Skillgeneral
Workflows Skill
Use when investigating unexpected metric changes - systematically narrows root cause through 4D segmentation, intrinsic vs extrinsic factor analysis, hypothesis testing, and North Star impact assessment
SKILL.md
---
name: metric-diagnosis
description: Use when investigating unexpected metric changes - systematically narrows root cause through 4D segmentation, intrinsic vs extrinsic factor analysis, hypothesis testing, and North Star impact assessment
---
# Metric Diagnosis Workflow
## Purpose
Systematically investigate why a metric changed unexpectedly (dropped or spiked) by segmenting data, distinguishing internal vs. external factors, and testing hypotheses to identify root cause. Prevents rushing to wrong conclusions and ensures evidence-based responses.
## When to Use This Workflow
Use this workflow when:
- Key metrics drop or spike unexpectedly
- Leadership asks "what happened to [metric]?"
- Post-mortem analysis needed after incident
- A/B test shows unexpected patterns
- Need to validate suspected root cause
- User behavior changed suddenly without obvious reason
## Skills Sequence
This workflow applies the `root-cause-diagnosis` skill three times, then assesses North Star impact:
```
1. Root Cause Diagnosis (Phase 1)
↓ (Segment across 4 dimensions: People, Geography, Technology, Time)
2. Root Cause Diagnosis (Phase 2)
↓ (Distinguish intrinsic vs extrinsic factors)
3. Root Cause Diagnosis (Phase 3)
↓ (Build hypothesis table and test against data)
4. North Star Alignment
↓ (Assess impact on top-line metrics)
OUTPUT: Narrowed scope, tested hypotheses, identified root cause, recommended actions
```
## Required Inputs
Gather this information before starting:
### Metric Change Information
- **The metric that changed**
- Name and definition
- Example: "Weekly Active Users (WAU)"
- **Magnitude of change**
- Absolute and percentage
- Example: "10,000 → 9,000 (10% drop)"
- **Timeframe**
- When was it noticed?
- Over what period did change occur?
- Example: "Noticed today, change happened over past week"
### Data Access
- **Segmented data availability**
- Can you segment by user type, geography, platform?
- Are raw logs accessible?
- **Comparison data**
- Historical baselines
- Year-over-year seasonality
- Competitor benchmarks (if available)
### Recent Activity Log
- **Recent releases** (past 1-4 weeks)
- Feature launches
- Bug fixes
- Infrastructure changes
- **Running experiments**
- A/B tests active
- Configuration changes
- **External context**
- Marketing campaigns
- News events
- Competitor activity
## Workflow Steps
### Phase 0: Data Quality Verification (5 minutes)
**Before investigating, confirm the data is real:**
1. **Check reporting systems**
- Is tracking still functional?
- Any pipeline issues?
- Data freshness correct?
2. **Cross-reference sources**
- Do multiple data sources show same pattern?
- Internal dashboards vs. external tools aligned?
3. **Validate sample sizes**
- Sufficient data for statistical significance?
- Time periods comparable?
**Red flags that indicate data quality issues:**
- Metric went to exactly zero
- Change happened at exact midnight
- Only visible in one data source
- Coincides with known tracking deployment
**If data quality issue found:** Fix tracking first, then re-investigate if problem persists.
**If data is valid:** Proceed to Phase 1.
### Phase 1: Narrow the Scope (15-20 minutes)
**Use the `root-cause-diagnosis` skill - 4 Dimension Segmentation**
Ask clarifying questions across all four dimensions:
#### Dimension 1: People (User Segments)
Questions to ask:
- Does it affect all users equally?
- Specific demographics (age, gender, location)?
- New vs. returning users?
- Free vs. paid users?
- Power users vs. casual users?
- Specific cohorts or user types?
Request data segmentation by user attributes.
#### Dimension 2: Geography
Questions to ask:
- Specific countries or regions affected?
- Certain cities?
- Urban vs. rural?
- Climate zones?
- Time zones?
Request data segmentation by location.
#### Dimension 3: Technology (Platform)
Questions to ask:
- iOS vs. Android?
- Web vs. mobile app?
- Desktop vs. mobile web?
- Specific browser or OS versions?
- Device types?
- Network types (WiFi vs. cellular)?
Request data segmentation by platform/device.
#### Dimension 4: Time
Questions to ask:
- When exactly did it start?
- Gradual decline or sudden drop?
- Day of week patterns?
- Time of day patterns?
- Recurring or one-time?
- Seasonal effects?
Request time-series data with granular breakdown.
**Output of Phase 1:**
Narrow problem statement from:
- ❌ "Usage is down"
To:
- ✓ "iOS usage among teens in US dropped 10% starting Sept 1st"
**Document:**
```markdown
## Narrowed Scope
**Original statement:** [Broad problem]
**Narrowed statement:** [Specific problem]
**Affected segments:**
- People: [Which user groups]
- Geography: [Which locations]
- Technology: [Which platforms]
- Time: [When and pattern]
**Unaffected segments:**
- [What stayed stable - equally important]
```
### Phase 2: Generate Hypotheses (15-20 minutes)
**Use the `root-cause-diagnosis` skill - Intrinsic vs. Extrinsic Factor Analysis**
Brainstorm potential causes in both categories:
#### Intrinsic Factors (Internal Changes)
**Engineering & Product:**
- Recent code releases (check deploy logs)
- Bug introductions
- Feature launches by any team
- A/B experiments started/stopped
- Algorithm or ranking changes
- Performance degradations
**Check with:**
- Engineering team (deploys, errors, performance)
- Other product teams (their launches)
- Data team (tracking changes)
**Operations & Business:**
- Pricing changes
- Policy updates
- Support process changes
- Marketing campaign changes
**Check with:**
- Operations team
- Marketing team
- Customer success team
#### Extrinsic Factors (External Events)
**Competition:**
- Competitor product launches
- Competitor pricing changes
- Competitor marketing campaigns
**Market:**
- Economic shifts
- Industry trends
- Regulatory changes
- Platform policy changes (iOS, Android, etc.)
**Calendar/Seasonal:**
- Holidays
- School schedules
- Weather patterns
- Cultural events
**News/PR:**
- News coverage (positive or negative)
- Social media trends
- External events affecting user behavior
**Check with:**
- Sales team (competitive intelligence)
- Customer success (user feedback)
- Marketing (market trends)
- Industry news sources
**Output of Phase 2:**
List of 4-6 hypotheses (mix of intrinsic and extrinsic):
```markdown
## Hypothesis List
**Intrinsic (Internal):**
1. [Hypothesis name]: [Description]
2. [Hypothesis name]: [Description]
3. [Hypothesis name]: [Description]
**Extrinsic (External):**
4. [Hypothesis name]: [Description]
5. [Hypothesis name]: [Description]
6. [Hypothesis name]: [Description]
```
### Phase 3: Test Hypotheses (20-30 minutes)
**Use the `root-cause-diagnosis` skill - Hypothesis Table Method**
#### Step 3.1: Create Hypothesis Table
**Columns:** Data points observed (from Phase 1)
**Rows:** Potential causes (from Phase 2)
**Fill each cell with predicted impact:**
- ↓ = Would decrease this segment
- ↑ = Would increase this segment
- → = No effect on this segment
- ? = Uncertain
#### Step 3.2: Match Patterns
For each hypothesis:
- Do predictions match observations?
- If ALL predictions match → Possible cause
- If ANY predictions contradict → Rule out
#### Step 3.3: Request Differentiating Data
If multiple hypotheses fit:
- Identify data points that would differ between them
- Request additional segmentation
- Narrow to single most likely cause
**Example Table:**
| Cause | iOS ↓ | Android → | Teens ↓ | Adults → | US ↓ | EU → |
|-------|-------|-----------|---------|----------|------|------|
| iOS bug in latest update | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| School started (US only) | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ |
| Competitor targeted teens | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ |
**Pattern match:** "School started (US)" fits all observations → Most likely cause
**Output of Phase 3:**
```markdown
## Hypothesis Testing Results
**Ruled out:**
- [Hypothesis]: [Why ruled out]
- [Hypothesis]: [Why ruled out]
**Possible causes (remaining):**
1. [Most likely hypothesis]
- Evidence supporting: [List]
- Evidence against: [List if any]
- Confidence: [High/Medium/Low]
2. [Second most likely] (if applicable)
- Evidence supporting: [List]
- Confidence: [High/Medium/Low]
**Additional data needed to confirm:**
- [Data point that would differentiate remaining hypotheses]
```
### Phase 4: Assess North Star Impact (10 minutes)
**Use the `north-star-alignment` skill**
Evaluate whether identified root cause impacts company-level metrics:
**Questions:**
- Does this affect North Star metrics?
- Is impact temporary or permanent?
- Does it require immediate action?
- What's the business impact (revenue, growth, retention)?
**Severity classification:**
**Critical (Act immediately):**
- Impacts revenue directly
- Affects North Star metrics substantially
- Permanent degradation if not fixed
- Competitive disadvantage
**Important (Act soon):**
- Impacts user experience
- Affects intermediate metrics
- May worsen over time
- Needs monitoring
**Low Priority (Monitor):**
- Expected variation (seasonality)
- Affects non-critical segments
- Self-resolving
- No action needed
**Output of Phase 4:**
```markdown
## North Star Impact Assessment
**Affected North Star metrics:**
- [Metric]: [Impact description]
- [Metric]: [Impact description]
**Business impact:**
- Revenue: [Direct/Indirect/None]
- Growth: [Slowed/Accelerated/Neutral]
- Retention: [Harmed/Improved/Neutral]
**Severity:** [Critical/Important/Low Priority]
**Urgency:** [Immediate/This week/Monitor]
```
### Phase 5: Recommend Actions (10 minutes)
Based on root cause and severity, recommend next steps:
**Action categories:**
1. **Fix (for bugs/issues)**
- What needs to be fixed?
- Who owns the fix?
- Timeline for fix?
2. **Mitigate (for external factors)**
- What can reduce harm?
- Counter-measures available?
- Timeline for mitigation?
3. **Monitor (for expected variations)**
- What tracking is needed?
- Alert thresholds?
- Review cadence?
4. **No Action (for acceptable changes)**
- Why no action needed?
- What validates this decision?
**Output document:**
```markdown
# Metric Diagnosis Report: [Metric Name]
## Executive Summary
[One paragraph: What happened, why, what to do]
## Problem Statement
- **Initial observation:** [Broad statement]
- **Narrowed scope:** [Specific statement with segments]
- **Magnitude:** [Absolute and % change]
- **Timeframe:** [When occurred]
## Root Cause Analysis
### Narrowed Scope (Phase 1)
- **Affected:** [User segments, geography, platforms, time pattern]
- **Unaffected:** [What stayed stable]
### Hypotheses Tested (Phase 2-3)
| Hypothesis | Category | Evidence | Conclusion |
|------------|----------|----------|------------|
| [Name] | Intrinsic/Extrinsic | [Supporting/Contradicting] | Ruled Out/Possible/Likely |
### Identified Root Cause
**Most likely cause:** [Hypothesis name]
**Evidence supporting:**
1. [Data point matching prediction]
2. [Data point matching prediction]
3. [Stakeholder confirmation]
**Confidence level:** [High/Medium/Low]
### North Star Impact (Phase 4)
- **Affected metrics:** [List with impact]
- **Severity:** [Critical/Important/Low]
- **Urgency:** [Immediate/Soon/Monitor]
## Recommended Actions
### Primary action:
**[Fix/Mitigate/Monitor/No Action]**
- What: [Specific action]
- Who: [Owner]
- When: [Timeline]
- Success criteria: [How to know it worked]
### Secondary actions:
- [Action 1]
- [Action 2]
### Monitoring plan:
- Metrics to track: [List]
- Review frequency: [Daily/Weekly]
- Alert thresholds: [When to escalate]
## Appendix
- Hypothesis table: [Detailed]
- Data sources: [Links]
- Stakeholders consulted: [Names/teams]
```
## Common Diagnostic Patterns
### Pattern 1: Seasonal Variation
- **Symptoms:** Recurring pattern (weekly, yearly)
- **Investigation:** Compare to historical same period
- **Action:** Usually monitor, adjust baselines
- **Example:** School year start, holiday season
### Pattern 2: Competitor Impact
- **Symptoms:** Sudden shift, specific demographic
- **Investigation:** Check competitor news, market research
- **Action:** Counter-strategy, feature parity
- **Example:** Competing app launch with targeted marketing
### Pattern 3: Internal Bug
- **Symptoms:** Sudden drop, platform-specific
- **Investigation:** Check recent releases, error logs
- **Action:** Hotfix, rollback if needed
- **Example:** iOS update broke key feature
### Pattern 4: Product Change Side Effect
- **Symptoms:** Gradual shift after release
- **Investigation:** Analyze feature adoption patterns
- **Action:** Iterate on feature, A/B test variations
- **Example:** Redesign changed user behavior
### Pattern 5: Tracking Issue (False Alarm)
- **Symptoms:** Exact numbers, midnight timing, single source
- **Investigation:** Verify data pipeline
- **Action:** Fix tracking, dismiss false alarm
- **Example:** Analytics tag removed in deploy
## Common Mistakes
| Mistake | Fix |
|---------|-----|
| Jumping to conclusions without segmentation | Complete Phase 1 (4D segmentation) first |
| Only considering one type of factor | Brainstorm both intrinsic AND extrinsic |
| Accepting first matching hypothesis | Test all hypotheses systematically |
| Skipping data quality check | Always verify data is valid first |
| Not consulting stakeholders | Talk to sales, CS, engineering teams |
| Immediate rollback without diagnosis | Understand cause before acting |
## Success Criteria
Metric diagnosis succeeds when:
- Data quality verified
- Problem narrowed to specific segments (4 dimensions)
- 4-6 hypotheses brainstormed (intrinsic + extrinsic)
- Hypotheses tested against data systematically
- Root cause identified (or top 2-3 candidates with confidence levels)
- North Star impact assessed
- Recommended actions clear with owners and timelines
- Diagnosis document created and shared
- Stakeholders aligned on next steps
## Real-World Example: Microsoft Teams Downloads Drop
### Initial Report
"New app downloads down 50% from last week"
### Phase 1: Narrow Scope (15 min)
**Questions asked:**
- Geography? → "Predominantly US and Europe"
- User type? → "Enterprise customers, not consumers"
- Platform? → "Web and desktop primarily, some mobile"
- Other metrics? → "Usage unchanged (existing users still active)"
**Narrowed statement:**
"Enterprise customer downloads via web/desktop down 50% in US/Europe, starting last week. Existing user engagement unchanged."
### Phase 2: Generate Hypotheses (15 min)
**Intrinsic:**
1. Bug in download flow
2. Recent release broke something
3. A/B test reducing visibility
**Extrinsic:**
4. Marketing promotion ended
5. Competitor launch
6. Economic downturn affecting enterprise spending
**Stakeholder consults:**
- Engineering: No recent releases, no bugs reported
- Marketing: "We were running 50% off for enterprises switching from Slack"
- Sales: No competitor news, pipeline healthy
### Phase 3: Test Hypotheses (20 min)
| Hypothesis | Enterprise | Consumer | US/EU | Usage Same | Downloads Only |
|------------|------------|----------|-------|------------|----------------|
| Bug in flow | ✗ Both | ✗ Both | ✗ All | ? | ✓ |
| Promo ended | ✓ | ✗ | ✓ | ✓ | ✓ |
| Competitor | ✗ Both | ✗ Both | ✗ All | ? | ✓ |
**Pattern match:** "Promotion ended" fits all observations
**Confirmation:** Marketing confirms promotion ended last weekend
### Phase 4: North Star Impact (5 min)
**Root cause:** End of promotional campaign
- Downloads during promo = artificially inflated baseline
- Current downloads = return to normal baseline
**North Star impact:**
- MAU: No impact (existing users unaffected)
- Revenue: No impact (existing customers retained)
- Severity: Low (false alarm, not actual problem)
### Phase 5: Recommended Actions (5 min)
**Primary action: No action needed**
- Not an actual decline, just return to baseline
- Adjust baselines for future comparisons
**Secondary actions:**
- Update dashboards to flag promotional periods
- Establish "normal" baseline for future monitoring
- Document for future reference (prevent false alarms)
**Time to diagnosis: 60 minutes**
## Related Skills
This workflow orchestrates these skills:
- **root-cause-diagnosis** (Phases 1-3) - Core investigation method
- **north-star-alignment** (Phase 4) - Assess strategic impact
## Related Workflows
- **tradeoff-decision**: May use diagnosis to understand why metrics changed before deciding
- **metrics-definition**: Uses similar systematic approach for defining vs. diagnosing
## Time Estimate
**Total: 60-90 minutes**
- Phase 0 (Data quality): 5 min
- Phase 1 (Narrow scope): 15-20 min
- Phase 2 (Hypotheses): 15-20 min
- Phase 3 (Test): 20-30 min
- Phase 4 (Impact): 10 min
- Phase 5 (Actions): 10 min
For complex issues, may require multiple iterations or longer investigation time.