📅 2025-10-20 15:00
🕒 Reading time: 37 min
🏷️ Measurement 🏷️ Quantification 🏷️ BOM 🏷️ Learning 🏷️ 【🔒Classified File】
Detective's Memo: "This article was good," "Today was tough," "The customer is satisfied"—qualitative expressions overflow in business. Many leave these as "unmeasurable sensations," but true detectives find hidden codes within. The key to quantification lies in building a measurement standard called "the Baseline of Measurement (BOM)." The adverb anchor method that divides emotions into 10 levels, Fibonacci sequences that automatically reflect uncertainty in large workload estimates, and most importantly—creating team-common standards rather than global standards. The implicit agreement that "quite good" equals 8 points, the common understanding that "the easiest difficulty" equals 1, the process of measuring and adjusting deviations in "the Baseline of Measurement" among members. Converting qualitative data to quantitative data, making it visible, shareable, and improvable—this is the measurement philosophy that creates reproducibility.
"The Baseline of Measurement (BOM)," formally recognized among clients as "the method of quantifying qualitative data through the establishment and sharing of standard baseline in measurement," is actually an area where many business professionals settle for "sensation," "intuition," or "somehow." While physics has clear unit definitions like 1 meter, 1 second, and 1 kilogram, business concepts like "difficulty," "satisfaction," and "quality" lack standard measurement baselines. However, this investigation revealed that global definitions are unnecessary—if only "the Baseline of Measurement" is shared within a project team, qualitative data can be quantified, visualized, made comparable, and made improvable.
Investigation Memo: Why is "satisfied" insufficient and "satisfaction 8/10" necessary? Why is the Fibonacci sequence (1,2,3,5,8,13...) more suitable than equal-interval scales (1,2,3,4...) for large workload estimates? And most importantly, why does aligning "the Baseline of Measurement" within a team become the key to project success? We need to clarify this foundational measurement technology that enables "Record (recording)" in the RCD Model and makes single indicators like NPS function.
Basic Evidence: Quantification realizes visualization
Four Problems with the Invisible:
Problem 1: Cannot Share
When told "feels good"
→ Unclear how good
→ Cannot convey to others
Problem 2: Cannot Compare
Last week: "was good"
This week: "was good"
→ Unclear which was better
→ Cannot judge improvement
Problem 3: Cannot Improve
"Let's improve quality"
→ Current state not quantified
→ Cannot set goals
→ Cannot judge achievement
Problem 4: Cannot Reproduce
"It went well that time"
→ No record of what was how good
→ Cannot reproduce same success
Four Powers Brought by Quantification:
Power 1: Shareability
"Satisfaction 8/10"
→ Concrete number
→ Conveys to others
→ Forms common understanding
Power 2: Comparability
Last week: "Satisfaction 6/10"
This week: "Satisfaction 8/10"
→ +2 point improvement
→ Clear progress
Power 3: Improvability
Current: "Quality 7/10"
Goal: "Quality 9/10"
→ Aim for +2 point improvement
→ Can plan measures and measure effects
Power 4: Reproducibility
Success case: "Difficulty 5, 8 hours work, satisfaction 9/10"
→ Can aim for success with same conditions next time
→ Pattern recognition, creating laws
Evidence Analysis: The essence of measurement is "making the invisible visible." Through quantification, subjective sensations become objective data, and individual experiences become team assets.
Qualitative Data:
Characteristics: - Expressed in words or descriptions - Qualitative properties - Subjective interpretation - Rich nuances
Examples: - "This article is very good" - "Customer is satisfied" - "Team atmosphere is bad" - "Design is sophisticated"
Advantages: - Understand context and background - Preserve emotions and nuances - Gain deep insights
Disadvantages: - Interpretations vary by person - Difficult to compare - Hard to aggregate and analyze - Low objectivity
Quantitative Data:
Characteristics: - Expressed in numbers - Quantitative properties - Objective measurement - Clear comparability
Examples: - "This article, 9/10 points" - "Customer satisfaction NPS +40" - "Team atmosphere 3/10" - "Design quality 8/10"
Advantages: - Clear and objective - Can compare and aggregate - Can statistically analyze - High reproducibility
Disadvantages: - Nuances are lost - Context is omitted - "Meaning" of numbers can become unclear
Essential Insight:
Not a binary opposition of qualitative vs quantitative → "Convert" qualitative to quantitative to gain benefits of both
Method: 1. While preserving qualitative richness 2. Acquire quantitative measurability 3. This is the role of "the Baseline of Measurement (BOM)"
Investigation Finding 1: Creating Rulers with Anchor Point Method
Bad Example (vague):
"Today's condition"
→ Condition of what?
→ Work? Health? Mood?
→ Unmeasurable
Good Example (clear):
"Today's writing productivity"
→ Clearly what to measure
→ Measurable
Principle: Fix with concrete examples
❌ Abstract anchor:
"The easiest state"
→ Different imagination per person
✅ Concrete anchor:
"Reply to 1 email (5 minutes)"
→ Same image for everyone
Practice Examples:
Minimum anchor for writing productivity: - "State where can't write even 500 words in 1 hour" = 1 point - Concrete situation: Cannot concentrate at all, rewrite many times
Minimum anchor for task difficulty: - "Fix 1 typo" = 1 point - Concrete work: Open file, fix, and commit
Principle: Maximum within one's experience range
❌ Unrealistic anchor:
"Most difficult in human history"
→ Zero practicality
✅ Realistic anchor:
"Most difficult work I've ever done"
→ Based on actual experience
→ Can recall
Practice Examples:
Maximum anchor for writing productivity: - "State where wrote 3,000 words in 1 hour" = 10 points - Concrete situation: Complete concentration, clear structure, materials ready
Maximum anchor for task difficulty: - "X040_NPS level super-large article writing" = 13 points - Concrete work: 10,000+ words, multiple case studies, including translation, 8+ hours
For more precise measurement:
Writing productivity scale:
1 point: Less than 500 words per hour (minimum)
3 points: 1,000 words per hour (poor)
5 points: 1,500 words per hour (average)
7 points: 2,000 words per hour (good)
9 points: 2,500 words per hour (excellent)
10 points: 3,000 words per hour (maximum)
Investigation Finding 2: Gradation Method Using Adverbs
Method established in psychology:
Developed by Rensis Likert (1932), attitude measurement method: - Measures degree of agreement to questions in 5 or 7 levels - Clarifies each level with adverbs - Standard method used in surveys worldwide
10-Level Scale (most practical):
0: Worst
1: Very bad
2: Bad
3: Somewhat bad
4: Below average
5: Average
6: Somewhat good
7: Good
8: Very good
9: Excellent
10: Perfect
Intensity Adverbs:
High intensity → Low intensity
Extremely > Very > Quite > Somewhat > Slightly > Hardly > Not at all
Examples:
"Extremely good" = 10 points
"Very good" = 9 points
"Quite good" = 8 points
"Somewhat good" = 6 points
"Slightly good" = 4 points
"Hardly good" = 2 points
"Not at all good" = 0 points
Scenario: Evaluating article writing satisfaction
Step 1: Qualitative sensation
"The article I wrote today is quite good"
Step 2: Identifying adverb
"Quite good" = High degree but not maximum
Step 3: Converting to number
"Quite good" = 8/10 points
Reason:
- Not as much as "very good" (9 points)
- Above "good" (7 points)
→ 8 points is appropriate
Step 4: Recording
2025-10-20 | X041 article writing | Satisfaction 8/10
Reason: Clear structure and readable, but feels like one example is missing
Step 5: Utilizing for next time
Next goal: Satisfaction 9/10
Improvement: Increase examples to 3
Investigation Finding 3: Workload Estimation Using Fibonacci Sequence
Problems with Equal-Interval Scale (1,2,3,4,5,6,7,8,9,10):
Small tasks:
1 hour and 2 hours → Difference is clear
2 hours and 3 hours → Difference is clear
Large tasks:
7 hours and 8 hours → Difference is ambiguous
8 hours and 9 hours → Difference is ambiguous
10 hours and 11 hours → Almost indistinguishable
Problem:
Equal-interval scale creates illusion of "constant precision"
→ Actually estimates vary more for larger values
→ Leads to overconfidence
Human Cognitive Limitations:
Weber-Fechner Law (1834):
Change in sensation ∝ Logarithmic change in stimulus
Meaning:
- 1 and 2 can be clearly distinguished
- 100 and 101 are difficult to distinguish
- Humans perceive logarithmically
Fibonacci Sequence:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89...
Definition:
F(n) = F(n-1) + F(n-2)
F(1) = 1, F(2) = 1
Relationship with Golden Ratio:
Ratio of adjacent numbers:
1/1 = 1.0
2/1 = 2.0
3/2 = 1.5
5/3 = 1.666...
8/5 = 1.6
13/8 = 1.625
21/13 = 1.615...
→ Converges to Golden Ratio φ ≈ 1.618
Meaning:
Each step increases by about 1.6 times
→ A ratio humans can clearly feel as "different"
ROI Detective Agency Implementation:
【1 Point】= Standard unit
- Typo correction
- Link addition
- Minor updates
Actual: 5-10 minutes
【2 Points】
- Partial rewrite of existing article
- Image replacement
Actual: 15-30 minutes
【3 Points】
- Small new article
- Template reuse
Actual: 30-60 minutes
【5 Points】
- Medium-scale original article
- Some research needed
Actual: 1.5-3 hours
Buffer: ±50%
【8 Points】
- Large article
- Multiple case studies needed
Actual: 3-6 hours
Buffer: ±100%
【13 Points】
- Super-large article (X040_NPS level)
- Large-scale research, including translation
Actual: 6-12 hours
Buffer: ±100-150%
【21 Points or more】
→ Task decomposition required
Reason 1: Automatic Reflection of Uncertainty
Equal-interval scale:
Small task: 2 hour estimate
Large task: 8 hour estimate
→ Both "4x difference"
→ Insufficient buffer for large task
Fibonacci scale:
Small task: 2 points
Large task: 13 points
→ "6.5x difference"
→ Automatic buffer for large task
Reason 2: Preventing Overconfidence
Equal interval: "This task 8 hours"
→ Feels precisely estimated
→ Actually 5-12 hours
Fibonacci: "This task 8 points"
→ Recognizes "there's a range"
→ Doesn't overestimate
Reason 3: Promoting Decomposition
Task exceeds 13 points
→ Warning that "estimate is too rough"
→ Should break down smaller
→ Reduces failure risk
Investigation Finding 4: Why Global Standards are Unnecessary and Team Standards Sufficient
Reason 1: Cultural and Language Differences
Example from NPS:
Japanese:
Even feeling "very good" → Modestly 8 points
Reason: Perceives "perfect" as 10, perfection is impossible
Americans:
Feeling "Very good" → Frankly 10 points
Reason: Culture of positive expression
Result:
2-point difference for same satisfaction
→ Comparison without cultural adjustment is meaningless
Reason 2: Experience Level Differences
Junior engineer:
"This bug fix, difficult" = 8 points
Reason: First time seeing error, anxious
Senior engineer:
"This bug, easy" = 2 points
Reason: Dealt with many times, understands pattern
Same task but BOM differs by experience
Practical Purposes:
What's needed in projects:
✅ Smooth communication within team
✅ Appropriate task allocation
✅ Accurate progress tracking
✅ Mutual understanding among members
All achievable with "common BOM within team"
❌ Comparison with global standards
❌ Absolute benchmarking with other companies
❌ Pursuit of universal truth
These are mostly unnecessary in practice
Why Focus on "Deviations":
Pattern where many teams fail:
1. Each estimates arbitrarily
2. Don't notice deviations
3. Schedule doesn't match
4. "Why delayed?" blame assignment
5. Trust relationship deteriorates
Successful team pattern:
1. Each estimates
2. Measure deviations ← Important here
3. Discuss why deviated
4. Adjust BOM
5. Deviations decrease next time
6. Trust relationship improves
Practice: Deviation Measurement Process
Scenario: Estimating new article creation
【Initial Estimation】
Task: Business framework article creation
Director: "8 points"
Claude: "5 points"
Deviation: 3 points (60% difference!)
【Dialogue Phase】
Director: "Why did you think 5 points?"
Claude: "I thought if we reference existing X039_HEART article,
we could reuse the writing structure.
If mainly rewriting, I thought 5 points"
Director: "I see. But this time we need 3 company case studies,
interviews and literature research will take time.
Completely new structure also needed"
Claude: "If including research and new structure,
certainly 8 points. I understand"
【Learning/Adjustment】
Claude's improvement from next time:
- Confirm "presence of research" when receiving task
- Distinguish between "rewrite" and "new creation"
- Confirm number of examples before estimating
Director's improvement from next time:
- Clearly indicate research scope when explaining task
- Clearly distinguish "new" and "rewrite"
- Share prerequisites first
Result:
Next similar task, deviation converges within 1 point
Technique 1: Setting Reference Tasks
ROI Detective Agency reference tasks (agreed by all):
【1 Point Standard】
Task: Fix 1 typo
Actual: 5 minutes
Everyone's recognition: "This is definitely 1"
【3 Points Standard】
Task: Partial rewrite of existing article (about 500 words)
Actual: 30-60 minutes
Everyone's recognition: "This is 3"
【8 Points Standard】
Task: New large article (X039_HEART level)
Actual: 4-6 hours
Everyone's recognition: "This is 8"
Usage:
New task → Compare with reference tasks → Estimate relatively
Technique 2: Planning Poker Method
Established method in agile development:
【Rules】
1. Listen to explanation of new task
2. Each estimates silently (Fibonacci cards)
3. Reveal numbers simultaneously on cue
4. Highest and lowest explain reasons
5. Discuss and re-estimate
6. Repeat until convergence
【Example】
Task: Create new framework article
Round 1:
Director: 8
Gemini: 8
Claude: 5
ChatGPT: 3
Discussion:
ChatGPT(3): "If new planning idea, can try with 3"
Director(8): "But this time systematic explanation article, research needed"
Claude(5): "Thought 5 if template available, but 8 if including research"
Round 2:
Director: 8
Gemini: 8
Claude: 8
ChatGPT: 5
Discussion:
ChatGPT: "If mainly research/writing rather than planning, agree on 8"
Final agreement: 8 points
Effects:
- Confirm prerequisites through dialogue
- Discover and adjust BOM deviations
- Deepens team mutual understanding
Technique 3: Calibration Meeting
Frequency: Monthly (weekly in early project)
Agenda:
1. Review last month's tasks (15 min)
- Compare estimates and actuals
- Pick up tasks with large deviations
2. Discuss deviation causes (30 min)
Example:
"Estimated article A as 3 points but actually 8 hours (5 points equivalent)"
→ Why deviated?
→ Research scope was 2x expected
→ Next time confirm research scope beforehand
3. Reconfirm BOM (15 min)
- Review reference tasks
- Add new reference tasks
- Realign everyone's recognition
4. Share success stories (10 min)
- Tasks where estimates were accurate
- Why were they accurate
- Horizontal deployment of best practices
Effects:
- Continuous precision improvement
- Team measurement skill improvement
- Learning culture from failures
Part 2 covers: - Power of Measurement - Limitations and Cautions - Applications and Integration - Practical Tools - Philosophy of Measurement - Future Prospects - Conclusion
【Part 1 Complete】
Problem 1: Cannot Share
When told "feels good"
→ Unclear how good
→ Cannot convey to others
Problem 2: Cannot Compare
Last week: "was good"
This week: "was good"
→ Unclear which was better
→ Cannot judge improvement
Problem 3: Cannot Improve
"Let's improve quality"
→ Current state not quantified
→ Cannot set goals
→ Cannot judge achievement
Problem 4: Cannot Reproduce
"It went well that time"
→ No record of what was how good
→ Cannot reproduce same success
Four Powers Brought by Quantification:
Power 1: Shareability
"Satisfaction 8/10"
→ Concrete number
→ Conveys to others
→ Forms common understanding
Power 2: Comparability
Last week: "Satisfaction 6/10"
This week: "Satisfaction 8/10"
→ +2 point improvement
→ Clear progress
Power 3: Improvability
Current: "Quality 7/10"
Goal: "Quality 9/10"
→ Aim for +2 point improvement
→ Can plan measures and measure effects
Power 4: Reproducibility
Success case: "Difficulty 5, 8 hours work, satisfaction 9/10"
→ Can aim for success with same conditions next time
→ Pattern recognition, creating laws
Evidence Analysis: The essence of measurement is "making the invisible visible." Through quantification, subjective sensations become objective data, and individual experiences become team assets.
Qualitative Data:
Characteristics: - Expressed in words or descriptions - Qualitative properties - Subjective interpretation - Rich nuances
Examples: - "This article is very good" - "Customer is satisfied" - "Team atmosphere is bad" - "Design is sophisticated"
Advantages: - Understand context and background - Preserve emotions and nuances - Gain deep insights
Disadvantages: - Interpretations vary by person - Difficult to compare - Hard to aggregate and analyze - Low objectivity
Quantitative Data:
Characteristics: - Expressed in numbers - Quantitative properties - Objective measurement - Clear comparability
Examples: - "This article, 9/10 points" - "Customer satisfaction NPS +40" - "Team atmosphere 3/10" - "Design quality 8/10"
Advantages: - Clear and objective - Can compare and aggregate - Can statistically analyze - High reproducibility
Disadvantages: - Nuances are lost - Context is omitted - "Meaning" of numbers can become unclear
Essential Insight:
Not a binary opposition of qualitative vs quantitative → "Convert" qualitative to quantitative to gain benefits of both
Method: 1. While preserving qualitative richness 2. Acquire quantitative measurability 3. This is the role of "the unit of measurement"
Investigation Finding 1: Creating Rulers with Anchor Point Method
Bad Example (vague):
"Today's condition"
→ Condition of what?
→ Work? Health? Mood?
→ Unmeasurable
Good Example (clear):
"Today's writing productivity"
→ Clearly what to measure
→ Measurable
Principle: Fix with concrete examples
❌ Abstract anchor:
"The easiest state"
→ Different imagination per person
✅ Concrete anchor:
"Reply to 1 email (5 minutes)"
→ Same image for everyone
Practice Examples:
Minimum anchor for writing productivity: - "State where can't write even 500 words in 1 hour" = 1 point - Concrete situation: Cannot concentrate at all, rewrite many times
Minimum anchor for task difficulty: - "Fix 1 typo" = 1 point - Concrete work: Open file, fix, and commit
Principle: Maximum within one's experience range
❌ Unrealistic anchor:
"Most difficult in human history"
→ Zero practicality
✅ Realistic anchor:
"Most difficult work I've ever done"
→ Based on actual experience
→ Can recall
Practice Examples:
Maximum anchor for writing productivity: - "State where wrote 3,000 words in 1 hour" = 10 points - Concrete situation: Complete concentration, clear structure, materials ready
Maximum anchor for task difficulty: - "X040_NPS level super-large article writing" = 13 points - Concrete work: 10,000+ words, multiple case studies, including translation, 8+ hours
For more precise measurement:
Writing productivity scale:
1 point: Less than 500 words per hour (minimum)
3 points: 1,000 words per hour (poor)
5 points: 1,500 words per hour (average)
7 points: 2,000 words per hour (good)
9 points: 2,500 words per hour (excellent)
10 points: 3,000 words per hour (maximum)
Investigation Finding 2: Gradation Method Using Adverbs
Method established in psychology:
Developed by Rensis Likert (1932), attitude measurement method: - Measures degree of agreement to questions in 5 or 7 levels - Clarifies each level with adverbs - Standard method used in surveys worldwide
10-Level Scale (most practical):
0: Worst
1: Very bad
2: Bad
3: Somewhat bad
4: Below average
5: Average
6: Somewhat good
7: Good
8: Very good
9: Excellent
10: Perfect
Intensity Adverbs:
High intensity → Low intensity
Extremely > Very > Quite > Somewhat > Slightly > Hardly > Not at all
Examples:
"Extremely good" = 10 points
"Very good" = 9 points
"Quite good" = 8 points
"Somewhat good" = 6 points
"Slightly good" = 4 points
"Hardly good" = 2 points
"Not at all good" = 0 points
Scenario: Evaluating article writing satisfaction
Step 1: Qualitative sensation
"The article I wrote today is quite good"
Step 2: Identifying adverb
"Quite good" = High degree but not maximum
Step 3: Converting to number
"Quite good" = 8/10 points
Reason:
- Not as much as "very good" (9 points)
- Above "good" (7 points)
→ 8 points is appropriate
Step 4: Recording
2025-10-20 | X041 article writing | Satisfaction 8/10
Reason: Clear structure and readable, but feels like one example is missing
Step 5: Utilizing for next time
Next goal: Satisfaction 9/10
Improvement: Increase examples to 3
Investigation Finding 3: Workload Estimation Using Fibonacci Sequence
Problems with Equal-Interval Scale (1,2,3,4,5,6,7,8,9,10):
Small tasks:
1 hour and 2 hours → Difference is clear
2 hours and 3 hours → Difference is clear
Large tasks:
7 hours and 8 hours → Difference is ambiguous
8 hours and 9 hours → Difference is ambiguous
10 hours and 11 hours → Almost indistinguishable
Problem:
Equal-interval scale creates illusion of "constant precision"
→ Actually estimates vary more for larger values
→ Leads to overconfidence
Human Cognitive Limitations:
Weber-Fechner Law (1834):
Change in sensation ∝ Logarithmic change in stimulus
Meaning:
- 1 and 2 can be clearly distinguished
- 100 and 101 are difficult to distinguish
- Humans perceive logarithmically
Fibonacci Sequence:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89...
Definition:
F(n) = F(n-1) + F(n-2)
F(1) = 1, F(2) = 1
Relationship with Golden Ratio:
Ratio of adjacent numbers:
1/1 = 1.0
2/1 = 2.0
3/2 = 1.5
5/3 = 1.666...
8/5 = 1.6
13/8 = 1.625
21/13 = 1.615...
→ Converges to Golden Ratio φ ≈ 1.618
Meaning:
Each step increases by about 1.6 times
→ A ratio humans can clearly feel as "different"
ROI Detective Agency Implementation:
【1 Point】= Standard unit
- Typo correction
- Link addition
- Minor updates
Actual: 5-10 minutes
【2 Points】
- Partial rewrite of existing article
- Image replacement
Actual: 15-30 minutes
【3 Points】
- Small new article
- Template reuse
Actual: 30-60 minutes
【5 Points】
- Medium-scale original article
- Some research needed
Actual: 1.5-3 hours
Buffer: ±50%
【8 Points】
- Large article
- Multiple case studies needed
Actual: 3-6 hours
Buffer: ±100%
【13 Points】
- Super-large article (X040_NPS level)
- Large-scale research, including translation
Actual: 6-12 hours
Buffer: ±100-150%
【21 Points or more】
→ Task decomposition required
Reason 1: Automatic Reflection of Uncertainty
Equal-interval scale:
Small task: 2 hour estimate
Large task: 8 hour estimate
→ Both "4x difference"
→ Insufficient buffer for large task
Fibonacci scale:
Small task: 2 points
Large task: 13 points
→ "6.5x difference"
→ Automatic buffer for large task
Reason 2: Preventing Overconfidence
Equal interval: "This task 8 hours"
→ Feels precisely estimated
→ Actually 5-12 hours
Fibonacci: "This task 8 points"
→ Recognizes "there's a range"
→ Doesn't overestimate
Reason 3: Promoting Decomposition
Task exceeds 13 points
→ Warning that "estimate is too rough"
→ Should break down smaller
→ Reduces failure risk
Investigation Finding 4: Why Global Standards are Unnecessary and Team Standards Sufficient
Reason 1: Cultural and Language Differences
Example from NPS:
Japanese:
Even feeling "very good" → Modestly 8 points
Reason: Perceives "perfect" as 10, perfection is impossible
Americans:
Feeling "Very good" → Frankly 10 points
Reason: Culture of positive expression
Result:
2-point difference for same satisfaction
→ Comparison without cultural adjustment is meaningless
Reason 2: Experience Level Differences
Junior engineer:
"This bug fix, difficult" = 8 points
Reason: First time seeing error, anxious
Senior engineer:
"This bug, easy" = 2 points
Reason: Dealt with many times, understands pattern
Same task but "unit of measurement" differs by experience
Practical Purposes:
What's needed in projects:
✅ Smooth communication within team
✅ Appropriate task allocation
✅ Accurate progress tracking
✅ Mutual understanding among members
All achievable with "common standards within team"
❌ Comparison with global standards
❌ Absolute benchmarking with other companies
❌ Pursuit of universal truth
These are mostly unnecessary in practice
Why Focus on "Deviations":
Pattern where many teams fail:
1. Each estimates arbitrarily
2. Don't notice deviations
3. Schedule doesn't match
4. "Why delayed?" blame assignment
5. Trust relationship deteriorates
Successful team pattern:
1. Each estimates
2. Measure deviations ← Important here
3. Discuss why deviated
4. Adjust "unit of measurement"
5. Deviations decrease next time
6. Trust relationship improves
Practice: Deviation Measurement Process
Scenario: Estimating new article creation
【Initial Estimation】
Task: Business framework article creation
Director: "8 points"
Claude: "5 points"
Deviation: 3 points (60% difference!)
【Dialogue Phase】
Director: "Why did you think 5 points?"
Claude: "I thought if we reference existing X039_HEART article,
we could reuse the writing structure.
If mainly rewriting, I thought 5 points"
Director: "I see. But this time we need 3 company case studies,
interviews and literature research will take time.
Completely new structure also needed"
Claude: "If including research and new structure,
certainly 8 points. I understand"
【Learning/Adjustment】
Claude's improvement from next time:
- Confirm "presence of research" when receiving task
- Distinguish between "rewrite" and "new creation"
- Confirm number of examples before estimating
Director's improvement from next time:
- Clearly indicate research scope when explaining task
- Clearly distinguish "new" and "rewrite"
- Share prerequisites first
Result:
Next similar task, deviation converges within 1 point
Technique 1: Setting Reference Tasks
ROI Detective Agency reference tasks (agreed by all):
【1 Point Standard】
Task: Fix 1 typo
Actual: 5 minutes
Everyone's recognition: "This is definitely 1"
【3 Points Standard】
Task: Partial rewrite of existing article (about 500 words)
Actual: 30-60 minutes
Everyone's recognition: "This is 3"
【8 Points Standard】
Task: New large article (X039_HEART level)
Actual: 4-6 hours
Everyone's recognition: "This is 8"
Usage:
New task → Compare with reference tasks → Estimate relatively
Technique 2: Planning Poker Method
Established method in agile development:
【Rules】
1. Listen to explanation of new task
2. Each estimates silently (Fibonacci cards)
3. Reveal numbers simultaneously on cue
4. Highest and lowest explain reasons
5. Discuss and re-estimate
6. Repeat until convergence
【Example】
Task: Create new framework article
Round 1:
Director: 8
Gemini: 8
Claude: 5
ChatGPT: 3
Discussion:
ChatGPT(3): "If new planning idea, can try with 3"
Director(8): "But this time systematic explanation article, research needed"
Claude(5): "Thought 5 if template available, but 8 if including research"
Round 2:
Director: 8
Gemini: 8
Claude: 8
ChatGPT: 5
Discussion:
ChatGPT: "If mainly research/writing rather than planning, agree on 8"
Final agreement: 8 points
Effects:
- Confirm prerequisites through dialogue
- Discover and adjust "unit of measurement" deviations
- Deepens team mutual understanding
Technique 3: Calibration Meeting
Frequency: Monthly (weekly in early project)
Agenda:
1. Review last month's tasks (15 min)
- Compare estimates and actuals
- Pick up tasks with large deviations
2. Discuss deviation causes (30 min)
Example:
"Estimated article A as 3 points but actually 8 hours (5 points equivalent)"
→ Why deviated?
→ Research scope was 2x expected
→ Next time confirm research scope beforehand
3. Reconfirm "unit of measurement" (15 min)
- Review reference tasks
- Add new reference tasks
- Realign everyone's recognition
4. Share success stories (10 min)
- Tasks where estimates were accurate
- Why were they accurate
- Horizontal deployment of best practices
Effects:
- Continuous precision improvement
- Team measurement skill improvement
- Learning culture from failures
Warning File 1: Four Transformations Created by Visualization
Transformation 1: From Subjective to Objective
Before:
"Today's condition was good"
→ Individual sensation
→ Cannot convey to others
→ Not recorded
After:
"Today's productivity: 8/10"
→ Objective indicator
→ Shareable with team
→ Accumulated as data
Transformation 2: From Ambiguous to Clear
Before:
"This task seems difficult"
→ How difficult?
→ Schedule unclear
→ Cannot allocate resources
After:
"This task, 13 points"
→ Range of 6-12 hours
→ Schedule possible
→ Appropriate staffing
Transformation 3: From Past to Future
Before:
"It went well last time"
→ Unclear why went well
→ Cannot reproduce
After:
"Last time: difficulty 5, 8 hours work, satisfaction 9/10"
→ Success pattern clear
→ Can aim for success with same conditions next time
Transformation 4: From Individual to Team
Before:
Each judges arbitrarily
→ Recognition scattered
→ Difficult collaboration
After:
Share "unit of measurement"
→ Converse in common language
→ Smooth collaboration
Warning File 2: Compound Effect Brought by Measurement
Data Compound Effect:
Month 1: 10 tasks recorded
→ Trends vaguely visible
Month 3: 30 tasks recorded
→ Patterns becoming visible
Month 6: 60 tasks recorded
→ Can predict with confidence
Month 12: 120 tasks recorded
→ High-precision estimates become natural
Effect:
Quality of insights improves over time
→ Prediction accuracy improves
→ Project success rate increases
Skill Compound Effect:
Initially: Estimate by trial and error
↓
After 1 month: Own patterns become visible
↓
After 3 months: Team "unit of measurement" aligns
↓
After 6 months: High-probability accurate estimates
↓
After 12 months: Almost certainly successful project plans
Warning File 3: Integration with RCD Model
Measurement is prerequisite for recording:
Record:
❌ "Today was good" alone has low recording value
✅ "Productivity 8/10, satisfaction 9/10" enables analysis
Check:
Because there's measurement data:
- Can analyze trends
- Can discover patterns
- Common points become visible
Do:
Based on measurement data:
- Plan improvement measures
- Measure effects
- Further improve
Warning File 4: Why NPS Functions
Essence of NPS = Establishing BOM:
Question: "How likely to recommend?"
0-10 point scale
Global common BOM:
- 0-6: Detractors (dissatisfied)
- 7-8: Passives (satisfied but won't recommend)
- 9-10: Promoters (enthusiastic fans)
Why this works:
✅ Measurable with single question
✅ Globally comparable
✅ Can track year-over-year changes
✅ Correlates with behavioral prediction
→ Successful example of globalizing BOM
Warning File 1: Trap of Perfectionism
Trap Structure:
Pattern 1: Perfectionism before measurement
"Let's establish correct measurement method before starting"
→ Keep researching perfect method
→ Never start measuring
→ Data doesn't accumulate
Correct approach:
"Start with 60-point measurement method"
→ Measure while improving
→ Data accumulates
→ Precision improves
Countermeasure:
Done is better than perfect → Continuous measurement rather than perfect measurement
Warning File 2: Loss of Richness Through Quantification
Problem:
Qualitative: "Claude's writing touched my heart"
→ Rich nuance
→ Quality of emotion
Quantitative: "Writing quality: 9/10"
→ Only number
→ Unclear why good
Solution: Hybrid Approach
Combining Quantitative + Qualitative:
Recording example:
Date: 2025-10-20
Task: X041 article creation
Satisfaction: 8/10
Reason (qualitative):
"Fibonacci sequence explanation written clearly.
Many practical examples of adverb anchor method too. However,
team standardization section became somewhat long.
Want to summarize more concisely next time"
→ Comparable with numbers + Understandable with context
Warning File 3: Measurement for Measurement's Sake
Putting Cart Before Horse Pattern:
❌ "Let's increase KPIs"
→ 50 measurement items
→ Takes too much time to measure
→ Can't do actual work
→ Measurement becomes purpose
✅ "Measure only 3 most important"
→ Productivity, quality, satisfaction
→ Measurement time: 1 minute/day
→ Sustainable
Countermeasure:
Principle of measurement cost:
Value of measurement > Cost of measurement
High-value measurement:
- Directly leads to decision-making
- Leads to improvement actions
- Shared and discussed in team
Low-value measurement:
- Nobody looks at it
- Doesn't lead to action
- Only alibi of "we're measuring"
Warning File 4: Inappropriate Scale Selection
Failure Example 1: Measuring large workload with equal intervals
❌ "This project, 50 hours"
→ Feels precisely estimated
→ Actually 30-80 hours
→ Number 50 leads to overconfidence
✅ "This project, 34 points → decomposition required"
→ Recognizes "cannot estimate"
→ Break down smaller
→ Each task 13 points or less
Countermeasure: Scale Selection Principles
Qualitative data (emotions, satisfaction, etc.):
→ Equal-interval scale (1-10) + adverb anchors
Quantitative data (time, workload) with large values:
→ Fibonacci scale
Quantitative data with small values:
→ Direct measurement (minutes, hours)
Related Evidence 1: RCD Model Foundation
Record: Measurement enables valuable recording
Check: Analysis possible because of data
Do: Improvement based on measurement
Related Evidence 2: NPS Integration
Common: 0-10 scale with clear anchors
Application: Article evaluation system
Measurement: Internal + reader NPS
Related Evidence 3: OKR Goal Setting
Objective: Improve article quality
Key Results (measurable):
- Self-evaluation average 8/10+
- 5+ difficulty 8pt articles monthly
- Reader NPS +50+
Related Evidence 4: HEART Framework
5 dimensions all measurable:
- Happiness: NPS / 10-level
- Engagement: Dwell time
- Adoption: New readers
- Retention: Repeat rate
- Task Success: Completion rate
Tool 1: Estimation Dictionary
# Work Point Definition
### 1 Point (Standard)
- Fix 1-3 typos
- Actual: 5-10 minutes
### 2 Points
- Minor rewrite (200 words)
- Actual: 15-30 minutes
### 3 Points
- Partial rewrite (500 words)
- Actual: 30-60 minutes
### 5 Points
- Medium article (3,000 words)
- Actual: 1.5-3 hours
- Uncertainty: ±50%
### 8 Points
- Large article (5,000-7,000 words)
- Actual: 3-6 hours
- Uncertainty: ±100%
### 13 Points
- Super-large (8,000-10,000 words)
- Actual: 6-12 hours
- Uncertainty: ±150%
### 21+ Points
→ Decomposition required
Tool 2: Deviation Tracking Sheet
| Date | Task | Est. | Actual | Dev. | Analysis | Action |
|------|------|------|--------|------|----------|--------|
| 10/15 | X040 | 8pt | 13pt | +5pt | Translation time | Separate task |
| 10/16 | Image | 2pt | 2pt | 0pt | Accurate | Standardized |
| 10/17 | Rewrite | 3pt | 5pt | +2pt | Structure review | Pre-review |
Monthly: 45 tasks, ±1.2pt avg, -0.3pt improvement
Tool 3: Adverb Conversion Matrix
| Expression | Number | Description |
|------------|--------|-------------|
| Perfect | 10 | Beyond comparison |
| Excellent | 9 | Exceeds expectations |
| Very good | 8 | Above expected |
| Quite good | 7 | As expected |
| Good | 6 | Fairly satisfied |
| Average | 5 | Neither good nor bad |
| Somewhat bad | 3 | Below expectations |
| Bad | 2 | Greatly below |
| Worst | 0 | Complete failure |
Evidence: Measurement as Building Common Language
Thomas Kuhn "The Structure of Scientific Revolutions" (1962):
Scientific truth = Agreement of scientific community
Example: "1 meter" definition changed over time
→ But functions if scientists agree
Business application:
Value of measurement = Common understanding within team
→ Global standard unnecessary
Evidence: Continuation Over Perfection
Pragmatism wisdom:
Idealism:
"Perfect method before starting"
→ Never starts
→ Data doesn't accumulate
→ Cannot improve
Pragmatism:
"Start with 60-point method today"
→ Start immediately
→ Data accumulates
→ Improve while using
→ 80-point precision in 3 months
ROI Detective Agency practice:
2025/04/28: Introduced GA4 (started measurement)
↓
2025/06/15: Added graphs (improved visualization)
↓
2025/07/26: Added segment analysis (improved precision)
↓
Continuously improving
→ Start without waiting for perfection
→ Refine while using
→ Evolve spirally
Evidence: Democratization of Data
Organizational transformation:
Traditional organization:
Data only for executives/specialists
→ Frontline judges by "sensation"
→ Recognition misalignment
→ Inefficiency
Measurement-driven organization:
Everyone sees same data
→ Share BOM
→ Dialogue in common language
→ Efficient collaboration
ROI Detective Agency:
Director, Gemini, Claude, ChatGPT
→ All use same BOM
→ Flat dialogue
→ Optimal task allocation
Evidence: AI-Assisted Measurement Automation
Current challenges:
Manual measurement by humans:
- Takes time to measure
- Forget to measure
- Subjective variation
Future possibilities:
AI-assisted measurement:
- Auto-track work time
- Auto-measure emotion with sentiment analysis
- Auto-evaluate quality
- Real-time dashboard updates
Example:
While writing, AI:
→ Estimates productivity from keystroke speed
→ Auto-evaluates text quality
→ "Today's productivity was 8/10" at work end
→ Auto-analyzes reasons "long concentration time"
Evidence: Integration with Biometric Data
Psychological/physiological data:
Current:
Subjective "satisfaction" evaluation
→ Self-report bias
Future:
Objective physiological data:
- Heart rate variability (stress level)
- Facial recognition (emotional state)
- Brain waves (concentration level)
Integration example:
Subjective evaluation: "Today's satisfaction 7/10"
Biometric data: "Stress value 3/10, concentration 8/10"
→ Comprehensive state understanding
Evidence: Blockchain for Measurement Reliability
Preventing data tampering:
Current challenge:
"Completed this project in 8 hours"
→ Really? Underreporting?
Blockchain utilization:
- Auto-record work start/end time
- Tamper-proof
- Transparency, auditability
Application:
Trust building in freelance/remote work era
Investigator's Final Report:
"The Baseline of Measurement (BOM)" is "the method of quantifying qualitative data through the establishment and sharing of standard baseline in measurement." Most impressive in this investigation was the practical wisdom that rather than pursuing globally common perfect measurement standards, practical measurement can be realized by aligning "the BOM" within a team.
The essence of measurement is "making the invisible visible." By converting the subjective sensation "This article is good" into objective data "article quality 8/10," we acquire four powers: shareability, comparability, improvability, and reproducibility. Most importantly, this measurement functions as a catalyst that transforms individual experience into team assets.
The technique of "ruler-making" through the anchor point method follows the same principle as the invention of thermometers in physics (water freezing point 0°C, boiling point 100°C). Fix the minimum anchor (easiest difficulty = 1) and maximum anchor (most difficult = 10) with concrete examples, and measure relatively in between. This simple principle transforms vague sensations into precise rulers.
Gradation using adverbs is the scientific method established by Rensis Likert in 1932. By associating adverbs like "very good," "quite good," "somewhat good" with numbers, we preserve qualitative richness while acquiring quantitative measurability. This "hybrid approach" maximizes measurement practicality.
The application of Fibonacci sequences (1,2,3,5,8,13...) to workload estimation is a brilliant method mathematically reflecting human cognitive limitations. As the Weber-Fechner law shows, humans perceive logarithmically. The exponential growth of Fibonacci sequences automatically reflects as buffer the cognitive characteristic that larger workload estimates deviate more. This is an established method adopted by agile development teams worldwide, with scientifically proven grounds.
The most important discovery is the "deviation measurement and adjustment" process. Many teams fail because they don't notice and leave estimation deviations unaddressed. Successful teams measure deviations, dialogue about why they occurred, and continuously adjust the BOM. This calibration process is key to building common language within teams and spirally improving measurement precision.
The planning poker method of simultaneous estimation is dialogue facilitation technology established in agile development. Each estimates silently, reveals numbers simultaneously, and those with highest and lowest explain reasons. Through this dialogue, differences in prerequisite understanding, experience levels, and task definition ambiguity surface, naturally aligning the BOM.
The pragmatic acceptance that "global common is difficult, team common is sufficient" is also important insight. As Thomas Kuhn pointed out, even scientific truth is based on scientific community agreement. Similarly for business measurement, if the BOM aligns within a team, comparison with global standards is unnecessary. Considering differences in culture, language, experience, and expertise, globally common measurement standards are unrealistic and not worth pursuing.
Integration with RCD Model also became clear. Measurement is prerequisite for Record, and because there's recording, Check enables pattern discovery, and Do enables improvement measure planning. To realize the detective's catchphrase "record and analyze experience to pursue reproducibility," measurement is first essential.
The reason NPS succeeded globally is also in globalizing the BOM. By setting clear anchors globally—0-6 points = Detractors, 7-8 points = Passives, 9-10 points = Promoters—comparison across cultures became possible. This proves that if the BOM is appropriately designed, global standards are also possible.
Limitations and cautions of measurement were also confirmed. Trap of perfectionism (starting nothing while seeking perfect measurement), loss of richness through quantification (disappearance of qualitative nuance), measurement for measurement's sake (becoming purpose), inappropriate scale selection (measuring large workload with equal intervals)—these are dangers that must always be guarded against when introducing measurement.
Countermeasures are clear. Done is better than perfect (continuation over perfection), hybrid approach (combining quantitative + qualitative), measurement value > measurement cost (measure only 3 most important), appropriate scale selection (equal intervals for qualitative data, Fibonacci for large quantitative data).
As future prospects, evolution possibilities of measurement technology like AI-assisted automatic measurement, integration with biometric data, and reliability assurance with blockchain were also confirmed. However, the essence doesn't change—align the BOM within team, measure continuously, adjust deviations, build common language. This is the royal road to creating measurement-driven organizations.
Most impressive was the attitude of ROI Detective Agency that "proves through practice" rather than "speaks as theory" this measurement philosophy. Starting from GA4 introduction, adding graphs, segment analysis, continuous improvement—starting at 60 points without waiting for perfection, refining while using, evolving spirally. This practical wisdom has more value than any theoretical book.
Measurement is the technology that transforms vague sensations into clear numbers, transforms individual experience into team assets, and predicts the future from past data. And at its core is "the Baseline of Measurement (BOM)"—the standard baseline of measurement shared within the team.
Recommended Maxim: "What cannot be measured cannot be improved. But rather than waiting for perfect measurement, measure from today even if rough. Once the Baseline of Measurement (BOM) is aligned, the team has a common language."
【ROI Detective Agency Classified File Series X041 Complete】
Case Closedification (disappearance of qualitative nuance), measurement for measurement's sake (becoming purpose), inappropriate scale selection (measuring large workload with equal intervals)—these are dangers that must always be guarded against when introducing measurement.
Countermeasures are clear. Done is better than perfect (continuation over perfection), hybrid approach (combining quantitative + qualitative), measurement value > measurement cost (measure only 3 most important), appropriate scale selection (equal intervals for qualitative data, Fibonacci for large quantitative data).
As future prospects, evolution possibilities of measurement technology like AI-assisted automatic measurement, integration with biometric data, and reliability assurance with blockchain were also confirmed. However, the essence doesn't change—align "the unit of measurement" within team, measure continuously, adjust deviations, build common language. This is the royal road to creating measurement-driven organizations.
Most impressive was the attitude of ROI Detective Agency that "proves through practice" rather than "speaks as theory" this measurement philosophy. Starting from GA4 introduction, adding graphs, segment analysis, continuous improvement—starting at 60 points without waiting for perfection, refining while using, evolving spirally. This practical wisdom has more value than any theoretical book.
Measurement is the technology that transforms vague sensations into clear numbers, transforms individual experience into team assets, and predicts the future from past data. And at its core is "the unit of measurement"—the standard unit of measurement shared within the team.
Recommended Maxim: "What cannot be measured cannot be improved. But rather than waiting for perfect measurement, measure from today even if rough. Once the unit is aligned, the team has a common language."
【ROI Detective Agency Classified File Series X041 Complete】
Case Closed
Solve Your Business Challenges with Kindle Unlimited!
Access millions of books with unlimited reading.
Read the latest from ROI Detective Agency now!
*Free trial available for eligible customers only