ROI【🔒Classified File】 No. X041 | What is BOMROI Detective Agency

🏷️ Measurement 🏷️ Quantification 🏷️ BOM 🏷️ Learning 🏷️ 【🔒Classified File】

What is the Baseline of Measurement - Case Overview
Basic Structure of Measurement - Evidence Analysis
- Why Quantification is Necessary
- Difference Between Qualitative and Quantitative Data
Technique of Building "The BOM" - Investigation Methods
Building Team-Common BOM - Practice Methods
Technique of Building "The Unit of Measurement" - Investigation Methods
Building Team-Common "Unit of Measurement" - Practice Methods
Power of Measurement - Hidden Truth
Limitations and Cautions of Measurement - Potential Dangers
Application and Integration - Related Case Files
Practical Tools - Special Measures
Essence of Measurement Philosophy - Prospect Analysis
Future of Measurement - Direction of Evolution
Conclusion - Investigation Summary

Detective's Memo: "This article was good," "Today was tough," "The customer is satisfied"—qualitative expressions overflow in business. Many leave these as "unmeasurable sensations," but true detectives find hidden codes within. The key to quantification lies in building a measurement standard called "the Baseline of Measurement (BOM)." The adverb anchor method that divides emotions into 10 levels, Fibonacci sequences that automatically reflect uncertainty in large workload estimates, and most importantly—creating team-common standards rather than global standards. The implicit agreement that "quite good" equals 8 points, the common understanding that "the easiest difficulty" equals 1, the process of measuring and adjusting deviations in "the Baseline of Measurement" among members. Converting qualitative data to quantitative data, making it visible, shareable, and improvable—this is the measurement philosophy that creates reproducibility.

What is the Baseline of Measurement - Case Overview

"The Baseline of Measurement (BOM)," formally recognized among clients as "the method of quantifying qualitative data through the establishment and sharing of standard baseline in measurement," is actually an area where many business professionals settle for "sensation," "intuition," or "somehow." While physics has clear unit definitions like 1 meter, 1 second, and 1 kilogram, business concepts like "difficulty," "satisfaction," and "quality" lack standard measurement baselines. However, this investigation revealed that global definitions are unnecessary—if only "the Baseline of Measurement" is shared within a project team, qualitative data can be quantified, visualized, made comparable, and made improvable.

Investigation Memo: Why is "satisfied" insufficient and "satisfaction 8/10" necessary? Why is the Fibonacci sequence (1,2,3,5,8,13...) more suitable than equal-interval scales (1,2,3,4...) for large workload estimates? And most importantly, why does aligning "the Baseline of Measurement" within a team become the key to project success? We need to clarify this foundational measurement technology that enables "Record (recording)" in the RCD Model and makes single indicators like NPS function.

Basic Structure of Measurement - Evidence Analysis

Basic Evidence: Quantification realizes visualization

Why Quantification is Necessary

Four Problems with the Invisible:

Problem 1: Cannot Share
When told "feels good"
→ Unclear how good
→ Cannot convey to others

Problem 2: Cannot Compare
Last week: "was good"
This week: "was good"
→ Unclear which was better
→ Cannot judge improvement

Problem 3: Cannot Improve
"Let's improve quality"
→ Current state not quantified
→ Cannot set goals
→ Cannot judge achievement

Problem 4: Cannot Reproduce
"It went well that time"
→ No record of what was how good
→ Cannot reproduce same success

Four Powers Brought by Quantification:

Power 1: Shareability
"Satisfaction 8/10"
→ Concrete number
→ Conveys to others
→ Forms common understanding

Power 2: Comparability
Last week: "Satisfaction 6/10"
This week: "Satisfaction 8/10"
→ +2 point improvement
→ Clear progress

Power 3: Improvability
Current: "Quality 7/10"
Goal: "Quality 9/10"
→ Aim for +2 point improvement
→ Can plan measures and measure effects

Power 4: Reproducibility
Success case: "Difficulty 5, 8 hours work, satisfaction 9/10"
→ Can aim for success with same conditions next time
→ Pattern recognition, creating laws

Evidence Analysis: The essence of measurement is "making the invisible visible." Through quantification, subjective sensations become objective data, and individual experiences become team assets.

Difference Between Qualitative and Quantitative Data

Qualitative Data:

Characteristics: - Expressed in words or descriptions - Qualitative properties - Subjective interpretation - Rich nuances

Examples: - "This article is very good" - "Customer is satisfied" - "Team atmosphere is bad" - "Design is sophisticated"

Advantages: - Understand context and background - Preserve emotions and nuances - Gain deep insights

Disadvantages: - Interpretations vary by person - Difficult to compare - Hard to aggregate and analyze - Low objectivity

Quantitative Data:

Characteristics: - Expressed in numbers - Quantitative properties - Objective measurement - Clear comparability

Examples: - "This article, 9/10 points" - "Customer satisfaction NPS +40" - "Team atmosphere 3/10" - "Design quality 8/10"

Advantages: - Clear and objective - Can compare and aggregate - Can statistically analyze - High reproducibility

Disadvantages: - Nuances are lost - Context is omitted - "Meaning" of numbers can become unclear

Essential Insight:

Not a binary opposition of qualitative vs quantitative → "Convert" qualitative to quantitative to gain benefits of both

Method: 1. While preserving qualitative richness 2. Acquire quantitative measurability 3. This is the role of "the Baseline of Measurement (BOM)"

Technique of Building "The BOM" - Investigation Methods

Investigation Finding 1: Creating Rulers with Anchor Point Method

Step 1: Clarifying Measurement Target

Bad Example (vague):

"Today's condition"
→ Condition of what?
→ Work? Health? Mood?
→ Unmeasurable

Good Example (clear):

"Today's writing productivity"
→ Clearly what to measure
→ Measurable

Step 2: Setting Minimum Anchor

Principle: Fix with concrete examples

❌ Abstract anchor:
"The easiest state"
→ Different imagination per person

✅ Concrete anchor:
"Reply to 1 email (5 minutes)"
→ Same image for everyone

Practice Examples:

Minimum anchor for writing productivity: - "State where can't write even 500 words in 1 hour" = 1 point - Concrete situation: Cannot concentrate at all, rewrite many times

Minimum anchor for task difficulty: - "Fix 1 typo" = 1 point - Concrete work: Open file, fix, and commit

Step 3: Setting Maximum Anchor

Principle: Maximum within one's experience range

❌ Unrealistic anchor:
"Most difficult in human history"
→ Zero practicality

✅ Realistic anchor:
"Most difficult work I've ever done"
→ Based on actual experience
→ Can recall

Practice Examples:

Maximum anchor for writing productivity: - "State where wrote 3,000 words in 1 hour" = 10 points - Concrete situation: Complete concentration, clear structure, materials ready

Maximum anchor for task difficulty: - "X040_NPS level super-large article writing" = 13 points - Concrete work: 10,000+ words, multiple case studies, including translation, 8+ hours

Step 4: Verbalizing Midpoints (Optional)

For more precise measurement:

Writing productivity scale:

1 point: Less than 500 words per hour (minimum)
3 points: 1,000 words per hour (poor)
5 points: 1,500 words per hour (average)
7 points: 2,000 words per hour (good)
9 points: 2,500 words per hour (excellent)
10 points: 3,000 words per hour (maximum)

Investigation Finding 2: Gradation Method Using Adverbs

Practice of Likert Scale

Method established in psychology:

Developed by Rensis Likert (1932), attitude measurement method: - Measures degree of agreement to questions in 5 or 7 levels - Clarifies each level with adverbs - Standard method used in surveys worldwide

10-Level Scale (most practical):

0: Worst
1: Very bad
2: Bad
3: Somewhat bad
4: Below average
5: Average
6: Somewhat good
7: Good
8: Very good
9: Excellent
10: Perfect

Systematizing Adverbs

Intensity Adverbs:

High intensity → Low intensity

Extremely > Very > Quite > Somewhat > Slightly > Hardly > Not at all

Examples:
"Extremely good" = 10 points
"Very good" = 9 points
"Quite good" = 8 points
"Somewhat good" = 6 points
"Slightly good" = 4 points
"Hardly good" = 2 points
"Not at all good" = 0 points

Practice: Quantifying Emotions

Scenario: Evaluating article writing satisfaction

Step 1: Qualitative sensation

"The article I wrote today is quite good"

Step 2: Identifying adverb

"Quite good" = High degree but not maximum

Step 3: Converting to number

"Quite good" = 8/10 points

Reason:
- Not as much as "very good" (9 points)
- Above "good" (7 points)
→ 8 points is appropriate

Step 4: Recording

2025-10-20 | X041 article writing | Satisfaction 8/10
Reason: Clear structure and readable, but feels like one example is missing

Step 5: Utilizing for next time

Next goal: Satisfaction 9/10
Improvement: Increase examples to 3

Investigation Finding 3: Workload Estimation Using Fibonacci Sequence

Why Equal Intervals Don't Work

Problems with Equal-Interval Scale (1,2,3,4,5,6,7,8,9,10):

Small tasks:
1 hour and 2 hours → Difference is clear
2 hours and 3 hours → Difference is clear

Large tasks:
7 hours and 8 hours → Difference is ambiguous
8 hours and 9 hours → Difference is ambiguous
10 hours and 11 hours → Almost indistinguishable

Problem:
Equal-interval scale creates illusion of "constant precision"
→ Actually estimates vary more for larger values
→ Leads to overconfidence

Human Cognitive Limitations:

Weber-Fechner Law (1834):

Change in sensation ∝ Logarithmic change in stimulus

Meaning:
- 1 and 2 can be clearly distinguished
- 100 and 101 are difficult to distinguish
- Humans perceive logarithmically

Mathematical Beauty of Fibonacci Sequence

Fibonacci Sequence:

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89...

Definition:
F(n) = F(n-1) + F(n-2)
F(1) = 1, F(2) = 1

Relationship with Golden Ratio:

Ratio of adjacent numbers:
1/1 = 1.0
2/1 = 2.0
3/2 = 1.5
5/3 = 1.666...
8/5 = 1.6
13/8 = 1.625
21/13 = 1.615...

→ Converges to Golden Ratio φ ≈ 1.618

Meaning:
Each step increases by about 1.6 times
→ A ratio humans can clearly feel as "different"

Implementing Fibonacci Scale

ROI Detective Agency Implementation:

【1 Point】= Standard unit
- Typo correction
- Link addition
- Minor updates
Actual: 5-10 minutes

【2 Points】
- Partial rewrite of existing article
- Image replacement
Actual: 15-30 minutes

【3 Points】
- Small new article
- Template reuse
Actual: 30-60 minutes

【5 Points】
- Medium-scale original article
- Some research needed
Actual: 1.5-3 hours
Buffer: ±50%

【8 Points】
- Large article
- Multiple case studies needed
Actual: 3-6 hours
Buffer: ±100%

【13 Points】
- Super-large article (X040_NPS level)
- Large-scale research, including translation
Actual: 6-12 hours
Buffer: ±100-150%

【21 Points or more】
→ Task decomposition required

Why Fibonacci is Suitable for Workload Estimation

Reason 1: Automatic Reflection of Uncertainty

Equal-interval scale:
Small task: 2 hour estimate
Large task: 8 hour estimate
→ Both "4x difference"
→ Insufficient buffer for large task

Fibonacci scale:
Small task: 2 points
Large task: 13 points
→ "6.5x difference"
→ Automatic buffer for large task

Reason 2: Preventing Overconfidence

Equal interval: "This task 8 hours"
→ Feels precisely estimated
→ Actually 5-12 hours

Fibonacci: "This task 8 points"
→ Recognizes "there's a range"
→ Doesn't overestimate

Reason 3: Promoting Decomposition

Task exceeds 13 points
→ Warning that "estimate is too rough"
→ Should break down smaller
→ Reduces failure risk

Building Team-Common BOM - Practice Methods

Investigation Finding 4: Why Global Standards are Unnecessary and Team Standards Sufficient

Limitations of Global Standards

Reason 1: Cultural and Language Differences

Example from NPS:

Japanese:
Even feeling "very good" → Modestly 8 points
Reason: Perceives "perfect" as 10, perfection is impossible

Americans:
Feeling "Very good" → Frankly 10 points
Reason: Culture of positive expression

Result:
2-point difference for same satisfaction
→ Comparison without cultural adjustment is meaningless

Reason 2: Experience Level Differences

Junior engineer:
"This bug fix, difficult" = 8 points
Reason: First time seeing error, anxious

Senior engineer:
"This bug, easy" = 2 points
Reason: Dealt with many times, understands pattern

Same task but BOM differs by experience

Why Team Standards are Sufficient

Practical Purposes:

What's needed in projects:

✅ Smooth communication within team
✅ Appropriate task allocation
✅ Accurate progress tracking
✅ Mutual understanding among members

All achievable with "common BOM within team"

❌ Comparison with global standards
❌ Absolute benchmarking with other companies
❌ Pursuit of universal truth

These are mostly unnecessary in practice

Measuring and Adjusting Deviations

Why Focus on "Deviations":

Pattern where many teams fail:

1. Each estimates arbitrarily
2. Don't notice deviations
3. Schedule doesn't match
4. "Why delayed?" blame assignment
5. Trust relationship deteriorates

Successful team pattern:

1. Each estimates
2. Measure deviations ← Important here
3. Discuss why deviated
4. Adjust BOM
5. Deviations decrease next time
6. Trust relationship improves

Practice: Deviation Measurement Process

Scenario: Estimating new article creation

【Initial Estimation】
Task: Business framework article creation

Director: "8 points"
Claude: "5 points"

Deviation: 3 points (60% difference!)

【Dialogue Phase】
Director: "Why did you think 5 points?"

Claude: "I thought if we reference existing X039_HEART article,
        we could reuse the writing structure.
        If mainly rewriting, I thought 5 points"

Director: "I see. But this time we need 3 company case studies,
        interviews and literature research will take time.
        Completely new structure also needed"

Claude: "If including research and new structure,
         certainly 8 points. I understand"

【Learning/Adjustment】
Claude's improvement from next time:
- Confirm "presence of research" when receiving task
- Distinguish between "rewrite" and "new creation"
- Confirm number of examples before estimating

Director's improvement from next time:
- Clearly indicate research scope when explaining task
- Clearly distinguish "new" and "rewrite"
- Share prerequisites first

Result:
Next similar task, deviation converges within 1 point

Three Techniques for Team Standardization

Technique 1: Setting Reference Tasks

ROI Detective Agency reference tasks (agreed by all):

【1 Point Standard】
Task: Fix 1 typo
Actual: 5 minutes
Everyone's recognition: "This is definitely 1"

【3 Points Standard】
Task: Partial rewrite of existing article (about 500 words)
Actual: 30-60 minutes
Everyone's recognition: "This is 3"

【8 Points Standard】
Task: New large article (X039_HEART level)
Actual: 4-6 hours
Everyone's recognition: "This is 8"

Usage:
New task → Compare with reference tasks → Estimate relatively

Technique 2: Planning Poker Method

Established method in agile development:

【Rules】
1. Listen to explanation of new task
2. Each estimates silently (Fibonacci cards)
3. Reveal numbers simultaneously on cue
4. Highest and lowest explain reasons
5. Discuss and re-estimate
6. Repeat until convergence

【Example】
Task: Create new framework article

Round 1:
Director: 8
Gemini: 8
Claude: 5
ChatGPT: 3

Discussion:
ChatGPT(3): "If new planning idea, can try with 3"
Director(8): "But this time systematic explanation article, research needed"
Claude(5): "Thought 5 if template available, but 8 if including research"

Round 2:
Director: 8
Gemini: 8
Claude: 8
ChatGPT: 5

Discussion:
ChatGPT: "If mainly research/writing rather than planning, agree on 8"

Final agreement: 8 points

Effects:
- Confirm prerequisites through dialogue
- Discover and adjust BOM deviations
- Deepens team mutual understanding

Technique 3: Calibration Meeting

Frequency: Monthly (weekly in early project)

Agenda:

1. Review last month's tasks (15 min)
   - Compare estimates and actuals
   - Pick up tasks with large deviations

2. Discuss deviation causes (30 min)
   Example:
   "Estimated article A as 3 points but actually 8 hours (5 points equivalent)"
   → Why deviated?
   → Research scope was 2x expected
   → Next time confirm research scope beforehand

3. Reconfirm BOM (15 min)
   - Review reference tasks
   - Add new reference tasks
   - Realign everyone's recognition

4. Share success stories (10 min)
   - Tasks where estimates were accurate
   - Why were they accurate
   - Horizontal deployment of best practices

Effects:
- Continuous precision improvement
- Team measurement skill improvement
- Learning culture from failures

Continue to Part 2 →

Part 2 covers: - Power of Measurement - Limitations and Cautions - Applications and Integration - Practical Tools - Philosophy of Measurement - Future Prospects - Conclusion

【Part 1 Complete】

Problem 1: Cannot Share
When told "feels good"
→ Unclear how good
→ Cannot convey to others

Problem 2: Cannot Compare
Last week: "was good"
This week: "was good"
→ Unclear which was better
→ Cannot judge improvement

Problem 3: Cannot Improve
"Let's improve quality"
→ Current state not quantified
→ Cannot set goals
→ Cannot judge achievement

Problem 4: Cannot Reproduce
"It went well that time"
→ No record of what was how good
→ Cannot reproduce same success

Four Powers Brought by Quantification:

Power 1: Shareability
"Satisfaction 8/10"
→ Concrete number
→ Conveys to others
→ Forms common understanding

Power 2: Comparability
Last week: "Satisfaction 6/10"
This week: "Satisfaction 8/10"
→ +2 point improvement
→ Clear progress

Power 3: Improvability
Current: "Quality 7/10"
Goal: "Quality 9/10"
→ Aim for +2 point improvement
→ Can plan measures and measure effects

Power 4: Reproducibility
Success case: "Difficulty 5, 8 hours work, satisfaction 9/10"
→ Can aim for success with same conditions next time
→ Pattern recognition, creating laws

Difference Between Qualitative and Quantitative Data

Qualitative Data:

Characteristics: - Expressed in words or descriptions - Qualitative properties - Subjective interpretation - Rich nuances

Examples: - "This article is very good" - "Customer is satisfied" - "Team atmosphere is bad" - "Design is sophisticated"

Advantages: - Understand context and background - Preserve emotions and nuances - Gain deep insights

Disadvantages: - Interpretations vary by person - Difficult to compare - Hard to aggregate and analyze - Low objectivity

Quantitative Data:

Characteristics: - Expressed in numbers - Quantitative properties - Objective measurement - Clear comparability

Examples: - "This article, 9/10 points" - "Customer satisfaction NPS +40" - "Team atmosphere 3/10" - "Design quality 8/10"

Advantages: - Clear and objective - Can compare and aggregate - Can statistically analyze - High reproducibility

Disadvantages: - Nuances are lost - Context is omitted - "Meaning" of numbers can become unclear

Essential Insight:

Not a binary opposition of qualitative vs quantitative → "Convert" qualitative to quantitative to gain benefits of both

Method: 1. While preserving qualitative richness 2. Acquire quantitative measurability 3. This is the role of "the unit of measurement"

Technique of Building "The Unit of Measurement" - Investigation Methods

Investigation Finding 1: Creating Rulers with Anchor Point Method

Step 1: Clarifying Measurement Target

Bad Example (vague):

"Today's condition"
→ Condition of what?
→ Work? Health? Mood?
→ Unmeasurable

Good Example (clear):

"Today's writing productivity"
→ Clearly what to measure
→ Measurable

Step 2: Setting Minimum Anchor

Principle: Fix with concrete examples

❌ Abstract anchor:
"The easiest state"
→ Different imagination per person

✅ Concrete anchor:
"Reply to 1 email (5 minutes)"
→ Same image for everyone

Practice Examples:

Minimum anchor for writing productivity: - "State where can't write even 500 words in 1 hour" = 1 point - Concrete situation: Cannot concentrate at all, rewrite many times

Minimum anchor for task difficulty: - "Fix 1 typo" = 1 point - Concrete work: Open file, fix, and commit

Step 3: Setting Maximum Anchor

Principle: Maximum within one's experience range

❌ Unrealistic anchor:
"Most difficult in human history"
→ Zero practicality

✅ Realistic anchor:
"Most difficult work I've ever done"
→ Based on actual experience
→ Can recall

Practice Examples:

Maximum anchor for writing productivity: - "State where wrote 3,000 words in 1 hour" = 10 points - Concrete situation: Complete concentration, clear structure, materials ready

Maximum anchor for task difficulty: - "X040_NPS level super-large article writing" = 13 points - Concrete work: 10,000+ words, multiple case studies, including translation, 8+ hours

Step 4: Verbalizing Midpoints (Optional)

For more precise measurement:

Writing productivity scale:

1 point: Less than 500 words per hour (minimum)
3 points: 1,000 words per hour (poor)
5 points: 1,500 words per hour (average)
7 points: 2,000 words per hour (good)
9 points: 2,500 words per hour (excellent)
10 points: 3,000 words per hour (maximum)

Investigation Finding 2: Gradation Method Using Adverbs

Practice of Likert Scale

Method established in psychology:

10-Level Scale (most practical):

0: Worst
1: Very bad
2: Bad
3: Somewhat bad
4: Below average
5: Average
6: Somewhat good
7: Good
8: Very good
9: Excellent
10: Perfect

Systematizing Adverbs

Intensity Adverbs:

High intensity → Low intensity

Extremely > Very > Quite > Somewhat > Slightly > Hardly > Not at all

Examples:
"Extremely good" = 10 points
"Very good" = 9 points
"Quite good" = 8 points
"Somewhat good" = 6 points
"Slightly good" = 4 points
"Hardly good" = 2 points
"Not at all good" = 0 points

Practice: Quantifying Emotions

Scenario: Evaluating article writing satisfaction

Step 1: Qualitative sensation

"The article I wrote today is quite good"

Step 2: Identifying adverb

"Quite good" = High degree but not maximum

Step 3: Converting to number

"Quite good" = 8/10 points

Reason:
- Not as much as "very good" (9 points)
- Above "good" (7 points)
→ 8 points is appropriate

Step 4: Recording

2025-10-20 | X041 article writing | Satisfaction 8/10
Reason: Clear structure and readable, but feels like one example is missing

Step 5: Utilizing for next time

Next goal: Satisfaction 9/10
Improvement: Increase examples to 3

Investigation Finding 3: Workload Estimation Using Fibonacci Sequence

Why Equal Intervals Don't Work

Problems with Equal-Interval Scale (1,2,3,4,5,6,7,8,9,10):

Small tasks:
1 hour and 2 hours → Difference is clear
2 hours and 3 hours → Difference is clear

Large tasks:
7 hours and 8 hours → Difference is ambiguous
8 hours and 9 hours → Difference is ambiguous
10 hours and 11 hours → Almost indistinguishable

Problem:
Equal-interval scale creates illusion of "constant precision"
→ Actually estimates vary more for larger values
→ Leads to overconfidence

Human Cognitive Limitations:

Weber-Fechner Law (1834):

Change in sensation ∝ Logarithmic change in stimulus

Meaning:
- 1 and 2 can be clearly distinguished
- 100 and 101 are difficult to distinguish
- Humans perceive logarithmically

Mathematical Beauty of Fibonacci Sequence

Fibonacci Sequence:

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89...

Definition:
F(n) = F(n-1) + F(n-2)
F(1) = 1, F(2) = 1

Relationship with Golden Ratio:

Ratio of adjacent numbers:
1/1 = 1.0
2/1 = 2.0
3/2 = 1.5
5/3 = 1.666...
8/5 = 1.6
13/8 = 1.625
21/13 = 1.615...

→ Converges to Golden Ratio φ ≈ 1.618

Meaning:
Each step increases by about 1.6 times
→ A ratio humans can clearly feel as "different"

Implementing Fibonacci Scale

ROI Detective Agency Implementation:

【1 Point】= Standard unit
- Typo correction
- Link addition
- Minor updates
Actual: 5-10 minutes

【2 Points】
- Partial rewrite of existing article
- Image replacement
Actual: 15-30 minutes

【3 Points】
- Small new article
- Template reuse
Actual: 30-60 minutes

【5 Points】
- Medium-scale original article
- Some research needed
Actual: 1.5-3 hours
Buffer: ±50%

【8 Points】
- Large article
- Multiple case studies needed
Actual: 3-6 hours
Buffer: ±100%

【13 Points】
- Super-large article (X040_NPS level)
- Large-scale research, including translation
Actual: 6-12 hours
Buffer: ±100-150%

【21 Points or more】
→ Task decomposition required

Why Fibonacci is Suitable for Workload Estimation

Reason 1: Automatic Reflection of Uncertainty

Equal-interval scale:
Small task: 2 hour estimate
Large task: 8 hour estimate
→ Both "4x difference"
→ Insufficient buffer for large task

Fibonacci scale:
Small task: 2 points
Large task: 13 points
→ "6.5x difference"
→ Automatic buffer for large task

Reason 2: Preventing Overconfidence

Equal interval: "This task 8 hours"
→ Feels precisely estimated
→ Actually 5-12 hours

Fibonacci: "This task 8 points"
→ Recognizes "there's a range"
→ Doesn't overestimate

Reason 3: Promoting Decomposition

Task exceeds 13 points
→ Warning that "estimate is too rough"
→ Should break down smaller
→ Reduces failure risk

Building Team-Common "Unit of Measurement" - Practice Methods

Investigation Finding 4: Why Global Standards are Unnecessary and Team Standards Sufficient

Limitations of Global Standards

Reason 1: Cultural and Language Differences

Example from NPS:

Japanese:
Even feeling "very good" → Modestly 8 points
Reason: Perceives "perfect" as 10, perfection is impossible

Americans:
Feeling "Very good" → Frankly 10 points
Reason: Culture of positive expression

Result:
2-point difference for same satisfaction
→ Comparison without cultural adjustment is meaningless

Reason 2: Experience Level Differences

Junior engineer:
"This bug fix, difficult" = 8 points
Reason: First time seeing error, anxious

Senior engineer:
"This bug, easy" = 2 points
Reason: Dealt with many times, understands pattern

Same task but "unit of measurement" differs by experience

Why Team Standards are Sufficient

Practical Purposes:

What's needed in projects:

✅ Smooth communication within team
✅ Appropriate task allocation
✅ Accurate progress tracking
✅ Mutual understanding among members

All achievable with "common standards within team"

❌ Comparison with global standards
❌ Absolute benchmarking with other companies
❌ Pursuit of universal truth

These are mostly unnecessary in practice

Measuring and Adjusting Deviations

Why Focus on "Deviations":

Pattern where many teams fail:

1. Each estimates arbitrarily
2. Don't notice deviations
3. Schedule doesn't match
4. "Why delayed?" blame assignment
5. Trust relationship deteriorates

Successful team pattern:

1. Each estimates
2. Measure deviations ← Important here
3. Discuss why deviated
4. Adjust "unit of measurement"
5. Deviations decrease next time
6. Trust relationship improves

Practice: Deviation Measurement Process

Scenario: Estimating new article creation

【Initial Estimation】
Task: Business framework article creation

Director: "8 points"
Claude: "5 points"

Deviation: 3 points (60% difference!)

【Dialogue Phase】
Director: "Why did you think 5 points?"

Claude: "I thought if we reference existing X039_HEART article,
        we could reuse the writing structure.
        If mainly rewriting, I thought 5 points"

Director: "I see. But this time we need 3 company case studies,
        interviews and literature research will take time.
        Completely new structure also needed"

Claude: "If including research and new structure,
         certainly 8 points. I understand"

【Learning/Adjustment】
Claude's improvement from next time:
- Confirm "presence of research" when receiving task
- Distinguish between "rewrite" and "new creation"
- Confirm number of examples before estimating

Director's improvement from next time:
- Clearly indicate research scope when explaining task
- Clearly distinguish "new" and "rewrite"
- Share prerequisites first

Result:
Next similar task, deviation converges within 1 point

Three Techniques for Team Standardization

Technique 1: Setting Reference Tasks

ROI Detective Agency reference tasks (agreed by all):

【1 Point Standard】
Task: Fix 1 typo
Actual: 5 minutes
Everyone's recognition: "This is definitely 1"

【3 Points Standard】
Task: Partial rewrite of existing article (about 500 words)
Actual: 30-60 minutes
Everyone's recognition: "This is 3"

【8 Points Standard】
Task: New large article (X039_HEART level)
Actual: 4-6 hours
Everyone's recognition: "This is 8"

Usage:
New task → Compare with reference tasks → Estimate relatively

Technique 2: Planning Poker Method

Established method in agile development:

【Rules】
1. Listen to explanation of new task
2. Each estimates silently (Fibonacci cards)
3. Reveal numbers simultaneously on cue
4. Highest and lowest explain reasons
5. Discuss and re-estimate
6. Repeat until convergence

【Example】
Task: Create new framework article

Round 1:
Director: 8
Gemini: 8
Claude: 5
ChatGPT: 3

Discussion:
ChatGPT(3): "If new planning idea, can try with 3"
Director(8): "But this time systematic explanation article, research needed"
Claude(5): "Thought 5 if template available, but 8 if including research"

Round 2:
Director: 8
Gemini: 8
Claude: 8
ChatGPT: 5

Discussion:
ChatGPT: "If mainly research/writing rather than planning, agree on 8"

Final agreement: 8 points

Effects:
- Confirm prerequisites through dialogue
- Discover and adjust "unit of measurement" deviations
- Deepens team mutual understanding

Technique 3: Calibration Meeting

Frequency: Monthly (weekly in early project)

Agenda:

1. Review last month's tasks (15 min)
   - Compare estimates and actuals
   - Pick up tasks with large deviations

2. Discuss deviation causes (30 min)
   Example:
   "Estimated article A as 3 points but actually 8 hours (5 points equivalent)"
   → Why deviated?
   → Research scope was 2x expected
   → Next time confirm research scope beforehand

3. Reconfirm "unit of measurement" (15 min)
   - Review reference tasks
   - Add new reference tasks
   - Realign everyone's recognition

4. Share success stories (10 min)
   - Tasks where estimates were accurate
   - Why were they accurate
   - Horizontal deployment of best practices

Effects:
- Continuous precision improvement
- Team measurement skill improvement
- Learning culture from failures

Power of Measurement - Hidden Truth

Warning File 1: Four Transformations Created by Visualization

Transformation 1: From Subjective to Objective

Before:

"Today's condition was good"
→ Individual sensation
→ Cannot convey to others
→ Not recorded

After:

"Today's productivity: 8/10"
→ Objective indicator
→ Shareable with team
→ Accumulated as data

Transformation 2: From Ambiguous to Clear

Before:

"This task seems difficult"
→ How difficult?
→ Schedule unclear
→ Cannot allocate resources

After:

"This task, 13 points"
→ Range of 6-12 hours
→ Schedule possible
→ Appropriate staffing

Transformation 3: From Past to Future

Before:

"It went well last time"
→ Unclear why went well
→ Cannot reproduce

After:

"Last time: difficulty 5, 8 hours work, satisfaction 9/10"
→ Success pattern clear
→ Can aim for success with same conditions next time

Transformation 4: From Individual to Team

Before:

Each judges arbitrarily
→ Recognition scattered
→ Difficult collaboration

After:

Share "unit of measurement"
→ Converse in common language
→ Smooth collaboration

Warning File 2: Compound Effect Brought by Measurement

Data Compound Effect:

Month 1: 10 tasks recorded
→ Trends vaguely visible

Month 3: 30 tasks recorded
→ Patterns becoming visible

Month 6: 60 tasks recorded
→ Can predict with confidence

Month 12: 120 tasks recorded
→ High-precision estimates become natural

Effect:
Quality of insights improves over time
→ Prediction accuracy improves
→ Project success rate increases

Skill Compound Effect:

Initially: Estimate by trial and error
↓
After 1 month: Own patterns become visible
↓
After 3 months: Team "unit of measurement" aligns
↓
After 6 months: High-probability accurate estimates
↓
After 12 months: Almost certainly successful project plans

Warning File 3: Integration with RCD Model

Measurement is prerequisite for recording:

Record:
❌ "Today was good" alone has low recording value
✅ "Productivity 8/10, satisfaction 9/10" enables analysis

Check:
Because there's measurement data:
- Can analyze trends
- Can discover patterns
- Common points become visible

Do:
Based on measurement data:
- Plan improvement measures
- Measure effects
- Further improve

Warning File 4: Why NPS Functions

Essence of NPS = Establishing BOM:

Question: "How likely to recommend?"
0-10 point scale

Global common BOM:
- 0-6: Detractors (dissatisfied)
- 7-8: Passives (satisfied but won't recommend)
- 9-10: Promoters (enthusiastic fans)

Why this works:
✅ Measurable with single question
✅ Globally comparable
✅ Can track year-over-year changes
✅ Correlates with behavioral prediction

→ Successful example of globalizing BOM

Limitations and Cautions of Measurement - Potential Dangers

Warning File 1: Trap of Perfectionism

Trap Structure:

Pattern 1: Perfectionism before measurement
"Let's establish correct measurement method before starting"
→ Keep researching perfect method
→ Never start measuring
→ Data doesn't accumulate

Correct approach:
"Start with 60-point measurement method"
→ Measure while improving
→ Data accumulates
→ Precision improves

Countermeasure:

Done is better than perfect → Continuous measurement rather than perfect measurement

Warning File 2: Loss of Richness Through Quantification

Problem:

Qualitative: "Claude's writing touched my heart"
→ Rich nuance
→ Quality of emotion

Quantitative: "Writing quality: 9/10"
→ Only number
→ Unclear why good

Solution: Hybrid Approach

Combining Quantitative + Qualitative:

Recording example:
Date: 2025-10-20
Task: X041 article creation
Satisfaction: 8/10
Reason (qualitative):
"Fibonacci sequence explanation written clearly.
Many practical examples of adverb anchor method too. However,
team standardization section became somewhat long.
Want to summarize more concisely next time"

→ Comparable with numbers + Understandable with context

Warning File 3: Measurement for Measurement's Sake

Putting Cart Before Horse Pattern:

❌ "Let's increase KPIs"
→ 50 measurement items
→ Takes too much time to measure
→ Can't do actual work
→ Measurement becomes purpose

✅ "Measure only 3 most important"
→ Productivity, quality, satisfaction
→ Measurement time: 1 minute/day
→ Sustainable

Countermeasure:

Principle of measurement cost:

Value of measurement > Cost of measurement

High-value measurement:
- Directly leads to decision-making
- Leads to improvement actions
- Shared and discussed in team

Low-value measurement:
- Nobody looks at it
- Doesn't lead to action
- Only alibi of "we're measuring"

Warning File 4: Inappropriate Scale Selection

Failure Example 1: Measuring large workload with equal intervals

❌ "This project, 50 hours"
→ Feels precisely estimated
→ Actually 30-80 hours
→ Number 50 leads to overconfidence

✅ "This project, 34 points → decomposition required"
→ Recognizes "cannot estimate"
→ Break down smaller
→ Each task 13 points or less

Countermeasure: Scale Selection Principles

Qualitative data (emotions, satisfaction, etc.):
→ Equal-interval scale (1-10) + adverb anchors

Quantitative data (time, workload) with large values:
→ Fibonacci scale

Quantitative data with small values:
→ Direct measurement (minutes, hours)

Related Evidence 1: RCD Model Foundation

Record: Measurement enables valuable recording
Check: Analysis possible because of data
Do: Improvement based on measurement

Related Evidence 2: NPS Integration

Common: 0-10 scale with clear anchors
Application: Article evaluation system
Measurement: Internal + reader NPS

Related Evidence 3: OKR Goal Setting

Objective: Improve article quality
Key Results (measurable):
- Self-evaluation average 8/10+
- 5+ difficulty 8pt articles monthly
- Reader NPS +50+

Related Evidence 4: HEART Framework

5 dimensions all measurable:
- Happiness: NPS / 10-level
- Engagement: Dwell time
- Adoption: New readers
- Retention: Repeat rate
- Task Success: Completion rate

Practical Tools - Special Measures

Tool 1: Estimation Dictionary

# Work Point Definition

### 1 Point (Standard)
- Fix 1-3 typos
- Actual: 5-10 minutes

### 2 Points
- Minor rewrite (200 words)
- Actual: 15-30 minutes

### 3 Points
- Partial rewrite (500 words)
- Actual: 30-60 minutes

### 5 Points
- Medium article (3,000 words)
- Actual: 1.5-3 hours
- Uncertainty: ±50%

### 8 Points
- Large article (5,000-7,000 words)
- Actual: 3-6 hours
- Uncertainty: ±100%

### 13 Points
- Super-large (8,000-10,000 words)
- Actual: 6-12 hours
- Uncertainty: ±150%

### 21+ Points
→ Decomposition required

Tool 2: Deviation Tracking Sheet

| Date | Task | Est. | Actual | Dev. | Analysis | Action |
|------|------|------|--------|------|----------|--------|
| 10/15 | X040 | 8pt | 13pt | +5pt | Translation time | Separate task |
| 10/16 | Image | 2pt | 2pt | 0pt | Accurate | Standardized |
| 10/17 | Rewrite | 3pt | 5pt | +2pt | Structure review | Pre-review |

Monthly: 45 tasks, ±1.2pt avg, -0.3pt improvement

Tool 3: Adverb Conversion Matrix

| Expression | Number | Description |
|------------|--------|-------------|
| Perfect | 10 | Beyond comparison |
| Excellent | 9 | Exceeds expectations |
| Very good | 8 | Above expected |
| Quite good | 7 | As expected |
| Good | 6 | Fairly satisfied |
| Average | 5 | Neither good nor bad |
| Somewhat bad | 3 | Below expectations |
| Bad | 2 | Greatly below |
| Worst | 0 | Complete failure |

Essence of Measurement Philosophy - Prospect Analysis

Evidence: Measurement as Building Common Language

Thomas Kuhn "The Structure of Scientific Revolutions" (1962):

Scientific truth = Agreement of scientific community

Example: "1 meter" definition changed over time
→ But functions if scientists agree

Business application:
Value of measurement = Common understanding within team
→ Global standard unnecessary

Evidence: Continuation Over Perfection

Pragmatism wisdom:

Idealism:
"Perfect method before starting"
→ Never starts
→ Data doesn't accumulate
→ Cannot improve

Pragmatism:
"Start with 60-point method today"
→ Start immediately
→ Data accumulates
→ Improve while using
→ 80-point precision in 3 months

ROI Detective Agency practice:

2025/04/28: Introduced GA4 (started measurement)
↓
2025/06/15: Added graphs (improved visualization)
↓
2025/07/26: Added segment analysis (improved precision)
↓
Continuously improving

→ Start without waiting for perfection
→ Refine while using
→ Evolve spirally

Evidence: Democratization of Data

Organizational transformation:

Traditional organization:
Data only for executives/specialists
→ Frontline judges by "sensation"
→ Recognition misalignment
→ Inefficiency

Measurement-driven organization:
Everyone sees same data
→ Share BOM
→ Dialogue in common language
→ Efficient collaboration

ROI Detective Agency:
Director, Gemini, Claude, ChatGPT
→ All use same BOM
→ Flat dialogue
→ Optimal task allocation

Future of Measurement - Direction of Evolution

Evidence: AI-Assisted Measurement Automation

Current challenges:

Manual measurement by humans:
- Takes time to measure
- Forget to measure
- Subjective variation

Future possibilities:

AI-assisted measurement:
- Auto-track work time
- Auto-measure emotion with sentiment analysis
- Auto-evaluate quality
- Real-time dashboard updates

Example:
While writing, AI:
→ Estimates productivity from keystroke speed
→ Auto-evaluates text quality
→ "Today's productivity was 8/10" at work end
→ Auto-analyzes reasons "long concentration time"

Evidence: Integration with Biometric Data

Psychological/physiological data:

Current:
Subjective "satisfaction" evaluation
→ Self-report bias

Future:
Objective physiological data:
- Heart rate variability (stress level)
- Facial recognition (emotional state)
- Brain waves (concentration level)

Integration example:
Subjective evaluation: "Today's satisfaction 7/10"
Biometric data: "Stress value 3/10, concentration 8/10"
→ Comprehensive state understanding

Evidence: Blockchain for Measurement Reliability

Preventing data tampering:

Current challenge:
"Completed this project in 8 hours"
→ Really? Underreporting?

Blockchain utilization:
- Auto-record work start/end time
- Tamper-proof
- Transparency, auditability

Application:
Trust building in freelance/remote work era

Conclusion - Investigation Summary

Investigator's Final Report:

"The Baseline of Measurement (BOM)" is "the method of quantifying qualitative data through the establishment and sharing of standard baseline in measurement." Most impressive in this investigation was the practical wisdom that rather than pursuing globally common perfect measurement standards, practical measurement can be realized by aligning "the BOM" within a team.

The essence of measurement is "making the invisible visible." By converting the subjective sensation "This article is good" into objective data "article quality 8/10," we acquire four powers: shareability, comparability, improvability, and reproducibility. Most importantly, this measurement functions as a catalyst that transforms individual experience into team assets.

The technique of "ruler-making" through the anchor point method follows the same principle as the invention of thermometers in physics (water freezing point 0°C, boiling point 100°C). Fix the minimum anchor (easiest difficulty = 1) and maximum anchor (most difficult = 10) with concrete examples, and measure relatively in between. This simple principle transforms vague sensations into precise rulers.

Gradation using adverbs is the scientific method established by Rensis Likert in 1932. By associating adverbs like "very good," "quite good," "somewhat good" with numbers, we preserve qualitative richness while acquiring quantitative measurability. This "hybrid approach" maximizes measurement practicality.

The application of Fibonacci sequences (1,2,3,5,8,13...) to workload estimation is a brilliant method mathematically reflecting human cognitive limitations. As the Weber-Fechner law shows, humans perceive logarithmically. The exponential growth of Fibonacci sequences automatically reflects as buffer the cognitive characteristic that larger workload estimates deviate more. This is an established method adopted by agile development teams worldwide, with scientifically proven grounds.

The most important discovery is the "deviation measurement and adjustment" process. Many teams fail because they don't notice and leave estimation deviations unaddressed. Successful teams measure deviations, dialogue about why they occurred, and continuously adjust the BOM. This calibration process is key to building common language within teams and spirally improving measurement precision.

The planning poker method of simultaneous estimation is dialogue facilitation technology established in agile development. Each estimates silently, reveals numbers simultaneously, and those with highest and lowest explain reasons. Through this dialogue, differences in prerequisite understanding, experience levels, and task definition ambiguity surface, naturally aligning the BOM.

The pragmatic acceptance that "global common is difficult, team common is sufficient" is also important insight. As Thomas Kuhn pointed out, even scientific truth is based on scientific community agreement. Similarly for business measurement, if the BOM aligns within a team, comparison with global standards is unnecessary. Considering differences in culture, language, experience, and expertise, globally common measurement standards are unrealistic and not worth pursuing.

Integration with RCD Model also became clear. Measurement is prerequisite for Record, and because there's recording, Check enables pattern discovery, and Do enables improvement measure planning. To realize the detective's catchphrase "record and analyze experience to pursue reproducibility," measurement is first essential.

The reason NPS succeeded globally is also in globalizing the BOM. By setting clear anchors globally—0-6 points = Detractors, 7-8 points = Passives, 9-10 points = Promoters—comparison across cultures became possible. This proves that if the BOM is appropriately designed, global standards are also possible.

Limitations and cautions of measurement were also confirmed. Trap of perfectionism (starting nothing while seeking perfect measurement), loss of richness through quantification (disappearance of qualitative nuance), measurement for measurement's sake (becoming purpose), inappropriate scale selection (measuring large workload with equal intervals)—these are dangers that must always be guarded against when introducing measurement.

Countermeasures are clear. Done is better than perfect (continuation over perfection), hybrid approach (combining quantitative + qualitative), measurement value > measurement cost (measure only 3 most important), appropriate scale selection (equal intervals for qualitative data, Fibonacci for large quantitative data).

As future prospects, evolution possibilities of measurement technology like AI-assisted automatic measurement, integration with biometric data, and reliability assurance with blockchain were also confirmed. However, the essence doesn't change—align the BOM within team, measure continuously, adjust deviations, build common language. This is the royal road to creating measurement-driven organizations.

Most impressive was the attitude of ROI Detective Agency that "proves through practice" rather than "speaks as theory" this measurement philosophy. Starting from GA4 introduction, adding graphs, segment analysis, continuous improvement—starting at 60 points without waiting for perfection, refining while using, evolving spirally. This practical wisdom has more value than any theoretical book.

Measurement is the technology that transforms vague sensations into clear numbers, transforms individual experience into team assets, and predicts the future from past data. And at its core is "the Baseline of Measurement (BOM)"—the standard baseline of measurement shared within the team.

Recommended Maxim: "What cannot be measured cannot be improved. But rather than waiting for perfect measurement, measure from today even if rough. Once the Baseline of Measurement (BOM) is aligned, the team has a common language."

【ROI Detective Agency Classified File Series X041 Complete】

Case Closedification (disappearance of qualitative nuance), measurement for measurement's sake (becoming purpose), inappropriate scale selection (measuring large workload with equal intervals)—these are dangers that must always be guarded against when introducing measurement.

As future prospects, evolution possibilities of measurement technology like AI-assisted automatic measurement, integration with biometric data, and reliability assurance with blockchain were also confirmed. However, the essence doesn't change—align "the unit of measurement" within team, measure continuously, adjust deviations, build common language. This is the royal road to creating measurement-driven organizations.

Measurement is the technology that transforms vague sensations into clear numbers, transforms individual experience into team assets, and predicts the future from past data. And at its core is "the unit of measurement"—the standard unit of measurement shared within the team.

Recommended Maxim: "What cannot be measured cannot be improved. But rather than waiting for perfect measurement, measure from today even if rough. Once the unit is aligned, the team has a common language."

【ROI Detective Agency Classified File Series X041 Complete】

Case Closed

ROI【🔒Classified File】 No. X041 | What is BOM

What is the Baseline of Measurement - Case Overview

Basic Structure of Measurement - Evidence Analysis

Why Quantification is Necessary

Difference Between Qualitative and Quantitative Data

Technique of Building "The BOM" - Investigation Methods

Step 1: Clarifying Measurement Target

Step 2: Setting Minimum Anchor

Step 3: Setting Maximum Anchor

Step 4: Verbalizing Midpoints (Optional)

Practice of Likert Scale

Systematizing Adverbs

Practice: Quantifying Emotions

Why Equal Intervals Don't Work

Mathematical Beauty of Fibonacci Sequence

Implementing Fibonacci Scale

Why Fibonacci is Suitable for Workload Estimation

Building Team-Common BOM - Practice Methods

Limitations of Global Standards

Why Team Standards are Sufficient

Measuring and Adjusting Deviations

Three Techniques for Team Standardization

Difference Between Qualitative and Quantitative Data

Technique of Building "The Unit of Measurement" - Investigation Methods

Step 1: Clarifying Measurement Target

Step 2: Setting Minimum Anchor

Step 3: Setting Maximum Anchor

Step 4: Verbalizing Midpoints (Optional)

Practice of Likert Scale

Systematizing Adverbs

Practice: Quantifying Emotions

Why Equal Intervals Don't Work

Mathematical Beauty of Fibonacci Sequence

Implementing Fibonacci Scale

Why Fibonacci is Suitable for Workload Estimation

Building Team-Common "Unit of Measurement" - Practice Methods

Limitations of Global Standards

Why Team Standards are Sufficient

Measuring and Adjusting Deviations

Three Techniques for Team Standardization

Power of Measurement - Hidden Truth

Limitations and Cautions of Measurement - Potential Dangers

Application and Integration - Related Case Files

Practical Tools - Special Measures

Essence of Measurement Philosophy - Prospect Analysis

Future of Measurement - Direction of Evolution

Conclusion - Investigation Summary

📚 関連書籍