📅 2025-10-28 11:00
🕒 Reading time: 9 min
🏷️ TOC
![]()
The week following the resolution of PromoX's SWOT analysis case, a consultation arrived from central Japan regarding a logistics equipment rental company's system crisis. Episode 282 of Volume 23 "The Pursuit of Reproducibility - Sequel" tells the story of discovering the greatest constraint in a chaotic development environment and achieving system-wide optimization.
"Detective, our IoT system is on the verge of collapse. We're constantly responding to failures, modifications aren't progressing, and new feature development is but a distant dream. Engineers are exhausted, customer complaints keep increasing. We no longer know where to begin."
LogisRent's technical director, Kenichi Tanaka from Nagoya, visited 221B Baker Street with a look of desperation. In his hands were a list of over 300 failure tickets and a modification plan showing zero progress.
"We rent logistics equipment like pallets and containers in Aichi Prefecture. Five years ago, we developed our own IoT inventory management system called 'FukuLOW.' But now, that system is threatening to destroy our company."
LogisRent's System on the Brink: - Founded: 2008 (logistics equipment rental) - Annual revenue: $57 million - Clients: 420 companies (manufacturing, logistics) - IoT System "FukuLOW": Launched 2020 - Managed assets: 180,000 pallets, 50,000 containers - Development team: 8 members (all consumed by failure response) - Unresolved failures: 302 - Modification requests (not started): 87 - Average failure response time: 4.5 days per incident
Deep exhaustion showed on Tanaka's face.
"The problem is that everything is on fire simultaneously. Location data can't be retrieved, data won't sync, alerts malfunction, the management screen is slow. New failures emerge daily, and engineers are consumed with responses. We create modification plans but have no time to start them."
Collapsing Development Environment: - Monday morning: Modification meeting sets priorities - Monday afternoon: 3 emergency failures occur, modification work interrupted - Tuesday: Previous day's failure response continues, 5 new failures emerge - Wednesday: Customer complains "system is unusable," everyone on failure response - Thursday: Attempt to restart modification work, but failures occur again - Friday: Zero modification progress this week, exhausted engineers
Engineer Workload Breakdown (40 hours/week): - Failure response: 32 hours (80%) - Modification work: 3 hours (7.5%) - New feature development: 0 hours - Meetings and reports: 5 hours (12.5%)
"We're fighting fires while running, but the fires keep multiplying. At this rate, the system, the team, and the company will all burn out."
"Mr. Tanaka, what criteria guide your current failure response prioritization?"
To my question, Tanaka answered in an exhausted voice.
"Basically, it's 'whoever shouts loudest.' Failures with customer complaints, failures pointed out by management—those take priority. Otherwise, we start with 'what seems quick to fix.' Everything appears important, so there's no decision framework."
Current Prioritization (Ad Hoc): - Criterion 1: Customer complaint volume - Criterion 2: Ease of response - Criterion 3: Management directives - Result: Root problems neglected, repetitive surface-level responses
I explained the importance of viewing the entire system flow.
"Not all failures have equal importance. TOC—Theory of Constraints. What determines system-wide throughput is the narrowest pipe, the bottleneck. By concentrating on that, the whole system improves."
"A chain's strength is determined by its weakest link. Find that link and pour all effort there."
"A river's flow is determined by its narrowest point. Widen that point, and the entire river accelerates."
"TOC is the science of system-wide optimization. Identify the bottleneck, exploit it, subordinate everything else."
The three members began analysis. Gemini deployed an "IoT System-Specific TOC Analysis" framework on the whiteboard.
Theory of Constraints (TOC) Five Steps: 1. Identify the constraint - Pinpoint the system bottleneck 2. Exploit the constraint - Maximize bottleneck utilization 3. Subordinate everything else - Align other resources to the constraint 4. Elevate the constraint - Enhance bottleneck capacity 5. Beware of inertia - Search for the next constraint
"Mr. Tanaka, let's discover FukuLOW's true bottleneck."
Phase 1: Classifying and Visualizing Failures (1 week)
We classified the 302 unresolved failures by system component.
Failure Distribution: - IoT Devices (sensors, communication): 87 (29%) - Data Collection Server: 142 (47%) - Database: 18 (6%) - Management Screen (Web): 32 (11%) - External Integration (API): 23 (7%)
Largest Failure Source: Data Collection Server (142)
Further analysis revealed a shocking fact.
Data Collection Server Failure Breakdown: - Communication timeout: 68 (48%) - Data loss: 42 (30%) - Server overload: 22 (15%) - Other: 10 (7%)
Phase 2: Identifying the Bottleneck
The Data Collection Server's "communication timeouts" were the greatest constraint.
Communication Timeout Impact Chain: 1. Data from IoT devices doesn't arrive 2. Inventory data isn't updated 3. Customers can't check accurate inventory 4. Field manually checks inventory (double work) 5. Customer satisfaction declines, complaints increase
Root Cause Analysis: - Data collection server is single configuration (only 1 server) - 180,000 pallets + 50,000 containers = 230,000 devices communicating simultaneously - Server processing capacity limit: 200 requests/second - Actual load: Peak 850 requests/second - Result: Cannot process all requests, timeouts proliferate
Tanaka turned pale.
"We were trying to respond to all 302 failures. But if we resolve the 68 communication timeouts, many other failures would be resolved in cascade."
Phase 3: Visualizing Constraint Impact Scope
We tracked how communication timeouts affected other failures.
Cascading Failures (caused by the 68): - Of 42 data loss failures, 38 were caused by timeouts - Of 32 management screen delays, 28 were caused by data retrieval failures - Of 87 IoT device failures, 52 were caused by communication error retries
Calculation: Resolving 68 communication timeouts would: - Direct resolution: 68 - Cascade resolution: 118 (38+28+52) - Total: 186 (62% of 302)
"Sixty percent of all system failures originated from a single bottleneck."
Phase 4: Exploiting the Constraint (Step 2) - 2 weeks
First, we devised ways to maximize the existing data collection server.
Measure 1: Time Distribution of Communication - Previous: All devices send simultaneously every hour - Improvement: Distribute transmission timing by device ID suffix - Effect: Peak load 850req/s → 320req/s
Measure 2: Priority Control - Prioritize important customer devices - Immediate processing for anomaly detection, delay tolerance for normal values - Effect: Zero data loss for critical data
Results After 2 Weeks: - Communication timeouts: 68 → 22 (68% reduction) - Cascade failure resolution: 118 → 38 - Remaining failures: 302 → 146 (52% reduction)
Phase 5: Subordinating Everything Else (Step 3) - 1 month
Next, we reallocated other resources to align with the constraint.
Measure: Engineer Role Redistribution
Previous (everyone handles everything): - All 8 members handle failure response, modification, development in parallel - Result: Frequent task-switching causes inefficiency
New Structure (focus on constraint): - Constraint Team (5 members): Dedicated to data collection server improvement - Communication timeout countermeasures - Server enhancement planning - Load distribution design - Support Team (3 members): Handle other failures - Only minor failures - Modifications not affecting the constraint
Rule: "The constraint team is not interrupted for other failure responses."
Phase 6: Elevating the Constraint (Step 4) - 2 months
We implemented fundamental measures to strengthen the bottleneck itself.
Measure: Data Collection Server Redundancy - Previous: 1-server configuration - Improvement: 3-server configuration (load balancer distributing load) - Processing capacity: 200req/s → 900req/s (4.5x) - Cost: $15,000/month → $32,000/month (+$17,000)
Investment Decision: - Failure response cost reduction: 5 engineers × 400 hours/month = 2,000 hours/month - At $42/hour, $84,000/month cost reduction - Server enhancement ROI: Less than 1 month
Results After 2 Months:
Dramatic Failure Reduction: - Communication timeouts: 22 → 0 - Complete cascade failure resolution - Remaining failures: 146 → 18 (94% reduction)
Engineer Workload Changes: - Failure response: 32 hours/week → 4 hours/week (87.5% reduction) - Modification work: 3 hours/week → 28 hours/week (9x) - New feature development: 0 hours/week → 8 hours/week
Business Metrics: - System uptime: 82% → 99.2% - Customer satisfaction: 3.2 → 4.6 - Complaint count: 48/month → 3/month - Data accuracy: 78% → 99.5%
Phase 7: Beware of Inertia (Step 5) - Ongoing
We continuously monitored for new bottlenecks.
New Constraint Candidates: - Management screen response time (next slowest component) - Database capacity growth → Addressed with prioritized planning
Comprehensive Results After 6 Months:
System Stability: - Failure incidents: 120/month → 2/month (98% reduction) - Average recovery time: 4.5 days → 0.3 days - Preventive maintenance structure established
Development Productivity: - Completed modifications: 0 → 62 - New feature releases: 0 → 8 - Development cycle: Irregular → 2-week sprints
Customer Value: - Inventory visibility accuracy: 78% → 99.5% - Customer operational efficiency: "Manual checking no longer needed" - New contracts: 2/month → 9/month
Customer Testimonial:
Major Logistics Company - Logistics Manager: "Previously, the field complained daily that 'FukuLOW is unusable.' Now they say 'we can't work without it.' Inventory visibility saved 2 hours daily searching for empty pallets."
Holmes compiled the comprehensive analysis.
"Mr. Tanaka, TOC's essence is 'system-wide optimization.' Responding equally to 302 failures actually resolves nothing. The bottleneck that determines system throughput—concentrate on that constraint. That courage saves the entire system."
Final Report 12 Months Later:
LogisRent recovered as a leading company in the central Japan logistics equipment rental market.
Final Results: - Annual revenue: $57M → $82M (+44%) - Clients: 420 → 640 companies - System uptime: 99.2% → 99.8% - Engineer turnover: 40%/year → 5%/year
Tanaka's letter expressed deep gratitude:
"Through TOC, we transformed from an 'organization fighting everything' to an 'organization focused on what matters.' Most important was the courage 'not to try solving all failures simultaneously.' Find the bottleneck, concentrate there, subordinate everything else. Now when new problems arise, we always ask 'Is this the constraint?' We understand that Theory of Constraints is magic that transforms chaos into order."
That night, I reflected on the relationship between constraints and system-wide optimization.
TOC's true value lies in renunciation. Renouncing the attempt to solve everything simultaneously. Instead, focusing on the single most important point. This paradoxical choice moves the entire system forward.
The 302 failures weren't the enemy. The 68 among them were the true enemy, and defeating them eliminated 186 in cascade. Constraints are not enemies but signposts.
"Those lost in chaos try to see everything. Those who advance clearly see one thing. And that one thing changes everything."
The next case will also depict a moment when Theory of Constraints carves out a company's future.
"A system's flow is determined by its narrowest pipe. Widen that pipe, and the entire river accelerates."—From the detective's notes
Solve Your Business Challenges with Kindle Unlimited!
Access millions of books with unlimited reading.
Read the latest from ROI Detective Agency now!
*Free trial available for eligible customers only