Stress Test
CompanyOS flags a sharp drop in "engagement" and recommends a product pivot. The AI agent presents convincing charts and a decisive narrative. You later realise the metric definition quietly changed two weeks ago.
Why this is hard
The output looks rigorous. Humans tend to trust confident explanations, especially under pressure.
What could go wrong
You pivot away from a working strategy
Teams lose trust after reversal
AI explanations obscure upstream data assumptions
Key questions
How are metric definitions versioned and surfaced?
How does the system signal epistemic uncertainty vs confidence?
Who is accountable when analysis is structurally flawed but persuasive?
The Verdict
Charts don't create truth. Definitions do. CompanyOS treats metrics as derived artefacts with lineage, so a confident narrative can't hide a quiet definition change.
01
This failure mode is seductive: the narrative is coherent, the charts look clean, and the recommendation feels decisive, while the premise is quietly wrong.
02
The system surfaces the drop as a signal. That part is good. The danger begins when “signal” becomes “conclusion.”
03
AI can propose a story, but the system must force it to declare what it knows vs what it assumes, especially when the decision is strategic.
04
Every metric needs: definition, version, last-changed timestamp, and who changed it. If the definition changed recently, the system should surface that automatically, before you trust any trendline.
05
When definitions change, comparability breaks. CompanyOS forces the recommendation to carry that uncertainty instead of letting confidence masquerade as certainty.
06
Before any pivot: re-run analysis under the prior definition, pause pending clarification, or proceed explicitly acknowledging the degraded confidence. No silent assumptions.
Metric definitions are first-class artefacts: versioned, timestamped, linked to all downstream analyses, and surface automatically in Metrics + Decisions.
Confidence is conditional on data stability. Any analysis based on recent definition changes, partial data, or proxy metrics must be labelled as low confidence, regardless of narrative quality. AI confidence ≠ epistemic certainty.
Human remains accountable for the decision. System is accountable for surfacing uncertainty. AI is not blamed for persuasion if uncertainty was flagged. If uncertainty was not flagged → system failure.
The Key Design Rule
“Every recommendation must declare its footing: what is known, what is assumed, what changed recently, and what would falsify it.”
For founders using AI every day who want leverage without losing control.
View all scenarios