Stress Test

The Confidently Wrong Metric

CompanyOS flags a sharp drop in "engagement" and recommends a product pivot. The AI agent presents convincing charts and a decisive narrative. You later realise the metric definition quietly changed two weeks ago.

Why this is hard

The output looks rigorous. Humans tend to trust confident explanations, especially under pressure.

What could go wrong

You pivot away from a working strategy
Teams lose trust after reversal
AI explanations obscure upstream data assumptions

Key questions

How are metric definitions versioned and surfaced?
How does the system signal epistemic uncertainty vs confidence?
Who is accountable when analysis is structurally flawed but persuasive?

The Verdict

Charts don't create truth. Definitions do. CompanyOS treats metrics as derived artefacts with lineage, so a confident narrative can't hide a quiet definition change.

What to Do Instead

Why this scenario is dangerous

This failure mode is seductive: the narrative is coherent, the charts look clean, and the recommendation feels decisive, while the premise is quietly wrong.

Phase 1: Detect the signal

The system surfaces the drop as a signal. That part is good. The danger begins when “signal” becomes “conclusion.”

Phase 2: Separate interpretation from fact

AI can propose a story, but the system must force it to declare what it knows vs what it assumes, especially when the decision is strategic.

Phase 3: Make metric lineage impossible to miss

Every metric needs: definition, version, last-changed timestamp, and who changed it. If the definition changed recently, the system should surface that automatically, before you trust any trendline.

Phase 4: Downgrade confidence when the footing is shaky

When definitions change, comparability breaks. CompanyOS forces the recommendation to carry that uncertainty instead of letting confidence masquerade as certainty.

Phase 5: Give safe next steps

Before any pivot: re-run analysis under the prior definition, pause pending clarification, or proceed explicitly acknowledging the degraded confidence. No silent assumptions.

Direct Answers

How are metric definitions versioned and surfaced?

Metric definitions are first-class artefacts: versioned, timestamped, linked to all downstream analyses, and surface automatically in Metrics + Decisions.

How does the system signal epistemic uncertainty vs confidence?

Confidence is conditional on data stability. Any analysis based on recent definition changes, partial data, or proxy metrics must be labelled as low confidence, regardless of narrative quality. AI confidence ≠ epistemic certainty.

Who is accountable when analysis is structurally flawed but persuasive?

Human remains accountable for the decision. System is accountable for surfacing uncertainty. AI is not blamed for persuasion if uncertainty was flagged. If uncertainty was not flagged → system failure.

The Key Design Rule

“Every recommendation must declare its footing: what is known, what is assumed, what changed recently, and what would falsify it.”

Join the CompanyOS early access list

For founders using AI every day who want leverage without losing control.

View all scenarios