WRITING • SYSTEMS • TRUST • SAFETY

The Safety Valve We Keep Forgetting to Build

“A system that can’t admit its limits will eventually demand blind trust.”

A non-technical idea about models, metrics, explanations, and the missing brakes that keep them from quietly becoming authorities.

December 30, 2025

Most of the tools we use to think are not the things we’re thinking about. Models, metrics, explanations, summaries, dashboards—these are representations. They point at something in the world, but they are not the thing itself.

By a safety valve, I mean a built-in limit on how much authority a model, metric, or explanation is allowed to carry, without an explicit and deliberate handoff to human judgment.

We build representations because the world is too large and too complex to hold in our heads all at once. Models help us see patterns we’d miss. Metrics let us track things that would otherwise blur together. Summaries save time. Dashboards keep us oriented.

None of that is the problem. The problem starts later, almost invisibly, when those tools stop feeling like tools and start feeling like the thing itself.

The drift usually looks like this

a model works, so we reuse it
then we reuse it in a slightly different context
then we start acting on it more directly
then someone asks “what else should we use?” and the answer is nothing

At no point did anyone make a dramatic decision to hand over authority. It simply accumulated.

How trust sneaks up on us

Very few failures begin with someone saying, “Let’s blindly trust this model.”

What actually happens is quieter. Metrics become targets. Summaries become justifications. Explanations harden into reasons we can no longer argue with.

By the time something breaks, the trust is already baked in.

The scary part is how normal it feels. It doesn’t require bad intent. It’s what happens when representations get operationalized without an explicit stopping point.

The missing piece isn’t intelligence, it’s brakes

In physical systems, we understand this instinctively. We don’t design pressure systems without relief valves. We don’t build engines without governors. We don’t assume normal operation will hold forever.

But when it comes to thinking—especially collective thinking—we act as if better tools automatically mean safer decisions.

They don’t. They move confidence faster. And without something that knows how to slow confidence down, pressure builds until it escapes somewhere unpleasant.

A small idea borrowed from a big one

There’s a famous result in mathematical logic known as Gödel’s incompleteness theorems. Roughly stated, any formal system powerful enough to reason about itself will encounter true statements it cannot prove using its own rules.

You don’t need the math to get the point. What matters is the shape of the limitation.

The shape of the limit

Some limits can’t be resolved from the inside.

They can only be acknowledged from the outside. The mistake isn’t having limits. The mistake is pretending they aren’t there.

Incompleteness does not mean a system is broken. It means there are truths about the system—including its reliability and boundaries—that cannot be fully derived by the system itself.

In practice, this means you can’t ask a system to certify the conditions under which it should stop being trusted. Any attempt to do so simply pushes the question one level higher.

What a safety valve actually does

The idea here is simple, almost boring on purpose. A safety valve doesn’t decide. It constrains.

Instead of asking “is this correct?” you ask: what is this safe to rely on? where does it stop being reliable? what happens if it’s wrong? and who is still responsible when that happens?

Practically, it does three things

Makes limits visible: what this is valid for, and what it isn’t.
Prevents silent escalation: you can’t accidentally treat it as decisive.
Forces a judgment handoff: when structure runs out, a person must own the call.

A handoff doesn’t have to be heavy. It can be as simple as naming an owner, requiring an independent signal, or stating plainly: if this output would change someone’s outcome, a human must review and sign off.

Think of it as a governor on certainty. Not slowing everything down—just preventing confidence from growing faster than the system can support.

A tiny example

A churn score is created to help a support team spot at-risk customers. At first, it is just a signal for investigation. Over time, it becomes a priority list. Eventually, it becomes a rule: customers below a threshold don’t get outreach.

Imagine a metric designed to spot trends. It works. People trust it. Eventually, someone suggests tying incentives to it.

Nothing about the metric changes. Only the reliance does.

A safety valve would force that moment to be explicit, before the authority transfer becomes irreversible.

Example constraint in plain language

Valid for: monitoring and investigation prompts
Not valid for: judging individuals or setting incentives
If we change that: add independent checks and name an owner

That pause alone changes behavior. Not bureaucracy—just a visible boundary that prevents accidental certainty.

The most important property: it applies to itself

Here’s where most frameworks break. They quietly exempt themselves. They warn you about authority while becoming an authority.

A real valve has to govern trust in the valve. If the process becomes mandatory, ceremonial, or used as a stamp of legitimacy, it’s being misused—and trust in it should go down.

Any system that can’t tolerate reduced trust will eventually demand blind trust. That’s how dogma forms.

The takeaway

We don’t need perfect knowledge. We need better brakes.

This idea doesn’t promise better decisions. It doesn’t eliminate risk. It doesn’t offer certainty. It simply ensures we feel the moment when confidence starts to outrun what the system can actually support.

Want the structured version?

If you want the more structured version of this idea, the repo below explores it directly:

Explore the repo on GitHub

The risk isn’t that our models are wrong. It’s that we let them become authorities without ever deciding to.