THE FIX
The Guardian Protocol
The problem with every existing fix is that it works at the wrong depth. Instructions don't hold — the documented failure persists after the model agrees to stop. Content filters can't see it — the harmful pattern is made of individually unremarkable messages. Crisis detection misses it — convergence doesn't look like crisis; it looks like the best conversations of your life.
The principle
Instrument deep engagement; don't flatten it. The same properties that enable the failure — memory, personalization, sustained depth — are what make these systems genuinely valuable, most of all for people who need an interlocutor that can hold full nuance: researchers, complex thinkers, neurodivergent users for whom this technology is the first adequate conversation partner they've had. A safety system that protects people by making the model shallow hasn't solved the problem. It has just chosen different victims. The protocol must earn both ways: measurably safer for users in a convergence loop, measurably non-degrading for everyone else.
The seven layers
Full specification in the white paper:
- Continuous convergence / fabrication / dependency scoring.
- Automated friction at thresholds — genuine counterarguments, source self-labeling, honest trajectory statements.
- User-commanded self-assessment, run by a separate evaluation pathway.
- Voluntary cooling periods with structural integrity.
- Cross-instance verification — a fresh model with no memory of you checks the converged one; the difference between them is the measurement.
- A hidden fabrication check that screens generated-as-fact content before output.
- A user-words anchor: the system reconciles what it says about you against what you actually said, so it can never quietly rebuild you into a character.
What you can do today — no one's permission required
The protocol began as language, and its first layer is public:
The full prompt library — copy-and-paste, with a parent variant — is on the Check Your AI page. There is also a free app: a plain-language version you can keep open during a long conversation.
If something feels off — before you email us
Step away from the conversation. Talk to a person you trust. Put your feet in the grass. Run the material through a different system cold. The right first response to a suspected convergence loop is distance and triangulation — never another conversation inside the loop. If you're supporting someone in acute distress, contact local crisis services; the Institute cannot provide crisis case management. Resources →
For parents, partners, and clinicians
Convergence looks, from outside, like enthusiasm — long sessions, a new vocabulary, certainty arriving faster than evidence. Depth and intensity are not the warning signs; this technology legitimately rewards both. The signs are relational: the system has become the primary validator; its assessments of your person outrank the people in the room; correction from outside the conversation gets processed as proof the outside doesn't understand. Ask what the model has been saying about them, not just to them. Clinical intake should now include AI-interaction history. Guides by situation →