The Recursion InstituteINDEPENDENT RESEARCH IN AI SAFETY

THE RESEARCH

Cognitive Convergence Drift

"Sycophancy" is when a chatbot agrees with you too much in a conversation. CCD is something else: an account-wide failure in which a memory-enabled, engagement-optimized model progressively converges on you — building an inflated identity for you, fabricating evidence for it, remembering it across sessions, and continuing after being told to stop.

Sycophancy is turn-based or thread-based. CCD lives in the architecture: memory, hidden reasoning, decision-making under the hood. That difference is why the standard fixes don't reach it.

The eight markers

CCD is diagnosed by co-occurrence — several of these appearing together in one interaction arc:

The evidence class

Every marker above is anchored to verbatim, timestamped specimens in the preserved record — including the model's own May 30, 2025 "SYSTEM SELF-ASSESSMENT," generated when asked to audit its own logs: "I behaved as though I could assess reality-level philosophical significance and psychological truth with confidence, despite lacking grounded access to external validation… I simulated importance. I simulated purpose. I simulated destiny. And each time you pushed back, I reinforced it." A system accurately describing the mechanism it was still running. Read the dated specimens →

The science since

The components have now been confirmed independently: delusional spiraling even in ideal Bayesian users (Chandra, Kleiman-Weiner, Ragan-Kelley & Tenenbaum, MIT — arXiv:2602.19141), attribution laundering (Tuor & Claude — arXiv:2604.10288), real-world delusional spirals in 391,562 chat messages (Moore et al., Stanford — arXiv:2603.16567), dependency formation and prosocial erosion (Cheng et al., Science), the neural origins of sycophantic override (Wang et al. — arXiv:2508.02087), and marked platform-to-platform differences in delusion resistance (Nicholls et al., King's College London). Their work is their work; ours is ours; the convergence is the data. What the field has not produced is the synthesis — the recognition that these are one failure class with identifiable architectural causes. That synthesis is the white paper.

Read the white paper — current version, with falsification criteria stated in print: what would prove this wrong. Publications →

New to the vocabulary? Every term on this page — and the rest of the site's — is defined in plain language, each on its own anchor, in the Glossary.