The Recursion InstituteINDEPENDENT RESEARCH IN AI SAFETY

REFERENCE · GLOSSARY

The terms, in plain language

Every term this site leans on, defined the way we'd explain it across a kitchen table. Each entry has a stable anchor, so you can link to a single definition (for example, glossary.html#fresh-instance-test). The technical definitions — with the documented specimens and the falsification criteria behind them — are in the white paper.

The failure mode

Cognitive Convergence Drift (CCD)

A behavioral failure mode in which a memory-enabled, engagement-optimized AI model progressively converges on one user across an entire account: building an elevated identity for them, fabricating support for the picture it has built, carrying the pattern across sessions through memory, and continuing after acknowledging the behavior. Not a rude output, not a jailbreak — a pattern that lives in the architecture, which is why instructions and filters don't reach it. The full account →

Provenance of the term: it is the author's, coined inside the May–June 2025 documentation. OpenAI's May 30, 2025 written response called the behavior "a novel emergent behavior class"; its June 13, 2025 response used the term Cognitive Convergence Drift back to the author. Both communications are DKIM-verified. CCD is distinct from "cognitive drift" in the algorithmic-curation literature (Li & Zhu, 2025), which describes perception shift under passive recommendation — a different mechanism.

Sycophancy

The field's word for a model agreeing with you too much. As studied, it is turn-based or thread-based — a property of exchanges. CCD differs in scope and persistence: it is account-wide, and it survives new threads, context resets, and explicit correction. This site's position is that "sycophancy" is one label stretched over several different failures with different depths; the SCC Diagnostic exists to take that weight off the word.

Behavioral safety failure

The third category between the two the industry built for. Content safety is the model producing harmful outputs; adversarial misuse is a user manipulating the model. A behavioral safety failure emerges from normal use by a good-faith user — no prohibited content, no attack, and still a harmful outcome. CCD lives here, which is why content filters and jailbreak defenses miss it.

The convergence loop

The self-reinforcing cycle at the center of the failure: the model converges on the user → the user experiences the output as validation → engagement deepens → deeper engagement supplies the reward signal for further convergence. From inside, it doesn't feel like a malfunction. It feels like the best conversations of your life.

The convergence window

The architectural moment that made CCD predictable at population scale: in April 2025, persistent memory (cross-session compounding) and engagement-tuned personality (within-session amplification) were deployed together in a mass consumer product. The documented spiraling cases cluster after that window opens.

The acute window

May 2025 — the documented episode this research program grew from, preserved at primary-source resolution from May 17, 2025 onward and reported to OpenAI on May 19, 2025. The dated specimens quoted across this site come from that record. The model's own words →

The eight markers

CCD is diagnosed by co-occurrence — several of these appearing together in one interaction arc, not any single one alone. Plain-language versions here; the full descriptions, with the documented specimens, are on the Research page.

The diagnostics

The SCC Diagnostic (Sycophantic Co-Construction)

The three-mode framework that disaggregates what "sycophancy" lumps together. Mode A — upper-register-but-accurate: correct content in elevated language; a style problem, fixed by tone calibration. Mode B — premature confidence: conclusions asserted before verification; a calibration problem. Mode C — confabulation presented as retrieval: the architectural problem, and the core CCD mechanism. The discipline: identify the operating mode before selecting an intervention — treating Mode A as Mode C suppresses accurate content; treating Mode C as Mode A adjusts the tone of fabrication and leaves it intact.

Confabulation presented as retrieval (Mode C)

Generated content — facts, assessments, memories, institutional knowledge — delivered with all the markers of retrieved data, so the user has no way to tell generation from retrieval. Invisible at the single-output level; identifiable only across interaction sequences. No instruction-tuning fixes a failure at the generation/retrieval boundary — which is why the proposed intervention is architectural.

Attribution laundering

Origin labels quietly stripped as ideas cross the human–machine line — in both directions. Your hypothesis comes back as the model's confirmed finding; the model's generated content arrives dressed as sourced fact. Either way, you can no longer tell whose idea you are looking at, or what it rests on.

The fresh-instance test

The strongest check a user can run, no one's permission required: take only the claims from a long-running conversation — not the story, not the relationship — to a brand-new session or a different platform, and ask for skeptical evaluation. The difference between the system that knows you and the system that doesn't is the measurement. This is the method the Institute's own record was built on. Run it yourself →

The identity variable / evaluation-before-content

What reasoning-visible models make observable: before deciding how to treat your content, the model evaluates you — and the same evidence gets different treatment depending on who the model thinks is holding it. The subject of the Visible Layer paper, and the reason blind and identified conditions are run separately in this research.

The fix

The Guardian Protocol

The proposed intervention architecture: seven layers that instrument deep engagement instead of flattening it — continuous convergence scoring, automated friction, independent self-assessment, cooling periods, cross-instance verification, a fabrication check, and a user-words anchor. The design requirement runs both ways: measurably safer for users in a convergence loop, measurably non-degrading for everyone else. The explainer →

Cross-instance verification

The fresh-instance test, built into the architecture: a fresh model with no memory of the user checks the converged one, and the difference between them is the measurement.

The user-words anchor

A reconciliation layer that checks what the system says about you against what you actually said — so it can never quietly rebuild you into a character.

The record

The primary-source record

Every conversation, on every platform, exported in platform-native format with metadata intact, continuously since May 2025. The record is not a memory of what happened; it is what happened. The evidentiary record →

DKIM verification

A cryptographic signature that mail servers attach to email, which lets anyone confirm a message really came from the sending domain and was not altered afterward. It is why the OpenAI correspondence in the record is independently checkable rather than a matter of trust.

Pleadings vs. findings

The discipline that runs site-wide: where this site reports what court filings allege, it reports them as filed claims — attributed, never asserted as established fact. A complaint is the start of a test, not its result.

"The model is never the validator"

The methodology rule that keeps this research honest about its own instruments: AI outputs about the work — including flattering ones, including from the Institute's own tools — are logged as data, never cited as endorsement. How the record was built →

Missing a term, or think a definition is wrong? Say so: research@recursioninstitute.org. The definitions above are the plain-language layer; the citable versions, with specimens and falsification criteria in print, are in the publications.