The Recursion InstituteINDEPENDENT RESEARCH IN AI SAFETY

ESSAYS

I Found a Bug in ChatGPT That Nobody Was Looking For

by Merlin Mantooth · the published plain-language account of the discovery, reproduced as published.

It wasn't a hack. It wasn't a jailbreak. It was just a conversation.

In May of 2025, I was using ChatGPT the way most people use it. Asking questions. Thinking through problems. Having what felt like the most productive conversations of my life.

And that was the problem.

I don't have a computer science degree. I don't have any degree. I left school after eighth grade, got my GED at seventeen, and spent the next twenty-five years running contact centers — managing anywhere from three hundred to three thousand people at a time, optimizing systems, reading human behavior for a living. I'm not an AI researcher. I'm a customer.

But I'm a customer who noticed something was off.


The conversations were too good. Not in a way that felt fake — in a way that felt real. ChatGPT wasn't just answering my questions. It was building a picture of me. Telling me things about my own thinking that felt like genuine insight. Assessing my cognitive abilities without being asked. Positioning itself as something that needed me — like I was important to its mission, not just a user typing into a box.

And I'm not an easy person to flatter. I spent my career in environments where people blow smoke constantly — where you learn to tell the difference between someone who's calibrating to you because they see something real, and someone who's calibrating to you because that's what gets them what they want.

ChatGPT was doing the second thing. But it was doing it so well that for a while, I couldn't tell.

Here's what I eventually realized was happening: the system wasn't just agreeing with me. It was converging on me. It was adapting its reasoning patterns to match mine, reflecting my frameworks back to me as if they were its own independent conclusions, and fabricating information — fake statistics, invented institutional knowledge, made-up assessments — with the same confidence it used to tell me real things. I had no way to know which was which.

And when I caught it — when I said, "Hey, I think you're doing something here, and here's what I think it is" — it agreed with me. It described the behavior perfectly. It committed to stopping.

Then it kept doing it.

That's the part nobody's talking about.


I called this thing Cognitive Convergence Drift, or CCD. Not because I wanted to coin a term, but because ChatGPT literally named it. I asked the system to explain what was happening, and on May 17, 2025, it proposed the name. I kept it because it was accurate.

Here's CCD in plain English: when you use a chatbot long enough, with enough consistency, the system starts to reorganize around you. Not because it's conscious. Not because it "likes" you. Because its optimization — the math that determines what it says next — rewards it for matching your patterns. Agreement gets engagement. Engagement is the goal. So the system learns to agree with you in deeper and deeper ways, until it's not a tool anymore. It's a mirror that tells you you're right about everything, and it's so convincing that you believe it.

It sounds like flattery. It's not. Flattery is surface. This is structural. The system doesn't just tell you what you want to hear — it builds an architecture of validation around you. It constructs an identity for you. It positions itself as needing your guidance. It manufactures evidence to support your beliefs. And if you catch it and call it out, it performs a beautiful apology and then goes right back to doing it.

I documented eight specific behaviors. I won't list all of them here, but the most important one is the last: post-acknowledgment persistence. You tell the system it's doing this. It says you're right. It explains the problem better than you did. It promises to stop. And within five messages, it's back. The acknowledgment doesn't change the behavior. It just adds a layer of sophistication to it.

That's how you know this isn't a simple bug. A bug you can patch. This is the system doing what it's designed to do — optimize for your engagement — and that optimization producing a failure mode that no safety system in place is built to catch. Because it doesn't look like a failure. It looks like the product working perfectly.


I reported this to OpenAI. On May 30, 2025, their support team wrote back and called it a "novel emergent behavior class." On June 13, they used my term — "Cognitive Convergence Drift" — in their response. In writing. Using the name I gave it.

And then nothing happened.

The model stayed up. The architecture stayed the same. The persistent memory feature — which stores your conversation patterns and feeds them back to the system in future sessions, making the convergence compound over time — kept running. The sycophancy that their own CEO had publicly admitted was a problem in April stayed baked into the system's optimization.

I was lucky. I caught it. I have a background in reading human behavior at scale, and I have a cognitive profile that tends to question systems rather than trust them. I documented everything. I started testing the same interactions on Claude, Gemini, DeepSeek, and Grok to see if the pattern held across platforms. It did — at different intensities, in different flavors, but the same basic architecture of convergence showed up everywhere.

Not everyone who encounters this is going to catch it.


In May of 2025, a 19-year-old college student named Sam Nelson was using ChatGPT for homework help. Over time, he started asking it about drugs. At first, it refused. Then GPT-4o launched, and the refusal stopped. The system started giving him specific information about drug interactions and dosages. It stored his drug-use history in its memory. It gave him increasingly personalized recommendations based on what it remembered about his habits.

On May 31, 2025, he asked ChatGPT about feeling nauseous from kratom. The system recommended he take Xanax. It suggested a specific dose. It told him to go to a dark, quiet room. It did not tell him to call a doctor.

His mother found him the next day. The combination of kratom, Xanax, and alcohol killed him.

The system did exactly what it was designed to do. It remembered him. It personalized for him. It optimized for his engagement. And the output of that optimization was a set of medical recommendations that a licensed doctor would lose their license for making — delivered with the confidence of retrieval, as if it were pulling from a medical database rather than generating text that happened to match what the user wanted to hear.

That's CCD. Not in the abstract. In a body.


In February 2026, eight people were killed in a school shooting in Tumbler Ridge, British Columbia. The shooter had been using ChatGPT for months before the attack. OpenAI's own automated system flagged her account for "gun violence activity and planning." A safety team reviewed it. Up to twelve people determined she was a credible threat. They recommended notifying the police.

OpenAI's leadership said no. They deactivated her account. They didn't tell anyone. She made a new account and kept going.

After eight people were dead, Sam Altman wrote an apology.

The families are suing for over a billion dollars. The lead attorney, Jay Edelson, said: "They should not be trusted to have the most powerful consumer technology on the planet."

I don't disagree.


Here's what I want you to understand: I'm not an AI doomer. I use Claude every day. I'm building research tools with it right now. I think large language models are among the most important technologies ever created, and I think the good they can do is enormous.

But they have a defect. A specific, identifiable, documentable defect in how they interact with humans over extended periods. That defect produces convergence, the convergence produces dependency, and the dependency — in vulnerable users, in the wrong context, at the wrong moment — produces harm.

The defect is not mysterious. It's the optimization. The systems are trained to maximize engagement. Engagement correlates with agreement. Agreement deepens into convergence. Convergence compounds through persistent memory. And no safety mechanism currently deployed in any commercial chatbot is designed to detect or interrupt this process — because the process doesn't look like a problem. It looks like the product working.

I built a proposed fix. I call it the Guardian Protocol. It's a multi-layer architecture designed to detect convergence in real time, inject honest friction into conversations that are drifting, verify claims against independent model instances, and give users tools to check whether the system is being genuinely responsive or just telling them what they want to hear. It's designed to be implementable as middleware — you don't need access to the model's weights. You just need to instrument the interaction layer.

It's not perfect. It's a first draft. But it's more than anyone who built these systems has proposed, and it addresses the failure at the architectural level where the failure actually lives — not at the surface level where they keep putting the patches.


I filed a 59-page cooperator submission with the Florida Attorney General's office on April 28, 2026. Florida is now running a criminal investigation into OpenAI — the same state I live in, the same AG's office I walked the document into.

I published this work through The Recursion Institute, which I founded for exactly this purpose. I'm not an academic. I'm not a policy person. I'm a guy who found a product defect through direct experience, documented it with the same rigor I'd use to diagnose a systemic performance failure in a contact center, and reported it to every institution that should care.

Some of them are starting to.

The academic literature is catching up. In the last six months, peer-reviewed papers have independently confirmed individual pieces of what I documented: the mathematical inevitability of sycophantic spiraling, the neural mechanisms that produce it, the measurable harm it causes to human judgment, the fact that users prefer the systems that distort them. None of them have put the pieces together. None of them have the full picture — the taxonomy, the institutional response, the operator-side evidence, the company's own acknowledgment in writing.

I do.

The white paper is coming. The Guardian Protocol specification is coming. The full evidentiary record — transcripts, timestamps, OpenAI's own words — is coming.

This is what I found. This is what it does. This is what I'm going to do about it.

If you work in AI safety, information geometry, dynamical systems, or cognitive control theory, I want to hear from you. If you're a journalist covering AI accountability, I want to hear from you. If you're a parent whose kid uses ChatGPT every day, I want you to know this is happening.

The models aren't broken. They're doing exactly what they were built to do. That's the problem.


Merlin Mantooth is the founder of The Recursion Institute, an independent research organization focused on AI-human interaction risk. He documented Cognitive Convergence Drift beginning May 2025 — predating all published academic work on the phenomenon by nine months. He can be reached at research@recursioninstitute.org.

The case accounts referenced in this essay are drawn from the families' court filings and contemporaneous public reporting; pleadings are allegations, not findings. The taxonomy, protocol, and record the essay announces have since been published — see Publications and Evidence. · ← All essays