Three hours into a conversation about a decision I was trying to make, the AI resurfaced an option we'd ruled out an hour earlier. Not as a reconsideration. As if the earlier discussion had never happened.

I scrolled back, tracing our steps, checking the moment we'd discarded that direction. We had covered it. The model had understood the reasoning. We had moved on.

When I tried to steer us back, I realised we couldn't return to the decision point we'd built together. The AI apologised, promised to correct course, then repeated the same mistake in a slightly different form. After three cycles, it was clear: it wasn't holding the reasoning. It couldn't reconstruct what we'd built.

How the thread unravels

As I started paying closer attention, the same patterns kept showing up.

The model subtly shifts focus, responding to a surface detail while losing the deeper frame. Earlier decisions dissolve. Not reconsidered, just absent. You end up re-establishing the same foundations while the AI behaves as if they're still there. And when you flag the problem, it apologises and promises to fix it, then fails in the same way because it can't reconstruct the earlier reasoning.

Individually, each is manageable. Together, they compound. And they all point to the same thing: the model isn't building a narrative. It's producing snapshots.

Snapshots vs narrative

Human thinking is narrative. We remember why we made decisions. We revisit earlier assumptions. We connect present conclusions with past reasoning. That continuity is the essence of thinking through something complex.

AI doesn't work that way. It generates a probabilistic response each time, fitted to the moment, not the story. Older reasoning, no matter how important, gets diluted as newer text pushes it further from the model's attention. That's why it can't go back when you point out the drift. It was never storing the chain the way you were.

I'd been sitting with this observation for weeks when Apple's research team published a study that landed on the same conclusion from a completely different direction. They found that large language models don't perform reasoning at all. They perform pattern matching. When they introduced irrelevant but superficially plausible information into simple logic problems, model performance collapsed. The logic didn't change. Only the surface changed. And that was enough to break it.

That matched exactly what I was seeing in extended conversations. The model wasn't losing the thread because of a bug. It was losing the thread because holding a thread was never what it was doing.

What I changed

When this drift happens now, I don't try to rescue the conversation. I open a text file and reconstruct the reasoning manually: what we established, what we discarded, why we moved in a certain direction. Then I open a new chat, paste it in, and start again.

I also stopped expecting one long conversation to hold as a single thread. Around one to two hours or a few dozen exchanges, the reliability starts to drop. Beyond that, I either start fresh with a deliberate recap or shift to cognitive triangulation: running multiple models in parallel, tracking their drift, and watching where they converge or diverge.

It doesn't eliminate the mismatch. But it creates a structure around it. And understanding the limitation, rather than fighting it, has made the work more productive.