Lost in Stories: How LLMs Lose the Thread in Long Narratives

Mon, 09 Mar 2026 00:00:00 +0000

Ask any language model to write a 10,000-word story. On page one, the hero has blue eyes. By page five — brown. In chapter three it’s Thursday; in chapter six, the same day is suddenly Saturday. A character who died on page seven is chatting away on page ten.

Sound familiar? The paper “Lost in Stories: Consistency Bugs in Long Story Generation by LLMs” systematically investigates this problem for the first time — and the results are sobering. Even the best models produce an average of one consistency error per 10,000 words, and human experts catch only 17% of them.

Evaluation on MLLog.dev

Lost in Stories: How LLMs Lose the Thread in Long Narratives