Reasoning

SAGE: Your Reasoning Model Knows When to Stop Thinking — You Just Won't Let It

Reasoning models generate long chains of thought to arrive at answers. But what if over half of those “thoughts” are useless noise, and the model has known the answer for a while — it just doesn’t know it can stop? The paper “Does Your Reasoning Model Implicitly Know When to Stop Thinking?” discovers that this is exactly the case, and proposes SAGE — a method that cuts token usage by 40-50% while maintaining or improving accuracy. ...

Attention as a Compass – Teaching Reasoning Models to Explore Smarter

Large Language Models (LLMs) are no longer just text generators — they are becoming reasoners, capable of solving mathematical problems, logical puzzles, or planning tasks step by step. One of the key challenges is how to improve the quality of this reasoning. Traditional Reinforcement Learning (RL) rewards only the final outcome, but in complex reasoning it makes more sense to evaluate each intermediate step. This is called process-supervised RL (PSRL). ...