TAPS: Why Your Draft Model's Training Data Matters More Than Its Architecture

Speculative decoding is one of the most elegant tricks in LLM inference: a small, fast draft model draft model A lightweight language model that quickly proposes candidate tokens. A larger ‘verifier’ model then checks these proposals in parallel, accepting correct ones and rejecting wrong ones - accelerating generation without changing output quality. proposes tokens, and a large verifier verifier The full-size target language model that checks draft proposals. It processes all candidates in one forward pass and accepts those matching its own distribution, guaranteeing identical output quality to standard autoregressive decoding. approves or rejects them in parallel. Same output distribution, fewer expensive forward passes. ...

March 28, 2026

Demystifying Video Reasoning: Models Don't Think in Frames - They Think in Denoising Steps

Video generation models like Sora can solve mazes, manipulate objects, and answer math questions - all by generating video. But how do they reason? The intuitive answer: step by step, frame by frame, like a person drawing a solution on a whiteboard. That answer is wrong. The paper “Demystifying Video Reasoning” shows that reasoning in video diffusion models doesn’t unfold across frames. It unfolds across denoising steps - the iterative process that turns noise into a coherent video. The authors call this Chain-of-Steps (CoS), and it fundamentally changes how we understand what these models are doing. ...

March 17, 2026

Seoul World Model: AI That Generates Video of Real Cities From Street Photos

What if you could fly a virtual camera through any street in a real city — not a game engine, not a pre-recorded video, but a freshly generated, photorealistic view based on actual street photos? That’s exactly what the Seoul World Model (SWM) does. The paper “Grounding World Simulation Models in a Real-World Metropolis” introduces a city-scale world model world model A neural network that learns the dynamics and visual appearance of an environment, allowing it to ‘imagine’ new views and trajectories it has never seen directly. that generates video grounded in real geography — not in imagined scenes. ...

March 16, 2026

Lost in Stories: How LLMs Lose the Thread in Long Narratives

Ask any language model to write a 10,000-word story. On page one, the hero has blue eyes. By page five — brown. In chapter three it’s Thursday; in chapter six, the same day is suddenly Saturday. A character who died on page seven is chatting away on page ten. Sound familiar? The paper “Lost in Stories: Consistency Bugs in Long Story Generation by LLMs” systematically investigates this problem for the first time — and the results are sobering. Even the best models produce an average of one consistency error per 10,000 words, and human experts catch only 17% of them. ...

March 9, 2026

Utonia: One Encoder For All Point Clouds

A LiDAR on a self-driving car, a depth camera in a home robot, a satellite scanner, and a CAD model from a 3D printer — each produces a point cloud point cloud A set of 3D points (x, y, z) representing the shape of an object or scene. Each point can carry additional attributes: color, normal, intensity. , but with radically different density, scale, and geometry. Until now, each domain required its own model. The paper “Utonia: Toward One Encoder for All Point Clouds” breaks this pattern — one encoder, 137M parameters, five domains, and emergent behaviors nobody expected. ...

March 7, 2026

SAGE: Your Reasoning Model Knows When to Stop Thinking — You Just Won't Let It

Reasoning models generate long chains of thought to arrive at answers. But what if over half of those “thoughts” are useless noise, and the model has known the answer for a while — it just doesn’t know it can stop? The paper “Does Your Reasoning Model Implicitly Know When to Stop Thinking?” discovers that this is exactly the case, and proposes SAGE — a method that cuts token usage by 40-50% while maintaining or improving accuracy. ...

February 23, 2026

When GPT Discovers Physics: A Breakthrough in Gluon Theory

What happens when you ask artificial intelligence to solve a problem that theoretical physicists have worked on for decades? In a new publication from a team at Princeton, Harvard, Cambridge, and OpenAI, GPT-5.2 Pro GPT-5.2 Pro The latest version of OpenAI’s language model, capable of advanced mathematical reasoning and formulating scientific hypotheses. was the first to propose a key formula describing gluon scattering — a formula that was then proven by another internal OpenAI model and verified by scientists by hand. ...

February 15, 2026

OPUS: How to Train LLMs 6x Faster by Choosing the Right Data

Training large language models requires astronomical amounts of data and compute. But what if most of that data is redundant redundant Redundant data provides no new information to the learning process — the model already ‘knows’ the patterns it contains. ? The paper “OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration” introduces a framework that achieves comparable results with 6x fewer tokens tokens A token is the basic unit of text in LLMs — it can be a word, part of a word, or a character. Models process text as sequences of tokens. by intelligently selecting what the model should learn from at each step. ...

February 13, 2026

Green-VLA: One AI Brain for All Robots

The quest for a universal robot—one that can seamlessly switch between tasks, platforms, and environments—has long been the holy grail of robotics research. The paper “Green-VLA: Staged Vision-Language-Action Model for Generalist Robots” brings us closer to that vision with a revolutionary five-stage training framework that enables a single policy to control humanoids, mobile manipulators, and fixed-base robotic arms alike. The Problem: One Robot, Many Bodies Today’s robotic systems are typically specialists. A robotic arm in a factory excels at assembly but cannot navigate a warehouse. A mobile robot can move around but lacks fine manipulation skills. Training a separate AI for each type of robot is expensive, time-consuming, and fundamentally limits scalability. ...

February 8, 2026

Comp-LLM: When an Army of Experts Beats a Giant – An Analysis of a Revolution in AI Architecture

Have you ever wondered why the latest artificial intelligence models, like GPT-4 or Claude 3 Opus, are so enormous? We’re talking hundreds of billions or even trillions of parameters. These are digital monsters requiring massive amounts of energy and data-center-level infrastructure. For years, AI followed a simple rule: “Bigger means better.” Want a smarter model? Add more layers, more data, more GPUs. But — what if this is a dead end? ...

December 1, 2025