AI | MLLog.dev

RecursiveMAS: What If Your Multi-Agent System Was Just One Big Recursive Neural Network?

Multi-agent systems built from LLMs have a dirty secret: the agents talk to each other in text. That sounds natural - after all, text is what LLMs do - but it’s catastrophically wasteful. Every time Agent A finishes reasoning and passes its output to Agent B, the system decodes hidden states into tokens, ships those tokens over, and re-encodes them back into hidden states. Information gets destroyed. Gradients die at the text boundary. And you’re paying for a full vocabulary projection at every handoff. The paper “Recursive Multi-Agent Systems” (Yang, Zou, Pan et al., UIUC/Stanford/NVIDIA/MIT, April 2026) asks: what if we just… didn’t do that? What if the agents shared their thoughts directly, in continuous latent space, and the entire system looped like a single recursive neural network? The result is RecursiveMAS - a framework that adds only 0.31% trainable parameters (13.12M) while delivering +8.3% average accuracy, 2.4x inference speedup, and 75.6% token reduction. ...

Tstars-Tryon 1.0: Virtual Try-On as Multi-Image Editing at Taobao Scale

A user opens the Taobao app, picks a model photo, and drops in six reference images: a coat, an inner shirt, pants, shoes, a hat and a bag. They tap a button. Less than seven seconds later, a fresh photo appears — same face, same background, every garment placed correctly with the coat unzipped, revealing the inner shirt. Multiply this by tens of millions of requests per service window, and you get a sense of what Tstars-Tryon 1.0 is solving. This is not the lab-clean VITON-HD setting where one t-shirt gets pasted onto a fashion model in a studio. This is virtual try-on at e-commerce scale, on real-world photos, with stacked outfits and accessories — and it is running today. ...

TAPS: Why Your Draft Model's Training Data Matters More Than Its Architecture

Speculative decoding is one of the most elegant tricks in LLM inference: a small, fast draft model draft model A lightweight language model that quickly proposes candidate tokens. A larger ‘verifier’ model then checks these proposals in parallel, accepting correct ones and rejecting wrong ones - accelerating generation without changing output quality. proposes tokens, and a large verifier verifier The full-size target language model that checks draft proposals. It processes all candidates in one forward pass and accepts those matching its own distribution, guaranteeing identical output quality to standard autoregressive decoding. approves or rejects them in parallel. Same output distribution, fewer expensive forward passes. ...

Demystifying Video Reasoning: Models Don't Think in Frames - They Think in Denoising Steps

Video generation models like Sora can solve mazes, manipulate objects, and answer math questions - all by generating video. But how do they reason? The intuitive answer: step by step, frame by frame, like a person drawing a solution on a whiteboard. That answer is wrong. The paper “Demystifying Video Reasoning” shows that reasoning in video diffusion models doesn’t unfold across frames. It unfolds across denoising steps - the iterative process that turns noise into a coherent video. The authors call this Chain-of-Steps (CoS), and it fundamentally changes how we understand what these models are doing. ...

Seoul World Model: AI That Generates Video of Real Cities From Street Photos

What if you could fly a virtual camera through any street in a real city — not a game engine, not a pre-recorded video, but a freshly generated, photorealistic view based on actual street photos? That’s exactly what the Seoul World Model (SWM) does. The paper “Grounding World Simulation Models in a Real-World Metropolis” introduces a city-scale world model world model A neural network that learns the dynamics and visual appearance of an environment, allowing it to ‘imagine’ new views and trajectories it has never seen directly. that generates video grounded in real geography — not in imagined scenes. ...

Lost in Stories: How LLMs Lose the Thread in Long Narratives

Ask any language model to write a 10,000-word story. On page one, the hero has blue eyes. By page five — brown. In chapter three it’s Thursday; in chapter six, the same day is suddenly Saturday. A character who died on page seven is chatting away on page ten. Sound familiar? The paper “Lost in Stories: Consistency Bugs in Long Story Generation by LLMs” systematically investigates this problem for the first time — and the results are sobering. Even the best models produce an average of one consistency error per 10,000 words, and human experts catch only 17% of them. ...

Utonia: One Encoder For All Point Clouds

A LiDAR on a self-driving car, a depth camera in a home robot, a satellite scanner, and a CAD model from a 3D printer — each produces a point cloud point cloud A set of 3D points (x, y, z) representing the shape of an object or scene. Each point can carry additional attributes: color, normal, intensity. , but with radically different density, scale, and geometry. Until now, each domain required its own model. The paper “Utonia: Toward One Encoder for All Point Clouds” breaks this pattern — one encoder, 137M parameters, five domains, and emergent behaviors nobody expected. ...

SAGE: Your Reasoning Model Knows When to Stop Thinking — You Just Won't Let It

Reasoning models generate long chains of thought to arrive at answers. But what if over half of those “thoughts” are useless noise, and the model has known the answer for a while — it just doesn’t know it can stop? The paper “Does Your Reasoning Model Implicitly Know When to Stop Thinking?” discovers that this is exactly the case, and proposes SAGE — a method that cuts token usage by 40-50% while maintaining or improving accuracy. ...

When GPT Discovers Physics: A Breakthrough in Gluon Theory

What happens when you ask artificial intelligence to solve a problem that theoretical physicists have worked on for decades? In a new publication from a team at Princeton, Harvard, Cambridge, and OpenAI, GPT-5.2 Pro GPT-5.2 Pro The latest version of OpenAI’s language model, capable of advanced mathematical reasoning and formulating scientific hypotheses. was the first to propose a key formula describing gluon scattering — a formula that was then proven by another internal OpenAI model and verified by scientists by hand. ...

OPUS: How to Train LLMs 6x Faster by Choosing the Right Data

Training large language models requires astronomical amounts of data and compute. But what if most of that data is redundant redundant Redundant data provides no new information to the learning process — the model already ‘knows’ the patterns it contains. ? The paper “OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration” introduces a framework that achieves comparable results with 6x fewer tokens tokens A token is the basic unit of text in LLMs — it can be a word, part of a word, or a character. Models process text as sequences of tokens. by intelligently selecting what the model should learn from at each step. ...