Llm-Collaboration

Multi-agent systems built from LLMs have a dirty secret: the agents talk to each other in text. That sounds natural - after all, text is what LLMs do - but it’s catastrophically wasteful. Every time Agent A finishes reasoning and passes its output to Agent B, the system decodes hidden states into tokens, ships those tokens over, and re-encodes them back into hidden states. Information gets destroyed. Gradients die at the text boundary. And you’re paying for a full vocabulary projection at every handoff. The paper “Recursive Multi-Agent Systems” (Yang, Zou, Pan et al., UIUC/Stanford/NVIDIA/MIT, April 2026) asks: what if we just… didn’t do that? What if the agents shared their thoughts directly, in continuous latent space, and the entire system looped like a single recursive neural network? The result is RecursiveMAS - a framework that adds only 0.31% trainable parameters (13.12M) while delivering +8.3% average accuracy, 2.4x inference speedup, and 75.6% token reduction. ...