Robotics

Orca: What If Next-Token, Next-Frame, and Next-Action Are the Same Task?

Imagine inheriting a system built by three teams that never talked to each other. Three microservices, all describing the same warehouse: one answers questions about it in text, one renders preview images of the shelves, one drives the forklift. Each has its own database, its own schema, and none of them agree on what’s actually sitting on shelf B4. Any architect who has seen this knows the fix — one canonical store, thin adapters at the edges. ...

MolmoAct2: The First Fully Open Robot Controller That Beats Closed-Source Giants

A robot that can fold laundry, pack medication, and pour tea - controlled by a single model - sounds like science fiction. But it’s exactly what’s needed for real deployment. The problem? The best robot controllers are either closed-source (π0.5), too slow (reasoning models that generate hundreds of tokens before moving), or tied to hardware most labs can’t afford. MolmoAct2 (Fang, Duan et al., Allen AI / UW / Stanford / NVIDIA / MIT, May 2026) solves all five problems at once: it’s fully open (weights, code, data), runs at 55.79 Hz, deploys on platforms costing under $6,000, and achieves 97.2% success on LIBERO - beating every open and closed baseline. The secret? Let the robot’s action generator peek into the language model’s brain at every layer, not just the final output. ...

Utonia: One Encoder For All Point Clouds

A LiDAR on a self-driving car, a depth camera in a home robot, a satellite scanner, and a CAD model from a 3D printer — each produces a point cloud point cloud A set of 3D points (x, y, z) representing the shape of an object or scene. Each point can carry additional attributes: color, normal, intensity. , but with radically different density, scale, and geometry. Until now, each domain required its own model. The paper “Utonia: Toward One Encoder for All Point Clouds” breaks this pattern — one encoder, 137M parameters, five domains, and emergent behaviors nobody expected. ...

Green-VLA: One AI Brain for All Robots

The quest for a universal robot—one that can seamlessly switch between tasks, platforms, and environments—has long been the holy grail of robotics research. The paper “Green-VLA: Staged Vision-Language-Action Model for Generalist Robots” brings us closer to that vision with a revolutionary five-stage training framework that enables a single policy to control humanoids, mobile manipulators, and fixed-base robotic arms alike. The Problem: One Robot, Many Bodies Today’s robotic systems are typically specialists. A robotic arm in a factory excels at assembly but cannot navigate a warehouse. A mobile robot can move around but lacks fine manipulation skills. Training a separate AI for each type of robot is expensive, time-consuming, and fundamentally limits scalability. ...

ASkDAgger: How Artificial Intelligence Learns More Effectively by Asking Questions

In a world where robots and AI systems increasingly learn through observation and interaction with humans, the efficiency of this process remains a key challenge. Traditional Imitation Learning methods often require a human teacher to constantly supervise and correct errors, which is time-consuming and costly. A team of researchers led by Jelle Luijkx proposes a groundbreaking solution in their latest paper, “ASkDAgger: Active Skill-level Data Aggregation for Interactive Imitation Learning.” ...