SNOO – Old-School Nesterov Momentum in a New Jacket: Making Big Models Learn Faster

Imagine you’re training a massive language model — the kind that takes weeks to learn even the basics. Every training step costs time, electricity, and a small fortune. In such a world, even a tiny bump in efficiency feels like finding a way to get free coffee at work — small, but sweet. Enter SNOO – Step-K Nesterov Outer Optimizer, a clever idea that takes Nesterov momentum, a decades-old optimization trick, and applies it in a new place — outside the normal training loop. The result? Models that learn faster and more smoothly, without much extra computational cost. ...

October 20, 2025