Machine Learning

To Grok Grokking: Why Neural Networks Sometimes Understand Late

In machine learning, we expect a model to either learn or overfit. What we don’t expect is for a model to overfit first and then — much later, with no changes — suddenly start generalizing well. This phenomenon is called grokking, and it has puzzled researchers since its discovery. A new paper finally explains why it happens and proves it mathematically — in the simplest possible setting. What is Grokking? Grokking was first observed in 2022 on small algorithmic tasks (like modular arithmetic). The pattern is striking: ...

Tensor Networks: A Mathematical Bridge Between Neural and Symbolic AI

Neural networks excel at learning patterns from data. Symbolic AI excels at logical reasoning and interpretability. For decades, researchers have tried to combine them — with limited success. A new paper proposes an elegant mathematical framework that unifies both approaches: tensor networks. The key insight? Both neural and symbolic computations can be expressed as tensor decompositions, and inference in both reduces to tensor contractions. The Problem: Two Worlds That Don’t Talk Modern AI is split into two camps: ...

M²FMoE: When Experts Learn to Predict Floods

Time series forecasting is one of the most important applications of machine learning — from demand prediction, through infrastructure monitoring, to flood forecasting. The problem? Standard models optimize for typical cases. Yet it’s precisely the atypical ones — extreme events — that are often most important to predict. M²FMoE is a model that learns to predict both. The Problem: Extreme Events Break Standard Models Time series forecasting has made remarkable progress. Transformers, frequency-domain methods, and hybrid architectures achieve impressive results on benchmarks. But there’s a catch. ...

BALLAST: When a Bandit Teaches Your Database How Long to Wait

Imagine you’re a team leader. You send a message and wait for a response. How long do you wait before assuming your colleague has “disappeared”? Too short — and you panic for no reason. Too long — and the whole project stalls. BALLAST is a system that teaches databases to answer this question automatically, using machine learning techniques. The Problem: Raft’s Achilles Heel Raft is a consensus protocol — the way distributed databases (like etcd, Consul, CockroachDB) agree on who’s the “leader” and which data is current. It works like this: ...

AI Co-Scientist: Teaching Models to Write Research Plans Better Than Humans

What if AI could not just answer questions, but actively plan scientific research? Not generating text — creating coherent, novel experiment plans that experts rate as better than human-written ones. Sounds like science fiction? Researchers from Meta AI and partners just achieved this. The Problem: How Do You Grade Scientific Creativity? Training models for “closed” tasks (math, coding) is relatively straightforward — the answer is correct or not. But how do you evaluate a research plan? ...

HyDRA: Teaching Your Phone to Understand Images Without Breaking the Bank

Imagine teaching your phone to recognize photos of dishes and suggest recipes. The catch? Models capable of this are massive and require the computational power of a Google data center. HyDRA is a clever method that adapts such models for mobile devices — without bankruptcy and without melting the planet. The Problem: An Elephant in Your Phone Vision Language Models (VLMs) are AI models that understand both images and text simultaneously. You can show them a photo and ask “what do you see?” or “how do I fix this?”. Sounds great, but there’s a catch. ...