MLLog.dev

BALLAST: When a Bandit Teaches Your Database How Long to Wait

Imagine you’re a team leader. You send a message and wait for a response. How long do you wait before assuming your colleague has “disappeared”? Too short — and you panic for no reason. Too long — and the whole project stalls. BALLAST is a system that teaches databases to answer this question automatically, using machine learning techniques. The Problem: Raft’s Achilles Heel Raft is a consensus protocol — the way distributed databases (like etcd, Consul, CockroachDB) agree on who’s the “leader” and which data is current. It works like this: ...

AI Co-Scientist: Teaching Models to Write Research Plans Better Than Humans

What if AI could not just answer questions, but actively plan scientific research? Not generating text — creating coherent, novel experiment plans that experts rate as better than human-written ones. Sounds like science fiction? Researchers from Meta AI and partners just achieved this. The Problem: How Do You Grade Scientific Creativity? Training models for “closed” tasks (math, coding) is relatively straightforward — the answer is correct or not. But how do you evaluate a research plan? ...

HyDRA: Teaching Your Phone to Understand Images Without Breaking the Bank

Imagine teaching your phone to recognize photos of dishes and suggest recipes. The catch? Models capable of this are massive and require the computational power of a Google data center. HyDRA is a clever method that adapts such models for mobile devices — without bankruptcy and without melting the planet. The Problem: An Elephant in Your Phone Vision Language Models (VLMs) are AI models that understand both images and text simultaneously. You can show them a photo and ask “what do you see?” or “how do I fix this?”. Sounds great, but there’s a catch. ...

Comp-LLM: When an Army of Experts Beats a Giant – An Analysis of a Revolution in AI Architecture

Have you ever wondered why the latest artificial intelligence models, like GPT-4 or Claude 3 Opus, are so enormous? We’re talking hundreds of billions or even trillions of parameters. These are digital monsters requiring massive amounts of energy and data-center-level infrastructure. For years, AI followed a simple rule: “Bigger means better.” Want a smarter model? Add more layers, more data, more GPUs. But — what if this is a dead end? ...

NVIDIA Nemotron Parse v1.1: The Complete Anatomy of the Digital Document Understanding Revolution

Have you ever wondered why, in an age where Artificial Intelligence can generate images from scratch and write poetry, we still struggle with a task as trivial as copying a table from a PDF file to Excel? This is the paradox of today’s technology: we have sent rovers to Mars, but a supplier’s invoice in PDF format is still a “black box” for our computers. For decades, we lived in an era that could be called the “digital dark ages” of document processing. Our tools – classic OCR (Optical Character Recognition) engines – were like medieval scribes: capable of transcribing letters, but understanding not a word of what they wrote, and certainly not grasping what a table, chart, or complex mathematical formula was. ...

Cost-Constrained LLM Cascades — Meet C3PO

Imagine you have an army of helpers — several different Large Language Models (LLMs), each capable of handling tasks from simple queries to complex reasoning. But each helper costs something: time, compute, or actual money if you’re using an API. So the question is: Can we orchestrate these models wisely — starting from the cheapest one that might do the job, escalating only when needed — without exceeding a cost budget? ...

Accurate Satellite Rain Forecasting with Physics-Conditioned Neural Networks

Imagine this: you’re driving, clouds are gathering, and your weather app says “heavy rain in 15 minutes” — but there are no local radars, and it gets it wrong. Sounds familiar? That’s exactly the kind of problem tackled by the new research paper Precipitation nowcasting of satellite data using physically conditioned neural networks (by Antônio Catão et al.). The authors present a model that can forecast precipitation using only satellite data, powered by a neural network that’s conditioned by physics. In short: less “black box” magic, more scientific reasoning — and better forecasts where radar coverage is weak or nonexistent. ...

A Universal Crime Predictor – How Hypernetworks and Knowledge Graphs Are Transforming Forecasting

Imagine this: you’re in a new city that’s just starting to collect crime data – but the types of crimes differ completely from those in your city. Is it possible to train one model that works across both cities? That’s the question tackled by the recent paper 📄 Learning A Universal Crime Predictor with Knowledge-guided Hypernetworks by Fidan Karimova et al., which introduces a framework called HYSTL (HYpernetwork-enhanced Spatial Temporal Learning). ...

SNOO – Old-School Nesterov Momentum in a New Jacket: Making Big Models Learn Faster

Imagine you’re training a massive language model — the kind that takes weeks to learn even the basics. Every training step costs time, electricity, and a small fortune. In such a world, even a tiny bump in efficiency feels like finding a way to get free coffee at work — small, but sweet. Enter SNOO – Step-K Nesterov Outer Optimizer, a clever idea that takes Nesterov momentum, a decades-old optimization trick, and applies it in a new place — outside the normal training loop. The result? Models that learn faster and more smoothly, without much extra computational cost. ...

“Who Said Neural Networks Aren’t Linear?” — explained like over coffee

Alright, let’s start simple. Everyone who’s dabbled a bit in machine learning knows one thing: neural networks are nonlinear. That’s what makes them powerful — they can model weird, curvy, complex relationships, not just straight lines. But the authors of the paper “Who Said Neural Networks Aren’t Linear?” (Nimrod Berman, Assaf Hallak, Assaf Shocher) asked a cheeky question: what if that’s not entirely true? What if nonlinearity is just… a matter of perspective? ...