<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Scientific Research on MLLog.dev</title><link>https://mllog.dev/en/categories/scientific-research/</link><description>Recent content in Scientific Research on MLLog.dev</description><image><title>MLLog.dev</title><url>https://mllog.dev/images/default_mllog.png</url><link>https://mllog.dev/images/default_mllog.png</link></image><generator>Hugo -- 0.147.9</generator><language>en</language><lastBuildDate>Tue, 27 Jan 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://mllog.dev/en/categories/scientific-research/index.xml" rel="self" type="application/rss+xml"/><item><title>To Grok Grokking: Why Neural Networks Sometimes Understand Late</title><link>https://mllog.dev/en/posts/grokking-provable-ridge-regression/</link><pubDate>Tue, 27 Jan 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/grokking-provable-ridge-regression/</guid><description>&lt;p>In machine learning, we expect a model to either learn or overfit. What we don&amp;rsquo;t expect is for a model to overfit first and then — much later, with no changes — suddenly start generalizing well. This phenomenon is called &lt;strong>grokking&lt;/strong>, and it has puzzled researchers since its discovery. A new paper finally explains why it happens and proves it mathematically — in the simplest possible setting.&lt;/p>
&lt;h2 id="what-is-grokking">What is Grokking?&lt;/h2>
&lt;p>Grokking was first observed in 2022 on small algorithmic tasks (like modular arithmetic). The pattern is striking:&lt;/p></description></item><item><title>Cost-Constrained LLM Cascades — Meet C3PO</title><link>https://mllog.dev/en/posts/llm-cascades-cost-constrained-c3po/</link><pubDate>Fri, 14 Nov 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/llm-cascades-cost-constrained-c3po/</guid><description>&lt;p>Imagine you have an army of helpers — several different Large Language Models (LLMs), each capable of handling tasks from simple queries to complex reasoning.&lt;br>
But each helper &lt;em>costs&lt;/em> something: time, compute, or actual money if you&amp;rsquo;re using an API.&lt;/p>
&lt;p>So the question is:&lt;br>
Can we orchestrate these models wisely — starting from the cheapest one that might do the job, escalating only when needed — &lt;strong>without exceeding a cost budget&lt;/strong>?&lt;/p></description></item></channel></rss>