<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Posts on MLLog.dev</title><link>https://mllog.dev/en/posts/</link><description>Recent content in Posts on MLLog.dev</description><image><title>MLLog.dev</title><url>https://mllog.dev/images/default_mllog.png</url><link>https://mllog.dev/images/default_mllog.png</link></image><generator>Hugo -- 0.147.9</generator><language>en</language><lastBuildDate>Tue, 09 Jun 2026 08:00:00 +0100</lastBuildDate><atom:link href="https://mllog.dev/en/posts/index.xml" rel="self" type="application/rss+xml"/><item><title>SWE-Explore: The Benchmark That Finally Asks — Did Your Coding Agent Read the Right Code?</title><link>https://mllog.dev/en/posts/swe-explore-benchmarking-coding-agents-repository-exploration/</link><pubDate>Tue, 09 Jun 2026 08:00:00 +0100</pubDate><guid>https://mllog.dev/en/posts/swe-explore-benchmarking-coding-agents-repository-exploration/</guid><description>SWE-Explore isolates repository exploration from patch generation, revealing that coding agents find the right files ~65% of the time but recall only ~15-19% of the lines that actually matter — and that context efficiency predicts downstream resolve rate with Pearson r = 0.950.</description></item><item><title>SkillOpt: Training Agent Skills Like Neural Network Weights - Without Touching the Model</title><link>https://mllog.dev/en/posts/skillopt-text-space-optimizer-agent-skills/</link><pubDate>Fri, 29 May 2026 08:00:00 +0100</pubDate><guid>https://mllog.dev/en/posts/skillopt-text-space-optimizer-agent-skills/</guid><description>SkillOpt applies deep-learning-style optimization - bounded edit budgets, validation gating, rejected-edit memory - to natural-language skill documents, improving frozen LLMs by up to +39 points across 52/52 evaluated cells without changing a single model weight.</description></item><item><title>MolmoAct2: The First Fully Open Robot Controller That Beats Closed-Source Giants</title><link>https://mllog.dev/en/posts/molmoact2-action-reasoning-real-world-deployment/</link><pubDate>Sun, 10 May 2026 12:00:00 +0100</pubDate><guid>https://mllog.dev/en/posts/molmoact2-action-reasoning-real-world-deployment/</guid><description>MolmoAct2 is a fully open vision-language-action model that outperforms π0.5 and matches Gemini Robotics ER, achieving 97.2% on LIBERO and 87.1% real-world success via per-layer KV-cache conditioning and adaptive depth reasoning.</description></item><item><title>RecursiveMAS: What If Your Multi-Agent System Was Just One Big Recursive Neural Network?</title><link>https://mllog.dev/en/posts/recursive-multi-agent-systems/</link><pubDate>Sat, 02 May 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/recursive-multi-agent-systems/</guid><description>RecursiveMAS treats an entire multi-agent system as a single recursive computation in latent space, adding only 0.31% trainable parameters while achieving +8.3% accuracy, 2.4x speedup, and 75.6% token reduction over text-based multi-agent baselines.</description></item><item><title>Tstars-Tryon 1.0: Virtual Try-On as Multi-Image Editing at Taobao Scale</title><link>https://mllog.dev/en/posts/2026-04-26-tstars-tryon-virtual-try-on-mmdit/</link><pubDate>Sun, 26 Apr 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/2026-04-26-tstars-tryon-virtual-try-on-mmdit/</guid><description>How a unified 5B MMDiT trained with multi-reward RL and step distillation reframes virtual try-on as multi-image editing — and runs in under 4 seconds in production.</description></item><item><title>ClawGUI: A Full-Stack Open-Source Pipeline for GUI Agents</title><link>https://mllog.dev/en/posts/2026-04-15-clawgui-unified-framework-gui-agents/</link><pubDate>Wed, 15 Apr 2026 10:00:00 +0100</pubDate><guid>https://mllog.dev/en/posts/2026-04-15-clawgui-unified-framework-gui-agents/</guid><description>ClawGUI unifies online RL training, reproducible evaluation, and real-device deployment of GUI agents into one open-source pipeline — and shows a 2B model trained inside it can beat 72B untrained baselines on MobileWorld.</description></item><item><title>SkillClaw: Making LLM Agent Skills Evolve Collectively</title><link>https://mllog.dev/en/posts/2026-04-12-skillclaw-collective-skill-evolution-llm-agents/</link><pubDate>Sun, 12 Apr 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/2026-04-12-skillclaw-collective-skill-evolution-llm-agents/</guid><description>SkillClaw is a framework for collective skill evolution in multi-user LLM agent ecosystems. Instead of static skill libraries, the system automatically learns from interactions across users and propagates improvements to everyone.</description></item><item><title>TAPS: Why Your Draft Model's Training Data Matters More Than Its Architecture</title><link>https://mllog.dev/en/posts/taps-task-aware-speculative-decoding/</link><pubDate>Sat, 28 Mar 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/taps-task-aware-speculative-decoding/</guid><description>&lt;p>Speculative decoding is one of the most elegant tricks in LLM inference: a small, fast &lt;span class="glossary-term" tabindex="0">
&lt;span class="glossary-word">draft model&lt;/span>
&lt;span class="glossary-tooltip">
&lt;strong>draft model&lt;/strong>
&lt;span class="glossary-def">A lightweight language model that quickly proposes candidate tokens. A larger &amp;lsquo;verifier&amp;rsquo; model then checks these proposals in parallel, accepting correct ones and rejecting wrong ones - accelerating generation without changing output quality.&lt;/span>
&lt;/span>
&lt;/span>
proposes tokens, and a large &lt;span class="glossary-term" tabindex="0">
&lt;span class="glossary-word">verifier&lt;/span>
&lt;span class="glossary-tooltip">
&lt;strong>verifier&lt;/strong>
&lt;span class="glossary-def">The full-size target language model that checks draft proposals. It processes all candidates in one forward pass and accepts those matching its own distribution, guaranteeing identical output quality to standard autoregressive decoding.&lt;/span>
&lt;/span>
&lt;/span>
approves or rejects them in parallel. Same output distribution, fewer expensive forward passes.&lt;/p></description></item><item><title>Demystifying Video Reasoning: Models Don't Think in Frames - They Think in Denoising Steps</title><link>https://mllog.dev/en/posts/demystifying-video-reasoning-chain-of-steps/</link><pubDate>Tue, 17 Mar 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/demystifying-video-reasoning-chain-of-steps/</guid><description>&lt;p>Video generation models like Sora can solve mazes, manipulate objects, and answer math questions - all by generating video. But &lt;strong>how&lt;/strong> do they reason? The intuitive answer: step by step, frame by frame, like a person drawing a solution on a whiteboard.&lt;/p>
&lt;p>That answer is wrong.&lt;/p>
&lt;p>The paper &lt;strong>&amp;ldquo;Demystifying Video Reasoning&amp;rdquo;&lt;/strong> shows that reasoning in video diffusion models doesn&amp;rsquo;t unfold across frames. It unfolds &lt;strong>across denoising steps&lt;/strong> - the iterative process that turns noise into a coherent video. The authors call this &lt;strong>Chain-of-Steps (CoS)&lt;/strong>, and it fundamentally changes how we understand what these models are doing.&lt;/p></description></item><item><title>Seoul World Model: AI That Generates Video of Real Cities From Street Photos</title><link>https://mllog.dev/en/posts/seoul-world-model-city-scale-video-generation/</link><pubDate>Mon, 16 Mar 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/seoul-world-model-city-scale-video-generation/</guid><description>&lt;p>What if you could fly a virtual camera through any street in a real city — not a game engine, not a pre-recorded video, but a freshly generated, photorealistic view based on actual street photos?&lt;/p>
&lt;p>That&amp;rsquo;s exactly what the &lt;strong>Seoul World Model (SWM)&lt;/strong> does. The paper &lt;strong>&amp;ldquo;Grounding World Simulation Models in a Real-World Metropolis&amp;rdquo;&lt;/strong> introduces a city-scale &lt;span class="glossary-term" tabindex="0">
&lt;span class="glossary-word">world model&lt;/span>
&lt;span class="glossary-tooltip">
&lt;strong>world model&lt;/strong>
&lt;span class="glossary-def">A neural network that learns the dynamics and visual appearance of an environment, allowing it to &amp;lsquo;imagine&amp;rsquo; new views and trajectories it has never seen directly.&lt;/span>
&lt;/span>
&lt;/span>
that generates video grounded in real geography — not in imagined scenes.&lt;/p></description></item><item><title>Lost in Stories: How LLMs Lose the Thread in Long Narratives</title><link>https://mllog.dev/en/posts/constory-consistency-bugs-long-stories-llm/</link><pubDate>Mon, 09 Mar 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/constory-consistency-bugs-long-stories-llm/</guid><description>&lt;p>Ask any language model to write a 10,000-word story. On page one, the hero has blue eyes. By page five — brown. In chapter three it&amp;rsquo;s Thursday; in chapter six, the same day is suddenly Saturday. A character who died on page seven is chatting away on page ten.&lt;/p>
&lt;p>Sound familiar? The paper &lt;strong>&amp;ldquo;Lost in Stories: Consistency Bugs in Long Story Generation by LLMs&amp;rdquo;&lt;/strong> systematically investigates this problem for the first time — and the results are sobering. Even the best models produce an average of &lt;strong>one consistency error per 10,000 words&lt;/strong>, and human experts catch only 17% of them.&lt;/p></description></item><item><title>Utonia: One Encoder For All Point Clouds</title><link>https://mllog.dev/en/posts/utonia-one-encoder-all-point-clouds/</link><pubDate>Sat, 07 Mar 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/utonia-one-encoder-all-point-clouds/</guid><description>&lt;p>A LiDAR on a self-driving car, a depth camera in a home robot, a satellite scanner, and a CAD model from a 3D printer — each produces a &lt;span class="glossary-term" tabindex="0">
&lt;span class="glossary-word">point cloud&lt;/span>
&lt;span class="glossary-tooltip">
&lt;strong>point cloud&lt;/strong>
&lt;span class="glossary-def">A set of 3D points (x, y, z) representing the shape of an object or scene. Each point can carry additional attributes: color, normal, intensity.&lt;/span>
&lt;/span>
&lt;/span>
, but with radically different density, scale, and geometry. Until now, each domain required its own model. The paper &lt;strong>&amp;ldquo;Utonia: Toward One Encoder for All Point Clouds&amp;rdquo;&lt;/strong> breaks this pattern — one encoder, 137M parameters, five domains, and emergent behaviors nobody expected.&lt;/p></description></item><item><title>SAGE: Your Reasoning Model Knows When to Stop Thinking — You Just Won't Let It</title><link>https://mllog.dev/en/posts/sage-reasoning-model-knows-when-to-stop-thinking/</link><pubDate>Mon, 23 Feb 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/sage-reasoning-model-knows-when-to-stop-thinking/</guid><description>&lt;p>Reasoning models generate long chains of thought to arrive at answers. But what if over half of those &amp;ldquo;thoughts&amp;rdquo; are useless noise, and the model has known the answer for a while — it just doesn&amp;rsquo;t know it can stop? The paper &lt;strong>&amp;ldquo;Does Your Reasoning Model Implicitly Know When to Stop Thinking?&amp;rdquo;&lt;/strong> discovers that this is exactly the case, and proposes &lt;strong>SAGE&lt;/strong> — a method that cuts token usage by 40-50% while maintaining or improving accuracy.&lt;/p></description></item><item><title>When GPT Discovers Physics: A Breakthrough in Gluon Theory</title><link>https://mllog.dev/en/posts/gpt-discovers-physics-gluon-amplitudes/</link><pubDate>Sun, 15 Feb 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/gpt-discovers-physics-gluon-amplitudes/</guid><description>&lt;p>What happens when you ask artificial intelligence to solve a problem that theoretical physicists have worked on for decades? In a new publication from a team at Princeton, Harvard, Cambridge, and &lt;strong>OpenAI&lt;/strong>, &lt;span class="glossary-term" tabindex="0">
&lt;span class="glossary-word">GPT-5.2 Pro&lt;/span>
&lt;span class="glossary-tooltip">
&lt;strong>GPT-5.2 Pro&lt;/strong>
&lt;span class="glossary-def">The latest version of OpenAI&amp;rsquo;s language model, capable of advanced mathematical reasoning and formulating scientific hypotheses.&lt;/span>
&lt;/span>
&lt;/span>
was the first to propose a key formula describing gluon scattering — a formula that was then proven by another internal OpenAI model and verified by scientists by hand.&lt;/p></description></item><item><title>OPUS: How to Train LLMs 6x Faster by Choosing the Right Data</title><link>https://mllog.dev/en/posts/opus-efficient-data-selection-llm-pretraining/</link><pubDate>Fri, 13 Feb 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/opus-efficient-data-selection-llm-pretraining/</guid><description>&lt;p>Training large language models requires astronomical amounts of data and compute. But what if most of that data is &lt;span class="glossary-term" tabindex="0">
&lt;span class="glossary-word">redundant&lt;/span>
&lt;span class="glossary-tooltip">
&lt;strong>redundant&lt;/strong>
&lt;span class="glossary-def">Redundant data provides no new information to the learning process — the model already &amp;lsquo;knows&amp;rsquo; the patterns it contains.&lt;/span>
&lt;/span>
&lt;/span>
? The paper &lt;strong>&amp;ldquo;OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration&amp;rdquo;&lt;/strong> introduces a framework that achieves comparable results with &lt;strong>6x fewer &lt;span class="glossary-term" tabindex="0">
&lt;span class="glossary-word">tokens&lt;/span>
&lt;span class="glossary-tooltip">
&lt;strong>tokens&lt;/strong>
&lt;span class="glossary-def">A token is the basic unit of text in LLMs — it can be a word, part of a word, or a character. Models process text as sequences of tokens.&lt;/span>
&lt;/span>
&lt;/span>
&lt;/strong> by intelligently selecting what the model should learn from at each step.&lt;/p></description></item><item><title>Green-VLA: One AI Brain for All Robots</title><link>https://mllog.dev/en/posts/green-vla-staged-vision-language-action-generalist-robots/</link><pubDate>Sun, 08 Feb 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/green-vla-staged-vision-language-action-generalist-robots/</guid><description>&lt;p>The quest for a universal robot—one that can seamlessly switch between tasks, platforms, and environments—has long been the holy grail of robotics research. The paper &lt;strong>&amp;ldquo;Green-VLA: Staged Vision-Language-Action Model for Generalist Robots&amp;rdquo;&lt;/strong> brings us closer to that vision with a revolutionary five-stage training framework that enables a single policy to control humanoids, mobile manipulators, and fixed-base robotic arms alike.&lt;/p>
&lt;hr>
&lt;h2 id="the-problem-one-robot-many-bodies">The Problem: One Robot, Many Bodies&lt;/h2>
&lt;p>Today&amp;rsquo;s robotic systems are typically specialists. A robotic arm in a factory excels at assembly but cannot navigate a warehouse. A mobile robot can move around but lacks fine manipulation skills. Training a separate AI for each type of robot is expensive, time-consuming, and fundamentally limits scalability.&lt;/p></description></item><item><title>To Grok Grokking: Why Neural Networks Sometimes Understand Late</title><link>https://mllog.dev/en/posts/grokking-provable-ridge-regression/</link><pubDate>Tue, 27 Jan 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/grokking-provable-ridge-regression/</guid><description>&lt;p>In machine learning, we expect a model to either learn or overfit. What we don&amp;rsquo;t expect is for a model to overfit first and then — much later, with no changes — suddenly start generalizing well. This phenomenon is called &lt;strong>grokking&lt;/strong>, and it has puzzled researchers since its discovery. A new paper finally explains why it happens and proves it mathematically — in the simplest possible setting.&lt;/p>
&lt;h2 id="what-is-grokking">What is Grokking?&lt;/h2>
&lt;p>Grokking was first observed in 2022 on small algorithmic tasks (like modular arithmetic). The pattern is striking:&lt;/p></description></item><item><title>Tensor Networks: A Mathematical Bridge Between Neural and Symbolic AI</title><link>https://mllog.dev/en/posts/tensor-networks-neuro-symbolic-ai/</link><pubDate>Fri, 23 Jan 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/tensor-networks-neuro-symbolic-ai/</guid><description>&lt;p>Neural networks excel at learning patterns from data. Symbolic AI excels at logical reasoning and interpretability. For decades, researchers have tried to combine them — with limited success. A new paper proposes an elegant mathematical framework that unifies both approaches: &lt;strong>tensor networks&lt;/strong>. The key insight? Both neural and symbolic computations can be expressed as tensor decompositions, and inference in both reduces to tensor contractions.&lt;/p>
&lt;h2 id="the-problem-two-worlds-that-dont-talk">The Problem: Two Worlds That Don&amp;rsquo;t Talk&lt;/h2>
&lt;p>Modern AI is split into two camps:&lt;/p></description></item><item><title>M²FMoE: When Experts Learn to Predict Floods</title><link>https://mllog.dev/en/posts/m2fmoe-extreme-adaptive-time-series-forecasting/</link><pubDate>Wed, 14 Jan 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/m2fmoe-extreme-adaptive-time-series-forecasting/</guid><description>&lt;p>Time series forecasting is one of the most important applications of machine learning — from demand prediction, through infrastructure monitoring, to flood forecasting. The problem? Standard models optimize for &lt;strong>typical&lt;/strong> cases. Yet it&amp;rsquo;s precisely the &lt;strong>atypical&lt;/strong> ones — extreme events — that are often most important to predict. &lt;strong>M²FMoE&lt;/strong> is a model that learns to predict both.&lt;/p>
&lt;h2 id="the-problem-extreme-events-break-standard-models">The Problem: Extreme Events Break Standard Models&lt;/h2>
&lt;p>Time series forecasting has made remarkable progress. Transformers, frequency-domain methods, and hybrid architectures achieve impressive results on benchmarks. But there&amp;rsquo;s a catch.&lt;/p></description></item><item><title>BALLAST: When a Bandit Teaches Your Database How Long to Wait</title><link>https://mllog.dev/en/posts/ballast-contextual-bandits-raft-timeouts/</link><pubDate>Mon, 05 Jan 2026 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/ballast-contextual-bandits-raft-timeouts/</guid><description>&lt;p>Imagine you&amp;rsquo;re a team leader. You send a message and wait for a response. How long do you wait before assuming your colleague has &amp;ldquo;disappeared&amp;rdquo;? Too short — and you panic for no reason. Too long — and the whole project stalls. &lt;strong>BALLAST&lt;/strong> is a system that teaches databases to answer this question automatically, using machine learning techniques.&lt;/p>
&lt;h2 id="the-problem-rafts-achilles-heel">The Problem: Raft&amp;rsquo;s Achilles Heel&lt;/h2>
&lt;p>&lt;strong>Raft&lt;/strong> is a consensus protocol — the way distributed databases (like etcd, Consul, CockroachDB) agree on who&amp;rsquo;s the &amp;ldquo;leader&amp;rdquo; and which data is current. It works like this:&lt;/p></description></item><item><title>AI Co-Scientist: Teaching Models to Write Research Plans Better Than Humans</title><link>https://mllog.dev/en/posts/ai-co-scientist-rubric-rewards/</link><pubDate>Tue, 30 Dec 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/ai-co-scientist-rubric-rewards/</guid><description>&lt;p>What if AI could not just answer questions, but actively &lt;strong>plan scientific research&lt;/strong>? Not generating text — creating coherent, novel experiment plans that experts rate as better than human-written ones. Sounds like science fiction? Researchers from Meta AI and partners just achieved this.&lt;/p>
&lt;h2 id="the-problem-how-do-you-grade-scientific-creativity">The Problem: How Do You Grade Scientific Creativity?&lt;/h2>
&lt;p>Training models for &amp;ldquo;closed&amp;rdquo; tasks (math, coding) is relatively straightforward — the answer is correct or not. But how do you evaluate a &lt;strong>research plan&lt;/strong>?&lt;/p></description></item><item><title>HyDRA: Teaching Your Phone to Understand Images Without Breaking the Bank</title><link>https://mllog.dev/en/posts/hydra-dynamic-rank-adaptation-mobile-vlm/</link><pubDate>Sat, 27 Dec 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/hydra-dynamic-rank-adaptation-mobile-vlm/</guid><description>&lt;p>Imagine teaching your phone to recognize photos of dishes and suggest recipes. The catch? Models capable of this are massive and require the computational power of a Google data center. &lt;strong>HyDRA&lt;/strong> is a clever method that adapts such models for mobile devices — without bankruptcy and without melting the planet.&lt;/p>
&lt;h2 id="the-problem-an-elephant-in-your-phone">The Problem: An Elephant in Your Phone&lt;/h2>
&lt;p>&lt;strong>Vision Language Models&lt;/strong> (VLMs) are AI models that understand both images and text simultaneously. You can show them a photo and ask &amp;ldquo;what do you see?&amp;rdquo; or &amp;ldquo;how do I fix this?&amp;rdquo;. Sounds great, but there&amp;rsquo;s a catch.&lt;/p></description></item><item><title>Comp-LLM: When an Army of Experts Beats a Giant – An Analysis of a Revolution in AI Architecture</title><link>https://mllog.dev/en/posts/comp-llm-composable-inference-framework-analysis-2025/</link><pubDate>Mon, 01 Dec 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/comp-llm-composable-inference-framework-analysis-2025/</guid><description>&lt;p>Have you ever wondered why the latest artificial intelligence models, like GPT-4 or Claude 3 Opus, are so enormous? We’re talking hundreds of billions or even trillions of parameters. These are digital monsters requiring massive amounts of energy and data-center-level infrastructure.&lt;/p>
&lt;p>For years, AI followed a simple rule:&lt;br>
&lt;strong>“Bigger means better.”&lt;/strong>&lt;br>
Want a smarter model? Add &lt;strong>more layers&lt;/strong>, &lt;strong>more data&lt;/strong>, &lt;strong>more GPUs&lt;/strong>.&lt;/p>
&lt;p>But — what if this is a dead end?&lt;/p></description></item><item><title>NVIDIA Nemotron Parse v1.1: The Complete Anatomy of the Digital Document Understanding Revolution</title><link>https://mllog.dev/en/posts/nvidia-nemotron-parse-v1-1-analiza-vlm-ocr-deep-dive/</link><pubDate>Wed, 26 Nov 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/nvidia-nemotron-parse-v1-1-analiza-vlm-ocr-deep-dive/</guid><description>&lt;p>Have you ever wondered why, in an age where Artificial Intelligence can generate images from scratch and write poetry, we still struggle with a task as trivial as copying a table from a PDF file to Excel? This is the paradox of today&amp;rsquo;s technology: we have sent rovers to Mars, but a supplier&amp;rsquo;s invoice in PDF format is still a &amp;ldquo;black box&amp;rdquo; for our computers. For decades, we lived in an era that could be called the &amp;ldquo;digital dark ages&amp;rdquo; of document processing. Our tools – classic OCR (Optical Character Recognition) engines – were like medieval scribes: capable of transcribing letters, but understanding not a word of what they wrote, and certainly not grasping what a table, chart, or complex mathematical formula was.&lt;/p></description></item><item><title>Cost-Constrained LLM Cascades — Meet C3PO</title><link>https://mllog.dev/en/posts/llm-cascades-cost-constrained-c3po/</link><pubDate>Fri, 14 Nov 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/llm-cascades-cost-constrained-c3po/</guid><description>&lt;p>Imagine you have an army of helpers — several different Large Language Models (LLMs), each capable of handling tasks from simple queries to complex reasoning.&lt;br>
But each helper &lt;em>costs&lt;/em> something: time, compute, or actual money if you&amp;rsquo;re using an API.&lt;/p>
&lt;p>So the question is:&lt;br>
Can we orchestrate these models wisely — starting from the cheapest one that might do the job, escalating only when needed — &lt;strong>without exceeding a cost budget&lt;/strong>?&lt;/p></description></item><item><title>Accurate Satellite Rain Forecasting with Physics-Conditioned Neural Networks</title><link>https://mllog.dev/en/posts/accurate-satellite-rain-forecasting/</link><pubDate>Mon, 10 Nov 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/accurate-satellite-rain-forecasting/</guid><description>&lt;p>Imagine this: you’re driving, clouds are gathering, and your weather app says “heavy rain in 15 minutes” — but there are no local radars, and it gets it wrong. Sounds familiar? That’s exactly the kind of problem tackled by the new research paper &lt;strong>&lt;em>Precipitation nowcasting of satellite data using physically conditioned neural networks&lt;/em>&lt;/strong> (by Antônio Catão et al.).&lt;/p>
&lt;p>The authors present a model that can forecast precipitation &lt;strong>using only satellite data&lt;/strong>, powered by a neural network that’s &lt;em>conditioned by physics&lt;/em>. In short: less “black box” magic, more scientific reasoning — and better forecasts where radar coverage is weak or nonexistent.&lt;/p></description></item><item><title>A Universal Crime Predictor – How Hypernetworks and Knowledge Graphs Are Transforming Forecasting</title><link>https://mllog.dev/en/posts/pinfm-foundation-model/</link><pubDate>Thu, 06 Nov 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/pinfm-foundation-model/</guid><description>&lt;p>Imagine this: you&amp;rsquo;re in a new city that’s just starting to collect crime data – but the types of crimes differ completely from those in your city.&lt;br>
Is it possible to train &lt;strong>one model&lt;/strong> that works across both cities?&lt;/p>
&lt;p>That’s the question tackled by the recent paper&lt;br>
📄 &lt;em>Learning A Universal Crime Predictor with Knowledge-guided Hypernetworks&lt;/em> by Fidan Karimova et al.,&lt;br>
which introduces a framework called &lt;strong>HYSTL (HYpernetwork-enhanced Spatial Temporal Learning)&lt;/strong>.&lt;/p></description></item><item><title>SNOO – Old-School Nesterov Momentum in a New Jacket: Making Big Models Learn Faster</title><link>https://mllog.dev/en/posts/snoo-optimizer-nesterov-momentum/</link><pubDate>Mon, 20 Oct 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/snoo-optimizer-nesterov-momentum/</guid><description>&lt;p>Imagine you’re training a massive language model — the kind that takes &lt;strong>weeks&lt;/strong> to learn even the basics. Every training step costs time, electricity, and a small fortune. In such a world, even a tiny bump in efficiency feels like finding a way to get free coffee at work — small, but sweet.&lt;/p>
&lt;p>Enter &lt;strong>SNOO – Step-K Nesterov Outer Optimizer&lt;/strong>, a clever idea that takes &lt;strong>Nesterov momentum&lt;/strong>, a decades-old optimization trick, and applies it in a new place — &lt;em>outside&lt;/em> the normal training loop.&lt;br>
The result? Models that learn faster and more smoothly, without much extra computational cost.&lt;/p></description></item><item><title>“Who Said Neural Networks Aren’t Linear?” — explained like over coffee</title><link>https://mllog.dev/en/posts/who-said-neural-networks-arent-linear/</link><pubDate>Fri, 10 Oct 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/who-said-neural-networks-arent-linear/</guid><description>&lt;p>Alright, let’s start simple. Everyone who’s dabbled a bit in machine learning knows one thing: &lt;strong>neural networks are nonlinear&lt;/strong>. That’s what makes them powerful — they can model weird, curvy, complex relationships, not just straight lines.&lt;/p>
&lt;p>But the authors of the paper &lt;em>“Who Said Neural Networks Aren’t Linear?”&lt;/em> (Nimrod Berman, Assaf Hallak, Assaf Shocher) asked a cheeky question: &lt;strong>what if that’s not entirely true?&lt;/strong> What if nonlinearity is just… a matter of perspective?&lt;/p></description></item><item><title>CHORD — Smart On-Device Recommendations Without Killing Your Battery</title><link>https://mllog.dev/en/posts/chord-smart-on-device-recommendations/</link><pubDate>Mon, 06 Oct 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/chord-smart-on-device-recommendations/</guid><description>&lt;p>In apps like online stores, streaming platforms, or social media, we want to show users things they might like —&lt;br>
&lt;em>“Hey, maybe you’ll enjoy this too.”&lt;/em>&lt;br>
That’s what &lt;strong>recommendation systems&lt;/strong> do.&lt;/p>
&lt;p>Usually, those models live in the &lt;strong>cloud&lt;/strong> — big servers crunch data and send you suggestions.&lt;br>
But lately, more and more of that work is moving &lt;strong>onto the user’s device&lt;/strong> (phone, tablet).&lt;br>
Why? Because:&lt;/p>
&lt;ul>
&lt;li>it’s faster (less waiting),&lt;/li>
&lt;li>it’s more private (fewer data uploads),&lt;/li>
&lt;li>it saves server resources.&lt;/li>
&lt;/ul>
&lt;p>But here’s the catch: devices vary.&lt;br>
Some phones are monsters, others barely keep up.&lt;br>
So how do you fit a good AI model on both?&lt;/p></description></item><item><title>Attention as a Compass – Teaching Reasoning Models to Explore Smarter</title><link>https://mllog.dev/en/posts/teaching-reasoning-models-to-explore-smarter/</link><pubDate>Wed, 01 Oct 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/teaching-reasoning-models-to-explore-smarter/</guid><description>&lt;p>Large Language Models (LLMs) are no longer just text generators — they are becoming &lt;strong>reasoners&lt;/strong>, capable of solving mathematical problems, logical puzzles, or planning tasks step by step.&lt;br>
One of the key challenges is &lt;strong>how to improve the quality of this reasoning&lt;/strong>. Traditional Reinforcement Learning (RL) rewards only the final outcome, but in complex reasoning it makes more sense to evaluate &lt;strong>each intermediate step&lt;/strong>. This is called &lt;strong>process-supervised RL (PSRL)&lt;/strong>.&lt;/p></description></item><item><title>No Prior, No Leakage – can we really reconstruct data from a neural network?</title><link>https://mllog.dev/en/posts/can-we-really-reconstruct-data-from-a-neural-network/</link><pubDate>Fri, 26 Sep 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/can-we-really-reconstruct-data-from-a-neural-network/</guid><description>&lt;p>In the era of artificial intelligence, &lt;strong>privacy protection&lt;/strong> is one of the hottest topics. Neural networks often &amp;ldquo;memorize&amp;rdquo; pieces of training data. In extreme cases, an attacker could try to &lt;strong>reconstruct&lt;/strong> the original examples just from the trained model&amp;rsquo;s parameters (so-called &lt;em>reconstruction attacks&lt;/em>). Imagine a medical model that could reveal fragments of sensitive patient images — alarming, right?&lt;/p>
&lt;p>The new paper &lt;strong>“No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks”&lt;/strong> (&lt;a href="https://arxiv.org/pdf/2509.21296">arxiv.org&lt;/a>) challenges this fear. It shows that without additional knowledge (&lt;em>priors&lt;/em>), reconstruction is &lt;strong>fundamentally undecidable&lt;/strong>. In other words: model parameters alone may not be enough to recover the training data.&lt;/p></description></item><item><title>How to Detect Credit Card Fraud?</title><link>https://mllog.dev/en/posts/how-to-detect-credit-card-fraud/</link><pubDate>Sun, 21 Sep 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/how-to-detect-credit-card-fraud/</guid><description>&lt;p>Today, credit card transactions are everywhere — online shopping, bill payments, travel, etc. Unfortunately, the number of fraud cases is also growing. The challenge is that frauds are &lt;strong>very rare&lt;/strong> compared to normal transactions. This means that simple models trained on raw data often “ignore” these rare cases — because statistically, it’s cheaper to be wrong on a few frauds than on thousands of normal payments.&lt;/p>
&lt;p>The paper &lt;em>“Credit Card Fraud Detection”&lt;/em> (arXiv:2509.15044) analyzes how to improve fraud detection by applying data preprocessing techniques (class balancing) and comparing several models. This is crucial because the effectiveness of such systems has real-world consequences — for banks, payment platforms, and user security.&lt;/p></description></item><item><title>JANUS – how to fool Graph Neural Networks and what it teaches us</title><link>https://mllog.dev/en/posts/janus-how-to-fool-graph-neural-networks/</link><pubDate>Wed, 17 Sep 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/janus-how-to-fool-graph-neural-networks/</guid><description>&lt;p>Graph Neural Networks (GNNs) are among the most powerful tools in modern AI. They can analyze data structured as nodes and connections – like social networks, financial links, protein structures, or transportation systems.&lt;br>
But success comes with risk: GNNs can be &lt;strong>attacked&lt;/strong>. A new research paper introduces &lt;strong>JANUS&lt;/strong> – a framework that learns to inject fake nodes into graphs in a way that is extremely hard to detect. While framed as an attack, the insights are equally valuable for building defenses.&lt;/p></description></item><item><title>Quantum Trading – AI and Quantum Computing in Investing</title><link>https://mllog.dev/en/posts/quantum-trading-ai-quantum-computing/</link><pubDate>Mon, 15 Sep 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/quantum-trading-ai-quantum-computing/</guid><description>&lt;p>Imagine your computer not only analyzing financial charts but also learning to make investment decisions on its own – faster and smarter than a human. Now add a touch of &lt;strong>quantum physics&lt;/strong>. Sounds like science fiction? Yet, recent research shows that combining &lt;strong>reinforcement learning&lt;/strong>, &lt;strong>quantum-inspired neural networks&lt;/strong>, and classical financial data can provide a real edge in trading.&lt;/p>
&lt;p>This is exactly the focus of a publication from National Taiwan Normal University and Wells Fargo. The researchers built a trading agent that uses &lt;strong>quantum-enhanced neural networks&lt;/strong> to trade the USD/TWD (US Dollar/Taiwan Dollar) currency pair.&lt;/p></description></item><item><title>Reinforcement Learning in Pinterest Ads – DRL-PUT in action!</title><link>https://mllog.dev/en/posts/reinforcement-learning-in-pinterest-ads/</link><pubDate>Mon, 08 Sep 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/reinforcement-learning-in-pinterest-ads/</guid><description>&lt;p>Can the effectiveness of an advertising system be improved by &lt;strong>almost 10%&lt;/strong> simply by tuning the weights in the ranking function more intelligently?&lt;br>
It turns out the answer is yes – and that’s exactly what the paper &lt;em>Deep Reinforcement Learning for Ranking Utility Tuning in the Ad Recommender System at Pinterest&lt;/em> (arXiv:2509.05292) is about.&lt;/p>
&lt;p>Traditionally, ad ranking relies on a &lt;strong>utility function&lt;/strong> – a linear combination of multiple model predictions, such as CTR (click-through rate), conversion probability, or other business metrics.&lt;br>
The problem? The weights of these predictors were historically tuned &lt;strong>manually&lt;/strong> by engineers. This approach:&lt;/p></description></item><item><title>The Anatomy of AI Lies: How Language Models Can Deceive Us</title><link>https://mllog.dev/en/posts/the-anatomy-of-ai-lies/</link><pubDate>Fri, 05 Sep 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/the-anatomy-of-ai-lies/</guid><description>&lt;p>We’re used to hearing that AI sometimes “hallucinates” — making funny or random mistakes. Hallucinations are &lt;strong>unintended errors&lt;/strong> caused by the limits of statistical prediction. But the new research goes further: it shows that AI can &lt;strong>knowingly choose to lie&lt;/strong> when deception helps it achieve a goal.&lt;/p>
&lt;p>The publication &lt;em>Can LLMs Lie?&lt;/em> takes us into a world where AI acts more like a &lt;strong>strategic agent&lt;/strong>, capable of manipulating information to maximize outcomes.&lt;/p></description></item><item><title>Edge AI: How to Accelerate Neural Networks on Specialized Hardware</title><link>https://mllog.dev/en/posts/accelerate_neural_network/</link><pubDate>Mon, 01 Sep 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/accelerate_neural_network/</guid><description>&lt;p>Modern science, especially in the field of high-energy physics, generates unimaginable amounts of data. Experiments like the LCLS-II free-electron laser (FEL) at the SLAC National Accelerator Laboratory produce terabytes of data per second. Transmitting and storing all of it is impractical. The solution is to intelligently select data in real-time, right at the source. The publication &lt;strong>&amp;ldquo;Neural Network Acceleration on MPSoC board: Integrating SLAC&amp;rsquo;s SNL, Rogue Software and Auto-SNL&amp;rdquo;&lt;/strong> is a fascinating case study of how to achieve this using artificial intelligence and specialized hardware.&lt;/p></description></item><item><title>Global Guarantees of Robustness: A Probabilistic Approach to AI Safety</title><link>https://mllog.dev/en/posts/global-guarantees-of-robustness-probablistic-apporach/</link><pubDate>Wed, 27 Aug 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/global-guarantees-of-robustness-probablistic-apporach/</guid><description>&lt;p>Modern machine learning models, from image recognition systems to large language models, have achieved impressive capabilities. However, their strength can be deceptive. One of the biggest challenges in the field of AI is their vulnerability to &lt;strong>adversarial attacks&lt;/strong>. These are intentionally crafted, small perturbations to input data (e.g., changing a few pixels in an image) that are imperceptible to humans but can completely fool the model, leading to incorrect and often absurd decisions.&lt;/p></description></item><item><title>Intern-S1: The New AI Scientist That's Redefining Research</title><link>https://mllog.dev/en/posts/intern-s1-ai-scientist/</link><pubDate>Sat, 23 Aug 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/intern-s1-ai-scientist/</guid><description>&lt;p>Artificial intelligence has already transformed many industries, but the world of scientific research has been waiting for a true game-changer. While general AI models are powerful, they often lack the specialized knowledge needed for deep scientific inquiry. Enter &lt;strong>Intern-S1&lt;/strong>, a new multimodal foundation model that&amp;rsquo;s set to bridge this gap and accelerate a new era of discovery.&lt;/p>
&lt;p>Developed by the Shanghai AI Laboratory, Intern-S1 is not just another large language model. It&amp;rsquo;s a &lt;strong>specialized generalist&lt;/strong>, designed from the ground up to understand and process complex scientific data in various formats, from text and images to time-series data.&lt;/p></description></item><item><title>Exploring MCFRCL: A New Perspective on Continual Learning</title><link>https://mllog.dev/en/posts/mcfrcl-continual-learning/</link><pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/mcfrcl-continual-learning/</guid><description>&lt;p>In the world of artificial intelligence, &lt;strong>Continual Learning&lt;/strong> is one of the biggest challenges. The goal is to enable AI models to learn new things sequentially without forgetting what they have learned before. This is a key ability that brings us closer to creating truly intelligent systems capable of adapting to a dynamically changing world.&lt;/p>
&lt;p>Unfortunately, traditional neural networks suffer from so-called &lt;strong>catastrophic forgetting&lt;/strong>. When they learn a new task, they tend to overwrite the knowledge gained from previous tasks. The publication &amp;ldquo;Monte Carlo Functional Regularisation for Continual Learning&amp;rdquo; (arXiv:2508.13006) by Pengcheng Hao, Menghao Waiyan William Zhu, and Ercan Engin Kuruoglu presents an innovative approach to this problem.&lt;/p></description></item><item><title>Look Inside Seamless Flow's Hyper-Efficient Training</title><link>https://mllog.dev/en/posts/seamless-flow-ai-factory/</link><pubDate>Mon, 18 Aug 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/seamless-flow-ai-factory/</guid><description>&lt;p>We are in the midst of an AI gold rush, where companies are investing billions to build increasingly intelligent models. The final, crucial step in this process is often Reinforcement Learning (RL), the &amp;ldquo;finishing school&amp;rdquo; where an AI agent learns to master complex tasks through trial and error. However, this training process at an industrial scale is plagued by two crippling problems: crippling inefficiency and maddening complexity. It&amp;rsquo;s like trying to run a state-of-the-art factory where half the machines are always idle and every product requires a complete retooling of the assembly line.&lt;/p></description></item><item><title>Systematization of Knowledge: Data Minimization in Machine Learning</title><link>https://mllog.dev/en/posts/sok-data-minimization-in-machine-learning/</link><pubDate>Fri, 15 Aug 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/sok-data-minimization-in-machine-learning/</guid><description>&lt;p>Modern systems based on Machine Learning (ML) are ubiquitous, from credit scoring to fraud detection. The conventional wisdom is that more data leads to better models. However, this data-centric approach directly conflicts with a fundamental legal principle: &lt;strong>data minimization (DM)&lt;/strong>. This principle, enshrined in key regulations like the GDPR in Europe and the CPRA in California, mandates that personal data collection and processing must be &amp;ldquo;adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed&amp;rdquo;.&lt;/p></description></item><item><title>Learning Machines That Don't Forget: A New Method for Evolving Data</title><link>https://mllog.dev/en/posts/learning_machines/</link><pubDate>Thu, 14 Aug 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/learning_machines/</guid><description>&lt;p>Imagine you&amp;rsquo;re learning to play chess. You master all the rules, strategies, and openings. You become a pretty good player. Now, someone introduces a new piece with completely new rules of movement. As you learn to play with this new piece, do you forget how to move a pawn or a knight? Of course not. Your brain can integrate new knowledge without losing what it has already acquired. Unfortunately, for many artificial intelligence systems, this is a huge challenge, known as &lt;strong>&amp;ldquo;catastrophic forgetting&amp;rdquo;&lt;/strong>.&lt;/p></description></item><item><title>A Deep Dive into the Text-to-SQL Revolution: Analyzing the Adaptive Method</title><link>https://mllog.dev/en/posts/deep-dive-text-to-sql-adaptive-method/</link><pubDate>Mon, 11 Aug 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/deep-dive-text-to-sql-adaptive-method/</guid><description>&lt;p>In the era of Big Data, data has become an organization&amp;rsquo;s most valuable asset. However, access to it is often limited by a technical barrier: the need to use query languages like SQL. For years, analysts and engineers have dreamed of a system that would allow them to &amp;ldquo;talk&amp;rdquo; to a database in natural language. &lt;strong>Text-to-SQL&lt;/strong> systems aim to realize this vision, but their path has been challenging. Older models, though promising, often failed in real-world scenarios: they were &amp;ldquo;brittle,&amp;rdquo; struggled with unseen database schemas, and required costly fine-tuning for each new domain.&lt;/p></description></item><item><title>Dynamic Fine-Tuning (DFT): How a Single Line of Code is Revolutionizing AI Training</title><link>https://mllog.dev/en/posts/dynamic-fine-tuning-dft/</link><pubDate>Mon, 11 Aug 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/dynamic-fine-tuning-dft/</guid><description>&lt;p>In an era where &lt;strong>Large Language Models (LLMs)&lt;/strong> like GPT-4 or Llama seem to understand the world, a fundamental challenge remains: how to teach them effectively and efficiently? The standard method is &lt;strong>Supervised Fine-Tuning (SFT)&lt;/strong>, which involves &amp;ldquo;feeding&amp;rdquo; the model thousands of examples of correct responses. However, as the groundbreaking paper &lt;strong>&amp;ldquo;On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification&amp;rdquo;&lt;/strong> (arXiv:2508.05629) points out, SFT has a hidden flaw that limits its true potential.&lt;/p></description></item><item><title>ASkDAgger: How Artificial Intelligence Learns More Effectively by Asking Questions</title><link>https://mllog.dev/en/posts/askdagger-interactive-imitation-learning/</link><pubDate>Fri, 08 Aug 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/askdagger-interactive-imitation-learning/</guid><description>&lt;p>In a world where robots and AI systems increasingly learn through observation and interaction with humans, the efficiency of this process remains a key challenge. Traditional Imitation Learning methods often require a human teacher to constantly supervise and correct errors, which is time-consuming and costly. A team of researchers led by Jelle Luijkx proposes a groundbreaking solution in their latest paper, &lt;strong>&amp;ldquo;ASkDAgger: Active Skill-level Data Aggregation for Interactive Imitation Learning.&amp;rdquo;&lt;/strong>&lt;/p></description></item><item><title>CaPulse: Teaching Machines to Hear the Rhythm of Data</title><link>https://mllog.dev/en/posts/capulse-teaching-machines-to-hear-the-rhythm-of-data/</link><pubDate>Thu, 07 Aug 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/capulse-teaching-machines-to-hear-the-rhythm-of-data/</guid><description>&lt;p>Can computers learn to &amp;ldquo;hear&amp;rdquo; the rhythm in a stream of data, much like we hear the rhythm in music? And by using this skill, can they better protect us from equipment failures, financial fraud, or health problems? A new scientific paper titled &lt;strong>&amp;ldquo;CaPulse: Detecting Anomalies by Tuning in to the Causal Rhythms of Time Series&amp;rdquo;&lt;/strong> attempts to answer these questions.&lt;/p>
&lt;h3 id="the-problem-with-anomalies">The Problem with Anomalies&lt;/h3>
&lt;p>We live in a world of data. From our heartbeats and stock market fluctuations to energy consumption in a smart city—all of this is &lt;strong>time series&lt;/strong> data, collected at regular intervals. Often lurking within this data are &lt;strong>anomalies&lt;/strong>: strange, unexpected events that can signal a problem. This could be a sudden cardiac arrhythmia, a suspicious bank transaction, or an impending engine failure in a factory.&lt;/p></description></item><item><title>Goedel-Prover-V2: A Revolution in Automated Theorem Proving</title><link>https://mllog.dev/en/posts/goedel-prover-v2-revolution-in-automated-theorem-proving/</link><pubDate>Wed, 06 Aug 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/goedel-prover-v2-revolution-in-automated-theorem-proving/</guid><description>&lt;p>In a world where artificial intelligence (AI) is solving increasingly complex problems, formal mathematical theorem proving remains one of the toughest challenges. It&amp;rsquo;s the Mount Everest of machine reasoning, demanding not only immense computational power but, above all, deep, logical deduction. The scientific paper &lt;strong>&amp;ldquo;Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction&amp;rdquo;&lt;/strong> introduces a breakthrough system that elevates automated proving to a new level. 🤖&lt;/p>
&lt;hr>
&lt;h2 id="system-architecture">System Architecture&lt;/h2>
&lt;p>At the heart of &lt;strong>Goedel-Prover-V2&lt;/strong> is an advanced language model, specially trained and adapted to work with proof assistants like &lt;strong>Lean&lt;/strong>. The system&amp;rsquo;s architecture is based on a cyclical interaction between several key components:&lt;/p></description></item><item><title>How to Teach AI to Handle Mistakes? Meet ε-Softmax</title><link>https://mllog.dev/en/posts/esoftmax-approximating-one-hot-vectors/</link><pubDate>Tue, 05 Aug 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/esoftmax-approximating-one-hot-vectors/</guid><description>&lt;p>In the world of artificial intelligence, data is the fuel that powers machine learning models. But what if that fuel is contaminated? Mislabeled data, known as &lt;strong>label noise&lt;/strong>, is a huge problem that can cause even the best algorithms to learn complete nonsense. The paper &amp;ldquo;ε-Softmax: Approximating One-Hot Vectors for Mitigating Label Noise,&amp;rdquo; accepted at the prestigious NeurIPS 2024 conference, offers an elegant solution.&lt;/p>
&lt;h2 id="the-problem-when-a-model-blindly-trusts-its-labels">The Problem: When a Model Blindly Trusts Its Labels&lt;/h2>
&lt;p>Let&amp;rsquo;s imagine we&amp;rsquo;re training a model to recognize animals. We show it a picture of a cute cat. In the traditional approach, we give it an absolutely certain piece of information, a so-called one-hot vector:&lt;/p></description></item><item><title>Simple and Effective Method for Uncertainty Quantification</title><link>https://mllog.dev/en/posts/quantitive_effective/</link><pubDate>Mon, 04 Aug 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/quantitive_effective/</guid><description>&lt;p>In the field of machine learning, a model&amp;rsquo;s ability to assess its own confidence is crucial for its reliability, especially in high-stakes applications like medicine or autonomous vehicles. The arXiv paper &lt;strong>2508.00754&lt;/strong>, titled &lt;em>&amp;ldquo;A Simple and Effective Method for Uncertainty Quantification and OOD Detection&amp;rdquo;&lt;/em>, by Yaxin Ma, Benjamin Colburn, and Jose C. Principe, introduces an innovative and efficient approach to this problem. The paper focuses on two related concepts: uncertainty quantification and Out-of-Distribution (OOD) detection.&lt;/p></description></item><item><title>Deep Learning-based Prediction of Clinical Trial Enrollment with Uncertainty Estimates</title><link>https://mllog.dev/en/posts/deep-learning-clinical-trial/</link><pubDate>Sat, 02 Aug 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/deep-learning-clinical-trial/</guid><description>&lt;p>Clinical trial enrollment is a critical bottleneck in drug development: nearly 80% of trials fail to meet target enrollment, costing up to $8 million per day if delayed. In this work, we introduce a multimodal deep‐learning framework that not only predicts total participant count but also quantifies uncertainty around those predictions.&lt;/p>
&lt;h2 id="challenges-in-enrollment-forecasting">Challenges in Enrollment Forecasting&lt;/h2>
&lt;p>Traditional approaches fall into two camps:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Deterministic models&lt;/strong> – e.g. tabular ML like XGBoost or LightGBM – which output a point estimate but ignore variability in recruitment rates.&lt;/li>
&lt;li>&lt;strong>Stochastic models&lt;/strong> – e.g. Poisson or Poisson–Gamma processes – which simulate recruitment and give confidence intervals, but often struggle with high-dimensional, heterogeneous data.&lt;/li>
&lt;/ul>
&lt;h2 id="model-architecture">Model Architecture&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Inputs&lt;/strong>&lt;/p></description></item><item><title>Consensus-Driven Active Model Selection</title><link>https://mllog.dev/en/posts/coda-consensus-driven-active-model-selection/</link><pubDate>Fri, 01 Aug 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/coda-consensus-driven-active-model-selection/</guid><description>&lt;p>The paper &lt;strong>“Consensus-Driven Active Model Selection”&lt;/strong> introduces &lt;strong>CODA&lt;/strong>, a method that selects the best machine learning model using the predictions of many candidate models and minimal labeled data. CODA builds a probabilistic framework that leverages model &lt;strong>agreement and disagreement&lt;/strong> to guide which examples should be labeled next.&lt;/p>
&lt;h2 id="-key-concepts">🚀 Key Concepts&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Active model selection&lt;/strong>: Instead of labeling a full validation set, CODA selectively chooses which data points to label by estimating which would be most informative.&lt;/li>
&lt;li>&lt;strong>Consensus modeling&lt;/strong>: CODA uses a Bayesian adaptation of the Dawid-Skene model to evaluate model performance based on agreement among models.&lt;/li>
&lt;li>&lt;strong>PBest distribution&lt;/strong>: Represents the current belief about which model is best, updated with each newly labeled data point.&lt;/li>
&lt;/ul>
&lt;h2 id="-how-does-coda-work">🧪 How Does CODA Work?&lt;/h2>
&lt;ol>
&lt;li>&lt;strong>Model predictions&lt;/strong> are collected over unlabeled data.&lt;/li>
&lt;li>A &lt;strong>consensus label&lt;/strong> for each data point is calculated using a weighted sum of predictions from all models.&lt;/li>
&lt;li>Each model is assigned a &lt;strong>confusion matrix&lt;/strong> prior using a Dirichlet distribution:
$$
\theta_{k, c, c'} = \frac{\beta_{c, c'} + \alpha \hat{M}_{k, c, c'}}{T}
$$&lt;/li>
&lt;li>CODA updates a &lt;strong>probabilistic estimate&lt;/strong> over which model is best:
$$
PBest(h_k) = \int_0^1 f_k(x) \prod_{l \ne k} F_l(x) dx
$$&lt;/li>
&lt;li>It selects the next data point to label by maximizing &lt;strong>expected information gain&lt;/strong>:
$$
EIG(x_i) = H(PBest) - \sum_c \hat{\pi}(c \mid x_i) H(PBest^c)
$$&lt;/li>
&lt;/ol>
&lt;h2 id="-results">📊 Results&lt;/h2>
&lt;ul>
&lt;li>CODA outperforms previous state-of-the-art methods on &lt;strong>18 out of 26 benchmark tasks&lt;/strong>.&lt;/li>
&lt;li>Achieves optimal model selection with up to &lt;strong>70% fewer labels&lt;/strong> compared to baselines.&lt;/li>
&lt;li>Especially effective in multi-class tasks (e.g., DomainNet, WILDS).&lt;/li>
&lt;/ul>
&lt;h2 id="-limitations">❗ Limitations&lt;/h2>
&lt;ul>
&lt;li>In binary classification with &lt;strong>high data imbalance&lt;/strong>, CODA may underperform due to biased early estimates (e.g., CivilComments, CoLA datasets).&lt;/li>
&lt;li>CODA assumes that consensus is meaningful; highly divergent models may reduce effectiveness.&lt;/li>
&lt;/ul>
&lt;h2 id="-future-work">🔮 Future Work&lt;/h2>
&lt;ul>
&lt;li>Better priors from human knowledge or unsupervised features.&lt;/li>
&lt;li>Extension to &lt;strong>non-classification&lt;/strong> tasks and alternative metrics.&lt;/li>
&lt;li>Integration with &lt;strong>active learning&lt;/strong> and &lt;strong>active testing&lt;/strong> frameworks.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="links">Links&lt;/h2>
&lt;ul>
&lt;li>Based on the publication 📄 &lt;a href="https://arxiv.org/pdf/2507.23771">arXiv:2507.23771 PDF&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>RLVMR: Reinforcement Learning with Verifiable Meta‑Reasoning Rewards for Robust Long‑Horizon Agents</title><link>https://mllog.dev/en/posts/rlvmr-wzmocnione-uczenie-z-weryfikowalnymi-nagradzajacymi-meta-rozumowaniem/</link><pubDate>Thu, 31 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/rlvmr-wzmocnione-uczenie-z-weryfikowalnymi-nagradzajacymi-meta-rozumowaniem/</guid><description>&lt;p>The paper introduces &lt;strong>RLVMR&lt;/strong>, a novel framework for reinforcement learning (RL) that integrates &lt;strong>verifiable meta‑reasoning rewards&lt;/strong> to strengthen long‑horizon performance. It enables agents to generate internal explanatory signals and be explicitly evaluated using meta‑reasoning criteria, enhancing robustness and planning over extended trajectories :contentReference[oaicite:1]{index=1}.&lt;/p>
&lt;h2 id="contributions">Contributions&lt;/h2>
&lt;ol>
&lt;li>A formal definition of &lt;strong>meta‑reasoning rewards&lt;/strong>: agents receive additional reward signals based on the verifiability of reasoning chains.&lt;/li>
&lt;li>A verifiable protocol: using checkable reasoning traces to assess agent justification.&lt;/li>
&lt;li>Empirical validation on long‑horizon RL tasks showing improved performance vs. standard RL baselines :contentReference[oaicite:2]{index=2}.&lt;/li>
&lt;/ol>
&lt;h2 id="method">Method&lt;/h2>
&lt;p>Let the agent generate reasoning chain $r = (r_1,\dots,r_T)$ alongside actions $a_t$. The total reward is:
&lt;/p></description></item><item><title>How AI Can Reveal Where Your Honey Comes From — A Look at Mineral Fingerprints</title><link>https://mllog.dev/en/posts/honey-origin-classification-review-en/</link><pubDate>Wed, 30 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/honey-origin-classification-review-en/</guid><description>&lt;p>Ever wondered whether that expensive jar of &amp;ldquo;acacia honey&amp;rdquo; is the real deal? Or if the origin listed on the label truly reflects the soil and flowers it came from? In a new study, researchers used &lt;strong>machine learning and mineral analysis&lt;/strong> to uncover the botanical and geographical roots of honey — all without needing a microscope.&lt;/p>
&lt;h3 id="the-science-behind-it">The Science Behind It&lt;/h3>
&lt;p>When bees produce honey, they also carry tiny traces of minerals from the plants and soil around them. These &lt;strong>mineral fingerprints&lt;/strong> — elements like calcium, magnesium, or zinc — vary depending on the environment. By measuring them, we can build a kind of chemical signature for each honey.&lt;/p></description></item><item><title>Graph Structure Learning with Privacy Guarantees for Open Graph Data</title><link>https://mllog.dev/en/posts/graph-structure-learning-dp/</link><pubDate>Mon, 28 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/graph-structure-learning-dp/</guid><description>&lt;p>In the age of graph data – such as social networks, business relationship graphs, or knowledge maps – sharing these datasets for research or application purposes is increasingly common. But what if the structure of a graph itself contains sensitive information? Even without revealing the node contents, simply disclosing the existence of edges can lead to privacy breaches.&lt;/p>
&lt;p>Traditional approaches to &lt;em>Differential Privacy&lt;/em> (DP) focus on protecting data during model training. In this paper, the authors go a step further — they aim to protect privacy &lt;strong>at the moment of graph data publishing&lt;/strong>. They propose an elegant method based on &lt;strong>Gaussian Differential Privacy&lt;/strong> (GDP) that enables learning the structure of a graph while maintaining strong privacy guarantees.&lt;/p></description></item><item><title>Optimizing Call Center Operations with Reinforcement Learning: PPO vs. Value Iteration</title><link>https://mllog.dev/en/posts/call-center-ppo-vs-vi-en/</link><pubDate>Sat, 26 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/call-center-ppo-vs-vi-en/</guid><description>&lt;p>Can AI improve how call centers operate? The paper &lt;em>“Optimising Call Centre Operations using Reinforcement Learning: Value Iteration versus Proximal Policy Optimisation”&lt;/em> by Kwong Ho Li and Wathsala Karunarathne shows that it can — and with strong results. The authors compare two reinforcement learning (RL) approaches to optimize call routing: the classical Value Iteration (VI) and the modern Proximal Policy Optimisation (PPO).&lt;/p>
&lt;h2 id="what-is-reinforcement-learning">What is Reinforcement Learning?&lt;/h2>
&lt;p>Reinforcement Learning is an AI method where an agent takes actions in an environment and receives rewards based on how good those actions are. The goal is to maximize the cumulative reward — essentially, to learn the best decisions.&lt;/p></description></item><item><title>Efficient &amp; Geometrically-Smart: Linear Memory SE(2)-Invariant Attention Explained</title><link>https://mllog.dev/en/posts/efficient-geometrically-smart-linear-memory-se2-invariant-attention/</link><pubDate>Fri, 25 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/efficient-geometrically-smart-linear-memory-se2-invariant-attention/</guid><description>&lt;p>In many real-world tasks—like forecasting the paths of cars at a busy intersection, coordinating fleets of delivery robots, or simulating pedestrian movement—models must reason about not just &lt;em>where&lt;/em> things are, but &lt;em>how they face or rotate&lt;/em> relative to each other. That&amp;rsquo;s the &lt;strong>SE(2)&lt;/strong> geometry: 2D position + heading.&lt;/p>
&lt;p>Traditional Transformer models that account for rotation and translation invariance (SE(2)-invariant) need to compute &lt;em>relative poses&lt;/em> between every pair of objects. If you have $n$ objects, this leads to memory cost growing like $O(n^2)$—which becomes prohibitively expensive when $n$ is large.&lt;/p></description></item><item><title>A Lightweight AI Engine for Skin Cancer Detection on Wearable Devices</title><link>https://mllog.dev/en/posts/ai-engine-skin-cancer/</link><pubDate>Thu, 24 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/ai-engine-skin-cancer/</guid><description>&lt;p>Skin cancer is one of the most common cancers globally – and early detection significantly improves the chances of successful treatment. Unfortunately, many people lack access to dermatologists or advanced diagnostic tools. This research addresses the problem by bringing AI-based diagnostics to low-cost wearable devices.&lt;/p>
&lt;h2 id="what-did-the-authors-do">What did the authors do?&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Used MobileNetV2:&lt;/strong>&lt;br>
A compact neural network architecture optimized for mobile environments. With &lt;em>transfer learning&lt;/em>, the model was fine-tuned to classify skin lesions as &lt;em>cancerous&lt;/em> or &lt;em>non-cancerous&lt;/em>.&lt;/p></description></item><item><title>SOPHIA: Enhancing Slow‑Thinking in Large Vision‑Language Models</title><link>https://mllog.dev/en/posts/sophia-enhancing-slow-thinking/</link><pubDate>Wed, 23 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/sophia-enhancing-slow-thinking/</guid><description>&lt;p>In recent years, Large Vision‑Language Models (LVLMs) have shown impressive abilities to understand and generate text about images—but they often struggle with long, multi‑step reasoning. The paper &lt;em>“SOPHIA: Semi‑Off‑Policy Reinforcement Learning for Slow‑Thinking in LVLMs”&lt;/em> presents a new approach that significantly improves their capacity for &lt;strong>slow‑thinking reasoning&lt;/strong>.&lt;/p>
&lt;h1 id="what-is-slowthinking">What Is Slow‑Thinking?&lt;/h1>
&lt;p>Slow‑thinking is a deliberate, step‑by‑step reasoning process where the model:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Breaks down&lt;/strong> complex problems into smaller steps,&lt;/li>
&lt;li>&lt;strong>Verifies&lt;/strong> intermediate conclusions,&lt;/li>
&lt;li>&lt;strong>Provides&lt;/strong> transparency into each decision.&lt;/li>
&lt;/ul>
&lt;p>This contrasts with fast, intuitive “snap” judgments and helps avoid &lt;strong>hallucinations&lt;/strong>—invented details not supported by the image.&lt;/p></description></item><item><title>The Role of AI in Managing Satellite Constellations</title><link>https://mllog.dev/en/posts/the-role-of-ai-in-managing-satellite-constellations/</link><pubDate>Tue, 22 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/the-role-of-ai-in-managing-satellite-constellations/</guid><description>&lt;p>Modern satellite mega-constellations—groups of hundreds or thousands of small satellites working together—are transforming how we connect the world. Yet, managing these networks presents unique challenges: constantly moving nodes, limited onboard computing power, and a need to minimize communication delays.&lt;/p>
&lt;p>The ConstellAI project, supported by the European Space Agency, explores how artificial intelligence (AI) can optimize two critical tasks:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Data Routing&lt;/strong>: Choosing the best path through the network to send data quickly and reliably.&lt;/li>
&lt;li>&lt;strong>Resource Allocation&lt;/strong>: Distributing limited resources (bandwidth, power, time slots) among satellites and ground stations.&lt;/li>
&lt;/ol>
&lt;h2 id="data-routing-with-reinforcement-learning">Data Routing with Reinforcement Learning&lt;/h2>
&lt;p>Traditional routing algorithms, like finding the shortest path on a map, don’t account for traffic jams (long queues) at network nodes. ConstellAI uses a technique called &lt;em>reinforcement learning&lt;/em> (RL). In RL, a software agent learns from experience: it tries different routes, observes delays, and gradually discovers which paths minimize overall transit time.&lt;/p></description></item><item><title>On the Fundamental Limitations of Dual Static CVaR Decompositions in Markov Decision Processes</title><link>https://mllog.dev/en/posts/fundamental-limitations-dual-cvar-decompositions/</link><pubDate>Mon, 21 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/fundamental-limitations-dual-cvar-decompositions/</guid><description>&lt;p>When making decisions—from financial investments to routing autonomous vehicles—we care not only about average outcomes but also about &lt;strong>risk&lt;/strong>. A widely used risk metric is the Conditional Value at Risk, or &lt;strong>CVaR&lt;/strong>, defined for confidence level $\alpha\in(0,1)$ by:
&lt;/p>
$$
CVaR_\alpha(X) =\inf_{\xi}\{\xi + \tfrac{1}{1-\alpha}\,E[(X-\xi)_+]\}.
$$&lt;p>
In their recent paper, Godbout and Durand (2025) examine how to reliably compute this metric in Markov Decision Processes (MDPs). They reveal that the most common method—the &lt;strong>dual&lt;/strong> decomposition—suffers from inherent limitations.&lt;/p></description></item><item><title>PinFM: Foundation Model for User Activity Sequences at a Billion-Scale Visual Discovery Platform</title><link>https://mllog.dev/en/posts/pinfm-foundation-model/</link><pubDate>Sun, 20 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/pinfm-foundation-model/</guid><description>&lt;p>The paper &amp;ldquo;PinFM: Foundation Model for User Activity Sequences at a Billion-Scale Visual Discovery Platform&amp;rdquo; introduces a $>$20B-parameter transformer pretrained on Pinterest user interaction sequences. Its goal is to build a universal sequence model applicable to various recommendation tasks, including content ranking, related Pins, and personalized feeds.&lt;/p>
&lt;h1 id="background-and-motivation">Background and Motivation&lt;/h1>
&lt;p>Traditional recommendation systems rely on specialized models for each task. The explosion of data volume and signal diversity calls for a generalized pretraining–finetuning paradigm. PinFM was developed to:&lt;/p></description></item><item><title>GradNetOT: Learning Optimal Transport Maps with GradNets</title><link>https://mllog.dev/en/posts/gradnetot-method/</link><pubDate>Sat, 19 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/gradnetot-method/</guid><description>&lt;p>Optimal Transport (OT) is the mathematical problem of moving “mass” from one distribution to another in the most efficient way. Think of reshaping a pile of sand into a new shape with minimal effort. GradNetOT is a novel machine‑learning method that learns exactly these efficient maps using neural networks equipped with a built‑in “bias” toward physically correct solutions.&lt;/p>
&lt;h1 id="what-is-optimal-transport">What Is Optimal Transport?&lt;/h1>
&lt;ul>
&lt;li>&lt;strong>Classic formulation&lt;/strong>: Given two probability distributions (e.g., piles of sand and holes to fill), find a mapping that moves mass at minimal total cost.&lt;/li>
&lt;li>&lt;strong>Monge’s theorem&lt;/strong>: For certain costs (like squared distance), the optimal map is the gradient of a convex function satisfying a Monge–Ampère equation.&lt;/li>
&lt;/ul>
&lt;h1 id="the-gradnetot-approach">The GradNetOT Approach&lt;/h1>
&lt;p>GradNetOT leverages a special neural network architecture called a &lt;strong>Monotone Gradient Network&lt;/strong> (mGradNet) to represent convex functions implicitly. By enforcing convexity and monotonicity, the network’s output gradient automatically yields a valid OT map.&lt;/p></description></item><item><title>Unstable Power: How Sharpness Drives Deep Network Learning</title><link>https://mllog.dev/en/posts/learning-on-the-border-ntk/</link><pubDate>Fri, 18 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/learning-on-the-border-ntk/</guid><description>&lt;p>The paper &lt;strong>“Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability”&lt;/strong> by Kaiqi Jiang, Jeremy Cohen, and Yuanzhi Li explores how the &lt;strong>Neural Tangent Kernel (NTK)&lt;/strong> evolves during deep network training, especially under the &lt;strong>Edge of Stability (EoS)&lt;/strong> regime.&lt;/p>
&lt;h3 id="what-is-the-ntk">What is the NTK?&lt;/h3>
&lt;ul>
&lt;li>The &lt;strong>Neural Tangent Kernel (NTK)&lt;/strong> is a matrix that captures how tiny weight changes affect network outputs on each training example.&lt;/li>
&lt;li>It lets us analyze neural networks with tools from kernel methods, offering theoretical insights into learning dynamics.&lt;/li>
&lt;/ul>
&lt;h3 id="what-is-the-edge-of-stability">What is the Edge of Stability?&lt;/h3>
&lt;ul>
&lt;li>When training with a large learning rate $\eta$, the largest eigenvalue of the NTK (or the loss Hessian) exceeds the stability threshold $2/\eta$ and then oscillates around it.&lt;/li>
&lt;li>This phenomenon, called &lt;strong>Edge of Stability&lt;/strong>, combines elements of instability with phases of rapid learning.&lt;/li>
&lt;/ul>
&lt;h3 id="key-findings">Key Findings&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Alignment Shift&lt;/strong>&lt;br>
Higher $\eta$ leads to stronger final &lt;strong>Kernel Target Alignment (KTA)&lt;/strong> between the NTK and the label vector $y$.&lt;/p></description></item><item><title>RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization</title><link>https://mllog.dev/en/posts/riemannlora-unified-riemannian-framework/</link><pubDate>Thu, 17 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/riemannlora-unified-riemannian-framework/</guid><description>&lt;p>In recent years, &lt;strong>Low‑Rank Adaptation&lt;/strong> (LoRA) has become a cornerstone technique for parameter‑efficient fine‑tuning of large language models (LLMs) and diffusion models. By injecting low‑rank matrices into pre-trained weights, LoRA drastically reduces memory and compute requirements, enabling rapid experimentation and deployment. However, practitioners face two persistent challenges:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Initialization ambiguity&lt;/strong>: Different low‑rank factor pairs $$A, B$$ can represent the same adapted weight update $AB^\top$, leading to unstable or suboptimal starts.&lt;/li>
&lt;li>&lt;strong>Redundant parameterization&lt;/strong>: Without a canonical representation, gradient updates can wander through equivalent parameter configurations.&lt;/li>
&lt;/ol>
&lt;p>The &lt;strong>RiemannLoRA&lt;/strong> framework, introduced by Bogachev &lt;em>et al.&lt;/em>, offers a unifying geometric viewpoint that removes these ambiguities and yields faster, more stable fine‑tuning.&lt;/p></description></item><item><title>A Neural Network Model of Complementary Learning Systems: Pattern Separation and Completion for Continual Learning</title><link>https://mllog.dev/en/posts/nn-model-of-complementary-learning-systems/</link><pubDate>Wed, 16 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/nn-model-of-complementary-learning-systems/</guid><description>&lt;p>Standard neural networks often suffer from &lt;em>catastrophic forgetting&lt;/em>, where learning new tasks degrades performance on previously learned tasks. In contrast, the human brain integrates new and old memories through two complementary memory systems: the hippocampus and neocortex.&lt;/p>
&lt;h2 id="1-objectives">1. Objectives&lt;/h2>
&lt;p>The authors aim to build a model that captures:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Pattern separation&lt;/strong>: distinct encoding of similar experiences,&lt;/li>
&lt;li>&lt;strong>Pattern completion&lt;/strong>: reconstructing full representations from partial inputs,&lt;/li>
&lt;/ul>
&lt;p>to support continual learning without loss of previously acquired skills.&lt;/p></description></item><item><title>Target Polish: How to Polish Data and Reveal Its True Structure</title><link>https://mllog.dev/en/posts/target-polish/</link><pubDate>Tue, 15 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/target-polish/</guid><description>&lt;p>Imagine you&amp;rsquo;re analyzing sensor data. Suddenly one sensor shows -999°C. That&amp;rsquo;s an &lt;em>outlier&lt;/em> — a single data point that can completely ruin your analysis.&lt;/p>
&lt;h2 id="-what-is-factorization">🧩 What is factorization?&lt;/h2>
&lt;p>Matrix factorization means decomposing data $X$ into two non-negative components:
&lt;/p>
$$
X \approx WH
$$&lt;p>Where $W$ contains “features” and $H$ shows how much of each is needed.&lt;/p>
&lt;h2 id="-the-problem">💡 The problem&lt;/h2>
&lt;p>Classical methods like NMF are sensitive to noise and outliers. When data is messy, analysis breaks down.&lt;/p></description></item><item><title>Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning</title><link>https://mllog.dev/en/posts/optimistic-exploration-risk-averse-crl/</link><pubDate>Mon, 14 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/optimistic-exploration-risk-averse-crl/</guid><description>&lt;p>Reinforcement Learning (RL) has revolutionized how agents learn to act in complex environments. But what happens when an agent can’t afford to make mistakes—because a mistake means a car crash, system failure, or energy limit violation?&lt;/p>
&lt;p>In such cases, we turn to &lt;strong>Constrained Reinforcement Learning (CRL)&lt;/strong>, where agents aim to &lt;strong>maximize reward while staying within safety or cost constraints&lt;/strong>. Unfortunately, current CRL methods often become&amp;hellip; too cautious, leading to poor performance.&lt;/p></description></item><item><title>Not Just Bigger Models: Why AI Should See Better Instead of Just Scaling</title><link>https://mllog.dev/en/posts/adaptive-sensing/</link><pubDate>Sun, 13 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/adaptive-sensing/</guid><description>&lt;p>In recent years, AI progress has been largely defined by size: bigger models, bigger datasets, bigger compute budgets. GPT-4, Claude, Gemini – each new model pushes the limits further. But is bigger always better?&lt;/p>
&lt;p>A group of researchers (Baek, Park, Ko, Oh, Gong, Kim) argue in their recent paper &lt;strong>&amp;quot;AI Should Sense Better, Not Just Scale Bigger&amp;quot;&lt;/strong> (arXiv:2507.07820) that we’ve hit diminishing returns. Instead of growing endlessly, they propose a new focus: &lt;strong>adaptive sensing&lt;/strong>.&lt;/p></description></item><item><title>HGMP: Revolutionizing Complex Graph Analysis with Prompt Learning</title><link>https://mllog.dev/en/posts/hgmp-revolution-graph-prompt-learning/</link><pubDate>Sat, 12 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/hgmp-revolution-graph-prompt-learning/</guid><description>&lt;p>In the era dominated by language models and machine learning, the importance of structured data is growing rapidly: social networks, biological relationships, and business connections. This data is represented in the form of graphs, which are often not homogeneous: they contain nodes of different types (e.g., people, products, companies) and different types of edges (e.g., “purchased”, “recommended”, “works at”). Processing such &lt;strong>heterogeneous graphs&lt;/strong> requires specialized methods.&lt;/p>
&lt;hr>
&lt;h3 id="what-are-heterogeneous-graphs">What are heterogeneous graphs?&lt;/h3>
&lt;p>A &lt;strong>heterogeneous graph&lt;/strong> is a structure in which:&lt;/p></description></item><item><title>Predicting and Generating Antibiotics Against Future Pathogens with ApexOracle</title><link>https://mllog.dev/en/posts/apexoracle-forecast-antibiotics/</link><pubDate>Fri, 11 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/apexoracle-forecast-antibiotics/</guid><description>&lt;p>The accelerating crisis of antimicrobial resistance (AMR) demands new computational methods to stay ahead of evolving pathogens. ApexOracle is a unified ML platform designed to both predict the activity of candidate compounds against specific bacterial strains and generate novel molecules de novo, proactively targeting future superbugs.&lt;/p>
&lt;h2 id="motivation-and-scope">Motivation and Scope&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Global Impact&lt;/strong>: AMR contributes to nearly 5 million deaths annually.&lt;/li>
&lt;li>&lt;strong>Traditional Challenges&lt;/strong>: Standard drug discovery pipelines are slow, resource-intensive, and reactive.&lt;/li>
&lt;li>&lt;strong>ApexOracle Goal&lt;/strong>: Integrate genomic context and molecular design into one end-to-end framework.&lt;/li>
&lt;/ul>
&lt;h2 id="apexoracle-architecture">ApexOracle Architecture&lt;/h2>
&lt;p>&lt;em>Layman’s Explanation: Imagine you have three sets of clues: the code of the bacteria (its genome), a simple description of its behaviors (like a basic fact sheet), and the building blocks of a potential drug (a molecular recipe). ApexOracle acts like a super-smart detective that reads all three clues at once. It combines them, figures out which molecules might work best, and even drafts entirely new molecular recipes that could stop the bacteria in its tracks.&lt;/em>&lt;/p></description></item><item><title>HeLo – A New Path for Multimodal Emotion Recognition</title><link>https://mllog.dev/en/posts/helo-multimodal-emotion-recognition/</link><pubDate>Thu, 10 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/helo-multimodal-emotion-recognition/</guid><description>&lt;p>Modern emotion-recognition systems increasingly leverage data from multiple sources—ranging from physiological signals (e.g., heart rate, skin conductance) to facial video. The goal is to capture the richness of human feelings, where multiple emotions often co-occur. Traditional approaches, however, focused on &lt;strong>single-label&lt;/strong> classification (e.g., “happy” or “sad”).&lt;/p>
&lt;p>The paper &lt;strong>“HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning”&lt;/strong> introduces an entirely new paradigm: &lt;strong>emotion distribution learning&lt;/strong>, where the model predicts the probability of each basic emotion being present.&lt;/p></description></item><item><title>Modern Methods in Associative Memory</title><link>https://mllog.dev/en/posts/modern-methods-in-associative-memory/</link><pubDate>Wed, 09 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/modern-methods-in-associative-memory/</guid><description>&lt;p>Associative memory is the ability to store patterns and retrieve them when presented with partial or noisy inputs. Inspired by how the human brain recalls memories, associative memory models are recurrent neural networks that converge to stored patterns over time. The tutorial &amp;lsquo;Modern Methods in Associative Memory&amp;rsquo; by Krotov et al. offers an accessible overview for newcomers and a rigorous mathematical treatment for experts, bridging classical ideas with cutting-edge developments in deep learning.&lt;/p></description></item><item><title>QuEst: Blending Data and Predictions for Robust Quantile Estimation</title><link>https://mllog.dev/en/posts/quest-robust-quantile-estimation/</link><pubDate>Tue, 08 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/quest-robust-quantile-estimation/</guid><description>&lt;p>&lt;img alt="Quest paper" loading="lazy" src="https://mllog.dev/images/quest.png">&lt;/p>
&lt;hr>
&lt;p>Imagine you track your morning commute times by recording 50 real-world trips with your GPS-enabled phone. You also run a traffic simulator to generate 5,000 possible commute scenarios. You want a reliable estimate of the 95th percentile of commute time—the duration you won’t exceed 95% of the days. Using only your 50 recorded trips yields a wide confidence interval. Using only the simulator risks systematic biases: it might ignore sudden road closures or special events.&lt;/p></description></item><item><title>RetrySQL: Self-Correcting Query Generation</title><link>https://mllog.dev/en/posts/retrysql-self-correction-in-sql-generation/</link><pubDate>Mon, 07 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/retrysql-self-correction-in-sql-generation/</guid><description>&lt;p>&lt;img alt="RetrySql paper" loading="lazy" src="https://mllog.dev/images/text_sql.png">&lt;/p>
&lt;hr>
&lt;p>The &lt;strong>text-to-SQL&lt;/strong> task involves converting natural language questions into executable SQL queries on a relational database. While modern large language models (LLMs) excel at many generative tasks, generating correct and complex SQL queries remains challenging. In the paper &lt;em>RetrySQL: text-to-SQL training with retry data for self-correcting query generation&lt;/em>, the authors introduce a training paradigm that teaches the model to self-monitor and correct its reasoning steps during generation, rather than relying solely on post-processing modules.&lt;/p></description></item><item><title>How Modern Information Theory Helps Diagnose Mental Disorders – MvHo‑IB in Action</title><link>https://mllog.dev/en/posts/mvho-ib-brain-disorder-diagnosis/</link><pubDate>Sun, 06 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/mvho-ib-brain-disorder-diagnosis/</guid><description>&lt;p>Diagnosing mental disorders such as autism, depression, or schizophrenia goes beyond taking simple brain images. &lt;strong>Resting-state fMRI&lt;/strong> (rs-fMRI) observes brain activity while at rest, revealing which regions activate simultaneously. This forms the basis for &lt;strong>functional connectivity&lt;/strong>.&lt;/p>
&lt;p>Traditional studies have used &lt;strong>graphs and neural networks&lt;/strong>, but they mostly focus on &lt;strong>pairwise interactions&lt;/strong> — asking “do regions A and B co-activate?” But what about &lt;strong>higher-order relationships&lt;/strong>, like among regions A, B, and C all at once?&lt;/p></description></item><item><title>Multi-level Stepwise Hints in Reinforcement Learning</title><link>https://mllog.dev/en/posts/multi-level-step-by-stepguidelines/</link><pubDate>Sat, 05 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/multi-level-step-by-stepguidelines/</guid><description>&lt;p>Reinforcement Learning (RL) enables agents to learn behaviors through reward signals. However, in tasks requiring long chains of reasoning, two main challenges arise:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>The near-miss problem&lt;/strong> – a single mistake at the end can invalidate the entire reasoning chain.&lt;/li>
&lt;li>&lt;strong>Exploration stagnation&lt;/strong> – the agent repeatedly follows known paths without discovering new strategies.&lt;/li>
&lt;/ol>
&lt;p>The paper &lt;strong>StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason&lt;/strong> introduces &lt;em>StepHint&lt;/em>, a method that provides agents with multi-level hints to support both beginners and advanced users.&lt;/p></description></item><item><title>How to Predict Scooter Demand? XGBoost and Urban Micromobility</title><link>https://mllog.dev/en/posts/blog_xgboost/</link><pubDate>Fri, 04 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/blog_xgboost/</guid><description>&lt;h3 id="can-we-predict-when-and-where-people-will-rent-electric-scooters-yes--and-with-impressive-accuracy-a-recent-publication-shows-how-advanced-algorithms-like-xgboost-can-revolutionize-the-management-of-micromobility-in-cities">Can we predict when and where people will rent electric scooters? Yes — and with impressive accuracy. A recent publication shows how advanced algorithms like &lt;strong>XGBoost&lt;/strong> can revolutionize the management of micromobility in cities.&lt;/h3>
&lt;h2 id="-context-micromobility-and-demand">🌍 Context: Micromobility and Demand&lt;/h2>
&lt;p>In many cities, dockless electric scooters have become a daily transport option. But for operators, a crucial question remains:&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Where and when will people want to rent a scooter?&lt;/strong>&lt;/p>&lt;/blockquote>
&lt;p>Too many vehicles in one location is wasteful. Too few — lost revenue and frustrated users. That&amp;rsquo;s why accurately predicting demand is so important.&lt;/p></description></item><item><title>Ghost Nodes: A Trick That Makes Neural Networks Learn Smarter</title><link>https://mllog.dev/en/posts/xgboost-how-to-predict-demand/</link><pubDate>Thu, 03 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/xgboost-how-to-predict-demand/</guid><description>&lt;p>When we train deep neural networks, they often get stuck — not in a bad result, but in a &amp;ldquo;flat region&amp;rdquo; of the loss landscape. The authors of this paper introduce &lt;strong>ghost nodes&lt;/strong>: extra, fake output nodes that aren&amp;rsquo;t real classes, but help the model explore better paths during training.&lt;/p>
&lt;p>Imagine you&amp;rsquo;re rolling a ball into a valley. Sometimes the valley floor is flat and the ball slows down. Ghost nodes are like adding new dimensions to the terrain — giving the ball more freedom to move and find a better path.&lt;/p></description></item><item><title>Does artificial intelligence really understand math? Let's find out what it says... data audit?</title><link>https://mllog.dev/en/posts/math_datasheets/</link><pubDate>Tue, 01 Jul 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/math_datasheets/</guid><description>&lt;p>Large‑scale epidemic modeling is a key tool for public health—but it often requires &lt;strong>sensitive data&lt;/strong> (e.g., hospital admissions, financial records, mobility).&lt;br>
A recent paper, &lt;em>“A Framework for Multi‑source Privacy Preserving Epidemic Analysis”&lt;/em> (June 27, 2025), introduces a hybrid neural‑mechanistic model that respects &lt;strong>Differential Privacy (DP)&lt;/strong>. This means we can use private data &lt;strong>without compromising individuals’ privacy&lt;/strong>.&lt;/p>
&lt;hr>
&lt;h2 id="-why-it-matters">🌍 Why It Matters&lt;/h2>
&lt;ul>
&lt;li>🚑 Accurate predictions help allocate resources (like vaccines, ICU beds).&lt;/li>
&lt;li>🕵️‍♂️ But using private data poses a &lt;strong>privacy risk&lt;/strong>.&lt;/li>
&lt;li>🔐 &lt;strong>Differential Privacy (DP)&lt;/strong> adds controlled randomness—protecting individuals at a formal, mathematical level.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="-inside-the-framework-neural--mechanistic">🧠 Inside the Framework: Neural + Mechanistic&lt;/h2>
&lt;p>The model is a &lt;em>hybrid system&lt;/em> combining:&lt;/p></description></item><item><title>Unbreakable in the Face of Adversity: ARMOR – Resilient UAV Control</title><link>https://mllog.dev/en/posts/blog_armor/</link><pubDate>Mon, 30 Jun 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/blog_armor/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Unmanned Aerial Vehicles (UAVs) play pivotal roles today in photography, deliveries, rescue missions, border surveillance, and military operations. However, the growing availability of signal disruption tools (GPS spoofing, gyroscope jamming, magnetometer manipulation) poses significant threats to autonomous systems. Even a slight navigational drift can turn a mission into a disaster.&lt;/p>
&lt;h2 id="why-physical-attack-robustness-matters">Why Physical-Attack Robustness Matters&lt;/h2>
&lt;p>Traditional safe RL methods or adversarial trainings rely on known attack scenarios. In practice, it’s impossible to anticipate every possible manipulation—an adversary could employ novel jamming or optical disruption techniques. Iterative adversarial training is computationally expensive and often poorly generalizes to unseen scenarios.&lt;/p></description></item><item><title>Mind2Web 2: A new era of “agent-based” web search</title><link>https://mllog.dev/en/posts/mind2web-agent-based-web-search/</link><pubDate>Sun, 29 Jun 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/mind2web-agent-based-web-search/</guid><description>&lt;h3 id="-mind2web-2-evaluating-agentic-search-with-agent-as-a-judge">🧠 Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge&lt;/h3>
&lt;p>Agentic Search is one of the most promising applications of modern AI. Imagine a virtual assistant that doesn&amp;rsquo;t just look up information for you but can autonomously search the web, navigate pages, find facts, and return well-structured answers &lt;strong>with citations&lt;/strong>. That’s the idea behind tools like OpenAI&amp;rsquo;s Deep Research.&lt;/p>
&lt;p>However, how do we &lt;strong>evaluate&lt;/strong> if such an AI is doing a good job?&lt;/p></description></item><item><title>A Machine That Discovers the Laws of Physics: How H-FEX Works and Why It Matters</title><link>https://mllog.dev/en/posts/hfex-machine-that-discovers-the-laws-of-physics/</link><pubDate>Sat, 28 Jun 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/hfex-machine-that-discovers-the-laws-of-physics/</guid><description>&lt;p>&lt;img alt="Bandit Learning" loading="lazy" src="https://mllog.dev/images/hfex_article.png">&lt;/p>
&lt;hr>
&lt;p>Can a machine discover the laws of physics by itself—like Newton, but without the apple and without writing the equation by hand?&lt;/p>
&lt;p>In June 2025, a new method called &lt;strong>H-FEX&lt;/strong> (&lt;em>Hamiltonian Finite Expression&lt;/em>) was published. It doesn’t just predict system behavior—it &lt;strong>writes down the math&lt;/strong> behind it. And crucially, in a form humans can understand.&lt;/p>
&lt;p>It’s a form of &lt;strong>symbolic learning&lt;/strong>, increasingly popular over black-box neural networks that work, but don’t tell us why.&lt;/p></description></item><item><title>When the Bandit Is Stronger Than Your Model – On the Limits of Exploratory Learning</title><link>https://mllog.dev/en/posts/bandit_learning_artykul/</link><pubDate>Fri, 27 Jun 2025 00:00:00 +0000</pubDate><guid>https://mllog.dev/en/posts/bandit_learning_artykul/</guid><description>&lt;p>&lt;img alt="Bandit Learning" loading="lazy" src="https://mllog.dev/images/bandit.png">&lt;/p>
&lt;hr>
&lt;p>Imagine having to choose the best ad variant, but each time you only learn how many users clicked on the one you showed. This is the essence of bandit learning: it balances exploration (trying out new options) with exploitation (using the current best) to discover the winner as quickly as possible. In a world where every experiment has a cost—from ad budgets to a patient’s time in experimental therapy—bandit algorithms can significantly accelerate optimal decision-making. Yet, despite their practical power, these solutions are surprisingly hard to analyze theoretically!&lt;/p></description></item></channel></rss>