Agents

You can’t fine-tune GPT-5.5. You can’t fine-tune Claude. You can’t fine-tune most of the models you actually deploy in production. Yet somehow, we expect these frozen models to handle spreadsheet automation, mathematical olympiads, and multi-step search tasks - all from a hand-written system prompt. The paper “SkillOpt: Executive Strategy for Self-Evolving Agent Skills” (arXiv 2605.23904, May 2026) asks: what if the system prompt itself was the trainable parameter? What if we applied the full discipline of deep learning - learning rates, validation splits, negative feedback - to a natural-language document instead of model weights? The result: SkillOpt wins or ties on all 52 evaluated (model, benchmark, harness) cells, achieving gains of up to +39 absolute points on procedural benchmarks and producing compact skill files of just 300-2,000 tokens that transfer across models, harnesses, and benchmarks. ...