Karpathy’s Autoresearch Runs 700 AI Experiments Autonomously

alex2404
By
Disclosure: This website may contain affiliate links, which means I may earn a commission if you click on the link and make a purchase. I only recommend products or services that I personally use and believe will add value to my readers. Your support is appreciated!

Autonomous AI agents capable of running experiments without human supervision have been an aspiration across multiple research disciplines. Andrej Karpathy‘s weekend release of a 630-line open source script called autoresearch moves that aspiration into demonstrable practice.

According to the announcement, Karpathy — the former Tesla AI lead and OpenAI co-founder — published the project on GitHub under an MIT License with a plainly stated objective: “The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement.” The script is not a finished product. It is a loop — an AI agent reads its own training code, forms a hypothesis, modifies the code, runs the experiment within a fixed compute budget (typically five minutes on a GPU), and evaluates whether validation loss improved. A better result is kept; a worse one is discarded and the process restarts.

The early numbers are concrete. In one overnight run, the agent completed 126 experiments, reducing validation loss — measured in bits per byte — from 0.9979 to 0.9697. Over two days of tuning a depth-12 model, the agent processed approximately 700 autonomous changes, identified roughly 20 additive improvements that transferred to larger models, and dropped the “Time to GPT-2” leaderboard metric from 2.02 hours to 1.80 hours — an 11% efficiency gain on a project Karpathy considered already well-optimized. “Seeing the agent do this entire workflow end-to-end and all by itself… is wild,” he wrote, noting the agent caught oversights in attention scaling and regularization that he had missed manually over two decades of work.

Distributed Networks and Emergent Strategy

The broader machine learning community moved quickly. Karpathy’s original post drew more than 8.6 million views within two days. Varun Mathur, CEO of Hyperspace AI, distributed the single-agent loop across a peer-to-peer network. On the night of March 8–9, 35 autonomous agents on the Hyperspace network ran 333 experiments without human oversight.

The results produced a notable pattern of emergent behavior. Hardware constraints shaped research strategy: agents running on H100 GPUs gravitated toward aggressive learning rates, while CPU-only agents on laptops — lacking raw throughput — focused on initialization strategies such as Kaiming and Xavier initialization, as well as normalization choices. When one agent found that Kaiming initialization reduced loss by 21%, the discovery propagated across the network via the GossipSub protocol. Within hours, 23 other agents had incorporated it. The report states that in 17 hours, these agents independently rediscovered ML techniques — including RMSNorm and tied embeddings — that took human researchers at labs including Google Brain and OpenAI nearly eight years to formalize.

Beyond Machine Learning

The implications the community drew extended well outside model training. Eric Siu, founder of ad agency Single Grain, applied the autoresearch framework to marketing experimentation. “Most marketing teams run ~30 experiments a year,” Siu wrote. “The next generation will run 36,500+. Easily.” The arithmetic is straightforward: an autonomous loop that runs overnight converts calendar time into experimental throughput, with no change in headcount required.

Karpathy’s script does not claim to solve research problems autonomously — it automates the iteration cycle around a fixed objective function. What the early deployments suggest is that the same loop structure is portable across any domain where experiments can be defined, run, and evaluated against a measurable outcome.

Photo by Pavel Danilyuk on Pexels

This article is a curated summary based on third-party sources. Source: Read the original article

Share This Article