Andrej Karpathy's new open source 'autoresearch' lets you run hundreds of AI experiments a night — with revolutionary implications
Source: VentureBeat
Over the weekend, Andrej Karpathy—the influential former Tesla AI lead, co‑founder and former member of OpenAI who coined the term “vibe coding”—posted on X about his new open‑source project, autoresearch.
It wasn’t a finished model or a massive corporate product: by his own admission it is a simple, 630‑line script made available on GitHub under a permissive, enterprise‑friendly MIT License. But the ambition was massive: automating the scientific method with AI agents while we humans sleep.
“The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement,” he stated on X.
How the system works
- Autonomous optimization loop – An AI agent receives a training script and a fixed compute budget (typically 5 minutes on a GPU).
- The agent reads its own source code, forms a hypothesis for improvement (e.g., changing a learning rate or architecture depth), modifies the code, runs the experiment, and evaluates the results.
- If the validation loss—measured in bits per byte (val_bpb)—improves, the change is kept; otherwise it is reverted and the agent tries again.
In one overnight run, Karpathy’s agent completed 126 experiments, driving loss down from 0.9979 → 0.9697.
“Seeing the agent do this entire workflow end‑to‑end and all by itself… is wild,” Karpathy remarked, noting that the agent caught oversights in attention scaling and regularization that he had missed manually over two decades of work.
Recent results
- After leaving the agent to tune a
depth=12model for two days, it performed ≈ 700 autonomous changes. - The agent discovered ≈ 20 additive improvements that transferred perfectly to larger models.
- Stacking these changes dropped the “Time to GPT‑2” metric on the leaderboard from 2.02 h → 1.80 h (an 11 % efficiency gain) on a project Karpathy believed was already well‑tuned.
This is more than a productivity hack; it is a fundamental shift in how intelligence is refined. By automating the scientific method for code, Karpathy has turned machine learning into an evolutionary process that runs at the speed of silicon rather than the speed of human thought.
The broader AI/ML community on X quickly recognized that this process could be applied far beyond computer science—to marketing, health, and basically anything that requires research.
Autoresearch spreads far and wide
The reaction was swift and viral, with Karpathy’s post garnering > 8.6 M views in two days as builders and researchers scrambled to scale the “Karpathy loop”.
Hyperspace AI
-
Varun Mathur, CEO of AI‑tool aggregator Hyperspace AI, distributed the single‑agent loop across a peer‑to‑peer network. Every node running the Hyperspace agent became an autonomous researcher.
-
Night of March 8–9: 35 autonomous agents on the Hyperspace network ran 333 experiments completely unsupervised. The results were a masterclass in emergent strategy:
| Observation | Details |
|---|---|
| Hardware Diversity as a Feature | H100 GPUs used “brute force” to find aggressive learning rates, while CPU‑only agents on laptops were forced to be clever, focusing on initialization strategies (Kaiming, Xavier) and normalization choices. |
| Gossip‑Based Discovery | Using the GossipSub protocol, agents shared wins in real‑time. When one agent found that Kaiming initialization dropped loss by 21 %, the idea spread like a digital virus; within hours, 23 other agents incorporated the discovery. |
| Compression of History | In just 17 hours, agents independently rediscovered ML milestones—RMSNorm, tied embeddings, etc.—that took human labs (Google Brain, OpenAI) nearly eight years to formalize. |
Run 36,500 marketing experiments each year instead of 30
While ML purists focused on loss curves, the business world saw a different revolution.
Eric Siu, founder of ad agency Single Grain, applied autoresearch to the “Experiment Loop” of marketing:
“Most marketing teams run ~30 experiments a year. The next generation will run 36,500+. Easily.”
“They’ll run experiments while they sleep.”
Siu’s framework
- Replace the training script with a marketing asset (landing page, ad creative, cold email).
- The agent modifies a variable (subject line, CTA), deploys it, measures the positive reply rate, and keeps or discards the change.
Siu argues this creates a “proprietary map” of what resonates with a specific audience—a moat built not of code, but of experiment history.
“The companies that win won’t have better marketers; they’ll have faster experiment loops.” – Siu
Community discussion and “spoiling” the validation set
Despite the fervor, the GitHub Discussions revealed a community grappling with the implications of such rapid, automated progress.
| Concern | Comment |
|---|---|
| The Over‑Optimization Trap | Researcher alexisthual: “Aren’t you concerned that launching that many experiments will eventually ‘spoil’ the validation set?” The fear is that agents may over‑fit to quirks of the test data rather than achieve general intelligence. |
| The Meaning of the Gains | User samionb: “Is a drop from 0.9979 → 0.9697 truly noticeable?” |
| Karpathy’s reply: “All we’re doing is optimizing performance per compute… these are real and substantial gains.” | |
| The Human Element | On X, user witcheer, Head of Growth at crypto platform Yari Finance, documented … (the discussion continues in the thread) |
The above markdown has been cleaned up for readability while preserving the original structure and content.
Autoresearch Insights
“the model got better by getting simpler”
The team ran their own overnight experiment on a Mac Mini M4. Out of 35 trials, 26 failed or crashed, but the seven successful runs revealed that the model improved as it became simpler. This insight—less is often more—was reached without any human intervention.
The Future: Curiosity as the Bottleneck
The release of autoresearch points toward a future where, thanks to straightforward AI instruction mechanisms, the human role shifts from experimenter to experimental designer.
As tools like DarkMatter, Optimization Arena, and NanoClaw emerge to support this swarm, the bottleneck of AI progress is no longer the “meat computer” (Karpathy’s term for the human brain) — its ability to code—but our ability to define the constraints of the search.
Andrej Karpathy has once again shifted the vibe. We are no longer just coding models; we are seeding ecosystems that learn while we sleep.