Andrej Karpathy's new open source 'autoresearch' lets you run hundreds of AI experiments a night — with revolutionary implications

Published: 1 hour ago (March 9, 2026 at 08:13 PM EDT)

6 min read

Source: VentureBeat

Over the weekend, Andrej Karpathy—the influential former Tesla AI lead, co‑founder and former member of OpenAI who coined the term “vibe coding”—posted on X about his new open‑source project, autoresearch.

It wasn’t a finished model or a massive corporate product: by his own admission it is a simple, 630‑line script made available on GitHub under a permissive, enterprise‑friendly MIT License. But the ambition was massive: automating the scientific method with AI agents while we humans sleep.

“The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement,” he stated on X.

How the system works

Autonomous optimization loop – An AI agent receives a training script and a fixed compute budget (typically 5 minutes on a GPU).
The agent reads its own source code, forms a hypothesis for improvement (e.g., changing a learning rate or architecture depth), modifies the code, runs the experiment, and evaluates the results.
If the validation loss—measured in bits per byte (val_bpb)—improves, the change is kept; otherwise it is reverted and the agent tries again.

In one overnight run, Karpathy’s agent completed 126 experiments, driving loss down from 0.9979 → 0.9697.

“Seeing the agent do this entire workflow end‑to‑end and all by itself… is wild,” Karpathy remarked, noting that the agent caught oversights in attention scaling and regularization that he had missed manually over two decades of work.

Recent results

After leaving the agent to tune a depth=12 model for two days, it performed ≈ 700 autonomous changes.
The agent discovered ≈ 20 additive improvements that transferred perfectly to larger models.
Stacking these changes dropped the “Time to GPT‑2” metric on the leaderboard from 2.02 h → 1.80 h (an 11 % efficiency gain) on a project Karpathy believed was already well‑tuned.

This is more than a productivity hack; it is a fundamental shift in how intelligence is refined. By automating the scientific method for code, Karpathy has turned machine learning into an evolutionary process that runs at the speed of silicon rather than the speed of human thought.

The broader AI/ML community on X quickly recognized that this process could be applied far beyond computer science—to marketing, health, and basically anything that requires research.

Autoresearch spreads far and wide

The reaction was swift and viral, with Karpathy’s post garnering > 8.6 M views in two days as builders and researchers scrambled to scale the “Karpathy loop”.

Hyperspace AI

Varun Mathur, CEO of AI‑tool aggregator Hyperspace AI, distributed the single‑agent loop across a peer‑to‑peer network. Every node running the Hyperspace agent became an autonomous researcher.
Night of March 8–9: 35 autonomous agents on the Hyperspace network ran 333 experiments completely unsupervised. The results were a masterclass in emergent strategy:

Observation	Details
Hardware Diversity as a Feature	H100 GPUs used “brute force” to find aggressive learning rates, while CPU‑only agents on laptops were forced to be clever, focusing on initialization strategies (Kaiming, Xavier) and normalization choices.
Gossip‑Based Discovery	Using the GossipSub protocol, agents shared wins in real‑time. When one agent found that Kaiming initialization dropped loss by 21 %, the idea spread like a digital virus; within hours, 23 other agents incorporated the discovery.
Compression of History	In just 17 hours, agents independently rediscovered ML milestones—RMSNorm, tied embeddings, etc.—that took human labs (Google Brain, OpenAI) nearly eight years to formalize.

Run 36,500 marketing experiments each year instead of 30

While ML purists focused on loss curves, the business world saw a different revolution.

Eric Siu, founder of ad agency Single Grain, applied autoresearch to the “Experiment Loop” of marketing:

“Most marketing teams run ~30 experiments a year. The next generation will run 36,500+. Easily.”

“They’ll run experiments while they sleep.”

Siu’s framework

Replace the training script with a marketing asset (landing page, ad creative, cold email).
The agent modifies a variable (subject line, CTA), deploys it, measures the positive reply rate, and keeps or discards the change.

Siu argues this creates a “proprietary map” of what resonates with a specific audience—a moat built not of code, but of experiment history.

“The companies that win won’t have better marketers; they’ll have faster experiment loops.” – Siu

Community discussion and “spoiling” the validation set

Despite the fervor, the GitHub Discussions revealed a community grappling with the implications of such rapid, automated progress.

Concern	Comment
The Over‑Optimization Trap	Researcher alexisthual: “Aren’t you concerned that launching that many experiments will eventually ‘spoil’ the validation set?” The fear is that agents may over‑fit to quirks of the test data rather than achieve general intelligence.
The Meaning of the Gains	User samionb: “Is a drop from 0.9979 → 0.9697 truly noticeable?”
Karpathy’s reply: “All we’re doing is optimizing performance per compute… these are real and substantial gains.”
The Human Element	On X, user witcheer, Head of Growth at crypto platform Yari Finance, documented … (the discussion continues in the thread)

The above markdown has been cleaned up for readability while preserving the original structure and content.

Autoresearch Insights

“the model got better by getting simpler”

The team ran their own overnight experiment on a Mac Mini M4. Out of 35 trials, 26 failed or crashed, but the seven successful runs revealed that the model improved as it became simpler. This insight—less is often more—was reached without any human intervention.

The Future: Curiosity as the Bottleneck

The release of autoresearch points toward a future where, thanks to straightforward AI instruction mechanisms, the human role shifts from experimenter to experimental designer.

As tools like DarkMatter, Optimization Arena, and NanoClaw emerge to support this swarm, the bottleneck of AI progress is no longer the “meat computer” (Karpathy’s term for the human brain) — its ability to code—but our ability to define the constraints of the search.

Andrej Karpathy has once again shifted the vibe. We are no longer just coding models; we are seeding ecosystems that learn while we sleep.

Andrej Karpathy's new open source 'autoresearch' lets you run hundreds of AI experiments a night — with revolutionary implications

How the system works

Recent results

Autoresearch spreads far and wide

Hyperspace AI

Run 36,500 marketing experiments each year instead of 30

Siu’s framework

Community discussion and “spoiling” the validation set

Autoresearch Insights

The Future: Curiosity as the Bottleneck

Related posts

How to make your e-commerce product visible to AI agents? Use this new system trusted by L’Oréal, Unilever, Mars & Beiersdorf

Microsoft says ungoverned AI agents could become corporate 'double agents.' Its fix costs $99 a month.

The limits of bubble thinking: How AI breaks every historical analogy

Enterprise agentic AI requires a process layer most companies haven’t built