brain/
conceptartificial-intelligence

AutoResearch and the recursive-self-improvement loop

Notes

AutoResearch and the recursive-self-improvement loop

Vintage: March 2026. Primary source for this page is the 2026-03-20-no-priors-andrej-karpathy-skill-issue-code-agents interview, where Karpathy first-person describes his AutoResearch project as actively working on his own nanochat model. The framing is fresh — but recursive-self-improvement at the frontier-lab scale is one of the fastest-moving capability surfaces in AI, so re-validate against newer sources (any later Karpathy material, frontier-lab releases of similar tooling) before treating this as a stable picture of capability.

One-line summary: Karpathy's project (March 2026) where LLM-driven agents handle the experiment-design / hyperparameter-search / training / evaluation loop of LLM training itself, with the human only providing objectives and constraints. He reports it found nanochat tunings he missed manually after two decades of doing it by hand. Framed as a personal-scale prototype of what frontier labs (OpenAI, Anthropic) are doing institutionally to "automate themselves away." This is the most concrete instance the wiki has of a frontier practitioner actively demonstrating the AI-research-automates-AI-research piece of the AI-2027 / fast-takeoff thesis.

The framing

AutoResearch is Karpathy's name for taking a piece of AI research — in his demonstration case, training a small LLM (his nanochat repo) — and arranging it so an agent loop can run experiments autonomously:

  • Objective (target validation loss).
  • Metric (verifiable, automatable: did the training run hit the target?).
  • Boundaries (what the agent is and isn't allowed to do — search this set of hyperparameters, this set of architectural tweaks).
  • Compute (a pool of GPU time the agent can spend).
  • Then hit go and remove yourself from the loop.

The result: agents try things in parallel, find improvements, commit them. Each commit is cheap to verify (just run the training again and check the loss) even though the search over candidate commits is expensive. This makes it a natural fit for the same structural problem domain as protein-folding (Folding@home), distributed-cryptanalysis, and blockchain proof-of-work — expensive to find, cheap to verify.

Karpathy's headline result: nanochat was already "fairly well tuned" after two decades of his own manual experience training LLMs. He let AutoResearch run overnight. It came back with tunings he hadn't found — specifically the weight decay on the value embeddings and the Adam betas. These things "jointly interact", so finding the right combination is exactly the kind of search a manual practitioner can't exhaustively cover.

Why it matters to this thread

Three reasons this is high-leverage for the AI thread to track:

  1. Direct AI-2027 / recursive-self-improvement evidence. The bear case Karpathy was advancing in October 2025 (see ai-coding-agent-asymmetry-on-novel-code and agi-timeline-decade-of-agents) was that agents aren't good enough at "code that has never been written before" to automate frontier AI research. In March 2026, on his own model with his own twenty-year baseline, he reports the loop is now closing — agents found improvements he missed. The asymmetry framing has narrowed.

  2. Decentralized capability multiplier. Karpathy frames AutoResearch as having a structural property that lets it scale beyond a single lab: "a swarm of agents on the Internet could collaborate to improve LLMs and could potentially even run circles around Frontier Labs... Frontier Labs have a huge amount of trusted compute, but the Earth is much bigger and has huge amount of untrusted compute." If this works — and it's a real "if" — the standard "compute is the moat" framing weakens because untrusted compute participates via verifiable-commits-only.

  3. Frontier-lab operational mirror. Karpathy directly cites his time inside OpenAI: "I was like, you guys realize if we're successful, like we're all out of job. Like we're just building automation for Sam or something." The 1000+ researcher headcounts at OpenAI / Anthropic are, on his framing, "glorified auto[mation]" — meaning the frontier-lab structure is already pointed at automating its own research labor. AutoResearch is Karpathy's personal-scale prototype of what those labs are doing institutionally.

Evidence

Karpathy's first-person demonstration on nanochat (March 2026)

  • andrej-karpathy in 2026-03-20-no-priors-andrej-karpathy-skill-issue-code-agents (the framing): "to get the most out of the tools that have become available now, you have to remove yourself as the bottleneck. You can't be there to prompt the next thing. You need to take yourself outside. You have to arrange things such that they're completely autonomous... auto research is just, yeah, here's an objective, here's a metric, here's your boundaries of what you can and cannot do. And go."
  • andrej-karpathy in 2026-03-20-no-priors-andrej-karpathy-skill-issue-code-agents (the headline result): "I let autoresearch go for overnight and it came back with tunings that I didn't see. And yeah, I did forget the weight decay on the value embeddings and my atom betas were not sufficiently tuned. And these things jointly interact. So once you tune one thing, the other things have to potentially change too. I shouldn't be a bottleneck, I shouldn't be running these hyperparameters search optimizations. I shouldn't be looking at the results."
  • andrej-karpathy in 2026-03-20-no-priors-andrej-karpathy-skill-issue-code-agents (the 20-year-baseline framing that calibrates the result): "I have like two decades of like, oh, I've trained this model like thousands of times of like. So I've done a bunch of experiments, I've done hyperparameter tuning, I've done all the things I'm very used to and I've done for two decades and I've gotten to a certain point and I thought it was fairly well tuned. And then I let autoresearch go for overnight."

The verifiable-commits / search-vs-verify structure

  • andrej-karpathy in 2026-03-20-no-priors-andrej-karpathy-skill-issue-code-agents (the structural property): "if anyone gives you a candidate commit, it's very easy to verify that that commit is correct is good. Someone could claim from the Internet that this piece of code will optimize much better and give you much better performance. You could just check. Very easy, but probably a lot of work goes into that checking. But fundamentally they could lie and et cetera. So you're basically dealing with a similar kind of problem. It almost actually looks a little bit like my designs that incorporate an untrusted pool of workers actually look a little bit more like a blockchain a little bit. Because instead of blocks, you have commits."
  • andrej-karpathy in 2026-03-20-no-priors-andrej-karpathy-skill-issue-code-agents (the swarm framing): "a swarm of agents on the Internet could collaborate to improve LLMs and could potentially even run circles around Frontier Labs, who knows? Yeah, maybe that's even possible. Frontier Labs have a huge amount of trusted compute, but the Earth is much bigger and has huge amount of untrusted compute. But if you put systems in place that deal with this, then maybe it is possible that the swarm out there could come up with better solutions and people kind of contribute cycles to a thing that they care about."

Frontier-lab operational mirror

  • andrej-karpathy in 2026-03-20-no-priors-andrej-karpathy-skill-issue-code-agents: "Obviously, even with other research, OpenAI or Anthropic or these other labs, they're employing what, like a thousand something researchers, Right? These researchers are basically like glorified auto. They're like automating themselves away actively. And this is like the thing they're all trying to do. I think I went around some of those researchers also feel the psychosis. Right? Because they can. It's working, right? And so they're like, oh, it's over for me too."

Karpathy joining Anthropic to lead recursive self-improvement (May 22, 2026)

  • From 2026-05-22-podcast-all-in-podcast-spacex-s-2t-case-nvidia-s-shock-selloff-america — Karpathy joins Anthropic to lead a new pre-training team focused on recursive self-improvement. This is the operational consummation of the March 2026 framing on this concept: Karpathy is moving from personal-prototype scale into frontier-lab institutional scale, taking the same loop that worked on his nanochat into Anthropic's pre-training. The concept is no longer just a personal-research demonstration — it is now a named frontier-lab program at the lab Baker identifies as having a 3-6 month lead.
  • chamath-palihapitiya in same: "If you put those two things [recursive self-learning + scaling] together, I think that you start to potentially live out this idea that there's an order of magnitude improvement on a yearly basis. So like this new form of Moore's Law. So then the model quality just goes absolutely parabolically just like this, straight up." Chamath quantifies the rate: 10x per year as the new Moore's Law if Karpathy's framing lands at frontier-lab scale. Falsification threshold sharpens — track Anthropic capability disclosures over the next 12 months and compare against the 10x/year claim.
  • gavin-baker in same: "I do think what Karpathy is working on, recursive self improvement is really important and unlocking that and continual learning, you know, maybe the two final frontiers for AI... continual learning is the holy grail, where the model learns from experiences the way humans do. And that's something we haven't unlocked yet. And those two combined, I think they might pull the future forward in a very real way." Baker pairs recursive-self-improvement with continual learning as the two unsolved frontier problems whose joint unlocking would compress the AI timeline. Establishes a more specific success criterion than "AGI" — both legs need to land for the parabolic-improvement framing to take.

What still requires humans (the surviving asymmetry)

  • andrej-karpathy in 2026-03-20-no-priors-andrej-karpathy-skill-issue-code-agents (caveats and limits): "Number one, this is extremely well suited to anything that has objective metrics that are easy to evaluate. So for example, like writing kernels for more efficient CUDA code for various parts of a model, et cetera, are the perfect fit because you have inefficient code and then you want efficient code that has the exact same behavior, but it's much faster. Perfect fit. So a lot of things are perfect fit for auto research, but many things will not be. And so it's just if you can't evaluate it, then you can't auto research it."
  • andrej-karpathy in 2026-03-20-no-priors-andrej-karpathy-skill-issue-code-agents (research-direction still human): "certainly they can contribute ideas, but okay, they shouldn't actually be enacting those ideas. There's a queue of ideas and there's maybe an automated scientist that comes up with ideas based on all the archive papers and GitHub repos and it funnels ideas in. Or researchers can contribute ideas, but it's a single queue and there's workers that pull items and they try them out and whatever works just gets sort of put on the feature branch."

Cross-thread tensions

  • vs. agi-timeline-decade-of-agents (same thread, October 2025 framing). Karpathy's October "I doubt agents can do code that has never been written before well enough to automate frontier research" is in tension with his March demonstration of agents finding nanochat tunings he missed. Both can be honest first-person reports — different model generations, the December 2025 inflection, fast capability rate — but the wiki's bear case on AI-2027 weakens here. See the two-point snapshot framing on that page.
  • vs. ai-coding-agent-asymmetry-on-novel-code (same thread). The asymmetry framing survives but narrowed: it now applies to the soft / not-RL-trained domains (intent inference, jokes, design choices before unit tests exist) rather than to all "novel code". The research-direction layer where humans still credit themselves with adding value is exactly the soft-domain layer.

Open questions

  • Which research operations can be Karpathy-style auto-researched, and which can't? Karpathy gives the discriminator: "if you can't evaluate it, then you can't auto research it." Hyperparameter search: yes. CUDA kernel optimization: yes. Architectural innovation that requires new evaluation metrics: less clear. Empirical-research bets on what to evaluate in the first place: not via this loop. Building out a taxonomy would be a worthwhile autoresearch / ingest pass on its own.
  • Does the untrusted-swarm framing actually deploy at scale? Karpathy explicitly says he doesn't have a system he's "super happy with just yet." Track whether any project (his or others') gets to demonstrable collaborative-LLM-improvement across an untrusted pool. The structural argument is right; the implementation might fail on incentive design (no monetary reward as of March), on adversarial-actor handling, or on the practical security cost of running untrusted code on verification machines.
  • What does Karpathy's "thousand researchers automating themselves" framing predict for OpenAI / Anthropic headcount over 2026-2027? If true, frontier-lab research headcount should plateau or contract (the AI-2027 prediction); if false (institutional inertia, regulatory hiring, alignment workload growth), they keep hiring. Direct falsification path.
  • Does the demand-elasticity / Jevons-paradox framing extend to AI research itself? If AutoResearch lowers the cost of "running one more experiment", does the demand for AI experiments grow enough that researcher headcount holds? Karpathy implies no (the OpenAI quote — they're out of a job); the Jevons framing on ai-vampire-pattern would predict yes at least transiently. Worth watching whether this resolves the same way as compiler-and-programmer-headcount did historically.

Related

Referenced by