What AI research LLMs can and can't automate (the capability boundary)
What AI research LLMs can and can't automate (the capability boundary)
Vintage: May 2026. From 2026-05-15-dwarkesh-podcast-eric-jang-building-alphago-from-scratch, where Eric Jang reports first-person from running an automated LLM coding-agent loop (Opus 4.6 / 4.7) on his AlphaGo-from-scratch project. A practitioner capability-snapshot — exactly the kind of claim the vintage discipline (../_meta/AI_CAPABILITY_TRACKING) flags for re-validation, since Jang himself expects "Mythos-class models" to move the boundary.
One-line summary: As of May 2026, LLM agents are very good at the verifiable, well-scoped parts of AI research — implementing and running experiments, hyperparameter optimization, and open-ended code search against a fixed metric — but not good at choosing the next experiment in a track, stepping back to abandon a dead-end track ("lateral thinking"), or catching infra bugs without being pointed at them. The discriminator is verifiability: "if you can't evaluate it, then you can't auto research it."
The insight
Jang ran his project largely through an automated agent loop and reports a clear competence split:
What works well today:
- Open-ended optimization, not just grid search. Beyond classic learning-rate/weight-decay sweeps, the agent can diagnose ("the gradients are small in this layer"), rewrite code, invent a new data augmentation, and grind a metric like a "grad-student-like" optimizer.
- Executing experiments end-to-end. Jang wrote a Claude skill ("Experiment") that takes a described x-axis/y-axis question, runs the experiments, compiles the plot, writes a report, and suggests causes.
What doesn't work well today:
- Selecting the next experiment in a track. Current public models "don't seem to be that great at selecting what the next experiment should be."
- Lateral thinking / escaping dead ends. They can't "step back and do the lateral thinking of like wait a minute, this track doesn't really make sense. Let's go back to first principles."
- Catching infra bugs unprompted. Jang "had to catch infra bugs myself" and then prompt Claude with the right question to investigate.
This maps to a longstanding research-taste claim (Jang invokes Ilya Sutskever): a good researcher has a strong prior about which idea should work and so can distinguish "this is a bug" from "the idea is wrong" and persevere. That high-level belief is the hard-to-automate part.
The structural fix Jang proposes: use a hard-to-cheat, quick-to-verify game (Go win-rate vs. KataGo) as an outer loop that grounds an automated scientist, while the inner loop does the research-engineering. Go is attractive because the outer verification is cheap and un-game-able, yet the inner loop contains rich, transferable sub-problems (distributed systems, predicting whether an idea works). Jang's bet: skills learned on a verifiable game positively transfer to messier domains — citing DeepMind's games→LLMs trajectory as precedent.
Evidence
- eric-jang in 2026-05-15-dwarkesh-podcast-eric-jang-building-alphago-from-scratch (what works): "the models can do a very good job of doing hyperparameter optimization ... it can search a much more open ended set of problems ... you end up with this much more flexible and kind of high level, almost like grad student like ability to just grind a performance metric ... it is also fantastic now at basically executing any experiment."
- eric-jang in 2026-05-15-dwarkesh-podcast-eric-jang-building-alphago-from-scratch (what doesn't): "current closed models that we can access ... don't seem to be that great at selecting what the next experiment should be in a given track. And they don't seem to be able to kind of step back and do the lateral thinking of like wait a minute, this track doesn't really make sense ... I had to catch infra bugs myself."
- eric-jang in 2026-05-15-dwarkesh-podcast-eric-jang-building-alphago-from-scratch (the verifiability discriminator, echoing Karpathy): "if you can't evaluate it, then you can't auto research it" — paraphrased here as the through-line; Jang's framing is the Go-outer-loop: "it's very quick to verify the outer loop ... you can kind of check the outcome of a Go game quite easily."
- dwarkesh-patel in 2026-05-15-dwarkesh-podcast-eric-jang-building-alphago-from-scratch (the research-taste / Ilya framing): "Ilya was talking about ... one of the things he thinks makes him a good researcher is that he has intuition ... he is able to persevere through bugs and know which things are bugs versus mistakes in the fundamental idea based on his high level belief about this idea should work."
- dwarkesh-patel in 2026-05-15-dwarkesh-podcast-eric-jang-building-alphago-from-scratch (the stacking problem at labs): "I've heard rumors that at some AI labs the thing that has gone wrong is that people will individually pursue good ideas, but those don't end up stacking well ... having a single top down vision of how things should work is very important."
- eric-jang in 2026-05-15-dwarkesh-podcast-eric-jang-building-alphago-from-scratch (positive-transfer argument for the singularity question): "DeepMind ... started as a sort of focus on games ... presumably there was some positive transfer from their time working on games ... why wouldn't it also be true for automated AI researchers? They should be able to positively transfer experience ... to something more ambitious and economically useful."
Design implications
- Verifiable, cheap-to-check outer loops are the bottleneck-breaker for research automation — the same enabling ingredient as in mcts-vs-llm-rl-credit-assignment (a groundable value function). Where you can't construct one, the human stays in the loop on direction-setting.
- Compute-multiplier tricks don't reliably stack (Jang: KataGo's algorithmic tricks "just don't matter so much" on Blackwell GPUs; multipliers "might have some correlated benefit" and stack less as hardware improves). This complicates naive intelligence-explosion math that assumes additive gains.
- Connects directly to autoresearch-recursive-self-improvement (Karpathy's nanochat AutoResearch, March 2026): same "remove the human as bottleneck, automate the verifiable loop" pattern, same discriminator (verifiable → automatable). Jang's account adds the negative space — the parts that still need a human — more concretely.
Contradictions / tensions
- Tension with autoresearch-recursive-self-improvement on how much is automated. Karpathy (March 2026) reports AutoResearch finding nanochat tunings he missed in two decades — bullish on the verifiable inner loop. Jang (May 2026) agrees the inner loop works but is more explicit that direction-selection and dead-end escape still fail, and that compute-multipliers don't stack. Not a hard contradiction — both locate the frontier at "verifiable inner loop yes, research taste no" — but Jang is the more sober read on near-term takeoff. Frame chronologically: similar pictures two months apart, Jang emphasizing the surviving human-only layer.
- Jang on whether games-pretraining helps or hobbles labs (Google): "the jury's still out" — the positive-transfer argument is asserted, not demonstrated.
Open questions
- can-llms-choose-the-right-research-question — the single sharpest open question this episode raises.
- Which specific research operations cross from "human-only" to "automatable" first, and on what timeline (does a Mythos-class model move the lateral-thinking boundary, as Jang speculates)?
- Does positive transfer from verifiable games actually carry to economically-useful research, or does it hobble (the Google-TPU/games-tradeoff worry)?
Related
- autoresearch-recursive-self-improvement — the closely-parallel Karpathy March-2026 account; this page is the May-2026 sober counterpart
- mcts-vs-llm-rl-credit-assignment — the verifiable-outer-loop principle in its RL-training form
- agi-timeline-decade-of-agents — the intelligence-explosion-timing framing this informs
- eric-jang — primary source
- dwarkesh-patel — interlocutor; surfaces the Ilya research-taste and idea-stacking framings