brain/
← all entities
entitypersonstock-market

Dwarkesh Patel

Podcaster, Dwarkesh Podcast

Quotes

So space is really a regulatory. It's really a regulatory play. It's harder to build on land than it is in space

2026-02-05-dwarkesh-patel-elon-musk-in-36-months-the-cheapest-place-to-put· 2026-02-05#terrestrial-power-flat-to-orbital-dc-arbitrage

if you look at pre training, if you look at llama 3, for example, I think it's trained on 15 trillion tokens. And if you look at the 70B model, that would be the equivalent of 0.07 bits per token in that it sees in pre training in terms of the information and the weights of the model compared to the tokens it reads. Whereas if you look at the kvcache and how it grows per additional token, in context learning, it's like 320 kilobytes. So that's a 35 million fold difference in how much information per token is assimilated by the model.

2025-10-17-dwarkesh-patel-andrej-karpathy-summoning-ghosts· 2025-10-17#agi-timeline-decade-of-agents

Through the history of programming there's been many productivity improvements, compilers linting better programming languages, et cetera, which have increased programmer productivity but have not led to an explosion. So that's like one that sounds very much like an autocomplete tab. And this other category is just like automation of the programmer. And it's interesting you're seeing more in the category of the historical analogies of better compilers or something.

2025-10-17-dwarkesh-patel-andrej-karpathy-summoning-ghosts· 2025-10-17#ai-coding-agent-asymmetry-on-novel-code

this thing you're saying, which would be intractable and prevents you from actually getting beyond a certain level in Go, is just by default how LLMs are trained ... Karpathy, when he was on the podcast, called it like sucking supervision through a straw.

2026-05-15-dwarkesh-podcast-eric-jang-building-alphago-from-scratch· 2026-05-15#mcts-vs-llm-rl-credit-assignment#mcts-per-move-target-to-llm-rl-inefficiency

you're trying to maximize as you're learning bits per flop ... you can think of bits per flop as samples per flop times bits per sample ... the samples per flop go down as RL becomes more and more long horizon. But at least this kind of naive RL is also terrible from a bits per sample perspective.

with supervised learning ... there's a label that says, actually the term here is blue ... Now, if you were doing this through rl ... you would have to do this on the order of 100,000 times in order to just stumble on blue, then get some learning signal off of that.

[On the chip-design-from-bottom-up framing for the Reiner Pope blackboard lecture]: how do chips actually work - starting with basic logic gates, and working up to why GPUs, TPUs, FPGAs, and the human brain each look the way they do.

2026-05-22-podcast-dwarkesh-podcast-reiner-pope-chip-design-from-the-bottom-up· 2026-05-22#picks-and-shovels-leading-edge-fab-buildout
Notes

Dwarkesh Patel

One-line summary: Interviewer / podcaster known for long-form AI and economics conversations. Tracked here when guests articulate AI-infrastructure theses; Dwarkesh's own framings (regulatory-play analyses, scaling-law intuitions) show up too.

What they're known for

Brief factual context — fill in.

Why they matter to stock-market

Why this person's claims are tracked here — fill in.

Said

Speaker-attributed claims extracted from diarized sources. Each bullet mirrors one entry in quotes: frontmatter — keep them in sync.

Sources

Related

Cross-links — fill in.

Referenced by