CUDA moat erosion at inference

One-line summary: The standard bull case for nvidia is that CUDA is an unassailable software moat. A competitor CEO (andrew-feldman) argues CUDA "has no role whatsoever in inference" and that two of the three leading frontier models now train without it — a re-rating risk for the assumption that the moat extends to the whole AI stack.

The insight

Feldman concedes Nvidia is "probably the greatest company in the first part of this century" and that CUDA "was really important in the creating of the AI landscape." But he argues the moat is decaying along two axes:

Inference: CUDA is irrelevant. Moving a model from GPU to Cerebras takes "10 keystrokes — just move, point to our API." CUDA's lock-in does not bind at inference.
Training: share is hemorrhaging. A year ago every frontier model was CUDA-built. Today, of the three leading frontier models, two are not: Gemini (Google, TPUs, no CUDA), Anthropic (trained on Trainium, no CUDA), vs only OpenAI's GPT (GPUs, CUDA). Feldman frames this as "a hemorrhaging of share."

If the moat is real only for a shrinking slice of training and not at all for the (now-dominant) inference workload, then NVDA's terminal multiple — to the extent it prices a durable software lock-in — carries re-rating risk.

The chain

CUDA built the early AI landscape → frontier labs adopt non-CUDA silicon (TPU, Trainium, wafer-scale) → 2 of 3 leading models drop CUDA and inference is CUDA-irrelevant → Nvidia's software moat narrows to a shrinking training slice → moat-premium in NVDA's valuation is at risk.

Canonical: cuda-moat-erosion-to-nvda-rerate.

Evidence

andrew-feldman in 2026-05-21-odd-lots-why-cerebras-ceo-andrew-feldman-built-the-world-s: "CUDA was really important in the creating of the AI landscape, but it's not important now and it has no role whatsoever in inference. If you want to move from running a model on GPUs today to running it on us, we can move it in 10 keystrokes."
andrew-feldman in 2026-05-21-odd-lots-why-cerebras-ceo-andrew-feldman-built-the-world-s: "a year ago every major Frontier Lab model had been built on a Cuda foundation and today two of three haven't. So they lost 70% market share... Gemini built by Google on TPUs... Anthropics models trained on Trainium, no CUDA... So two of the three leading models today use no CUDA. That's a hemorrhaging of share."

AI-thread angle: portability as the why behind model-layer commoditization

The "10 keystrokes" portability claim is the hardware-layer face of the llm-as-commodity-thesis tracked elsewhere in this thread (Ghodsi: Databricks customers swap models in ~1 day). If a served model can be re-pointed from GPU to wafer-scale by changing an API endpoint, then runtime lock-in does not bind at the inference layer — which is consistent with the commodity-thesis claim that durable value accrues above the model/runtime layer (proprietary data, workflow integration), not in CUDA-style ecosystem lock-in. The three-frontier-models-on-three-silicon-stacks fact (Gemini/TPU, Claude/Trainium, GPT/CUDA-GPU) is itself a marker of the open-vs-closed-source-model-economics landscape diversifying away from a single hardware substrate.

chamath-palihapitiya in 2026-05-29-podcast-all-in-podcast-anthropic-s-digital-god-pope-vs-ai-job-loss (Rogo eval convergence — frontier models indistinguishable): Citing Rogo's financial-analyst eval benchmark: "across all evals, there is no single best model anymore. At the top of the leaderboard opus 4, 7, GPT, 5, 5, sonnet 4, 6 appear almost indistinguishable, separated by less than 3/10 of a percentage point overall...these things are getting commoditized way too quickly. And then you'd say, well, what's the ROI on all this incremental spend?" — An independent financial-services eval (not a competitor) showing frontier models within 0.3 pp is a stronger commoditization signal than Feldman's competitor framing.
chamath-palihapitiya in 2026-05-29-podcast-all-in-podcast-anthropic-s-digital-god-pope-vs-ai-job-loss (Fortune 1000 hot-swap as enterprise behavior): "a lot of the folks that we see now in the Fortune 1000 and increasingly the Global 1000, they want abstraction above it. They want to sit in a control plane. They want to have the flexibility because they don't know how it's going to shake out. They see all the money being invested at the model layers, but they see the model quality asymptote." — Chamath's 8090 enterprise deployments confirm that model-layer commoditization is already driving enterprise contract structure (abstraction / hot-swap) not just research benchmarks.
david-sacks in 2026-05-29-podcast-all-in-podcast-anthropic-s-digital-god-pope-vs-ai-job-loss (Anthropic monopoly trajectory — opposite dynamic): "if you have one company that's growing at 10x year over year and another company that's growing at 3x year over year, within two years the first company will have 90% market share." — Sacks's Anthropic-pulls-away framing is in direct tension with Chamath's commoditization/asymptote framing. Both can be true: Anthropic may dominate a commoditized market that is still winner-take-most on distribution.
rob-wachen in 2026-06-30-podcast-invest-like-the-best-etched-building-ai-hardware-to-make-inference (a new entrant building without CUDA from day one): "the decision explicitly not to build an arbitrary graph compiler, not to support arbitrary Pytorch, not to support arbitrary cuda, not to support arbitrary ONNX graphs. But instead we envisioned a world where there was going to be under 100 models that actually mattered... the only people that took us seriously were in High Frequency Trading. They all hate compilers too." — etched is a second (post-Cerebras) datapoint that inference challengers treat CUDA compatibility as skippable, not table stakes. Interested-party source (founder, interviewed by his own investor).
dave-blundin in 2026-07-08-podcast-moonshots-fable-5-is-back-govt-leashed-altman-offers-5-of (the buy-side custom-silicon count): "Every single Magnum obstacle company is designing its own chips except for Anthropic... Nvidia's stranglehold on 80% gross margin is not forever." A market-cap-concentration framing of the same erosion — 10 of the 11 largest companies building their own AI silicon is the demand-side pressure on the GM.
andrew-feldman in 2026-07-10-podcast-all-in-podcast-open-source-wins-agi-is-here-and-scorsese-s-ai (the architecture-age argument): "All chips prior to us followed Moore's Law. And we broke into doubling every 18 months... in the next 18 months, we'll be way over 2x. Now, if you've got a 20 year old architecture like the GPU, it's much harder. You have to rely on... going to the next fab node." Reframes the moat-erosion as an architecture-generational one — newer non-GPU designs have more optimization headroom. Competitor framing (as before).
anjney-midha in 2026-06-13-podcast-odd-lots-anjney-midha-s-plan-to-radically-lower-the-price (the margin driver behind custom silicon): "About 80 cents of every dollar a lab spends today on R&D flows to a chip provider like Nvidia, so margins are super rough. From a unit-economic perspective you want more control over your margins... and supply-chain independence. That's how it works at the foundry level today — TSMC gets to decide which compute provider's business grows because they only have so much capacity." — A non-competitor practitioner quantifies why every lab (Microsoft Maia, etc.) is building inference silicon: an ~80% take-rate to Nvidia is the economic forcing function eroding the moat from the buy side, independent of Feldman's competitive framing.

Design implications

Bearish-leg input for the durability of NVDA's software-moat premium; bullish for non-CUDA silicon (TPU/Google, Trainium/AWS, wafer-scale/cerebras).
The flip side already tracked in the wiki: Nvidia still owns the cost-optimized slow-token market and >50% of CoWoS; the moat erosion is at the software/lock-in layer, not the hardware-supply layer.

Contradictions / tensions

Source is a direct Nvidia competitor — clear incentive to talk down the moat. Treat the "70% / two-of-three" framing as a directional claim, not an audited share figure.
GPT (OpenAI) still trains in CUDA, and CUDA's training role is "shrinking" rather than gone — the moat is decaying, not dead.
The CoWoS/HBM supply chokehold (cowos-packaging-capacity-crunch, hbm-supply-bottleneck) is a hardware moat independent of CUDA and is not eroding on this evidence.
Nvidia-Groq $20B deal (December 2025): CUDA moat actively DEFENDED at inference. Nvidia licensed Groq's LPU (Language Processing Unit) — the primary chip architecture that was genuinely faster/more efficient than CUDA for inference workloads — in a $20B acqui-hire structure. By neutralizing the leading non-CUDA inference challenger, Nvidia has defended the moat Feldman claims is eroding. This is direct counterevidence to the thesis. Senators Warren and Blumenthal (March 20, 2026) characterized Groq's LPU as "genuinely faster and more energy-efficient for certain workloads" — validating that Groq was a real threat and that Nvidia has now eliminated it. From 2026-05-27-autoresearch-regulatory-antitrust-semis-ai-may-2026.
AMD-Meta $60B deal (February 2026) provides the "primary inference alternative" now that Groq is neutralized — but AMD ROCm is not yet at the level of Groq's LPU for low-latency inference. From 2026-05-26-autoresearch-semis-ai-infra-macro-scan-may-23-26-2026.
2026-07-27 — SemiAnalysis conditionally upgrades AMD (independent written analyst). From 2026-07-25-feed-semianalysis-can-amd-break-the-cuda-moat: AMD upgraded from "0% chance" (early 2025) to "a great chance of success" at eroding the software moat — conditional on fixing a slow Helios rack ramp and unstable internal GPU clusters for ROCm dev/CI. ROCm progress is real (vLLM AMD gating, SGLang disaggregated DeepSeek-V4 nightly tests, 18× Kimi K2.5 latency improvement via AITER, MiniMax M3 competitive with B200) but the binding gate is a disaggregated-inference "software composability problem" — individual optimizations work; combining them breaks. This sharpens the two-sided read: the erosion at inference is credible and now has a merchant #2, but software-readiness remains a genuine frictional moat, and the same source frames disaggregated inference + WideEP as Nvidia's emerging moat ("the pie is growing rapidly for everyone"). Conviction held low-medium.

Open questions

How much of NVDA's multiple is a software-lock-in premium vs a supply-chain-control premium? Only the former is at risk on this thesis.

CUDA moat erosion at inference

CUDA moat erosion at inference

The insight

The chain

Evidence

AI-thread angle: portability as the why behind model-layer commoditization

Design implications

Contradictions / tensions

Open questions

Related