brain/
conceptstock-market

CUDA moat erosion at inference

Notes

CUDA moat erosion at inference

One-line summary: The standard bull case for nvidia is that CUDA is an unassailable software moat. A competitor CEO (andrew-feldman) argues CUDA "has no role whatsoever in inference" and that two of the three leading frontier models now train without it — a re-rating risk for the assumption that the moat extends to the whole AI stack.

The insight

Feldman concedes Nvidia is "probably the greatest company in the first part of this century" and that CUDA "was really important in the creating of the AI landscape." But he argues the moat is decaying along two axes:

  1. Inference: CUDA is irrelevant. Moving a model from GPU to Cerebras takes "10 keystrokes — just move, point to our API." CUDA's lock-in does not bind at inference.
  2. Training: share is hemorrhaging. A year ago every frontier model was CUDA-built. Today, of the three leading frontier models, two are not: Gemini (Google, TPUs, no CUDA), Anthropic (trained on Trainium, no CUDA), vs only OpenAI's GPT (GPUs, CUDA). Feldman frames this as "a hemorrhaging of share."

If the moat is real only for a shrinking slice of training and not at all for the (now-dominant) inference workload, then NVDA's terminal multiple — to the extent it prices a durable software lock-in — carries re-rating risk.

The chain

CUDA built the early AI landscape → frontier labs adopt non-CUDA silicon (TPU, Trainium, wafer-scale) → 2 of 3 leading models drop CUDA and inference is CUDA-irrelevant → Nvidia's software moat narrows to a shrinking training slice → moat-premium in NVDA's valuation is at risk.

Canonical: cuda-moat-erosion-to-nvda-rerate.

Evidence

AI-thread angle: portability as the why behind model-layer commoditization

The "10 keystrokes" portability claim is the hardware-layer face of the llm-as-commodity-thesis tracked elsewhere in this thread (Ghodsi: Databricks customers swap models in ~1 day). If a served model can be re-pointed from GPU to wafer-scale by changing an API endpoint, then runtime lock-in does not bind at the inference layer — which is consistent with the commodity-thesis claim that durable value accrues above the model/runtime layer (proprietary data, workflow integration), not in CUDA-style ecosystem lock-in. The three-frontier-models-on-three-silicon-stacks fact (Gemini/TPU, Claude/Trainium, GPT/CUDA-GPU) is itself a marker of the open-vs-closed-source-model-economics landscape diversifying away from a single hardware substrate.

  • chamath-palihapitiya in 2026-05-29-podcast-all-in-podcast-anthropic-s-digital-god-pope-vs-ai-job-loss (Rogo eval convergence — frontier models indistinguishable): Citing Rogo's financial-analyst eval benchmark: "across all evals, there is no single best model anymore. At the top of the leaderboard opus 4, 7, GPT, 5, 5, sonnet 4, 6 appear almost indistinguishable, separated by less than 3/10 of a percentage point overall...these things are getting commoditized way too quickly. And then you'd say, well, what's the ROI on all this incremental spend?" — An independent financial-services eval (not a competitor) showing frontier models within 0.3 pp is a stronger commoditization signal than Feldman's competitor framing.
  • chamath-palihapitiya in 2026-05-29-podcast-all-in-podcast-anthropic-s-digital-god-pope-vs-ai-job-loss (Fortune 1000 hot-swap as enterprise behavior): "a lot of the folks that we see now in the Fortune 1000 and increasingly the Global 1000, they want abstraction above it. They want to sit in a control plane. They want to have the flexibility because they don't know how it's going to shake out. They see all the money being invested at the model layers, but they see the model quality asymptote." — Chamath's 8090 enterprise deployments confirm that model-layer commoditization is already driving enterprise contract structure (abstraction / hot-swap) not just research benchmarks.
  • david-sacks in 2026-05-29-podcast-all-in-podcast-anthropic-s-digital-god-pope-vs-ai-job-loss (Anthropic monopoly trajectory — opposite dynamic): "if you have one company that's growing at 10x year over year and another company that's growing at 3x year over year, within two years the first company will have 90% market share." — Sacks's Anthropic-pulls-away framing is in direct tension with Chamath's commoditization/asymptote framing. Both can be true: Anthropic may dominate a commoditized market that is still winner-take-most on distribution.

Design implications

  • Bearish-leg input for the durability of NVDA's software-moat premium; bullish for non-CUDA silicon (TPU/Google, Trainium/AWS, wafer-scale/cerebras).
  • The flip side already tracked in the wiki: Nvidia still owns the cost-optimized slow-token market and >50% of CoWoS; the moat erosion is at the software/lock-in layer, not the hardware-supply layer.

Contradictions / tensions

  • Source is a direct Nvidia competitor — clear incentive to talk down the moat. Treat the "70% / two-of-three" framing as a directional claim, not an audited share figure.
  • GPT (OpenAI) still trains in CUDA, and CUDA's training role is "shrinking" rather than gone — the moat is decaying, not dead.
  • The CoWoS/HBM supply chokehold (cowos-packaging-capacity-crunch, hbm-supply-bottleneck) is a hardware moat independent of CUDA and is not eroding on this evidence.
  • Nvidia-Groq $20B deal (December 2025): CUDA moat actively DEFENDED at inference. Nvidia licensed Groq's LPU (Language Processing Unit) — the primary chip architecture that was genuinely faster/more efficient than CUDA for inference workloads — in a $20B acqui-hire structure. By neutralizing the leading non-CUDA inference challenger, Nvidia has defended the moat Feldman claims is eroding. This is direct counterevidence to the thesis. Senators Warren and Blumenthal (March 20, 2026) characterized Groq's LPU as "genuinely faster and more energy-efficient for certain workloads" — validating that Groq was a real threat and that Nvidia has now eliminated it. From 2026-05-27-autoresearch-regulatory-antitrust-semis-ai-may-2026.
  • AMD-Meta $60B deal (February 2026) provides the "primary inference alternative" now that Groq is neutralized — but AMD ROCm is not yet at the level of Groq's LPU for low-latency inference. From 2026-05-26-autoresearch-semis-ai-infra-macro-scan-may-23-26-2026.

Open questions

  • How much of NVDA's multiple is a software-lock-in premium vs a supply-chain-control premium? Only the former is at risk on this thesis.

Related

Referenced by