brain/
← all mechanisms
medium convictionactive · updated 2026-05-21T00:00:00.000Z

CUDA built early AI → non-CUDA silicon adoption → 2 of 3 frontier models drop CUDA + inference CUDA-irrelevant → Nvidia software moat narrows → NVDA re-rate risk

The standard NVDA bull case prices a durable CUDA software moat. A competitor CEO argues CUDA 'has no role whatsoever in inference' and that two of the three leading frontier models now train without it. If the moat binds only on a shrinking training slice, the moat-premium in NVDA's terminal multiple carries re-rating risk. Tradeable: bearish-leg on the CUDA-lock-in component of NVDA's valuation; bullish for non-CUDA silicon (TPU/Google, Trainium/AWS, wafer-scale/CBRS).

The chain
1
CUDA was decisive in creating the AI landscape and was a real lock-in moat 3-5 years ago.
andrew-feldman in 2026-05-21-odd-lots-why-cerebras-ceo-andrew-feldman-built-the-world-s: "CUDA was really important in the creating of the AI landscape... what was true three or five years ago, in which CUDA had a dominant position."
2
CUDA has no lock-in at inference — moving a model from GPU to a non-Nvidia accelerator is near-frictionless.
andrew-feldman in 2026-05-21-odd-lots-why-cerebras-ceo-andrew-feldman-built-the-world-s: "it's not important now and it has no role whatsoever in inference. If you want to move from running a model on GPUs today to running it on us, we can move it in 10 keystrokes. Just move point to our API."
3
Frontier-model training is migrating off CUDA: two of the three leading models (Gemini on TPUs, Anthropic on Trainium) use no CUDA; only GPT remains CUDA-trained.
andrew-feldman in 2026-05-21-odd-lots-why-cerebras-ceo-andrew-feldman-built-the-world-s: "a year ago every major Frontier Lab model had been built on a Cuda foundation and today two of three haven't. So they lost 70% market share... Gemini built by Google on TPUs... Anthropics models trained on Trainium, no CUDA... two of the three leading models today use no CUDA. That's a hemorrhaging of share."
4
If CUDA's lock-in is gone at inference and shrinking at training, the software-moat premium embedded in NVDA's valuation is at risk — the durable-monopoly assumption narrows to a hardware-supply moat (CoWoS/HBM control), not software.
andrew-feldman in 2026-05-21-odd-lots-why-cerebras-ceo-andrew-feldman-built-the-world-s: "what was true three or five years ago, in which CUDA had a dominant position with Central, has shrunk significantly and not important at all at inference and shrinking in its role in training."
What would falsify this
  • Step 3: A new frontier model leader re-adopts CUDA, or Gemini/Anthropic move significant training back onto GPUs/CUDA.
  • Step 4: NVDA sustains pricing power and gross margin even as CUDA share falls — indicating the moat was hardware-supply, not software, and the re-rate thesis is misdirected.
Contradictions / tensions
  • Source is a direct Nvidia competitor with a clear incentive to talk down the moat. The '70% / two-of-three' framing is directional, not an audited share figure.
  • GPT (OpenAI) still trains in CUDA; CUDA's training role is 'shrinking,' not gone.
  • Nvidia retains a hardware-supply chokehold (CoWoS, HBM) that CUDA-erosion does not touch — see cowos-packaging-capacity-crunch. The hardware moat may matter more than the software moat for the medium term.
Implications
  • Bearish-leg input for the software-lock-in component of NVDA's multiple. Note: only the software-moat premium is at risk on this thesis — Nvidia's hardware-supply moat (>50% CoWoS, HBM allocation) is independent and not eroding here.
  • Bullish for non-CUDA silicon: Google TPU (Alphabet), AWS Trainium (Amazon), Cerebras wafer-scale (CBRS).
  • Pairs with inference-demand-to-wafer-scale-advantage: as inference (where CUDA is irrelevant) becomes the bulk of compute, the moat's relevance shrinks with the workload mix.
  • AI-thread reading (non-tradeable): the same chain, read as an ecosystem fact rather than a valuation call, says the AI compute substrate is **fragmenting across vendors** (TPU / Trainium / wafer-scale / GPU), and that runtime portability ("10 keystrokes") removes lock-in at the inference layer — the hardware-side corroboration of llm-as-commodity-thesis and a marker of the diversifying open-vs-closed-source-model-economics landscape.
Companies
Concepts
Open questions
none