Is vision-only perception sufficient for L4 autonomy?

The question

Can a camera-only perception stack (no LiDAR, no radar) achieve the reliability required for true SAE Level 4 driverless operation at scale, or does the failure-mode envelope of cameras (glare, fog, occlusion) necessarily require sensor redundancy?

Why it matters

This is the central empirical question behind tesla-fsd's strategy vs waymo, mercedes-drive-pilot, and every other major OEM. See vision-only-vs-sensor-fusion for the architectural split. The answer determines whether Tesla's per-vehicle cost advantage and fleet-learning flywheel dominate long-run, or whether vision-only tops out at L2.

What we currently believe

As of April 2026, no vision-only system has cleared the bar for commercial driverless operation. Only sensor-fused stacks (Waymo, Zoox) operate at scale without safety drivers 2026-04-20-autoresearch-tesla-fsd.
The nhtsa EA26002 investigation specifically targets camera-only failure modes (sun glare, fog, occluded cameras) — regulator-level evidence that these are real, not hypothetical 2026-04-20-autoresearch-tesla-fsd.

Evidence we have

Community-tracker data: tesla-fsd v14.2 at ~809 city miles to critical disengagement vs waymo's ~30,000 threshold before safety-driver removal 2026-04-20-autoresearch-tesla-fsd.
Tesla unsupervised fleet remains ~4–8 Model Ys in Austin with remote supervision — technology demonstration scale, not commercial service scale 2026-04-20-autoresearch-tesla-fsd.

Evidence we need

Whether v14/v15 and AI4/AI5 close the disengagement gap meaningfully.
Whether Tesla's claim that fleet-scale data trumps sensor redundancy holds empirically.
How vision-only handles adversarial weather conditions that defeat cameras (heavy snow, direct sun at horizon, smoke).

How to resolve

Track the disengagement-rate trajectory across v14.x releases.
Watch for any vision-only program besides Tesla demonstrating comparable L4 performance (currently none).
Look for controlled third-party benchmarks against sensor-fused stacks.

Is vision-only perception sufficient for L4 autonomy?

Is vision-only perception sufficient for L4 autonomy?

The question

Why it matters

What we currently believe

Evidence we have

Evidence we need

How to resolve

Related