您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [元戎启行]:用基础模型重新定义自动驾驶的边界 - 发现报告

用基础模型重新定义自动驾驶的边界

交运设备 2026-03-09 元戎启行 Man💗
报告封面

Tongyi Cao | CTO, DeepRoute.ai Vehicles Shipped. Our New Starting Point. Scaling Law: The Inevitable Path to L5 Autonomous Driving Today's Talk Model Scaling40B VLA Foundation Model: Unified driver, analysist, critic 01 Data ScalingThe data flywheel from 200K to 1M vehicles 02 SimulationBridging the Sim-to-Real gap 03 Autonomous Driving is a Scaling Problem Every module in your system is data-drivenEnd-to-end, no hand-crafted rules 2You have a large enough fleet to close-loop all takeover dataEvery takeover is a gold sample Your model has sufficient capacity to absorb all scenariosSmall models saturate—more data doesn't help If all three hold→you can solve every takeover→that’s L5. The Data Closed-Loop: Everyone Knows It While True: Reality Check 1. Collect and label data2. Train and deploy model3. Find failure modea. Curate failure samplesb. Train trigger classifierc. Deploy trigger→collect •The industry has adopted end-to-end...•But L5 timelines keep slipping•The loop is correct—but it spins too slowly Two Bottlenecks That Determine the Speed of Progress Model Scaling Bottleneck Data Scaling Bottleneck •Boring data hurts•Open-source / third-party LLMs hit ceilings•Rule based mining, small dedicated model, human review, Data •Addingdataisnot helping•With prompt engineering, SOTA multimodal LLMs solve 90% offailure cases at decision level, 70% at motion level (within 5% errormargin) Our Answer: One FM for Both Model and Data The Foundation Model is both the product AND the engine of the data pipeline. Model Scaling Data Scaling •Data quantity: 200K vehicles disengagement data•Data quality: FM-powered pipeline (5 days→12 hrs) •40B VLA Foundation Model•Driver, Analysist, Critic PART 01 Model Scaling 40B VLA Foundation Model: Vision + Language + Action 01 MODEL SCALING Pretrain: Why Video Prediction? 01 MODEL SCALING Midtrain: Act, Then Explain Part A—V + A (large scale) v₀a₀v₁a₁v₂a₂... •You are a reasoning driver•Data: V→L (key event description) + A •You are an Analysist/Critic•Data: V + A→L (action extraction,explanation, judgement)•"Decelerate—cyclist merging into lane" •You are a Driver 01 MODEL SCALING Real-Time Inference: VLA with KV Cache Acceleration •KV cache: history never recomputed•MTP Target: 10-15 Hz real-time closed-loop control 01 MODEL SCALING VLA model mass production PART 02 Data Scaling The Data Flywheel: From 200K to 1M Vehicles 02 DATA SCALING Total Time: Over 5 Days (100+ Hours) Every critical node depends on human capability. Experience-driven, rule-based, manually coordinated—it works, but it doesn'tscale. FM Powers Every Node of the Data Loop Disengagement Diagnosis Data Mining FM categorizes every takeover: non-AD / map / perception / behavior FM discovers high-value clips: near-misses, rare scenarios, edge cases CoT Annotation FM generates Chain of Thought reasoning traces→feeds VLA midtraining FM scores trajectory quality: comfort, safety, human-likeness→release gate 02 DATA SCALING Our FM Outperforms on Driving Tasks PART 03 Simulation Bridging the Sim-to-Real Gap 03 SIMULATION Long-Tail Scenarios: The Final Piece 03 SIMULATION Auto-fix artifacts, clipping, and floaters.Ensures training-grade simulation fidelity 03 SIMULATION Synthetic Data: Cosmos + DiPIR Cosmos Style Transfer (NVIDIA) DiPIR Insertive Editing (DeepRoute) •Input: real 7-view video•Output: all-weather variants (night / rain / snow / glare)•Geometry & semantic labels preserved•One daytime case→train night / snow scenarios •Insert 3D simulated objects into real data•Pedestrians / cyclists / animals as sudden obstacles•Lighting + shadows + ISP style auto-matched•Systematically generate "impossible-to-capture dangers" 03 SIMULATION Posttrain: Refine via RL RL Policy Optimization (RLCS) •FM Driver generates L (reasoning) + A (action), multiple rollout•Reduce the need of human labeling•FM Critic evaluates reasoning-action consistency + Rule based reward A Unified Foundation Model Powering a Broad Suite of Products 1,000,000 Racing Toward One Million Vehicles Shipped Bridging Sim-to-Real for long-tail FM-powered closed-loop: 200K→1M One unified 40B VLA brain