行业研究公司研究宏观策略财报招股书会议纪要 Token 低空经济十五五 AIGC 大模型

用基础模型重新定义自动驾驶的边界

交运设备 2026-03-09 元戎启行 Man💗

DeepRoute.ai 首席技术官：NVIDIA GTC 2026重新定义边界自动驾驶基础模型

核心观点与关键数据

GTC 2026缩放定律：迈向L5级自动驾驶的必然之路

200,000已发货：标志着新的起点，强调自动驾驶是一个规模问题。
缩放定律：模型缩放 × 数据缩放速度 = 接近L5的进度比率。
瓶颈分析：数据闭环速度和模型缩放速度是决定进步的关键。

数据飞轮：从20万辆到100万辆汽车

数据挑战：行业采用端到端方法，但L5时间表推迟，数据闭环旋转过慢。
数据质量瓶颈：传统方法依赖人工，效率低下，无法扩展。
FM驱动数据循环：Foundation Model（FM）驱动数据挖掘、标注、训练、评估等环节，将5天流程缩短至12小时。
FM性能优势：召回率@P=90%，动作评分相关性优于传统方法。

模拟弥合虚拟与现实之间的差距

长尾场景模拟：低频、高风险场景需要自然发生，通过模拟提升真实性。
3D场景重建：利用NVIDIA 3DGUT修复图像瑕疵，确保训练级模拟真实度。
合成数据生成：结合宇宙风格迁移和DiPIR插入式编辑，系统生成“难以捕捉的危险”输入。
强化学习优化：FM驱动策略优化，降低人工标注需求，提升决策安全性与精确性。

统一基础模型：为系列产品提供动力

40B VLA基础模型：视觉 + 语言 + 行动，支持不同产品线（如自动驾驶出租车）。
数据积累：13亿+实际行驶公里数，20k+城市NOA解决方案累计装机容量。
产品验证与迭代：通过基础模型迭代提升VLA模型性能，实现高效VA模型（不少于100 TOPs）。

研究结论

数据缩放是关键：通过FM驱动数据闭环，提升数据质量与处理速度。
模型缩放是基础：40B VLA基础模型支持大规模数据处理与多模态融合。
模拟与现实结合：利用3D重建和合成数据生成，提升长尾场景模拟真实性。
统一基础模型：为系列产品提供动力，加速L5级自动驾驶进程。

Tongyi Cao | CTO, DeepRoute.ai Vehicles Shipped. Our New Starting Point. Scaling Law: The Inevitable Path to L5 Autonomous Driving Today's Talk Model Scaling40B VLA Foundation Model: Unified driver, analysist, critic 01 Data ScalingThe data flywheel from 200K to 1M vehicles 02 SimulationBridging the Sim-to-Real gap 03 Autonomous Driving is a Scaling Problem Every module in your system is data-drivenEnd-to-end, no hand-crafted rules 2You have a large enough fleet to close-loop all takeover dataEvery takeover is a gold sample Your model has sufficient capacity to absorb all scenariosSmall models saturate—more data doesn't help If all three hold→you can solve every takeover→that’s L5. The Data Closed-Loop: Everyone Knows It While True: Reality Check 1. Collect and label data2. Train and deploy model3. Find failure modea. Curate failure samplesb. Train trigger classifierc. Deploy trigger→collect •The industry has adopted end-to-end...•But L5 timelines keep slipping•The loop is correct—but it spins too slowly Two Bottlenecks That Determine the Speed of Progress Model Scaling Bottleneck Data Scaling Bottleneck •Boring data hurts•Open-source / third-party LLMs hit ceilings•Rule based mining, small dedicated model, human review, Data •Addingdataisnot helping•With prompt engineering, SOTA multimodal LLMs solve 90% offailure cases at decision level, 70% at motion level (within 5% errormargin) Our Answer: One FM for Both Model and Data The Foundation Model is both the product AND the engine of the data pipeline. Model Scaling Data Scaling •Data quantity: 200K vehicles disengagement data•Data quality: FM-powered pipeline (5 days→12 hrs) •40B VLA Foundation Model•Driver, Analysist, Critic PART 01 Model Scaling 40B VLA Foundation Model: Vision + Language + Action 01 MODEL SCALING Pretrain: Why Video Prediction? 01 MODEL SCALING Midtrain: Act, Then Explain Part A—V + A (large scale) v₀a₀v₁a₁v₂a₂... •You are a reasoning driver•Data: V→L (key event description) + A •You are an Analysist/Critic•Data: V + A→L (action extraction,explanation, judgement)•"Decelerate—cyclist merging into lane" •You are a Driver 01 MODEL SCALING Real-Time Inference: VLA with KV Cache Acceleration •KV cache: history never recomputed•MTP Target: 10-15 Hz real-time closed-loop control 01 MODEL SCALING VLA model mass production PART 02 Data Scaling The Data Flywheel: From 200K to 1M Vehicles 02 DATA SCALING Total Time: Over 5 Days (100+ Hours) Every critical node depends on human capability. Experience-driven, rule-based, manually coordinated—it works, but it doesn'tscale. FM Powers Every Node of the Data Loop Disengagement Diagnosis Data Mining FM categorizes every takeover: non-AD / map / perception / behavior FM discovers high-value clips: near-misses, rare scenarios, edge cases CoT Annotation FM generates Chain of Thought reasoning traces→feeds VLA midtraining FM scores trajectory quality: comfort, safety, human-likeness→release gate 02 DATA SCALING Our FM Outperforms on Driving Tasks PART 03 Simulation Bridging the Sim-to-Real Gap 03 SIMULATION Long-Tail Scenarios: The Final Piece 03 SIMULATION Auto-fix artifacts, clipping, and floaters.Ensures training-grade simulation fidelity 03 SIMULATION Synthetic Data: Cosmos + DiPIR Cosmos Style Transfer (NVIDIA) DiPIR Insertive Editing (DeepRoute) •Input: real 7-view video•Output: all-weather variants (night / rain / snow / glare)•Geometry & semantic labels preserved•One daytime case→train night / snow scenarios •Insert 3D simulated objects into real data•Pedestrians / cyclists / animals as sudden obstacles•Lighting + shadows + ISP style auto-matched•Systematically generate "impossible-to-capture dangers" 03 SIMULATION Posttrain: Refine via RL RL Policy Optimization (RLCS) •FM Driver generates L (reasoning) + A (action), multiple rollout•Reduce the need of human labeling•FM Critic evaluates reasoning-action consistency + Rule based reward A Unified Foundation Model Powering a Broad Suite of Products 1,000,000 Racing Toward One Million Vehicles Shipped Bridging Sim-to-Real for long-tail FM-powered closed-loop: 200K→1M One unified 40B VLA brain

点击免费查看完整报告

用基础模型重新定义自动驾驶的边界

DeepRoute.ai 首席技术官：NVIDIA GTC 2026重新定义边界自动驾驶基础模型

核心观点与关键数据

GTC 2026缩放定律：迈向L5级自动驾驶的必然之路

数据飞轮：从20万辆到100万辆汽车

模拟弥合虚拟与现实之间的差距

统一基础模型：为系列产品提供动力

研究结论

你可能感兴趣

【盘中宝】该技术将重新定义芯片封装的边界，多家科技巨头纷纷跟进这一方案，这家公司已实现相关细分领域技术的全面覆盖

超算互联网时代的调度与调优：用AI重新定义通信

用AI重新定义设施管理的未来：人工智能驱动设施转型的五种方式

智能穿戴行业：华米科技，科技连接健康，重新定义边界

未来零售：用“时间”重新定义“市场份额”

用人形机器人重新定义运营

保险科技公司，用技术重新定义保险

AI 智能中台架构设计：用ABC+IOT重新定义制造

AI产品测评体验系列报告：多模态模型迎来Deepseek时刻，供给革命将重新定义内容创作范式

GPT-5.2系列发布：重新定义AI生产力，驱动AI从模型竞争转向场景落地