行业研究公司研究宏观策略财报招股书会议纪要中央经济工作会议低空经济 DeepSeek AIGC 智能驾驶大模型

新兴空间简报：机器人基础模型（英）2025

机械设备2025-07-28PitchBookx***

AI智能总结

机器人基础模型概述

机器人基础模型（RFMs）是新型AI模型，作为机器人的通用“大脑”，整合视觉、语言和运动于一体。这些模型通过在互联网规模文本、图像和机器人经验数据上进行预训练，学习丰富的表示和广泛的世界知识，实现类似人类的泛化能力，推动机器人系统向更灵活、学习驱动的方向发展。

技术与流程

构建RFM需要创新AI算法、数据管道和专用硬件，关键技术包括：

多模态Transformer架构：利用基于Transformer的神经网络处理多种输入/输出模式，将视觉输入、语言命令和传感器读数编码到统一潜在空间，再解码为动作或任务计划。
大规模多样化数据收集：包括车队学习（如Ambi Robotics的PRIME-1模型基于2000万张真实图像）、仿真到现实（Sim2Real）管道（如TartanAir和NVIDIA平台）以及自监督和弱监督学习（如Ambi利用自监督深度学习实现3D理解）。
软硬件协同设计：训练RFM需要强大的GPU/TPU集群（如Covariant使用NVIDIA A100/H100），部署时需解决延迟和内存限制，推动模型压缩和专用边缘AI硬件发展。

应用

RFM广泛应用于多个行业：

工业自动化和制造业：实现柔性自动化和人机协作，如卫星组件装配线上的机器人可处理未见过的零件和任务。
仓储和物流机器人：Covariant的Covariant Brain和Ambi Robotics的PRIME-1模型能处理“无限多样性”的物品，提升拣选效率和可靠性。
家庭辅助和服务机器人：Physical Intelligence的π模型可执行折叠衣物、搬运购物等任务，无需显式编程。
国防和安全应用：Scout AI的Fury模型用于多领域军事机器人，实现战场感知、自然语言理解和复杂任务执行。
人机协作：LLM赋予机器人流畅对话能力，通过识别肢体语言和环境语义提升安全性，并支持自然语言纠正和指导。

局限性

RFM面临以下挑战：

训练数据稀缺和质量：缺乏大规模、高质量的机器人特定数据，导致模型在训练分布外表现不佳。
现实世界泛化和鲁棒性：模型在熟悉环境外可能出错，难以泛化全新物理技能，且易受分布迁移影响。
延迟和计算限制：模型规模大导致推理延迟高、资源消耗大，难以实时控制，部署受限。
安全、可靠性和伦理问题：需确保模型行为可预测、保守，并解决误指令、视觉输入错误等问题，同时应对数据偏见、隐私和模型透明度等伦理挑战。
运营和商业挑战：集成现有系统成本高，人才稀缺，基准测试和评估体系不完善，市场可能被少数核心模型主导。

近期交易活动和市场前景

2022至2025年间，机器人/AI市场投资激增，2025年VC投资达22亿美元（2023年的7倍），Physical Intelligence获4亿美元融资，Scout AI获1500万美元种子轮。战略收购活跃，如Alphabet收购Vicarious，亚马逊评估收购Covariant。开源模型（如Physical Intelligence的π0）和战略合作（如FedEx部署Covariant系统）也值得关注。

市场预计近期将在受控环境（仓库、工厂、医院）商业化，推动数据网络效应，总市场规模可达数千亿美元。新商业模式（如机器人即服务RaaS）涌现，美国领先，中国等地区也加大投入。

关键数据

2025年VC投资达22亿美元，同比增长136.1%。
2025年交易数量101宗，投资者324家，投资额43亿美元。
估值最高的公司为Figure AI（23.45亿美元融资）。
退出机会评分最高的公司为Physical Intelligence（99分）。
专利数量最多的公司为Intrinsic（482件）。

Originally published July 17, 2025pbinstitutionalresearch@pitchbook.comEMERGING SPACE BRIEFRobotic Foundation ModelsOverviewRobotic foundation models (RFMs) are a novel class of AI models that serve asgeneral-purpose “brains” for robots, integrating vision, language, and motion withina unified model. These models are pretrained on vast datasets, including internet-scale text, imagery, and robotic experience data, enabling them to learn richrepresentations and broad world knowledge. This approach allows RFMs to interpretcomplex commands and perform a wide range of robotic tasks with human-likegeneralization, shifting robotics toward more flexible, learning-driven systems.BackgroundThe evolution of robotics has progressed from early deterministic systems todata-driven machine learning, yet robots traditionally struggled with generalizingbeyond specific tasks. Concurrently, foundation models revolutionized AI in naturallanguage processing and computer vision, demonstrating versatile capabilitiesthrough pretraining on massive datasets. This success inspired the developmentof RFMs in the early 2020s, aiming to create unified models that learn fromextensive, multimodal data to enable robots to perform diverse tasks. Projectslike DeepMind’s Gato and Google Robotics’ PaLM-E and Robotics Transformer 2(RT-2) have showcased the potential for robots to leverage web-scale knowledgeto dynamically take different actions in novel scenarios. This paradigm shiftmarks a convergence of vision, language, and action, moving robotics fromspecialized, pipeline architectures to more flexible and robust systems capable ofbroad real-world application. 1 05101520 Technologies and processesBuilding RFMs necessitates innovation across AI algorithms, data pipelines, andspecialized hardware. Key technologies and processes include:Multimodal transformer architectures:RFMs predominantly leverage transformer-based neural networks capable of processing multiple input/output modalities.These architectures, often with billions of parameters, encode visual inputs,language commands, and sensor readings into a unified latent space, then decodeactions or task plans. For instance, Google’s RT-2 uses a high-capacity vision-language transformer to directly translate visual observations and natural languageinto low-level control signals. Similarly, Physical Intelligence’s π0novel transformer architecture with “flow matching” to learn a general policy forcontrolling different robot types.Massive and diverse data collection:Acquiring “internet-scale” robotic data iscrucial and involves innovative collection mechanisms, such as:•Fleet learning:Companies like Covariant and Ambi Robotics continuouslycollect experience from their deployed robot fleets. Ambi Robotics, for example,pretrained its PRIME-1 foundation model on over 20 million real-world imagesfrom 150,000 hours of pick-and-place operations.•Simulation-to-reality (Sim2Real) pipelines:High-fidelity simulatorsgenerate synthetic data at scale to complement real-world data. Projects likeTartanAir create diverse simulated environments for navigation datasets, andNVIDIA’s Omniverse and Isaac platforms produce physics-based syntheticdata for training, reporting 20 million hours of synthetic autonomous drivingand robot data.•Self-supervised and weakly supervised learning:RFMs primarily utilizeself-supervised objectives to learn from unlabeled data, predicting withheldinformation or aligning modalities. Ambi’s PRIME-1 leveraged self-superviseddeep learning on its massive image dataset to achieve robust 3D understanding.Techniques like language embedding of play data or using pretrained vision-language models like CLIP (contrastive language-image pretraining) in the DIAL(distributed instruction augmentation and learning) approach to auto-labelsensor data enable efficient utilization of vast, unannotated robot experience.Hardware-software codesign:The computational demands of RFMs drive thecodesign of hardware and software. Training RFMs requires powerful GPU/TPUclusters, with companies like Covariant investing in NVIDIA A100/H100 systems.For deployment, addressing latency and memory constraints is critical, leading toefforts in model compression and specialized edge AI hardware. While many RFMscurrently rely on cloud computing for heavy inference, the goal is to enable real-time inference on board robots using high-end embedded GPUs or new AI chips, asexemplified by Scout AI’s Fury defense RFM, which was designed for modularity andhardware-agnostic deployment on platforms like drones or ground vehicles. model employs a 2 ApplicationsRobotic foundation models are being used in a wide range of industries bygiving robots a better ability to understand their surroundings and performtasks more flexibly.Industrial automation and manufacturing:RFMs are fostering flexible automationand human-robot collaboration in manufacturing. For instance, a 2024 studydemonstrated a human-robot collaboration (HRC) assembly

点击免费查看完整报告