行业研究公司研究宏观策略财报招股书会议纪要 Token 低空经济十五五 AIGC 大模型

2025嵌入式AI调查：从模拟器到研究任务

信息技术 2025-03-14 IEEE 曾阿牛

本文探讨了“具身智能”（Embodied AI）领域，该领域正从传统的“互联网智能”范式转向通过与环境交互进行学习的范式。文章首先概述了具身智能的兴起背景，指出当前的研究范式需要更全面的模拟环境和交互方式，以推动通用人工智能（AGI）的发展。

文章重点评估了九个具身智能模拟器，包括 DeepMind Lab、AI2-THOR、CHALET、VirtualHome、VRKitchen、Habitat-Sim、iGibson、SAPIEN 和 ThreeDWorld。通过七个关键特征（环境、物理、对象类型、对象属性、控制器、动作和多智能体）对这些模拟器进行了比较，并提出了三个次要评估特征（真实性、可扩展性和交互性）。评估结果显示，AI2-THOR、iGibson 和 Habitat-Sim 在真实性、可扩展性和交互性方面表现最佳。

文章进一步调查了具身智能的三个主要研究任务：视觉探索、视觉导航和具身问答（QA）。每个任务都被详细讨论了其方法、评估指标和数据集。视觉探索旨在让智能体通过感知和运动更新其环境内部模型；视觉导航要求智能体在有或没有外部先验或自然语言指令的情况下导航到目标位置；具身问答则结合了视觉识别、语言理解和问答能力，是当前最具挑战性的任务。

最后，文章总结了具身智能模拟器和研究任务之间的相互关系，并指出了该领域面临的挑战和未来研究方向。主要挑战包括模拟器的真实性（缺乏具有真实世界场景和高级物理特性的模拟器）、可扩展性（大规模 3D 场景和对象数据集收集困难）和交互性（缺乏对多智能体设置的重视）。未来研究方向包括开发更真实的模拟器、改进 3D 数据集收集工具以及探索多智能体系统。

Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, Cheston Tan Abstract—There has been an emerging paradigm shift fromthe era of “internet AI” to “embodied AI”, where AI algorithmsand agents no longer learn from datasets of images, videos ortext curated primarily from the internet. Instead, they learnthrough interactions with their environments from an egocentricperceptionsimilar to humans.Consequently, there has beensubstantial growth in the demand for embodied AI simulatorsto support various embodied AI research tasks. This growinginterest in embodied AI is beneficial to the greater pursuit ofArtificial General Intelligence (AGI), but there has not been acontemporary and comprehensive survey of this field. This paperaims to provide an encyclopedic survey for the field of embodiedAI, from its simulators to its research. By evaluating nine current traditional intelligence concepts from vision, language, andreasoning into an artificialembodiment to help solve AI The growing interest in embodied AI has led to significantprogress in embodied AI simulators that aim to faithfullyreplicate the physical world. These simulated worlds serveas virtual testbeds to train and test embodied AI frameworksbefore deploying them into the real world. These embodied AIsimulators also facilitate the collection of task-based dataset[2], [3] which are tedious to collect in real-world as it requiresan extensive amount of manual labor to replicate the samesetting as in the virtual world. While there have been several To address the scarcity of contemporary comprehensivesurvey papers on this emerging field of embodied AI, wepropose this survey paper on the field of embodied AI, from itssimulators to research tasks. This paper covers the followingnine embodied AI simulators that were developed over the pastfour years: DeepMind Lab [12], AI2-THOR [13], CHALET[14], VirtualHome [15], VRKitchen [16], Habitat-Sim [17],iGibson [18], SAPIEN [19], and ThreeDWorld [20]. The cho-sen simulators are designed for general-purpose intelligencetasks, unlike game simulators [21] which are only used fortraining reinforcement learning agents. These embodied AI Index Terms—Embodied AI, Computer Vision, 3D Simulators. I. INTRODUCTION RECENT advances in deep learning, reinforcement learn-ing, computer graphics and robotics have garnered grow- arXiv:2103.04918v8 [cs.AI] 5 Jan 2022ing interest in developing general-purpose AI systems. As aresult, there has been a shift from “internet AI” that focuses onlearning from datasets of images, videos and text curated fromthe internet, towards “embodied AI” which enables artificialagents to learn through interactions with their surroundingenvironments. Embodied AI is the belief that true intelligencecan emerge from the interactions of an agent with its envi- ronment [1]. But for now, embodied AI is about incorporating Embodied AI simulators have given rise to a series ofpotential embodied AI research tasks, such asvisual explo-ration,visual navigationandembodied QA. We will focus onthese three tasks since most existing papers [11], [22], [23]in embodied AI either focus on these tasks or make use ofmodules introduced for these tasks to build models for morecomplex tasks like audio-visual navigation. These three tasksare also connected in increasing complexity. Visual explorationis a very useful component in visual navigation [22], [24] and This work was supported by the Agency for Science, Technology and Re-search (A*STAR), Singapore under its AME Programmatic Funding Scheme(Award #A18A2b0046) and the National Research Foundation, Singaporeunder its NRFISF Joint Call (Award NRF2015-NRF-ISF001-2541).J. Duan was with the Nanyang Technological University of Singapore,School of Electrical and Electronics Engineering, Singapore 639798, Singa-pore (e-mail: duan0038@e.ntu.edu.sg).S. Yu was with the Singapore Universityof Technology and Design,Singapore 487372, Singapore (e-mail: samsonyu@sutd.edu.sg).H.L. Tan, H. Zhu, and C. Tan are with the Institute for Infocomm Research,A*STAR, Singapore 138632, Singapore (e-mail:{duan jiafei, hltan, zhuh,cheston-tan}@i2r.a-star.edu.sg).Manuscriptaccepted December 4,2021,IEEE-TETCI.©2021 IEEE.Personaluse of this material is permitted.Permission from IEEE mustbe obtained for all other uses, in any current or future media, includingreprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists,or reuse of any copyrighted component of this work in other works. from 3D assets, while world-based scenes are constructedfrom real-world scans of the objects and the environment. A3D environment constructed entirely out of 3D assets oftenhas built-in physics features and object classes that are well-segmented when compared to a 3D mesh of an environmentmade from real-world scanning. The clear object segmentationfor the 3D assets makes it easy to model them as articulatedobjects with movable joi

点击免费查看完整报告

你可能感兴趣

2025嵌入式AI调查：从模拟器到研究任务

你可能感兴趣

AI脉动调查——第三卷：从自动化到自主化：AI代理的能力与复杂性

家电行业周报：从油烟机到新兴嵌入式厨大电，行业实现无缝延伸

嵌入式Linux：从原型到生产

计算机行业嵌入式公司的进阶之路：从人力外包到自有IP

2025年计算机行业年度策略：从Infra到Agent，AI创新的无尽前沿