行业研究公司研究宏观策略财报招股书会议纪要 seedance2.0 低空经济 DeepSeek AIGC 大模型

生成式增强现实：范式、技术与未来应用

信息技术 2025-11-20 香港科技大学&南洋理工大学&XMax.AI 大王雪

生成式增强现实（GAR）

概念

增强现实（AR）旨在将数字内容与用户感知和行动的真实环境相结合。传统AR通过叠加虚拟对象实现增强，但存在内容保真度、交互精度和自然响应性方面的限制。生成式增强现实（GAR）将增强视为世界重新合成的过程，而非传统AR引擎的世界组合。GAR使用统一的生成主干，将环境感知、虚拟内容和交互信号共同编码为连续视频生成的条件输入。

传统AR架构与技术栈

传统AR系统由四个紧密耦合的子系统组成：

跟踪和世界理解：执行环境跟踪和空间理解，包括姿态估计、几何重建、光照估计和语义理解。
场景管理和锚定：定义虚拟内容如何与真实世界对齐，包括坐标系和锚点，以及语义分割和物体识别。
渲染和资产处理：处理虚拟元素与相机馈路的组合，管理光照、阴影和遮挡的 photometric 一致性。
交互和输入控制：管理用户输入、事件调度和多模态反馈，包括手势、语音、注视和触摸。

生成式增强现实（GAR）范式

GAR用统一的生成主干取代了传统AR引擎的多阶段模块，将环境感知、虚拟内容和交互信号共同编码为连续视频生成的条件输入。GAR通过嵌入物理推理和渲染到统一的潜在过程中，保留了AR循环的感知连续性，同时从根本上改变了资产、计算和控制信号的表示和处理方式。

GAR的技术基础

GAR基于先进的视频生成基础模型，但引入了独特的生成范式，带来了两个主要技术挑战：

计算效率：需要实现高效的流式视频生成，以满足实时交互的需求。
内容质量：需要在开放和动态的环境中生成时间一致的视频序列，这对当前基础视频模型的能力提出了挑战。

视频生成基础

变分和对抗潜在模型：早期生成式视频系统基于潜在变量模型，如变分自动编码器（VAE）和生成对抗网络（GAN）。
扩散和流匹配模型：扩散模型将生成学习重新解释为逆转数据中高斯噪声的去噪过程，而流匹配模型则通过确定性传输方程提供扩散的连续时间重新表述。

计算效率优化

自回归模型：通过沿着时间维度进行自回归建模来实现流式视频生成。
单步生成器：通过修改模型训练过程或对预训练生成模型进行蒸馏来实现单步生成。
特征/计算重用：通过重用局部视觉特征或中间计算结果来提高效率。

无限长视频生成和质量优化

长期视频质量优化：需要解决无限长视频生成中累积误差的问题。
上下文压缩和稀疏化：通过压缩或选择性地路由上下文信息来处理计算问题。

多模态交互控制

相机控制：将用户的头部姿势和注视方向作为条件输入，以生成正确的相机视图内容。
结构控制：将显示的结构信息（如深度或骨骼）纳入生成模型，以产生遵循给定结构约束的对象。
拖动控制：通过用户拖动操作生成运动轨迹，引导内容沿指定路径移动。
音频控制：使视频生成模型能够生成与音频内容一致的内容。

场景和资产管理

显式管理：通过显式建模场景和资产来确保场景和资产一致性。
隐式管理：通过隐式表示资产和场景来更灵活地绕过创作障碍，并在无需预定义建模的情况下实现生成。

GAR的应用前景

GAR有望重塑空间体验，体现在创造力、自适应故事讲述和协作世界方面。

AR应用领域：工具、商业、文化、生活和游戏。
新兴应用空间：教育、学习、情境生成、自适应中介和协同创作合成。
体验、能动性和生态转变：从观察转变为共存，从交互转变为协同行动，从系统使用转变为混合环境。

结论

GAR通过将增强视为一个不断学习并通过对持续推理进行创作的动态过程，为下一代AR开辟了道路。GAR不仅扩展了AR的功能，而且开始重塑体验的本质，改变用户如何感知、行动和共存增强环境。

Chen Liang1,4,Jiawen Zheng1,Yufeng Zeng1,Yi Tan3,Hengye Lyu1,Yuhui Zheng1,Zisu Li2,Yueting Weng4,Jiaxin Shi4 andHanwang Zhang1The Hong Kong University of Science and Technology (Guangzhou)2The Hong Kong University of Science and Technology3Nanyang Technological University4XMax.AI Ltd.Contact: chenliang2@hkust-gz.edu.cn, jiaxin@xmax.ai, hanwangzhang@ntu.edu.sgAbstract pipelines. This structure makes it difficult to synthesize high-fidelity interactions, such as fluid material behaviors, com-plex mechanical dynamics, and even the responsiveness ofliving creatures.Scaling toward broader expressive spacesoften increases authoring burden and system fragility: pro-ducing high-fidelity 3D assets demands substantial manual This paper introducesGenerative Augmented Re-ality(GAR) as a next-generation paradigm thatreframes augmentation as a process ofworld re-synthesisrather than world composition by a con-ventional AR engine.GAR replaces the conven-tional AR engine’s multi-stage modules with aunified generative backbone, where environmen-tal sensing, virtual content, and interaction signalsare jointly encoded as conditioning inputs for con-tinuous video generation.We formalize the com-putational correspondence between AR and GAR,survey the technical foundations that make real- In parallel, the rapid advancement of generative models,particularly in diffusion-based video generation models [Ho et al., 2022; Konget al., 2024], has introduced a fundamen-tally different way of constructing visual experience. Thesemodels are capable of producing temporally coherent, se- mantically grounded videos of and beyond both the physicaland the imaginary world contents from high-level conditionssuch as textual intent [Luoet al., 2023], motion cues [Baiet al., 2025], reference frames [Hu, 2024], or behavioral sig-nals [Guoet al., 2025]. Rather than treating scenes as fixedbackdrops for augmentation, generative video models repre-sent reality as a learnable, extendable process, where physi- 1Introduction Augmented Reality (AR) emerged as a response to the long-standing goal of blending digital content with physical en-vironments grounded in users’ real-world perception and ac-tion. Early formulations, such as Thomas and David [1992]’swork on overlaying digital instructions for aircraft assemblyand Milgram and Kishino [1994]’s Reality–Virtuality contin-uum, situated AR as an intermediate blend between virtualreality and physical reality.As advances in sensing, spa-tial tracking, and real-time rendering [Azuma, 1997a] madearXiv:2511.16783v1 [cs.HC] 20 Nov 2025 This paper presents a forward-looking conceptual and tech-nical survey of Generative Augmented Reality as a computa-tional framework for next-generation spatial computing. Our • We formalize the computational transition from compo-sitional AR pipelines to generative world re-synthesis,providing a comparative formulation of their perceptualgrounding, control flow, and asset management, and ren-dering mechanisms.• We survey the enabling technologies underlying GAR,including streaming video generation models, compu- However, as technological progress elevates expectationsfor content fidelity, interaction precision, and naturalistic re-sponsiveness in AR, the compositional paradigm underlyingconventional AR architectures reveals inherent constraints. building, and mixed-reality ecosystems. of real and virtual content, 2) real-time interactivity, and 3)accurate three-dimensional registration. Building on these principles, Craig [2013] and Billinghurstet al.[2015] summarized AR as a multidisciplinary syn-thesis of computer vision, graphics, sensing, and interacti-ton—designed to enable spatial coherence between the phys-ical and virtual worlds.These frameworks define AR as a 2Generative Augmented Reality: The NextGeneration of Spatial Computing and In this section, we present the paradigm of Generative Aug-mented Reality (GAR) in the context of the rapid develop-ment of generative video models. GAR rethinks the pathwaysto achieve augmentation of reality, representing a shift in the With recent works, Mendoza-Ram´ırezet al.[2023] high-light advances in semantic anchoring and adaptive contextmodeling that extend AR beyond geometric registration,while Audaet al.[2023] frame AR within cross-reality sys-tems emphasizing embodied and context-driven interaction.Together, these recent perspectives expand foundational def- To ground this paradigm, we first revisit the fundamen-tals of traditional augmented reality (AR), outline its tech-nology stack and implementation hierarchy, and then explainhow GAR transforms this architecture into a model-driven 2.2Traditional Augmented Reality Architectureand Technical Stacks 2.1Concept of Augmented Reality The conceptual basis of AR was first formalized by Mil-gramet al.[1995] through the Reality–Virtuality Continuum,which positioned AR within a spectrum ranging from purelyphysical to fully virtual environments. Later, Azuma [1997a] Fo

点击免费查看完整报告

你可能感兴趣

生成式增强现实：范式、技术与未来应用

生成式增强现实（GAR）

概念

传统AR架构与技术栈

生成式增强现实（GAR）范式

GAR的技术基础

视频生成基础

计算效率优化

无限长视频生成和质量优化

多模态交互控制

场景和资产管理

GAR的应用前景

结论

你可能感兴趣

内容营销的未来-探索生成式AI技术的应用

生成式AI与未来工作：增强还是自动化？

智驱未来云网随行打造AI应用开发与交付网络架构新范式

2025未来就绪医疗调查报告：生成式AI在平衡当下需求与未来愿景中的应用

2025面向未来的医疗调研报告：生成式AI在平衡当下需求与未来愿景中的应用

生成式增强现实：范式、技术与未来应用

你可能感兴趣

内容营销的未来-探索生成式AI技术的应用

生成式AI与未来工作：增强还是自动化？

智驱未来 云网随行 打造AI应用开发与交付网络架构新范式

2025未来就绪医疗调查报告：生成式AI在平衡当下需求与未来愿景中的应用

2025面向未来的医疗调研报告：生成式AI在平衡当下需求与未来愿景中的应用

智驱未来云网随行打造AI应用开发与交付网络架构新范式