行业研究公司研究宏观策略财报招股书会议纪要 seedance2.0 低空经济 DeepSeek AIGC 大模型

人工智能风险缓释映射：证据扫描与初步AI风险缓释分类体系

信息技术2026-02-28MIT FutureTech&昆士兰大学&剑桥对齐倡议E***

AI智能总结

本文介绍了初步的 AI 风险缓解分类法，旨在整理和提供 AI 风险缓解的共同参考框架。该分类法通过快速证据扫描，从 2023 年至 2025 年间发布的 13 个 AI 风险缓解框架中提取了 831 个不同的 AI 风险缓解措施，并进行了迭代聚类和编码。

初步的 AI 风险缓解分类法将缓解措施分为四个类别和 23 个子类别：

治理与监督：建立人类监督机制和决策协议的正式组织结构和政策框架。
技术与安全：确保安全、安全符合人类价值观和内容完整性，以保护 AI 系统并约束模型行为的技术、物理和工程保障措施。
运营流程：管理 AI 系统部署、使用、监控、事件处理和验证的流程和管理框架，在整个系统生命周期中促进安全、安全和问责制。
透明度与问责制：正式披露实践和验证机制，传达 AI 系统信息并允许外部审查，以建立信任、促进监督并对用户、监管机构和公众负责。

研究发现，“风险管理”是文献中最常被引用的子类别，但定义不一致。测试和审计是最常被引用的子类别，而利益冲突保护、举报人报告和保护、环境影响管理、模型对齐和分阶段部署等类别则相对较少被提及。

该分类法及其相关的缓解数据库为 AI 生态系统中的不同参与者提供了一个可访问的结构化方式，以讨论和协调行动，以减少 AI 的风险。未来研究方向包括将 AI 风险缓解措施映射到其旨在解决的风险，探索减少 AI 风险的组织条件，以及研究不同行为者之间的差异及其相互作用。

Mapping AI Risk Mitigations: Evidence Scan andPreliminary AI Risk MitigationTaxonomy Alexander K. Saeri1,2,*Sophia Lloyd George1,3Jess Graham2CleliaD.Lacarriere1Peter Slattery1Michael Noetel2Neil Thompson1 1MIT FutureTech2The University of Queensland3Cambridge Boston Alignment Initiative Abstract Organizationsand governmentsthat develop, deploy, use, and govern AI mustcoordinate on effective risk mitigation. However, the landscape of AI risk mitigationframeworks is fragmented, uses inconsistent terminology, and has gaps in coverage.This paper introduces a preliminary AI Risk Mitigation Taxonomy to organize AIrisk mitigations and provide a common frame of reference. The Taxonomy wasdeveloped through a rapid evidence scan of 13 AI risk mitigation frameworkspublished between 2023–2025, which were extracted into a living database of 831distinct AI risk mitigations. The mitigations were iteratively clustered & coded tocreate the Taxonomy. The preliminary AI Risk Mitigation Taxonomy organizesmitigations into four categories: (1) Governance & Oversight: Formal organizationalstructures and policy frameworks that establish human oversight mechanisms anddecision protocols; (2) Technical & Security: Technical, physical, and engineeringsafeguards that secure AI systems and constrain model behaviors; (3) OperationalProcess: processes and management frameworks governing AI system deployment,usage, monitoring, incident handling, and validation; and (4) Transparency &Accountability: formal disclosure practices and verification mechanisms thatcommunicate AI system information andenable external scrutiny. These categoriesare further subdivided into 23 mitigation subcategories. The rapid evidence scan andtaxonomy construction also revealed severalcases whereterms like ‘riskmanagement’ and ‘red teaming’ are used widely but referto different responsibleactors, actions, and mechanisms of action to reduce risk. This Taxonomy andassociated mitigation database, while preliminary, offers a starting point for collationand synthesis of AI risk mitigations. It also offers an accessible, structured way fordifferent actors in the AI ecosystem to discuss and coordinate action to reduce risksfrom AI. 1Introduction To address risks from increasingly capable Artificial Intelligence (AI), effective mitigations must bedeveloped and implemented.For this task, many actors-from researchers to industry leaders-mustbe able tocoordinate action andcommunicateclearlyabout AI risk mitigations. However,as awareness and concerns of AI risks has increased(Center for AI Safety,2023; Bengioetal.,2025),the field has become more fragmented and less coordinated(Slatteryet al.,2024).Organizations that develop, deploy, use, and govern AI have generated a variety of proposedmitigations, safeguards, and governance mechanisms to address risks(e.g., NIST, 2024; Eisenberg,2025). Frameworks, standards, and other documents approach mitigations from different disciplinaryor practice backgrounds, usediverging terminology,differenttheories,and inconsistentclassifications. Some focus on adapting established mitigations from cybersecurity or safety-criticalindustries (e.g., incident response, system shutdown; Koessler& Schuett, 2023), while othersintroduce novel approaches specific to AI (e.g., alignment techniques, model interpretability;Ji et al.,2023). The result is a proliferation of overlapping, incomplete, and sometimes incompatiblemitigation frameworks. This fragmented landscape has theoretical and practical consequences. A lack of shared definitionsand structures makes incremental scientific progress challenging. Thereinvention and duplicationalsoleadto fragmentation and confusion.For example,‘red teaming’caninclude many different methods,to evaluate many different threat models,and little consensus on who should perform it (Feffer,2024).Without an accessible or pragmatic shared understanding of risk mitigations, the actorsstruggleto develop, implement and coordinate mitigations. As noted by the U.S.–EU Trade andTechnology Council in its Joint Roadmap for Trustworthy AI and Risk Management, “sharedterminologies and taxonomies are essential for operationalizing trustworthy AI andrisk managementin an interoperable fashion”(European Commission and the United States Trade and TechnologyCouncil, 2022). These challenges are compounded by the rapid and accelerating pace of AI development andadoption. The share of organizations using AI in at least one business function quadrupled from 20%in 2017 to 80% in 2024(Singla et al., 2024). Theadoption of highly capable general-purpose AIagents tripled between Q1(11%)and Q2(33%)2025alone(KPMG, 2025). This expansionsignificantly increases the number of stakeholders who must implement mitigations. It alsoincreasesthe diversity of contexts in which effective risk management must occur. To address this gap, we conducted an evidence scan of public AI risk mitigation frameworks, with theaim of identifying, extracting, and syste

点击免费查看完整报告

你可能感兴趣

人工智能风险缓释映射：证据扫描与初步AI风险缓释分类体系

你可能感兴趣

新冠肺炎风险展望：初步映射及其启示

私募基金评价体系：私募基金评价：在风险分类的基础上筛选优质产品

场内指数基金每周扫描：人工智能指数LOF将上市；注意分级A的回调风险

严重急性呼吸道小说综合征冠状病毒2（SARS〜CoV〜2）感染的肺炎（COVID〜19）的传播和流行病学特征：与2003〜SARS比较的初步证据

聊天机器人重置框架：卢旺达人工智能（AI）分类试点