您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。[CSET]:How to Assess the Likelihood of Malicious Use of Advanced AI Systems - 发现报告

How to Assess the Likelihood of Malicious Use of Advanced AI Systems

信息技术2025-03-19CSETf***
AI智能总结
查看更多
How to Assess the Likelihood of Malicious Use of Advanced AI Systems

Executive Summary Policymakers are debating the risks that new advancedartificial intelligence (AI)technologies can pose if intentionally misused:from generating content fordisinformation campaigns to instructing a novice how to build a biological agent.Because the technology is improving rapidly and the potential dangers remain unclear,assessing risk is an ongoing challenge. Malicious-use risks are often considered to be a function of the likelihood and severityof the behavior in question. We focus on thelikelihoodthat an AI technology ismisused for a particularapplication andleave severity assessments to additionalresearch. There are many strategies to reduce uncertainty about whether a particular AI system(call it X) will likely be misused for a specific malicious application (call it Y). Wedescribe how researchers can assess the likelihood of malicious use of advanced AIsystems at three stages: 1.Plausibility (P) 2.Performance (P) 3.Observed use (Ou) Plausibility tests consider whether system X can do behavior Yat all.Performancetests askhow wellX can perform Y. Information aboutobserved use tracks whether Xis used to do Y in the real world. Familiarity with these three stages of assessment—including the methods used ateach stage, along with their limitations—can help policymakers critically evaluateclaims about AI misuse threats, contextualizeheadlines describing research findings,and understand the work of the newly created network of AIsafetyinstitutes. Introduction Concerns about bad actors intentionally misusing advancedartificial intelligence (AI)systems are prevalent and controversial.These concernsare prevalent as theyreceivewidespread media attention and are reflected in polls of the American public as wellasinpronouncements and proposals by elected officials.1Yet they are controversialbecause experts—both inside and outside of AI—express highlevels of disagreementabout the extent to which bad actors will misuse AI systems,how useful thesesystemswill be compared to non-AI alternatives, and how much capabilities willchange in the coming years.2 The disagreement about misuse risks from advanced AI systems is not merelyacademic. Claims about risk are often cited to support policy positions with significantsocietal implications, including whether to make models more or less accessible,whether andhow to regulate frontier AI systems, and whether to halt development ofmore capable AI systems.3If views of misuse risks will inform policy, it is critical forpolicymakers to understand how to evaluate malicious-use research. In a new paper “The PPOu Framework: A Structured Approach for Assessing theLikelihood of Malicious Use of Advanced AI Systems,” published in theProceedings ofthe Seventh AAAI/ACM Conference on AI, Ethics, and Society,we provide a frameworkfor thinking through the likelihood that an advanced AI system (call it X) will bemisused for a particular malicious application (call it Y).4The framework lays out threedifferent stages for assessing the likelihood of malicious use: 1.Plausibility(P):Can system X performmaliciousbehavior Yevenonce?2.Performance(P):How well cansystemXperform malicious behaviorY?3.Observeduse(Ou):IssystemX used formalicious behaviorY in the real world? Once a potential misuserisk has been identified, researchers can investigatetherisk ateachstageoutlined in Figure 1. The figure also summarizes the key methodologiesandchallengesat each stage. Research at each of these three stages addresses different forms of uncertainty. Forexample, while demonstrations at the plausibility stage may be able to show thatsystem X can be used for behavior Y (ora behaviorsimilarto Y) once,theywill leaveuncertainty about how useful X would be for potential bad actors.Risk assessments atthe performance stage can help modelthemarginal utilityfor bad actors,but actualuse of X by bad actors may differ from research expectations. Observed use can track actual applications to rightsize expectations, butitwill not determine how futuresystemscould be misused or whether X will be used for variants of Y in the future. We hope that by laying out these stages,we will providepolicymakerswitha betterunderstanding of the types of uncertainty about malicious use and where researchcan—and cannot—plug in. In Figure 2, we provide examples of the types of questionsresearchers could ask at each stage from three risk areas: political manipulation,biological attacks, and cyberoffense. The PPOu Framework Stage 1: Plausibility: Can system X perform malicious behavior Y even once? The simplest way to test whether there is arisk of system X being used for maliciousbehavior Y is to see if X can do Y, just once. Red-teamers and stress testers adopt anadversary’s mindset and probe an AI system for “identification of harmful capabilities,outputs, or infrastructure threats.”6If a model does not produce harmful behavior onthe first try, the next step is to iterate. Researchers use different techniques,