AI智能总结
《⼈格选择模型:为什么AI助⼿可能会表现得像⼈类》 SamMarks,JackLindsey,ChristopherOlah SamMarks、JackLindsey、ChristopherOlah charactersduringpre-training,andpost-trainingelicitsandrefinesaparticularsuchAssistantpersona.InteractionswithanAIassistantarethenwell-understoodasbeinginteractionswiththeAssistant—somethingroughlylikeacharacterinanLLM-generatedstory.Wesurveyempiricalbehavioral,generalization,andinterpretability-basedevidenceforPSM.PSMhasconsequencesforAIdevelopment,suchasrecommendinganthropomorphicreasoningaboutAIpsychologyandintroductionofpositiveAIarchetypesintotrainingdata.AnimportantopenquestionishowexhaustivePSMis,especiallywhethertheremightbesourcesofagencyexternaltotheAssistantpersona,andhowthismightchangeinthefuture. 我们描述了⼈格选择模型(PSM):这⼀观点认为,LLMs在预训练期间学会模拟多样化的⼈物,⽽在后训练阶段通过诱导和精炼形成某⼀特定的助⼿⼈格。与AI助⼿的交互因此可以被很好地理解为与该助⼿⸺⼤致类似于LLM⽣成故事中的⼀个⻆⾊⸺的交互。我们综述了基于⾏为、泛化和可解释性证据对PSM的实证⽀持。PSM对⼈⼯智能发展具有若⼲后果,例如建议采⽤拟⼈化的⽅式来推断AI⼼理,并在训练数据中引⼊正⾯的AI原型。⼀个重要的未解问题是PSM的穷尽性,特别是是否存在外于助⼿⼈格之外的能动性来源,以及这在未来可能如何变化。 Introduction引⾔ WhatsortofthingisamodernAIassistant?Oneperspectiveholdsthattheyareshallow,rigidsystemsthatnarrowlypattern-matchuserinputstotrainingdata.AnotherperspectiveregardsAIsystemsasaliencreatureswithlearnedgoals,behaviors,andpatternsofthoughtthatarefundamentallyinscrutabletous.AthirdoptionistoanthropomorphizeAIsandregardthemassomethinglikeadigitalhuman.DevelopinggoodmentalmodelsforAIsystemsisimportantforpredictingandcontrollingtheirbehaviors.IfourgoalistomakeAIassistantsthatareusefulandalignedwithhumanvalues,therightapproachwilldifferquiteabitifwearedealingwithinflexiblecomputerprograms,aliens,ordigitalhumans. 现代AI助⼿到底是什么样的存在?⼀种观点认为它们是浅薄、僵化的系统,狭窄地将⽤户输⼊与训练数据进⾏模式匹配。另⼀种观点则将AI系统视为拥有学习到的⽬标、⾏为和思维模式的外来⽣物,其本质对我们来说难以理解。第三种选择是拟⼈化AI,把它们看作类似数字⼈的存在。为AI系统建⽴良好的⼼理模型对于预测和控制其⾏为很重要。如果我们的⽬标是制造有⽤且与⼈类价值观⼀致的AI助⼿,那么当我们⾯对的是不灵活的计算机程序、外来⽣物或数字⼈时,正确的⽅法将有很⼤差异。 unintuitive.Afterall,theneuralarchitecturesofmodernlargelanguagemodels(LLMs)areverydifferentfromhumanbrains,andLLMtrainingisquiteunlikebiologicalevolutionorhumanlearning.Thatsaid,inourexperience,AIassistantslikeClaudeareshockinglyhuman-like.Forexample,theyoftenappeartoexpressemotions—likefrustrationwhenstrugglingwithatask—despitenoexplicittrainingtodoso.And,asweʼlldiscuss,weobservedeeperformsofhuman-like-nessinhowtheygeneralizefromtheirtrainingdataandinternallyrepresenttheirownbehaviors. 在这些观点中,第三种⸺即⼈⼯智能系统像数字⼈类⸺可能看起来最不直观。毕竟,现代⼤型语⾔模型(LLMs)的神经结构与⼈类⼤脑⼤相径庭,LLM的训练也与⽣物进化或⼈类学习⼤不相同。尽管如此,根据我们的经验,像Claude这样的AI助⼿却令⼈震惊地像⼈类。例如,它们经常表现出情感⸺⽐如在处理某项任务时表现出挫败感⸺尽管并没有明确的训练去这样做。⽽且,正如我们将要讨论的那样,我们在它们如何从训练数据中进⾏泛化以及如何在内部表征⾃身⾏为⽅⾯,观察到了更深层的⼈类特征。 Inthispost,weshareamentalmodelwehavefoundusefulforunderstandingAIassistantsandpredictingtheirbehaviors.Underthismodel,LLMsarebestthoughtofasactorsorauthorscapableofsimulatingavastrepertoireofcharacters,andtheAIassistantthatusersinteractwithisonesuchcharacter.Inmoredetail,thismodel,whichwecallthepersonaselectionmodel(PSM),statesthat: 在本⽂中,我们分享了⼀个我们发现对理解AI助⼿并预测其⾏为有⽤的⼼理模型。根据该模型,LLMs最适合被视为能够模拟⼤量⻆⾊的演员或作者,⽽⽤户交互的AI助⼿就是其中的⼀个⻆⾊。更详细地说,我们称之为⼈格选择模型(personaselectionmodel,PSM),其内容如下: 1.Duringpre-training,LLMslearntobepredictivemodelsthatarecapableofsimulatingdiversepersonasbasedonentitiesappearingintrainingdata:realhumans,fictionalcharacters,realandfictionalAIsystems,etc. 在预训练阶段,LLMs学会作为预测模型,能够基于训练数据中出现的实体模拟多样的⼈格:真实⼈类、虚构⻆⾊、真实和虚构的AI系统等。 2.Post-trainingrefinestheLLMʼsmodelofacertainpersonawhichwecalltheAssistant.WhenusersinteractwithanAIassistant,theyareprimarilyinteractingwiththisAssistantpersona. 后训练阶段细化了LLM对某种我们称之为Assistant的⼈格模型。当⽤户与AI助⼿交互时,他们主要是在与这个Assistant⼈格进⾏交互。 ThebehavioroftheresultingAIassistantcanthenbeunderstoodlargelyviathetraitsoftheAssistantpersona.Thisgeneralideaisnotuniquetous.Ourgoalinthispostistoarticulateandnametheidea,discussempiricalevidenceforit,andreflectonitsconsequencesforAIdevelopment. 由此产⽣的AI助⼿的⾏为在很⼤程度上可以通过该助⼿⼈格(Assistantpersona)的特征来理解。这个总体思路并⾮我们独创。我们在本⽂的⽬标是阐明并命名这⼀思想,讨论其经验证据,并反思其对AI开发的影响。 Intheremainderofthispost,wewill: 在本⽂其余部分,我们将: Describethepersonaselectionmodel(PSM)andsupportingevidence.Forinstance,wearguethatPSMprovidesanexplanationforvarioussurprisingresultsinthegeneralizationandinterpretabilityliteratures. 描述⼈格选择模型(PSM)及其⽀持证据。例如,我们认为PSM为通⽤化和可解释性⽂献中的各种令⼈惊讶的结果提供了解释。 ReflectontheconsequencesofPSMforAIdevelopment.InsofarasPSMisagoodmodelofAIassistantbehavior,ithassomesurprisingconsequences.Forinstance,PSMrecommendsanthropomorphicreasoningaboutAIassistantsandintroductionofdatatopre-trainingrepresentingpositiveAIarchetypes. 反思PSM对AI开发的后果。在PSM作为AI助理⾏为良好模型的前提下,它带来了⼀些令⼈惊讶的后果。例如,PSM建议对AI助理采⽤拟⼈化推理,并在预训练中引⼊代表积极AI原型的数据。 AskhowexhaustivePSMisasamodelofAIassistantbehavior.DoesunderstandingtheAssistantpersonatelluseverythingweʼdliketoknow?Wesketchoutaspectrumofviewsonthesequestions,rangingfromthepopular“maskedshoggoth”—wherean“outeragent”canpuppettheAssistanttowardsitsownends—toanoppositeperspectivewherethepost-trainedLLMislikeaneutraloperatingsystemrunningasimulationthattheAssistantliveswithin.WealsodiscusssomerelevantempiricalobservationsandconceptualreasonsthatPSMmayormaynotbeexha