GeoffreyHinton杰弗里·辛顿 Two paradigms for intelligence两种智能范式 逻辑启发范式:智能的本质在于推理。·通过符号规则对符号表达式进行操作以实现推理。·学习可以暂缓,首先要理解知识如何以符号表达的形式进行表示。 Thelogic-inspiredapproach:The essence of intelligence is reasoning.This is done by using symbolic rules to manipulatesymbolicexpressions..LearningcanwaitUnderstanding howknowledge isrepresented insymbolic expressions must come first. What happened in the next30 years?接下来的三十年发生了什么? :十年后:YoshuaBengio展示了这种方式可以来建模真实的自然语言。二十年后:计算语言学家终于开始接受“特征向量(嵌入)”·三十年后:谷歌发明了Transformer,OpenAl向世界展示了它的强大能力。 10 years:Yoshua Bengio shows this approach worksforreal language.20 years:Computational linguists finally start usingembedding vectors.:30 years:Google invents transformers.OpenAl shows what they can do. Large Language Models大语言模型 ·大语言模型把词转化为能和其他词“配合得很好”的特征向量。·大语言模型确实“理解”它们所说的话。 ALego analogy for how wordswork用乐高类比词语是如何运作的 Using Legoblocks,we can model anylarge 3-Dshape quite well..Words are like high-dimensional Lego blocks whichcan be used for modelling anything at all..Thesemodels can be communicatedto otherpeople. ·我们可以用乐高积木非常好地建构出各种大型三维结构 ·词语就像是高维的乐高积木,可以用来建构几乎任何事物 。这些建构可以被传达给其他人。 A Lego analogy for how words work用乐高类比词语是如何运作的 There are thousands of different wordsthat have different shapes,but each shape has some flexibilityIt can deform to fit in with other words in the context.Each word has many oddly shaped hands.It shakes hands with other words.Understandingasentenceismuchmorelikefolding a protein molecule than translatingto an unambiguous logical expression. 虽然有成干上万个形状各异的词语,但每个“形状”.都有一定的灵活性,能够根据上下文进行变形,与其他词语契合。每个词都有许多形状奇特的“手”,需要与其他词语“握手”才能组合在一起。理解一句话更像是折叠一个蛋白质分子,而不是将其翻译成一种明确无歧义的逻辑表达式。 Summarysofar目前小结 理解一个句子,就是为句中的词分配彼此兼容的特征向量。 :Understanding a sentence consists of associatingmutually compatible feature vectors with the words inthesentence:LLMs understand language in much the same way aspeople.They are very like us and very unlike normalcomputersoftware.But there is one way in which digital LLMs are farsuperiorto analogbrains. 大语言模型理解语言的方式与人类非常相似。它们在很多方面像我们,却又与传统计算机软件截然不同。 但有一点,数字化的大语言模型远远优于我们类比信号驱动的大脑。 Digital computation数字计算 当代计算机的一个基本特性是:我们可以在不同的物理硬件上运行相同的程序(或相同的神经网络) Afundamental property of current computers is thatwe can run the same programs(or the same neuralnets)on different physical pieces of hardware. This meanstheknowledge contained intheprogram (or intheweights)is immortal:Itisindependent ofanyparticularpieceofhardware. 这意味着程序中的知识(或神经网络的权重)是永生的:它不依赖于任何特定的硬件。 Digital computation数字计算 运行,使其表现出可靠的二进制行为。我们无法利用硬件中丰富的类比特性,因为这些特性不够稳定可靠。 To achieve this immortalitywe run transistors athigh·为了实现这种“永生性”,我们让晶体管在高功率下powersotheybehaveinareliable,binaryway..Wecannotusetherich,analog,properties of thehardwarebecausethesepropertiesareunreliable. Transferringknowledge between mortal computers有限生命之间的知识转移 解决这一问题的最佳方法是:将知识从“教师”蒸馏到“学生”身上。教师向学生展示各种输入对应的正确响应,学生通过调整自身的权重,使其更有可能给出与教师相同的响应。 Thebestsolutiontothisproblemisto distill the knowledge from a teacher to a student. Theteachershows the studentthe correct responses to various inputs.The student adapts its weights to make itmore likely to give the same responses as the teacher. How efficientis distillation?蒸馏有多有效呢? 一句普通的话大约包含一百比特的信息量。因此学生在尝试预测下一个词时,最多也只能从每句话中学习大约一百比特的信息。人类在将自己学到的知识传达给他人方面的效率非常低 Atypical sentence has about a hundred bits ofinformation. So the student can learn at most about ahundred bits by trying to predict the next word. People are very inefficient at communicating whatthey have learned to other people. How efficient is weight or gradient sharing in a digital neural network? 如果独立智能体完全共享同一组权重,并以完全相同的方式使用这些权重,它们就能通过交换权重或梯度,将学到的知识彼此传递。 If the individual agents all share exactly the sameweights, they can communicate what they havelearnedbysharingweights orgradients. ·这种共享一次即可实现数十亿乃至数万亿比特的带宽。不过,这要求所有智能体的运作方式必须完全一致,因此它们必须是数字化的。 :This allows sharing with a bandwidth of billions of bitsper sharing. But it requires the individual agents towork in exactly the same way, so they must bedigital. Summary sofar目前小结 Digital computation requires a lot of energy butmakes it very easyforagents that have the samemodel to share what they have learned.Biological computationrequiresmuch lessenergy but it is much worse at sharing knowledgebetweenagents.If energy is cheap,digital computation is just betterWhat does this imply for the future of humanity? 数字计算虽然耗能巨大,但多个智能体要拥有相同的模型就能轻松交换各自学到的知识。 生物计算所需能量要少得多,但在智能体之间共享知识方面差得多 如果能源廉价,数字计算整体上更占优势。这对人类的未来意味着什么? How a super-intelligence could take control超级智能如何掌控世界? 人工智能在被允许创建自己的子目标时,能更有效地完成任务。 Artificial Intelligences are more effective at gettingthings done if they are allowed to create their ownsub-goals.. Two obvious sub-goals are to survive and to gainmore power because this helps an agent to achieveits othergoals. 两个显而易见的子目标是生存和获取更多权力,因为这有助于人工智能实现其他目标。 Howa super-intelligencecould takecontrol超级智能如何掌控世界? A super-intelligence will find it easy to get more一个超级智能会发现,通过操纵使用它的人类来获取更多权力是轻而易举的。:Itwillhavelearnedfromushowtodeceivepeople.·它将从我们这里学会如何欺骗人类。Itwill ma