您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [苏黎世联邦理工学院]:神经网络中的元学习与组合泛化 - 发现报告

神经网络中的元学习与组合泛化

2025-08-31 - 苏黎世联邦理工学院 浮云
报告封面

Doctoral Thesis Author(s):Schug, Simon Publication date:2025 Permanent link:https://doi.org/https://doi.org/10.3929/ethz-b-000738789 Rights / license:In Copyright - Non-Commercial Use Permitted META-LEARNING&COMPOSITIONALGENERALIZATIONINNEURALNETWORKS Simon Schug DISS. ETH NO. 30975 META-LEARNING & COMPOSITIONALGENERALIZATION IN NEURAL NETWORKS A thesis submitted to attain the degree of DOCTOR OF SCIENCES(Dr. sc. ETH Zurich) presented by SIMON PHILIPP STEPHAN SCHUGMSc, University of Zurich & ETH Zurich born on22.08.1994 accepted on the recommendation of Prof. Dr. Angelika Steger, examinerDr. João Sacramento, co-examinerDr. Razvan Pascanu, co-examinerProf. Dr. Brenden Lake, co-examiner Abstract When the environment changes and makes it hard to reach ourgoals, we have to adapt. If we had to purely rely on evolutionto find a better suited genetic program, this would be a verytedious process. Luckily, evolution discovered learning, and weare able to adapt and form new behaviors to perform the task athand using our experience. Taken at face value, learning is justthat: we become better at doing that one task. But learningcan be slow. Yet, we constantly find ourselves in new situationsand have to readapt. Fortunately, tasks are rarely completelyunknown to us and remarkably, learning something somewhatfamiliar is easier.We are in some way able to find commonstructure between the tasks we learn, form generalizations andrefine our learning strategies over time. This thesis concerns itself with studying how these abilities canbe realized in neural networks. In particular, we study meta-learning, the ability to improve the learning process itself overthe course of encountering many tasks with shared structure.And, we investigate how a particular form of structure betweentasks can be harnessed: compositionality, the property that asmall set of constituents can be recombined into many differenttask combinations. We will begin by reviewing the mathemat-ical foundation of our specific contributions.We detail howmeta-learning in neural networks can both be formalized as ahierarchical optimization problem as well as a sequence modelingproblem.Furthermore, we define what it means for a familyof tasks to be compositional and use this definition to formallystate the goal of compositional generalization. Equipped withthis background, we then present three parts that aim to con-tribute to our understanding of meta-learning and compositionalgeneralization in neural networks. In the first part, we develop a simple but exact algorithm formeta-learning via bilevel optimization. Whereas prior algorithmsrequire computing gradients backwards in time or evaluating second-order derivatives, our method simply runs the learningprocess twice and obtains the meta-gradient by contrasting thetwo learning outcomes using local meta-plasticity rules. In the second part, we investigate how meta-learning with mod-ular architectures can capture the compositional structure ofa family of tasks. We theoretically characterize the conditionsunder which hypernetworks, neural networks that ad hoc pro-duce the weights for another neural network tasked to solve aparticular task, are guaranteed to generalize compositionally.We then verify these conditions in a number of experiments,demonstrating that modular but not monolithic architecturescan learn policies that generalize compositionally when the iden-tified conditions are met. In the final part, we study meta-learning in Transformers thatprocess compositional tasks as sequences within their context.We draw a formal connection between the multi-head attentionmechanism of Transformers and hypernetworks. It suggests thatTransformers might be able to reuse and recombine operationsthrough the latent code of an implicit hypernetwork.We ex-perimentally validate this hypothesis in two abstract reasoningtasks, revealing a functionally structured latent code that ispredictive of the subtasks the learned networks use on unseentask compositions. Taken together, our findings shed light on the ability of neuralnetworks to meta-learn and generalize compositionally.Weconclude by providing an outlook on emerging research questionsfor the study of neural networks, given the tremendous progressof both machine learning and neuroscience. Zusammenfassung Wenn sich die Umwelt verändert und es uns damit schwer macht,unsere Ziele zu erreichen, müssen wir uns anpassen. Müssten wiruns dazu allein auf die Evolution verlassen, ein besser geeignetesgenetisches Programm zu finden, wäre dies ein sehr langwierigerProzess.Glücklicherweise hat die Evolution das Lernen ent-deckt, und wir sind in der Lage uns mithilfe unserer Erfahrungenanzupassen und neue Verhaltensweisen zu entwickeln. Auf denersten Blick ist Lernen genau das: Wir werden besser darin einebestimmte Aufgabe auszuführen. Aber Lernen kann langsamsein.Dennoch finden wir uns ständig in neuen Situationenwieder und müssen uns immer wieder neu anpassen.Glück