行业研究公司研究宏观策略财报招股书会议纪要 seedance2.0 低空经济 DeepSeek AIGC 大模型

神经网络中的元学习与组合泛化

2025-08-31 - 苏黎世联邦理工学院浮云

本论文研究了神经网络如何实现元学习和组合泛化能力。元学习是指通过接触多个具有共同结构的任务来改进学习过程本身的能力，而组合泛化是指将先前获得的技能重新组合以解决新任务的能力。

论文首先回顾了元学习和组合泛化的数学基础，并详细介绍了如何将元学习形式化为分层优化问题和序列建模问题。此外，论文还定义了什么是组合任务族，并使用该定义正式陈述了组合泛化的目标。

论文的核心贡献包括：

对比元学习规则：提出了一种简单的精确算法，通过双层优化进行元学习。该算法只需运行学习过程两次，并通过局部元可塑性规则对比两次学习结果，即可获得元梯度。
发现可泛化的模块化解决方案：研究了模块化架构如何通过元学习捕获任务族的组合结构，并证明超网络在满足特定条件下可以保证组合泛化。实验结果表明，模块化但非整体的架构在教师-学生设置之外的任务中也能实现组合泛化。
注意力作为超网络：将 Transformer 架构的注意力机制解释为超网络，并提出多头注意力机制可以通过隐式超网络的低维潜在代码重用和重组操作。实验结果表明，Transformer 在抽象推理任务中表现出功能结构化的潜在代码，可以预测网络在未知任务组合中使用的子任务。

论文的研究结论表明，神经网络具有元学习和组合泛化能力，这为理解人类智能和创建能够加速科学进步的人工系统提供了新的视角。

Doctoral Thesis Author(s):Schug, Simon Publication date:2025 Permanent link:https://doi.org/https://doi.org/10.3929/ethz-b-000738789 Rights / license:In Copyright - Non-Commercial Use Permitted META-LEARNING&COMPOSITIONALGENERALIZATIONINNEURALNETWORKS Simon Schug DISS. ETH NO. 30975 META-LEARNING & COMPOSITIONALGENERALIZATION IN NEURAL NETWORKS A thesis submitted to attain the degree of DOCTOR OF SCIENCES(Dr. sc. ETH Zurich) presented by SIMON PHILIPP STEPHAN SCHUGMSc, University of Zurich & ETH Zurich born on22.08.1994 accepted on the recommendation of Prof. Dr. Angelika Steger, examinerDr. João Sacramento, co-examinerDr. Razvan Pascanu, co-examinerProf. Dr. Brenden Lake, co-examiner Abstract When the environment changes and makes it hard to reach ourgoals, we have to adapt. If we had to purely rely on evolutionto find a better suited genetic program, this would be a verytedious process. Luckily, evolution discovered learning, and weare able to adapt and form new behaviors to perform the task athand using our experience. Taken at face value, learning is justthat: we become better at doing that one task. But learningcan be slow. Yet, we constantly find ourselves in new situationsand have to readapt. Fortunately, tasks are rarely completelyunknown to us and remarkably, learning something somewhatfamiliar is easier.We are in some way able to find commonstructure between the tasks we learn, form generalizations andrefine our learning strategies over time. This thesis concerns itself with studying how these abilities canbe realized in neural networks. In particular, we study meta-learning, the ability to improve the learning process itself overthe course of encountering many tasks with shared structure.And, we investigate how a particular form of structure betweentasks can be harnessed: compositionality, the property that asmall set of constituents can be recombined into many differenttask combinations. We will begin by reviewing the mathemat-ical foundation of our specific contributions.We detail howmeta-learning in neural networks can both be formalized as ahierarchical optimization problem as well as a sequence modelingproblem.Furthermore, we define what it means for a familyof tasks to be compositional and use this definition to formallystate the goal of compositional generalization. Equipped withthis background, we then present three parts that aim to con-tribute to our understanding of meta-learning and compositionalgeneralization in neural networks. In the first part, we develop a simple but exact algorithm formeta-learning via bilevel optimization. Whereas prior algorithmsrequire computing gradients backwards in time or evaluating second-order derivatives, our method simply runs the learningprocess twice and obtains the meta-gradient by contrasting thetwo learning outcomes using local meta-plasticity rules. In the second part, we investigate how meta-learning with mod-ular architectures can capture the compositional structure ofa family of tasks. We theoretically characterize the conditionsunder which hypernetworks, neural networks that ad hoc pro-duce the weights for another neural network tasked to solve aparticular task, are guaranteed to generalize compositionally.We then verify these conditions in a number of experiments,demonstrating that modular but not monolithic architecturescan learn policies that generalize compositionally when the iden-tified conditions are met. In the final part, we study meta-learning in Transformers thatprocess compositional tasks as sequences within their context.We draw a formal connection between the multi-head attentionmechanism of Transformers and hypernetworks. It suggests thatTransformers might be able to reuse and recombine operationsthrough the latent code of an implicit hypernetwork.We ex-perimentally validate this hypothesis in two abstract reasoningtasks, revealing a functionally structured latent code that ispredictive of the subtasks the learned networks use on unseentask compositions. Taken together, our findings shed light on the ability of neuralnetworks to meta-learn and generalize compositionally.Weconclude by providing an outlook on emerging research questionsfor the study of neural networks, given the tremendous progressof both machine learning and neuroscience. Zusammenfassung Wenn sich die Umwelt verändert und es uns damit schwer macht,unsere Ziele zu erreichen, müssen wir uns anpassen. Müssten wiruns dazu allein auf die Evolution verlassen, ein besser geeignetesgenetisches Programm zu finden, wäre dies ein sehr langwierigerProzess.Glücklicherweise hat die Evolution das Lernen ent-deckt, und wir sind in der Lage uns mithilfe unserer Erfahrungenanzupassen und neue Verhaltensweisen zu entwickeln. Auf denersten Blick ist Lernen genau das: Wir werden besser darin einebestimmte Aufgabe auszuführen. Aber Lernen kann langsamsein.Dennoch finden wir uns ständig in neuen Situationenwieder und müssen uns immer wieder neu anpassen.Glück

点击免费查看完整报告

你可能感兴趣

神经网络中的元学习与组合泛化

你可能感兴趣

人工智能在作战决策中的应用：基于强化学习与图神经网络的武器目标分配

第八届挑战赛C1-基于卷积神经网络及集成学习的网络问政平台留言文本挖掘与分析

金工专题报告：深度学习揭秘系列之二：涵盖价量与基本面因子的多模型结合神经网络

第十届挑战赛C2-基于对偶对比学习文本分类及图神经网络的周边游需求图谱构建与分析

“学海拾珠”系列之一百七十三：基于端到端神经网络的风险预算与组合优化