行业研究公司研究宏观策略财报招股书会议纪要 Token 低空经济十五五 AIGC 大模型

机器学习的信息论基础

2025-06-01 - 斯坦福大学见风

本文提出了一种基于贝叶斯统计和香农信息论的机器学习理论框架，旨在统一分析机器学习中的多种现象，并刻画最优贝叶斯学习器在不同数据生成过程中的性能。该框架主要观点如下：

学习与信息的关系：框架将学习过程视为减少对未知变量不确定性的过程，并通过信息论工具量化这种不确定性。具体而言，最优贝叶斯学习器的估计误差等于其从数据中获取的总信息量。
率失真理论的应用：框架利用率失真理论将估计误差与最优有损压缩联系起来，从而为分析各种机器学习问题提供了一种通用的方法。
不同学习场景的分析：框架成功应用于多种学习场景，包括独立同分布数据、序列数据、层次结构数据和模型错配数据。例如，框架推导了线性回归、逻辑回归、深度神经网络和非参数学习的理论误差界限，并展示了其通用性。
元学习和情境学习：框架进一步扩展到元学习和情境学习，分析了元参数和任务特定参数对学习性能的影响，并提供了理论工具来量化这些误差。
模型错配的影响：框架研究了模型错配对学习性能的影响，并证明了即使数据无限，错配算法的估计误差也会随时间增长而消失。

本文的核心结论是，机器学习的性能取决于数据中包含的信息量，而最优贝叶斯学习器能够有效地利用这些信息来减少不确定性。框架提供的理论工具可以帮助研究人员更好地理解机器学习现象，并为未来的研究方向提供指导。

© 2025 by Hong Jun Jeon. All Rights Reserved.Re-distributed by Stanford University under license with the author.This work is licensed under a Creative Commons Attribution-3.0 United States License.http://creativecommons.org/licenses/by/3.0/us/This dissertation is online at: https://purl.stanford.edu/gx002mv2026ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate inscope and quality as a dissertation for the degree of Doctor of Philosophy.Benjamin Van Roy, Primary AdviserI certify that I have read this dissertation and that, in my opinion, it is fully adequate inscope and quality as a dissertation for the degree of Doctor of Philosophy.Dorsa Sadigh, Co-AdviserI certify that I have read this dissertation and that, in my opinion, it is fully adequate inscope and quality as a dissertation for the degree of Doctor of Philosophy.Tatsunori HashimotoApproved for the Stanford University Committee on Graduate Studies.Stacey F. Bent, Vice Provost for Graduate EducationThis signature page was generated electronically upon submission of this dissertation inelectronic format.iii Information-Theoretic Foundations for Machine LearningHong Jun Jeon1and Benjamin Van Roy2,31Department of Computer Science, Stanford University2Department of Electrical Engineering, Stanford UniversityDepartment of Management Science and Engineering, Stanford UniversityAbstractThe progress of machine learning over the past decade is undeniable. In retrospect, it is both remarkableand unsettling that this progress was achievable with little to no rigorous theory to guide experimentation.Despite this fact, practitioners have been able to guide their future experimentation via observations fromprevious large-scale empirical investigations. However, alluding to Plato’s Allegory of the cave, it is likelythat the observations which form the field’s notion of reality are but shadows representing fragments of thatreality. In this work, we propose a theoretical framework which attempts to answer what exists outside ofthe cave. To the theorist, we provide a framework which is mathematically rigorous and leaves open manyinteresting ideas for future exploration. To the practitioner, we provide a framework whose results are simple,and provide intuition to guide future investigations across a wide range of learning paradigms. Concretely,we provide a theoretical framework rooted in Bayesian statistics and Shannon’s information theory which isgeneral enough to unify the analysis of many phenomena in machine learning. Our framework characterizesthe performance of an optimal Bayesian learner as it learns from a stream of experience. Unlike existinganalyses that weaken with increasing data complexity, our theoretical tools provide accurate insights acrossdiverse machine learning settings.Throughout this work, we derive theoretical results and demonstratetheir generality by apply them to derive insights specific to settings.These settings range from learningfrom data which is independently and identically distributed under an unknown distribution, to data whichis sequential, to data which exhibits hierarchical structure amenable to meta-learning, and finally to datawhich is not fully explainable under the learner’s beliefs (misspecification). These results are particularlyrelevant as we strive to understand and overcome increasingly difficult machine learning challenges in thisendlessly complex world.iv 3 To my family.v Contents1Introduction2Related Works2.1Frequentist and Bayesian Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2PAC Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3A Framework for Learning3.1Probabilistic Framework and Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2Data Generating Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.4Achievable Error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4Requisite Information Theory4.1Entropy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2Conditional Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.3Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.4Differential Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.5Requisite Results from Information Theory. . . . . . . . . . . . . . . . . . . . . . . . . .Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5Connecting Learning and Information Theory5.1Error is I

点击免费查看完整报告

机器学习的信息论基础

你可能感兴趣

奠定人工智能和机器学习的基础：Gartner趋势洞察报告

组合优化赋能的机器学习：技术基础、应用场景与研究前沿

基金优选系列之三十三：国金量化多因子A：以机器学习为基础，多维度预测提高决策稳健性

第十届挑战赛B2-基于机器学习和深度学习的电力负荷预测研究

第九届挑战赛A2-基于机器学习模型预测财务造假的上市公司

第七届挑战赛A1-机器学习优化股票多因子模型的研究与实证分析

第七届挑战赛A3-基于机器学习提升的轮动多因子量化选股模型

云原生存储 CubeFS 在大数据和机器学习的探索和实践-唐之享

评估医疗分析中的监督机器学习分类模型

股票市场预测的机器学习技术与数据：文献综述