您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [斯坦福大学]:机器学习的信息论基础 - 发现报告

机器学习的信息论基础

2025-06-01 - 斯坦福大学 见风
报告封面

© 2025 by Hong Jun Jeon. All Rights Reserved.Re-distributed by Stanford University under license with the author.This work is licensed under a Creative Commons Attribution-3.0 United States License.http://creativecommons.org/licenses/by/3.0/us/This dissertation is online at: https://purl.stanford.edu/gx002mv2026ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate inscope and quality as a dissertation for the degree of Doctor of Philosophy.Benjamin Van Roy, Primary AdviserI certify that I have read this dissertation and that, in my opinion, it is fully adequate inscope and quality as a dissertation for the degree of Doctor of Philosophy.Dorsa Sadigh, Co-AdviserI certify that I have read this dissertation and that, in my opinion, it is fully adequate inscope and quality as a dissertation for the degree of Doctor of Philosophy.Tatsunori HashimotoApproved for the Stanford University Committee on Graduate Studies.Stacey F. Bent, Vice Provost for Graduate EducationThis signature page was generated electronically upon submission of this dissertation inelectronic format.iii Information-Theoretic Foundations for Machine LearningHong Jun Jeon1and Benjamin Van Roy2,31Department of Computer Science, Stanford University2Department of Electrical Engineering, Stanford UniversityDepartment of Management Science and Engineering, Stanford UniversityAbstractThe progress of machine learning over the past decade is undeniable. In retrospect, it is both remarkableand unsettling that this progress was achievable with little to no rigorous theory to guide experimentation.Despite this fact, practitioners have been able to guide their future experimentation via observations fromprevious large-scale empirical investigations. However, alluding to Plato’s Allegory of the cave, it is likelythat the observations which form the field’s notion of reality are but shadows representing fragments of thatreality. In this work, we propose a theoretical framework which attempts to answer what exists outside ofthe cave. To the theorist, we provide a framework which is mathematically rigorous and leaves open manyinteresting ideas for future exploration. To the practitioner, we provide a framework whose results are simple,and provide intuition to guide future investigations across a wide range of learning paradigms. Concretely,we provide a theoretical framework rooted in Bayesian statistics and Shannon’s information theory which isgeneral enough to unify the analysis of many phenomena in machine learning. Our framework characterizesthe performance of an optimal Bayesian learner as it learns from a stream of experience. Unlike existinganalyses that weaken with increasing data complexity, our theoretical tools provide accurate insights acrossdiverse machine learning settings.Throughout this work, we derive theoretical results and demonstratetheir generality by apply them to derive insights specific to settings.These settings range from learningfrom data which is independently and identically distributed under an unknown distribution, to data whichis sequential, to data which exhibits hierarchical structure amenable to meta-learning, and finally to datawhich is not fully explainable under the learner’s beliefs (misspecification). These results are particularlyrelevant as we strive to understand and overcome increasingly difficult machine learning challenges in thisendlessly complex world.iv 3 To my family.v Contents1Introduction2Related Works2.1Frequentist and Bayesian Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2PAC Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3A Framework for Learning3.1Probabilistic Framework and Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2Data Generating Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.4Achievable Error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4Requisite Information Theory4.1Entropy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.2Conditional Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.3Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.4Differential Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.5Requisite Results from Information Theory. . . . . . . . . . . . . . . . . . . . . . . . . .Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5Connecting Learning and Information Theory5.1Error is I