行业研究公司研究宏观策略财报招股书会议纪要 seedance2.0 低空经济 DeepSeek AIGC 大模型

统计与信息理论

2024-10-22 - 斯坦福 MEI.

定义域为 Θ 的凸函数 f: R^d → R 是强凸的，当对于所有 w, v ∈ Θ 和 g ∈ ∂f(w) 我们有 f(w) ≥ f(v) + ⟨g, v⟩ + 0, 其中 f 是关于范数 ∥·∥ 的强凸函数，即 f(w) 在 w 的邻域内至少以至少二次的速度增长。这一定理对于理解凸优化问题的解的性质以及开发具有良好 regret 界定的算法至关重要。

John Duchi October 22, 2024 Contents 1Introduction and setting10 1.1Information theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 1.2Moving to statistics and machine learning. . . . . . . . . . . . . . . . . . . . . . . .111.3Outline and chapter discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121.4A remark about measure theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14 2An information theory review15 2.1Basics of Information Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .152.1.1Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .152.1.2Chain rules and related properties. . . . . . . . . . . . . . . . . . . . . . . .202.1.3Data processing inequalities:. . . . . . . . . . . . . . . . . . . . . . . . . . .222.2General divergence measures and definitions . . . . . . . . . . . . . . . . . . . . . . .232.2.1Partitions, algebras, and quantizers . . . . . . . . . . . . . . . . . . . . . . . .232.2.2KL-divergence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .242.2.3f-divergences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .262.2.4Inequalities and relationships between divergences. . . . . . . . . . . . . . .282.2.5Convexity and data processing for divergence measures. . . . . . . . . . . .322.3First steps into optimal procedures: testing inequalities. . . . . . . . . . . . . . . .332.3.1Le Cam’s inequality and binary hypothesis testing. . . . . . . . . . . . . . .332.3.2Fano’s inequality and multiple hypothesis testing . . . . . . . . . . . . . . . .352.4A first operational result: entropy and source coding . . . . . . . . . . . . . . . . . .372.4.1The source coding problem. . . . . . . . . . . . . . . . . . . . . . . . . . . .372.4.2The Kraft-McMillan inequalities. . . . . . . . . . . . . . . . . . . . . . . . .382.4.3Entropy rates and longer codes. . . . . . . . . . . . . . . . . . . . . . . . . .412.5Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .432.6Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43 3Exponential families and statistical modeling483.1Exponential family models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48 3.2Why exponential families? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .503.2.1Fitting an exponential family model. . . . . . . . . . . . . . . . . . . . . . .533.3Divergence measures and information for exponential families. . . . . . . . . . . . .543.4Generalized linear models and regression . . . . . . . . . . . . . . . . . . . . . . . . .553.4.1Fitting a generalized linear model from a sample. . . . . . . . . . . . . . . .583.4.2The information in a generalized linear model . . . . . . . . . . . . . . . . . .593.5Lower bounds on testing a parameter’s value. . . . . . . . . . . . . . . . . . . . . .61 3.6Deferred proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .633.6.1Proof of Proposition 3.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .633.7Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .653.8Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65 IConcentration, information, stability, and generalization 4Concentration Inequalities67 4.1Basic tail inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67 4.1.1Sub-Gaussian random variables . . . . . . . . . . . . . . . . . . . . . . . . . .694.1.2Sub-exponential random variables. . . . . . . . . . . . . . . . . . . . . . . .734.1.3Orlicz norms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .774.1.4First applications of concentration: random projections. . . . . . . . . . . .804.1.5A second application of concentration: codebook generation . . . . . . . . . .814.2Martingale methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .834.2.1Sub-Gaussian martingales and Azuma-Hoeffding inequalities. . . . . . . . .844.2.2Examples and bounded differences. . . . . . . . . . . . . . . . . . . . . . . .854.3Matrix concentration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .884.4Technical proofs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .914.4.1Proof of Theorem 4.1.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .914.4.2Proof of Theorem 4.1.15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .924.4.3Proof of Theorem 5.1.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .934.4.4Proof of Proposition 4.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .934.5Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .954.6Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .95 5E

点击免费查看完整报告

你可能感兴趣

统计与信息理论

你可能感兴趣

沪深300期指：基于协整理论的股指套利统计

信息级联和阈值实现：理论及其在众筹中的应用

“双碳”目标下中国综合能源服务的态势辨析与理论架构

电力系统安全域构建理论与方法

数字贸易促进贸易强国建设的理论分析、现实挑战与政策建议