Charles MarxJune 2025 © 2025 by Charles Thomas Marx. All Rights Reserved.Re-distributed by Stanford University under license with the author. This dissertation is online at: https://purl.stanford.edu/sm978sh0523 I certify that I have read this dissertation and that, in my opinion, it is fully adequate inscope and quality as a dissertation for the degree of Doctor of Philosophy. Stefano Ermon, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate inscope and quality as a dissertation for the degree of Doctor of Philosophy. Sanmi Koyejo I certify that I have read this dissertation and that, in my opinion, it is fully adequate inscope and quality as a dissertation for the degree of Doctor of Philosophy. Volodymyr Kuleshov Approved for the Stanford University Committee on Graduate Studies. Stacey F. Bent, Vice Provost for Graduate Education Abstract Reliable uncertainty quantification is fundamental to the safe and e↵ective deployment of machinelearning systems in high-stakes settings, where predictions inform decisions ranging from medicaldiagnoses to infrastructure management and scientific discovery. This dissertation presents a studyof probabilistic prediction with a focus on making uncertainty estimates trustworthy by achievingcalibration—wherein predicted probabilities align with the empirical frequencies of events, such as a90% confidence interval including the observed outcome 90% of the time. I propose interventionsto improve calibration across the model lifecycle:training objectives to encourage calibration,post-processing methods to correct miscalibration, and online techniques for adaptively preservingcalibration during deployment in nonstationary environments. The first part addresses post-hoc recalibration.I introduce modular conformal calibration,a general framework that encompasses and extends existing post-hoc uncertainty quantificationtechniques such as isotonic regression and conformal prediction. This framework identifies a designspace for recalibration procedures and provides finite-sample calibration guarantees for any modelrecalibrated using these strategies. This allows practitioners to trade o↵between computational cost,meaningful likelihoods, deterministic behavior, and stronger calibration guarantees. In the second part, I turn to training-time calibration with the goal of encouraging calibrationwhile maintainingsharpness—the degree to which predictions are confident and informative.Ipropose a class of di↵erentiable calibration measures that serve as regularization objectives, enablingco-optimization of calibration and sharpness during training.These objectives encompass manypopular notions of calibration for regression and classification previously enforced only after training,instead incorporating them into standard empirical risk minimization. They also enable task-specificcalibration objectives, allowing probabilistic models whose uncertainty estimates are both statisticallycoherent and aligned with the practical needs of downstream decision-making. The third part investigates calibration under distribution shift, a central challenge in real-worlddeployments. I consider an online forecasting setting where data may evolve over time or be selectedadversarially. Building on Blackwell approachability theory, I develop a general strategy for enforcingcalibration guarantees across arbitrary observation sequences under minimal assumptions.Thisframework supports diverse calibration notions, including distribution and decision calibration, through both oracle-based and computationally tractable algorithms. I further present gradient-basedapproaches that relax the guarantees while enabling broader applicability. Empirical evaluationsdemonstrate that these methods maintain calibrated forecasts while achieving vanishing regret withrespect to expert predictors. Collectively, this dissertation provides principled strategies for uncertainty estimation withincreased flexibility by enforcing many forms of calibration at each stage of model development.This comprehensive approach to calibration across the model lifecycle enables practitioners totailor uncertainty quantification to their specific applications while reliably informing decisions inhigh-stakes settings. Acknowledgments First and foremost, I want to thank my advisor, Stefano Ermon. In addition to being one of thesmartest people I’ve had the good fortune to work with, Stefano has given me more freedom to workon anything that draws my interest than I could have reasonably hoped for. Volodymyr Kuleshovand Berk Ustun have also been great mentors throughout my PhD, being incredibly generous withtheir time and ideas. I am grateful to Emma Brunskill, Tatsu Hashimoto, Sanmi Koyejo, VasilisSyrgkanis, and Omer Reingold for their support at various stages of my PhD, including serving onmy committees and hosting thought-provoking research rotations. I would also like to thank t