Davide Gallon1, Arnulf Jentzen2,3, and Philippe von Wurstemberger4,5 1Applied Mathematics: Institute for Analysisand Numerics, University of M¨unster,Germany, e-mail: davide.gallon@uni-muenster.de2School of Data Science and Shenzhen Research Institute ofBig Data, The Chinese University of Hong Kong, Shenzhen(CUHK-Shenzhen), China, e-mail: ajentzen@cuhk.edu.cn3Applied Mathematics: Institute for Analysis and Numerics,University of M¨unster, Germany, e-mail: ajentzen@uni-muenster.de4Risklab, Department of Mathematics, ETH Zurich,Switzerland, e-mail: philippe.vonwurstemberger@math.ethz.ch5School of Data Science, The Chinese University ofHong Kong, Shenzhen (CUHK-Shenzhen),China, e-mail: philippevw@cuhk.edu.cn December 3, 2024 Abstract This article provides a mathematically rigorous introduction todenoising diffusion prob-abilistic models(DDPMs), sometimes also referred to asdiffusion probabilistic models ordiffusion models, for generative artificial intelligence.We provide a detailed basic mathe-matical framework for DDPMs and explain the main ideas behind training and generationprocedures.In this overview article we also review selected extensions and improvementsof the basic framework from the literature such as improved DDPMs, denoising diffusionimplicit models, classifier-free diffusion guidance models, and latent diffusion models.arXiv:2412.01371v1 [cs.LG] 2 Dec 2024 Contents 1Introduction 2Denoising diffusion probabilistic models (DDPMs)42.1General framework for DDPMs. . . . . . . . . . . . . . . . . . . . . . . . . . . .4 2.2Training objective in DDPMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82.3A first simplified DDPM generative method . . . . . . . . . . . . . . . . . . . . .12 3.1Properties of Gaussian distributions. . . . . . . . . . . . . . . . . . . . . . . . .143.1.1On Gaussian transition kernels. . . . . . . . . . . . . . . . . . . . . . . .153.1.2Explicit constructions for Gaussian transition kernels . . . . . . . . . . . .153.1.3Bayes rule for Gaussian distributions . . . . . . . . . . . . . . . . . . . . .163.1.4KL divergence between Gaussian distributions. . . . . . . . . . . . . . .173.2Framework for DDPMs with Gaussian noise . . . . . . . . . . . . . . . . . . . . .173.3Distributions of the forward process in DDPMs with Gaussian noise. . . . . . .183.3.1Conditional distributions going forward. . . . . . . . . . . . . . . . . . .183.3.2Terminal distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .193.3.3Conditional distributions going backwards . . . . . . . . . . . . . . . . . .203.4Reformulated training objective in DDPMs with Gaussian noise . . . . . . . . . .213.5DDPM generative method with Gaussian noise. . . . . . . . . . . . . . . . . . .263.6Network architectures for the backward process . . . . . . . . . . . . . . . . . . .293.6.1UNets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .293.6.2Time embedding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31 4Evaluation of generative models32 4.1Content variant metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .334.1.1Inception score. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .334.1.2Fr´echet inception distance . . . . . . . . . . . . . . . . . . . . . . . . . . .344.2Content invariant metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35 5Advanced variants and extensions of DDPMs36 5.1Improved DDPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .365.2Denoising Diffusion Implicit Model (DDIM) . . . . . . . . . . . . . . . . . . . . .405.2.1Framework for DDIM. . . . . . . . . . . . . . . . . . . . . . . . . . . . .405.2.2Distribution for the forward process in DDIM . . . . . . . . . . . . . . . .415.2.3Explicit objective function in DDIM. . . . . . . . . . . . . . . . . . . . .425.2.4Generative method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .425.3Classifier-free diffusion guidance . . . . . . . . . . . . . . . . . . . . . . . . . . . .445.3.1Controlling with adaptive group normalization. . . . . . . . . . . . . . .445.3.2Generative method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .455.4Stable Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .475.4.1Controlling with cross attention layer. . . . . . . . . . . . . . . . . . . .475.4.2Generative method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .485.5Further state of the art diffusion techniques. . . . . . . . . . . . . . . . . . . . .495.5.1GLIDE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .505.5.2DALL-E 2 and DALL-E 3 . . . . . . . . . . . . . . . . . . . . . . . . . . .505.5.3Imagen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51 1Introduction The goal of generative modelling is to generate new data samples from an unknown underlyingdistribution based on a dataset of samples from tha