Federal Reserve Board, Washington, D.C.ISSN 1936-2854 (Print)ISSN 2767-3898 (Online) Quasi Maximum Likelihood Estimation and Inference of LargeApproximate Dynamic Factor Models via the EM algorithm Matteo Barigozzi and Matteo Luciani 2024-086 Please cite this paper as:Barigozzi, Matteo, and Matteo Luciani (2024). “Quasi Maximum Likelihood Estimation andInference of Large Approximate Dynamic Factor Models via the EM algorithm,” Financeand Economics Discussion Series 2024-086. Washington: Board of Governors of the FederalReserve System, https://doi.org/10.17016/FEDS.2024.086. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminarymaterials circulated to stimulate discussion and critical comment.The analysis and conclusions set forthare those of the authors and do not indicate concurrence by other members of the research staff or theBoard of Governors. References in publications to the Finance and Economics Discussion Series (other thanacknowledgement) should be cleared with the author(s) to protect the tentative character of these papers. Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm Matteo BarigozziMatteo LucianiUniversità di BolognaFederal Reserve Boardmatteo.barigozzi@unibo.itmatteo.luciani@frb.gov This version: October 23, 2024First draft: November 8, 2017∗ Abstract We study estimation of large Dynamic Factor models implemented through the Expectation Maximization(EM) algorithm, jointly with the Kalman smoother. We prove that as both the cross-sectional dimension,n,and the sample size,T, diverge to infinity:(i) the estimated loadings are√T-consistent, asymptoticallynormal and equivalent to their Quasi Maximum Likelihood estimates; (ii) the estimated factors are√n-consistent, asymptotically normal and equivalent to their Weighted Least Squares estimates. Moreover, theestimated loadings are asymptotically as efficient as those obtained by Principal Components analysis, whilethe estimated factors are more efficient if the idiosyncratic covariance is sparse enough. Keywords:Approximate Dynamic Factor Model; Expectation Maximization Algorithm; Kalman Smoother;Quasi Maximum Likelihood. 1Introduction Factor analysis can be considered a pioneering technique in unsupervised statistical learning (Ghahramani andHinton, 1996).It originally gained popularity in the early decades of the twentieth century as a dimension-reduction technique used in psychometrics (Spearman, 1904).Since then, it has become a classical methodused for the statistical analysis of complex datasets in many human, natural, and social sciences (see, e.g.,Lawley and Maxwell, 1971, Chapter 1, and references therein). In the last thirty years, factor analysis has seensignificant success in financial and macroeconometrics because it allows to analyze and predict economic activityby summarizing large panels of economic time series in a simple and effective way (see, e.g., the survey by Stockand Watson, 2016 and references therein). Anr-factor model is defined by wherexitis the observation for theith cross-section at timet,μiis a constant, andFtandλiarer-dimensionallatent column vectors offactorsand factorloadings, withr≪n.We callλ′iFtthecommoncomponent and ξittheidiosyncraticcomponent.Throughout, we consider the standard case in which all{xit}are zero-meanweakly stationary processes or are the result of a transformation to stationarity. Furthermore, in the case of time series, the factors are likely to be autocorrelated.For example, we canassume simple first order autoregressive dynamics: withvtbeing anr-dimensional vector of innovations. Likewise, the idiosyncratic components might be autocor-related.The measurement equation (1) and the state equation (2) form a state-space model, or, equivalently,a Dynamic Factor Model (DFM) (this is a restricted version of the more general model by Forni et al., 2000,where factors can be loaded also with lags). Thanks to its simplicity and empirical success, the DFM is the mostcommon approach to factor analysis of high-dimensional time series. In large dimensional macroeconomic and financial datasets, the idiosyncratic components are likely to bealso cross-correlated. Indeed, although macroeconomic or financial market dynamics are the main drivers of thecomovement in these datasets, sectoral and local comovements are non-negligible sources of fluctuations. In thecase of correlated idiosyncratic components the factor model is calledapproximateas opposed to anexactfactormodel having uncorrelated idiosyncratic components. In an exact factor model a small number of variables is enough to estimate the loadings by Quasi MaximumLikelihood (QML), but we cannot consistently estimate the factors (Lawley and Maxwell, 1971). In an approxi-mate factor model, we can disentangle the common and idiosyncratic components only in the extreme case whenn→ ∞(Chamberlain and Rothschild, 1983)—in other words,