NORMALIZING FLOWS AIDED VARIATIONAL INFERENCE FOR UNCERTAINTY QUANTIFICATION

Bayesian statistics is a powerful tool for quantifying uncertainties when estimating unknown model parameters. It is often the case that the posterior distributions arising from the Bayesian paradigm are intractable. This may be due to complex statistical model choices and high-dimensionality of the parameter space. Previously, Markov Chain Monte Carlo (MCMC) methods have been the preferred approach for sampling from posterior distributions with an unknown normalizing constant. However, MCMC methods run into a number of issues in practice. For instance, they do not always scale well to multimodal distributions defined on a high-dimensional support. Variational Inference (VI) has emerged as a scalable alternative to MCMC for sampling from intractable posterior distributions. Recently, Normalizing Flows aided VI (FAVI) has been used for sampling from complex and multimodal posterior distributions to overcome the limitations of existing mean-field and structured VI approaches. FAVI has had a significant impact across fields in applications such as computer vision, computational biology, and physics-based modelling. Despite its impact, there is limited research on the theoretical properties of the approximate posterior arising from FAVI. The computational cost of FAVI depends heavily on the choice of Normalizing Flow (NF) family, but there is no work quantifying the nature of the approximate posterior from FAVI at a particular complexity of the NF, especially with respect to uncertainty quantification.In this dissertation, we study the properties of the FAVI posterior with a focus on: (i) The trade-off between accurate recovery of the posterior samples and complexity of the selected NF family. (ii) Uncertainty quantification. We first provide background on FAVI and compare it to popular competitors (Mean-Field VI (MF-VI) and MCMC) over some basic statistical applications. Our results demonstrate that FAVI lies between MCMC and MF-VI in both statistical accuracy and computational efficiency. In this second part of this dissertation, we use the framework of Bayesian linear regression with 2 predictor variables to rigorously study the optimal Kullback-Leibler divergence between the FAVI approximation with Inverse Auto-regressive Flows (IAF) and the true posterior. We also derive the uncertainty quantification (credible interval coverage) resulting from using FAVI to approximate the posterior, as a function of the correlation between the regression predictors. We contrast this coverage with MF-VI (the most popular VI approach in the literature) and find that, given sufficient complexity of the NF, there is virtually no loss in coverage from FAVI relative to the true posterior, regardless of the correlation. On the other hand, the loss in coverage for MF-VI increases monotonically in the correlation. Next, we extend our results to the case of an arbitrary ? > 2 regression predictors. Our results (presented across complexity levels of the IAF transformations), demonstrate that given sufficient complexity of IAF, FAVI can completely recover the true posterior. To our knowledge, this is the first theoretical exploration of this kind. Finally, we discuss ongoing research and plans for future work where we will leverage our learning to use FAVI for Bayesian inference in high-dimensional linear models with spike and slab priors. Preliminary results show that FAVI can capture dependencies in the posterior more effectively than MF-VI. FAVI is one among many novel computational tools that has originated in machine learning literature for scalable Bayesian computation, but there has been little previous work analyzing its statistical properties and reliability for uncertainty quantification. By studying the FAVI posterior from a statistical lens, this dissertation bridges some of the gap between machine learning and statistics, and takes strides towards building reliable computational tools for Bayesian inference.

Read