# Ozone example

## Bayesian Inference for Large-Scale Gaussian Models

## Intro

This page describes how to perform Bayesian inference on extremely large Gaussian models. We use Monte-Carlo techniques and Krylov-methods to obtain unbiased estimates of the log-likelihood of the model. We then use Russian roulette to turn those into unbiased estimates of the likelihood itself. Those estimates can then be plugged into Metropolis-Hastings style samplers which are still guaranteed to converge to the exact posterior distribution.

We now give a model description, an overview of the method, illustration of some results, and provide example code for download.

## The Model

This example describes how Russian roulette can be used to perform Bayesian inference on extremely large Gaussian models. We fit a simple stationary model to a set of N=173,405 ozone measurements gathered by an orbiting satellite with a passive sensor that measures back-scattered light. We use the approximate Matérn model on a sphere to describe prior uncertainty. While this model is simple, it suits our illustrative proposes. The model itself is a three stage hierarchical model given as

where Q(κ) is the precision matrix of a Matérn SPDE model defined on a fixed triangulation of the globe and A is a matrix that evaluates piecewise linear basis functions. The parameter κ controls the range over which the correlation between two values of the field is essentially zero. The precision matrix Q is sparse, which allows both low-memory storage and fast matrix-vector products. In this example, the triangulation over which the SPDE model is defined has n=196,002 vertices and this is the dimension of the latent field x. The marginal likelihood of the model given the parameters {τ,κ} can shown to be

We use the above quantity to sample the posterior distribution log π(τ,κ|y).

## Estimating Log-Determinants

The difficult part of the above quantity is the log-determinant log(det(Q)). In high dimensions, the Cholesky factorization becomes infeasible due to the fill-in effect of the factorization (not sparse anymore). We construct an unbiased Monte-Carlo estimate for it using

where z is a vector with element-wise zero-mean unit-variance (e.g. Gaussian). The matrix logarithm log(Q) can be approximated up to machine precision using a rational approximation of the Cauchy contour integral formula. In addition, based on the observation that the elements of log(Q) decay quickly away from the the non-zero elements of Q, an effective way of choosing the random vectors z is to use a graph colouring scheme. This massively reduces the variance of the Monte Carlo estimator.

## Russian Roulette for Pseudo-Marginal MCMC

Given an unbiased estimator for the log-likelihood, log π(y|κ,τ), we use Russian roulette on the exponential function's series expansion

in order to construct an unbiased estimator for the likelihood π(y|κ,τ). This quantity can be directly plugged into an MH-sampler to get posterior samples using the ratio

where q is some proposal distribution over θ:={τ,κ}.

## Exact Bayesian Results

For comparison and using clever engineering, we sample the exact posterior distributions of the ozone model's parameters. Below are plots for the analysis of the MCMC chain and illustrations of the posterior distributions. Note that the posterior distributions are well-behaved i.e. unimodal and not heavy tailed.

## Pseudo-marginal Russian roulette

In contrast to the other cases examined in this paper, we found that the Russian Roulette random walk Metropolis chain failed to converge for this problem: the chain exhibited catastrophic sticking and therefore extremely high autocorrelation. Even after very careful tuning and using thousands of computing node to reduce the variance of the Russian Roulette estimator by averaging, the chain still got stuck at the posterior mode. The fact that it is infeasible to realise a genuinely unbiased estimate of the normalising term for a model of this nature and size may mean that the Russian Roulette framework (and perhaps the entire concept of exact-approximate methods) are not the right approach for this type of model. Nevertheless, the most interesting open question is whether particular parts of the framework, and if so which, cause the described problems.

## Example Code

All presented results have been computed on 1500 synchronised computing nodes on a PBS/SGE cluster. We provide an experimental implementation of the Russian Roulette scheme for the ozone example on github. Codes for the log-determinant estimates are based on an efficient implementation written by Soumyajit De during the Google Summer of Code 2013 within the Shogun Machine Learning Toolbox. Codes for the parallel computing are written by Heiko Strathmann and are experimental.