## Statistical Science Seminars

**Usual time**: Thursdays 16:00 - 17:00

**Location**: Room 102, Department of Statistical Science, 1-19 Torrington Place (1st floor).

Some seminars are held in different locations at different times. Click on the abstract for more details.

## 14 January: Ragnhild Noven (Imperial College)

### Modelling complex distributions and dependence structures with trawl-type processes

Trawl processes are a class of stationary, continuous-time stochastic processes driven by an independently scattered random measure. They belong to the wider class of so-called Ambit fields, and give rise to a flexible class of models that can accommodate non-Gaussian distributions and a wide range of covariance structures. We review trawl processes and their properties in the context of statistical modelling, and introduce a new representation that enables exact simulation for discrete observations, as well as allowing for Bayesian approaches to parameter estimation. Then we consider two statistical models that exploit the wider trawl process framework, the first being a continuous-time model for rainfall and the second a hierarchical model for extreme values.

## 20 January: Yoshiki Yamagata (National Institute for Environmental Studies, Japan)

### Climate Resilient 3D Urban Design: A “Wise Shrink” Approach

Tokyo is the world largest mega-city in the world. There had been a massive suburbanization since 1970s. Even after the bubble economy crushed, this trend has been continuing until recently. However, national level population decrease had started since 2014 and we are at a turning point where we urgently need to re-design the city urban form to ensure the sustainability of the city in the future. As a possible way for this transformation, we are proposing a new urban design concept called “Wise Shrink”. This scenario aim to achieve both resilient and compact urban land use at the same time. This “Wise Shrink” scenario consider optimal land use at the micro-district level by eliminating the trade-offs and highlighting the synergies between climate change mitigation and adaptation policies as well as disaster risks management in the process of shrinking the urban extent (residential land use) in a sustainable manner.

.

## 21 January: Almut Veraart (Imperial College)

### Modelling multivariate serially correlated count data in continuous time

A new continuous-time framework for modelling serially correlated count and integer-valued data is introduced in a multivariate setting. The key modelling component is a multivariate integer-valued trawl process which is obtained by kernel smoothing of an integer-valued Levy basis. We discuss various ways of describing both serial and cross-dependence in such a setting and we study how the corresponding model parameters can be estimated. Simulation studies reveal a good finite sample performance of the proposed methods. Finally we apply the new modelling framework and estimation procedure to high frequency financial data.

## 28 January: Claire Miller (University of Glasgow)

### GloboLakes: Calibration, Coherence and Quality

*Claire Miller, Marian Scott, Ruth O’Donnell, Mengyi Gong, Craig Wilkie*

Traditional monitoring of lake water quality has focused on in-depth studies of individual lakes, without considering the global context of environmental change. GloboLakes is a 5-year consortium project funded by the Natural Environment Research Council, UK, to investigate the state of lakes and their response to environmental drivers at a global scale. The project involves: the production of a 20-year time series of observed ecological parameters for approximately 1000 lakes globally from archive satellite data, collation of associated catchment and meteorological data, and in-situ monitoring of selected lakes.

Lakes are sensitive to large-scale environmental pressures and hence different lakes within a region can be expected to behave similarly through time (temporal coherence). This seminar will describe the developments on-going in Statistics at The University of Glasgow to investigate this. These include, Bayesian spatiotemporal varying-coefficient regression downscaling for calibration, mixed model functional PCA for dimensionality reduction of sparse images, and functional clustering. Applications currently use data from the AATSR and MERIS instruments on the European Space Agency satellite platform, which have been used to estimate lake surface water temperature and ecological properties such as chlorophyll (as an indicator of lake water quality) respectively.

## 04 February: Ayush Bhandari (Massachusetts Institute of Technology)

### A Swiss Army Knife for Sampling Theory, Shift-invariant Subspaces and Sparse Approximation

The year 2016 marks the Claude Shannon centenary. One of his many elegant results is linked with the topic of Sampling Theory. Seen from an abstract point of view, if a given signal/function is smooth, then, the sampling theory deals with conditions under which signal reconstruction/approximation is perfect. The constraint that a given signal is bandlimited (or compactly supported in Fourier domain) is a mathematical construct that somehow measures the smoothness of a function. For bandlimited functions, this topic is well understood and goes in the name of Nyquist--Shannon Sampling theorem. In the past four decades---thanks to the wavelet revolution---considerable advancements have been made in this area which now incorporates an alternative viewpoint: sampling theory as approximation of functions and covers the case of non-bandlimited, as well as sparse signals.

The idea that Fourier transform of a function forms a cyclic group---four, consecutive Fourier transforms of a function, produces the same function again---attracted the attention of several mathematicians including Norbert Wiener. This resulted in the formalization of the fractional Fourier transform or the FrFT domain (parametrized by an additional parameter) and later, the Special Affine Fourier Transform.

## 11 February: Konstantina Palla (University of Cambridge)

### A birth-death process for feature allocation

We propose a Bayesian nonparametric prior over feature allocations for sequential data, the birth-death feature allocation process (BDFP).

The BDFP models the evolution of the feature allocation of a set of objects N across a covariate (e.g. time) by creating and deleting features.

A BDFP is exchangeable, projective, stationary and reversible, and its equilibrium distribution is given by the Indian buffet process (IBP).

We also present the Beta Event Process (BEP) and we show that it is the de Finetti mixing distribution underlying the BDFP. This results shows that the BEP plays the role for the BDFP that the Beta process plays for the Indian buffet process. Moreover, we show that the BEP permits simplified inference. The utility of this prior is demonstrated on synthetic and real world data.

Joint work with David Knowles and Zoubin Ghahramani.

## 25 February: Peter McCullagh (The University of Chicago)

### Empirical phenomena and universal laws

In 1935 Fisher, together with Corbet and Williams, published a study on the relation between the number of species and the number of specimens in random samples.

This very short paper has since been recognized as one of the most influential papers on species diversity in 20th century ecology.

It was a combination of empirical work backed up by a simple theoretical argument pointing to the log-series distribution for species diversity in random samples.

Fisher's work is closely related to more recent mathematical developments on random partitions, such as the Ewens partition and the chinese restaurant process.

In this talk, I will explain this relationship and also how Fisher's log-series distribution is a consequence of three mathematical axioms:

exchangeability, consistency and self-similarity.

If time permits, I will discuss empirical studies of a similar sort, including Fairfield-Smith's work on the variance of spatial averages.

## 10 March: Stephan Huckemann (Georg-August-Universität Göttingen)

### On Statistical Analysis of Shape: Theory and Applications

We introduce Kendall's shape spaces, the "zoo" of non-Euclidean generalizations of the concept of an expected value and of principal components, as well as applications to growth in biometry.

On the theoretical side we will explore some of the complications that arise in an asymptotic theory featuring non-Euclidean phenomena such as smeariness and stickiness.

## 17 March: Kyrylo Chimisov (University of Warwick)

### Adapting the Gibbs Sampler

*joint with Krys Latuszynski and Gareth Roberts*

The popularity of Adaptive MCMC has been fueled on the one hand by its success in applications, and on the other hand, by mathematically appealing and computationally straightforward optimisation criteria for the Metropolis algorithm acceptance rate (and, equivalently, proposal scale). Similarly principled and operational criteria for optimising the selection probabilities of the Random Scan Gibbs Sampler have not been devised to date.

In the present work we close this gap and develop a general purpose Adaptive Random Scan Gibbs Sampler that adapts the selection probabilities. The adaptation is guided by optimising the L2 spectral gap for the target's Gaussian analogue [1,3], gradually, as target's global covariance is learned by the sampler. The additional computational cost of the adaptation represents a small fraction of the total simulation effort.

We present a number of moderately- and high-dimensional examples, including Truncated Normals, Bayesian Hierarchical Models and Hidden Markov Models, where significant computational gains are empirically observed for both, Adaptive Gibbs, and Adaptive Metropolis within Adaptive Gibbs version of the algorithm, and where formal convergence is guaranteed by [2]. We argue that Adaptive Random Scan Gibbs Samplers can be routinely implemented and substantial computational gains will be observed across many typical Gibbs sampling problems.

*[1] Amit, Y.
Convergence properties of the Gibbs sampler for perturbations of Gaussians,
”The Annals of Statistics”, 1996. *

* [2] Latuszynski, K., Roberts, G. O. , and Rosenthal, J. S. Adaptive
Gibbs samplers and related MCMC methods, ”The Annals of Applied Probability”,
2013. *

* [3] Roberts, G. O. , and Sahu, S. K. Updating Schemes, Correlation
Structure, Blocking and Parameterization for the Gibbs Sampler, ”Journal of the
Royal Statistical Society”, 1997.*

## 22 March: Luis Pericchi (University of Puerto Rico)

### On prior distributions for scales: The Scaled Beta 2

We put forward the Scaled Beta 2 (SBeta2) as a flexible and tractable family for modeling scales, both for hierarchical and non-hierarchical situations, as an alternative to "vague" inverted gamma priors, and as a generalization of some other proposed replacements of the inverted gamma priors.

The combination of normal priors for locations and inverse--gamma priors for variances is widely extended. This includes the use of vague normal and inverted--gamma priors as representation of ``prior ignorance".

It is known however that, far from being quasi non-informative, the ``vague" inverted-gamma leads to very low variances of the effects and very strong shrinkages to the general mean. Several priors has been proposed as alternatives, but we claim that the SBeta2 shares their advantages and adds flexibility, tractability and has a natural motivation. The SBeta2 class of prior distributions has the attractive property that if the variance parameter is in the family the precision is also in the family. This family of distributions can be obtained in closed form as a gamma scale mixture of gamma distributions, as the student distribution can be obtained as a gamma scale mixture of normals in a hierarchical model. The SBeta2 also arises in Objective Model Selection as Intrinsic Priors and as Divergence based priors in diverse situations.

The SBeta2 unify and generalizes different proposals in the Bayesian literature, and has numerous theoretical and practical advantages:

it is flexible, it can be as heavy or heavier tailed as the half-Cauchy, and different behaviors at the origin can be modeled. Furthermore it is easy to simulate from, and can be embedded in a Gibbs sampling schema. When coupled with a conditional Cauchy prior for locations, the marginal prior for locations can be found explicitly as proportional to known transcendental functions, and for integer values of the hyper-parameters an analytical closed form exists. Furthermore, for specific choices of the hyper--parameters, the marginal is found to be an explicit "Horseshoe" prior which are known to have excellent theoretical and practical properties. To our knowledge this is the first closed form Horseshoe prior obtained. We also show that for certain values of the hyper-parameters the mixture of a normal and a Scaled Beta 2 distributions also gives a closed form marginal.

A general byproduct is the insight about the duality between priors for estimation versus priors for testing. The Scaled Beta 2 is obtained in different ways as a prior for testing, and at the same time it can be justified for estimation.

Applications include, robust hierarchical modeling and meta-analysis, detection of structural breaks in dynamic linear models and age-period-cohort epidemiological models.

We will discuss briefly, different possibilities of generalization to multivariate situations.

This is joint work with Maria Eglee Perez, Isabel Ramirez, Jairo Fuquene, David Torres and Joris Mulder.

## 24 March: Antar Bandyopadhyay (Indian Statistical Institute)

### De-preferential attachment random graphs

*joint work with Subhabrata Sen, Stanford
University*

In this talk we will introduce a new model of a growing sequence of random
graphs where a new vertex is less likely to join to an existing vertex with
high degree and more likely to join to a vertex with low degree. In contrast to
the well studied model of *preferential attachment random graphs* where
higher degree vertices are preferred, we will call our model *de-preferential attachment random graph model*. We will consider two types
of de-preferential attachment models, namely, *inverse de-preferential*,
where the attachment probabilities are inversely proportional to the degree and *linear de-preferential*, where the attachment probabilities are
proportional to c-degree, where c > 0 is a constant. We will give
asymptotic degree distribution for both the model and show that the limiting
degree distribution has very thin tail. We will also show that for a fixed
vertex v, the degree grows as sqrt{log n} for the inverse de-preferential
case and as log n for the linear case, for a graph with n vertices. Some of
the results will also be generalized when each new vertex joins to m > 1
existing vertices.

** **** **

## 07 April, 4pm-6pm: Bernd Bischl (Ludwig-Maximilians-Universität München)

### mlr - Machine Learning in R

The mlr package allows data analysts who are neither experts in machine
learning nor seasoned R programmers to specify complex machine learning experiments in short, succinct and scalable
code.

Experienced programmers, on the other hand, get to wield a large, well-designed
toolbox, which they can easily customize and extend to their needs.

In my presentation I will demonstrate how to perform basic mlr operations like
data import, data pre-processing, model building, performance evaluation and
resampling.
Using these basic building blocks, we will focus on more advanced topics like
benchmarking, model selection and hyper-parameter tuning.

We will also demonstrate how to easily parallelize the most time-consuming
operations in common parallel environments.
The course will end with a short demonstration on how to access the new OpenML
server for open machine learning (http://www.openml.org) which
provides a large repository of benchmark data sets and enables reproducible experiments and meta-analysis.

Participants are encouraged to bring their own laptop to follow the examples on the slides.

## 14 April: Samuel Livingstone (University of Bristol)

### Some things we've learned... about Hamiltonian Monte Carlo

Hamiltonian/Hybrid Monte Carlo (HMC) is a sampling method which has existed for almost 30 years, and recently has become very popular among statisticians, primarily because its efficacy has been shown empirically, statistically oriented tutorials have been written, and general purpose software for its implementation is now available. Comparatively little, however, is understood rigorously about the method. In this talk I'll review HMC, along with some basic concepts in Markov chain theory which are relevant to users of Markov chain Monte Carlo methods. I'll then discuss recent work in which we establish fairly general \pi-irreducibility and geometric ergodicity criteria for the method, giving some basic guidelines on when it should 'work well' for estimating expectations of interest. The results also shed light on how to tune some of the free parameters in the method. If time permits I may also mention some ongoing work on non-Gaussian choices for the distribution of the momentum variable, and how this can either positively or negatively impact performance.

Based on joint work with Michael Betancourt, Simon Byrne
& Mark Girolami.

http://arxiv.org/abs/1601.08057

## 21 April: Arthur Gretton (Gatsby Computational Neuroscience Unit)

## 05 May: Augusto Gerolin (Università di Pisa)

## 12 May: Ajay Jasra (National University of Singapore)

### Multilevel Sequential Monte Carlo Samplers for Normalizing Constants

*joint work with Pierre
Del Moral (INRIA/Bordeaux), Kody Law (Oak ridge) and Yan Zou (NUS)*

This talk considers the sequential Monte Carlo approximation of ratios of normalizing constants associated to posterior distributions which in principle rely on continuum models. Therefore, the Monte Carlo estimation error and the discrete approximation error must be balanced. A multilevel strategy is utilized to substantially reduce the cost to obtain a given error level in the approximation as compared to standard estimators.

Two estimators are considered and relative variance bounds are given.

The theoretical results are numerically illustrated for
the example of identifying a parametrized permeability in an elliptic equation
given point-wise observations of the pressure.