Statistical Science Seminars
Usual time: Thursdays 16:00 - 17:00
Location: Room 102, Department of Statistical Science, 1-19 Torrington Place (1st floor).
Some seminars are held in different locations at different times. Click on the abstract for more details.
Vector-valued Distribution Regression: A Simple and Consistent Approach
We address the distribution regression problem (DRP): regressing on the domain of probability measures, in the two-stage sampled setup when only samples from the distributions are given. The DRP formulation offers a unified framework for several important tasks in statistics and machine learning including multi-instance learning (MIL), or point estimation problems without analytical solution. Despite the large number of MIL heuristics, essentially there is no theoretically grounded approach to tackle the DRP problem in two-stage sampled case. To the best of our knowledge, the only existing technique with consistency guarantees requires kernel density estimation as an intermediate step (which often scale poorly in practice), and the domain of the distributions to be compact Euclidean. We analyse a simple (analytically computable) ridge regression alternative to DRP: we embed the distributions to a reproducing kernel Hilbert space, and learn the regressor from the embeddings to the outputs. We show that this scheme is consistent in the two-stage sampled setup under mild conditions, for probability measure inputs defined on separable, topological domains endowed with kernels, with vector-valued outputs belonging to an arbitrary separable Hilbert space. Specially, choosing the kernel on the space of embedded distributions to be linear and the output space to the real line, we get the consistency of set kernels in regression, which was a 15-year-old open question. In our talk we are going to present (i) the main ideas and results of consistency, (ii) concrete kernel constructions on mean embedded distributions, and (iii) two applications (supervised entropy learning, aerosol prediction based on multispectral satellite images) demonstrating the efficiency of our approach.
Joint work with Arthur Gretton (UCL), Barnabas Poczos (CMU) and Bharath K. Sriperumbudur (PSU).
Directional inference for vector parameters
We consider inference on a vector-valued parameter of interest in a linear exponential family, in the presence of a finite-dimensional nuisance parameter. Based on higher-order asymptotic theory for likelihood, we propose a directional test whose p-value is computed using one-dimensional integration. The work simplifies and develops earlier research on directional tests for continuous models and on higher-order inference for discrete models, and the examples include contingency tables and logistic regression. Examples and simulations illustrate the high accuracy of the method, which we compare with the usual likelihood ratio test and with an adjusted version due to Skovgaard. In high-dimensional settings, such as covariance selection, the approach works essentially perfectly, whereas its competitors can fail catastrophically. Extensions to non-linear exponential families and to general models are also sketched.
(Joint work with A.C. Davison, D.A.S. Fraser, N. Reid.)
Bayes linear uncertainty analysis for complex physical systems modelled by computer simulators
Most large and complex physical systems are studied by mathematical models, implemented as high dimensional computer simulators. While all such cases differ in physical description, each analysis of a physical system based on a computer simulator involves the same underlying sources of uncertainty. There is a growing field of study which aims to quantify and synthesise all of the uncertainties involved in relating models to physical systems, within the framework of Bayesian statistics, and to use the resultant uncertainty specification to address problems of forecasting and decision making based on the application of these methods. This talk will give an overview of aspects of this emerging methodology, with particular emphasis on the Bayes linear approach to emulation, structural discrepancy modelling, iterative history matching and forecasting. The methodology will be illustrated with examples of current areas of practical application, and, in particular, to the analysis of flood models.
Online Changepoint Detection: A new way of thinking
Online changepoint detection has its origins in statistical process control where once a changepoint is detected the process is stopped, the fault rectified and the process monitoring then begins in control again. In modern day applications such as network traffic and medical monitoring it is infeasible to adopt this strategy. In particular the out of control monitoring is often vital to diagnosis of the problem; instead of fault analysis monitoring continues throughout the period of change and a second change is indicated when the process returns to the control state.
Recent offline changepoint detection literature has demonstrated the importance of considering the changepoints globally and not focusing on detecting a single changepoint in the presence of several. In this talk we will argue that this is also the case for online changepoint detection and discuss what is meant by a "global" view in online detection. This presents several problems as the standard definitions of average run length and detection delay are not clearly applicable. Following consideration of this we show the increased accuracy in future (and past) changepoint detections when taking this viewpoint and demonstrate the method on real world applications.
A stable particle filter in high dimensions
We consider the filtering problem in high-dimension, that is, when the hidden state lies in dimension d, with d large. This problem is ubiquitous in financial problems, for instance, in the online estimate of volatility. This is a notoriously difficult problem as required exact numerical procedures, such as particle filters, can have a cost that is exponential in d, for the algorithm to be stable in some sense. We develop a new particle filter for a specific class of state-space models in discrete time. This new class of particle filters provide correct Monte Carlo estimates for any fixed d, as do standard particle filters. However, under an i.i.d. structure, we show that in order to achieve some stability properties, this new filter has cost O(nNd^2), where n is the time parameter and N is the number of Monte Carlo samples, that are fixed, independent of d. This suggests that it is possible to tackle some high-dimensional filtering problems using exact Monte Carlo methods that were not previously possible to do so.
Likelihood-based inference for max-stable processes: some recent developments
Max-stable processes are an important class of models for extreme values of processes indexed by space and / or time. They are derived by taking suitably scaled limits of normalized pointwise maxima of stochastic processes; in practice therefore one uses them as models for maxima over many repetitions. However, the complicated nature of their dependence structures means that full (i.e., d-dimensional, where a process is observed at d locations) likelihood inference is not straightforward. Recent work has demonstrated that by including information on when the maxima occurred, full likelihood-based inference is possible for some classes of models. However, whilst this approach simplifies the likelihood enough to make the inference feasible, it can also cause or accentuate bias in parameter estimation for processes that are weakly dependent. In this talk I will describe the ideas behind full likelihood inference for max-stable processes, and discuss how this bias can occur. Understanding of the bias issue helps to identify potential solutions, and I will illustrate one possibility that has been successful in a high-dimensional multivariate model.
Inference for infinite mixture models and Gaussian Process mixtures of experts using simple approximate MAP Inference
The Dirichlet process mixture (DPM) is a ubiquitous, flexible Bayesian nonparametric statistical model. However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as Gibbs sampling are required. As a result, DPM-based methods, which have considerable potential, are restricted to applications in which computational resources and time for inference is plentiful. We develop simplified yet statistically rigorous approximate maximum a-posteriori (MAP) inference algorithms for DPMs. This algorithm is as simple as k-means clustering, performs in experiments as well as Gibbs sampling, while requiring only a fraction of the computational effort. Finally, we demonstrate how this approach can be used to perform inference for infinite mixtures of Gaussian Process experts.
Page last modified on 08 oct 12 11:11