Statistical Science Seminars
Usual time: Thursdays 16:00 - 17:00
Location: Room 102, Department of Statistical Science, 1-19 Torrington Place (1st floor).
Some seminars are held in different locations at different times. Click on the abstract for more details.
Vector-valued Distribution Regression: A Simple and Consistent Approach
We address the distribution regression problem (DRP): regressing on the domain of probability measures, in the two-stage sampled setup when only samples from the distributions are given. The DRP formulation offers a unified framework for several important tasks in statistics and machine learning including multi-instance learning (MIL), or point estimation problems without analytical solution. Despite the large number of MIL heuristics, essentially there is no theoretically grounded approach to tackle the DRP problem in two-stage sampled case. To the best of our knowledge, the only existing technique with consistency guarantees requires kernel density estimation as an intermediate step (which often scale poorly in practice), and the domain of the distributions to be compact Euclidean. We analyse a simple (analytically computable) ridge regression alternative to DRP: we embed the distributions to a reproducing kernel Hilbert space, and learn the regressor from the embeddings to the outputs. We show that this scheme is consistent in the two-stage sampled setup under mild conditions, for probability measure inputs defined on separable, topological domains endowed with kernels, with vector-valued outputs belonging to an arbitrary separable Hilbert space. Specially, choosing the kernel on the space of embedded distributions to be linear and the output space to the real line, we get the consistency of set kernels in regression, which was a 15-year-old open question. In our talk we are going to present (i) the main ideas and results of consistency, (ii) concrete kernel constructions on mean embedded distributions, and (iii) two applications (supervised entropy learning, aerosol prediction based on multispectral satellite images) demonstrating the efficiency of our approach.
Joint work with Arthur Gretton (UCL), Barnabas Poczos (CMU) and Bharath K. Sriperumbudur (PSU).
Directional inference for vector parameters
We consider inference on a vector-valued parameter of interest in a linear exponential family, in the presence of a finite-dimensional nuisance parameter. Based on higher-order asymptotic theory for likelihood, we propose a directional test whose p-value is computed using one-dimensional integration. The work simplifies and develops earlier research on directional tests for continuous models and on higher-order inference for discrete models, and the examples include contingency tables and logistic regression. Examples and simulations illustrate the high accuracy of the method, which we compare with the usual likelihood ratio test and with an adjusted version due to Skovgaard. In high-dimensional settings, such as covariance selection, the approach works essentially perfectly, whereas its competitors can fail catastrophically. Extensions to non-linear exponential families and to general models are also sketched.
(Joint work with A.C. Davison, D.A.S. Fraser, N. Reid.)
Bayes linear uncertainty analysis for complex physical systems modelled by computer simulators
Most large and complex physical systems are studied by mathematical models, implemented as high dimensional computer simulators. While all such cases differ in physical description, each analysis of a physical system based on a computer simulator involves the same underlying sources of uncertainty. There is a growing field of study which aims to quantify and synthesise all of the uncertainties involved in relating models to physical systems, within the framework of Bayesian statistics, and to use the resultant uncertainty specification to address problems of forecasting and decision making based on the application of these methods. This talk will give an overview of aspects of this emerging methodology, with particular emphasis on the Bayes linear approach to emulation, structural discrepancy modelling, iterative history matching and forecasting. The methodology will be illustrated with examples of current areas of practical application, and, in particular, to the analysis of flood models.
A stable particle filter in high dimensions
We consider the filtering problem in high-dimension, that is, when the hidden state lies in dimension d, with d large. This problem is ubiquitous in financial problems, for instance, in the online estimate of volatility. This is a notoriously difficult problem as required exact numerical procedures, such as particle filters, can have a cost that is exponential in d, for the algorithm to be stable in some sense. We develop a new particle filter for a specific class of state-space models in discrete time. This new class of particle filters provide correct Monte Carlo estimates for any fixed d, as do standard particle filters. However, under an i.i.d. structure, we show that in order to achieve some stability properties, this new filter has cost O(nNd^2), where n is the time parameter and N is the number of Monte Carlo samples, that are fixed, independent of d. This suggests that it is possible to tackle some high-dimensional filtering problems using exact Monte Carlo methods that were not previously possible to do so.
Page last modified on 08 oct 12 11:11