Statistical Science Seminars
Usual time: Thursdays 16:00 - 17:00
Location: Room 102, Department of Statistical Science, 1-19 Torrington Place (1st floor).
Some seminars are held in different locations at different times. Click on the abstract for more details.
Likelihood-based inference for max-stable processes: some recent developments
Max-stable processes are an important class of models for extreme values of processes indexed by space and / or time. They are derived by taking suitably scaled limits of normalized pointwise maxima of stochastic processes; in practice therefore one uses them as models for maxima over many repetitions. However, the complicated nature of their dependence structures means that full (i.e., d-dimensional, where a process is observed at d locations) likelihood inference is not straightforward. Recent work has demonstrated that by including information on when the maxima occurred, full likelihood-based inference is possible for some classes of models. However, whilst this approach simplifies the likelihood enough to make the inference feasible, it can also cause or accentuate bias in parameter estimation for processes that are weakly dependent. In this talk I will describe the ideas behind full likelihood inference for max-stable processes, and discuss how this bias can occur. Understanding of the bias issue helps to identify potential solutions, and I will illustrate one possibility that has been successful in a high-dimensional multivariate model.
Inference for infinite mixture models and Gaussian Process mixtures of experts using simple approximate MAP Inference
The Dirichlet process mixture (DPM) is a ubiquitous, flexible Bayesian nonparametric statistical model. However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as Gibbs sampling are required. As a result, DPM-based methods, which have considerable potential, are restricted to applications in which computational resources and time for inference is plentiful. We develop simplified yet statistically rigorous approximate maximum a-posteriori (MAP) inference algorithms for DPMs. This algorithm is as simple as k-means clustering, performs in experiments as well as Gibbs sampling, while requiring only a fraction of the computational effort. Finally, we demonstrate how this approach can be used to perform inference for infinite mixtures of Gaussian Process experts.
On an alternative class of exact-approximate MCMC algorithms
Consider the standard Metropolis-Hastings (MH) algorithm for a given distribution P on x. This talk is on exact-approximate algorithms that expand the scope of MH to situations where its acceptance ratio r(x, x’) is intractable.
We present a novel class of exact-approximate MH algorithms. The motivation is the desire to benefit averaging of multiple noisy estimates of r(x, x’) and still preserving detailed balance w.r.t. P.
We show that this is indeed possible with the use of a pair of proposal kernels and asymmetric acceptance ratios. Moreover, the steps within one iteration that increase statistical efficiency with the cost of extra computation are parallelizable.
We will discuss two interesting applications of the
methodology (if time permits): They are a simple extension of the exchange
algorithm of Murray et al (2006) for doubly intractable distributions, and a
“multiple jump” version of the Reversible jump MCMC of Green (1995) for
(Joint work with Christophe Andrieu and Arnaud Doucet.)
Murray, I., Ghahramani, Z., and MacKay, D. J. C. (2006). MCMC for doubly-intractable distributions. In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06), pages 359–366.
Green, P. (1995). Reversible jump Markov chain Mnote Carlo for Bayesian model determination. Biometrica, 82(4):711–732.
A reduced-variance approach to Monte Carlo integration is presented, that exploits tools from Gaussian process regression to improve asymptotic convergence rates. The method, called "control functionals", enables efficient, unbiased estimation using un-normalised densities and is well-suited to challenging contemporary applications of Bayesian statistics.
Joint work with Mark Girolami and Nicolas Chopin.
Rough paths theory and regression analysis
Regression analysis aims to use observational data from multiple observations to develop a functional relationship relating explanatory variables to response variables, which is important for much of modern statistics, and econometrics, and also the field of machine learning. In this talk, we consider the special case where the explanatory variable is a stream of information, and the response is also potentially a stream. We provide an approach based on identifying carefully chosen features of the stream which allows linear regression to be used to characterise the functional relationship between explanatory variables and the conditional distribution of the response; the methods used to develop and justify this approach, such as the signature of a stream and the shuffle product of tensors, are standard tools in the theory of rough paths and seem appropriate in this context of regression as well and provide a surprisingly unified and non-parametric approach.
To illustrate the approach we consider the problem of using data to predict the conditional distribution of the near future of a stationary, ergodic time series and compare it with probabilistic approaches based on first fitting a model. We believe our reduction of this regression problem for streams to a linear problem is clean, systematic, and efficient in minimizing the effective dimensionality. The clear gradation of finite dimensional approximations increases its usefulness. Although the approach is non-parametric, it presents itself in computationally tractable and flexible restricted forms in examples we considered. Popular techniques in time series analysis such as AR, ARCH and GARCH can be seen to be special cases of our approach, but it is not clear if they are always the best or most informative choices.
Detecting multiple change-points in panel data
In this paper, we propose a method for detecting multiple change-points in the mean of (possibly) high-dimensional panel data. CUSUM statistics have been widely adopted for change-point detection in both univariate and multivariate data. For the latter, it is of particular interest to exploit the cross-sectional structure and achieve simultaneous change point detection across the panels, by searching for change-points from the aggregation of multiple series of CUSUM statistics, each of which is computed on a single panel. For panel data of high dimensions, the detectability of a change-point is influenced by several factors, such as its sparsity across the panels, the magnitude of jumps at the change-point and the unbalancedness of its location, and having a method that handles a wide range of change-point configurations without any prior knowledge is vital in panel data analysis.
The Sparsified Binary Segmentation and the Double CUSUM Binary Segmentation represent determined efforts in this direction. We investigate under which conditions the two binary segmentation methods attain consistent change-point detection in terms of both the total number and the locations of detected change-points, and conduct a comparative simulation study in which its good performance is demonstrated.
Sun and Lemons: Getting over informational asymmetries in the California Solar Power Market
Using detailed data of approximately 125,000 solar photovoltaic systems installed in California between 2007 and 2014 I argue that the adoption of solar panels from Chinese manufacturers and the introduction of a leasing model for solar systems are closely intertwined. First, cheaper Chinese panels allowed a leasing model to be profitable for contractors. But an asymmetric information problem exists in the market for solar panels. Solar panels are long-lived productive assets, where quality is important but costly for individual consumers to verify. Consumers can instead be expected to rely on brands and observed reliability. This led to a barrier to entry for cheaper panels from new, primarily Chinese manufacturers. The adoption of a leasing model by several large local installers solved the asymmetric information problem and led to the adoption of Chinese panels and in turn lower overall system prices.
An application of Holonomic Gradient method for inference in Directional and Shape Statistics
Holonomic gradient method is a elegant implementation of ODE techniques for efficiently evaluating normalising constants. In the talk we show how to apply this approach to Fisher-Bingham distributions defined on the sphere. Such normalising constants are not available in a closed form and their accurate evaluation is essential for performing the statistical inference with them. This enables us to further develop a new algorithm for shape analysis.
This is a joint work with Tomonari Sei, Keio University, Japan.
Page last modified on 09 feb 15 21:07