Statistical Science Seminars
Usual time: Thursdays 16:00 - 17:00
Location: Room 102, Department of Statistical Science, 1-19 Torrington Place (1st floor).
Some seminars are held in different locations at different times. Click on the abstract for more details.
Likelihood-based inference for max-stable processes: some recent developments
Max-stable processes are an important class of models for extreme values of processes indexed by space and / or time. They are derived by taking suitably scaled limits of normalized pointwise maxima of stochastic processes; in practice therefore one uses them as models for maxima over many repetitions. However, the complicated nature of their dependence structures means that full (i.e., d-dimensional, where a process is observed at d locations) likelihood inference is not straightforward. Recent work has demonstrated that by including information on when the maxima occurred, full likelihood-based inference is possible for some classes of models. However, whilst this approach simplifies the likelihood enough to make the inference feasible, it can also cause or accentuate bias in parameter estimation for processes that are weakly dependent. In this talk I will describe the ideas behind full likelihood inference for max-stable processes, and discuss how this bias can occur. Understanding of the bias issue helps to identify potential solutions, and I will illustrate one possibility that has been successful in a high-dimensional multivariate model.
Inference for infinite mixture models and Gaussian Process mixtures of experts using simple approximate MAP Inference
The Dirichlet process mixture (DPM) is a ubiquitous, flexible Bayesian nonparametric statistical model. However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as Gibbs sampling are required. As a result, DPM-based methods, which have considerable potential, are restricted to applications in which computational resources and time for inference is plentiful. We develop simplified yet statistically rigorous approximate maximum a-posteriori (MAP) inference algorithms for DPMs. This algorithm is as simple as k-means clustering, performs in experiments as well as Gibbs sampling, while requiring only a fraction of the computational effort. Finally, we demonstrate how this approach can be used to perform inference for infinite mixtures of Gaussian Process experts.
On an alternative class of exact-approximate MCMC algorithms
Consider the standard Metropolis-Hastings (MH) algorithm for a given distribution P on x. This talk is on exact-approximate algorithms that expand the scope of MH to situations where its acceptance ratio r(x, x’) is intractable.
We present a novel class of exact-approximate MH algorithms. The motivation is the desire to benefit averaging of multiple noisy estimates of r(x, x’) and still preserving detailed balance w.r.t. P.
We show that this is indeed possible with the use of a pair of proposal kernels and asymmetric acceptance ratios. Moreover, the steps within one iteration that increase statistical efficiency with the cost of extra computation are parallelizable.
We will discuss two interesting applications of the
methodology (if time permits): They are a simple extension of the exchange
algorithm of Murray et al (2006) for doubly intractable distributions, and a
“multiple jump” version of the Reversible jump MCMC of Green (1995) for
(Joint work with Christophe Andrieu and Arnaud Doucet.)
Murray, I., Ghahramani, Z., and MacKay, D. J. C. (2006). MCMC for doubly-intractable distributions. In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06), pages 359–366.
Green, P. (1995). Reversible jump Markov chain Mnote Carlo for Bayesian model determination. Biometrica, 82(4):711–732.
A reduced-variance approach to Monte Carlo integration is presented, that exploits tools from Gaussian process regression to improve asymptotic convergence rates. The method, called "control functionals", enables efficient, unbiased estimation using un-normalised densities and is well-suited to challenging contemporary applications of Bayesian statistics.
Joint work with Mark Girolami and Nicolas Chopin.
Rough paths theory and regression analysis
Regression analysis aims to use observational data from multiple observations to develop a functional relationship relating explanatory variables to response variables, which is important for much of modern statistics, and econometrics, and also the field of machine learning. In this talk, we consider the special case where the explanatory variable is a stream of information, and the response is also potentially a stream. We provide an approach based on identifying carefully chosen features of the stream which allows linear regression to be used to characterise the functional relationship between explanatory variables and the conditional distribution of the response; the methods used to develop and justify this approach, such as the signature of a stream and the shuffle product of tensors, are standard tools in the theory of rough paths and seem appropriate in this context of regression as well and provide a surprisingly unified and non-parametric approach.
To illustrate the approach we consider the problem of using data to predict the conditional distribution of the near future of a stationary, ergodic time series and compare it with probabilistic approaches based on first fitting a model. We believe our reduction of this regression problem for streams to a linear problem is clean, systematic, and efficient in minimizing the effective dimensionality. The clear gradation of finite dimensional approximations increases its usefulness. Although the approach is non-parametric, it presents itself in computationally tractable and flexible restricted forms in examples we considered. Popular techniques in time series analysis such as AR, ARCH and GARCH can be seen to be special cases of our approach, but it is not clear if they are always the best or most informative choices.
Detecting multiple change-points in panel data
In this paper, we propose a method for detecting multiple change-points in the mean of (possibly) high-dimensional panel data. CUSUM statistics have been widely adopted for change-point detection in both univariate and multivariate data. For the latter, it is of particular interest to exploit the cross-sectional structure and achieve simultaneous change point detection across the panels, by searching for change-points from the aggregation of multiple series of CUSUM statistics, each of which is computed on a single panel. For panel data of high dimensions, the detectability of a change-point is influenced by several factors, such as its sparsity across the panels, the magnitude of jumps at the change-point and the unbalancedness of its location, and having a method that handles a wide range of change-point configurations without any prior knowledge is vital in panel data analysis.
The Sparsified Binary Segmentation and the Double CUSUM Binary Segmentation represent determined efforts in this direction. We investigate under which conditions the two binary segmentation methods attain consistent change-point detection in terms of both the total number and the locations of detected change-points, and conduct a comparative simulation study in which its good performance is demonstrated.
Sun and Lemons: Getting over informational asymmetries in the California Solar Power Market
Using detailed data of approximately 125,000 solar photovoltaic systems installed in California between 2007 and 2014 I argue that the adoption of solar panels from Chinese manufacturers and the introduction of a leasing model for solar systems are closely intertwined. First, cheaper Chinese panels allowed a leasing model to be profitable for contractors. But an asymmetric information problem exists in the market for solar panels. Solar panels are long-lived productive assets, where quality is important but costly for individual consumers to verify. Consumers can instead be expected to rely on brands and observed reliability. This led to a barrier to entry for cheaper panels from new, primarily Chinese manufacturers. The adoption of a leasing model by several large local installers solved the asymmetric information problem and led to the adoption of Chinese panels and in turn lower overall system prices.
An application of Holonomic Gradient method for inference in Directional and Shape Statistics
Holonomic gradient method is a elegant implementation of ODE techniques for efficiently evaluating normalising constants. In the talk we show how to apply this approach to Fisher-Bingham distributions defined on the sphere. Such normalising constants are not available in a closed form and their accurate evaluation is essential for performing the statistical inference with them. This enables us to further develop a new algorithm for shape analysis.
This is a joint work with Tomonari Sei, Keio University, Japan.
A Tale of Two Market Microstructures: Spillovers of Informed Trading and Liquidity For Cross Listed Chinese A
The dual-listed Chinese A and B shares reflect the uniqueness of perfect market segmentation. Under a series of regulatory changes that the Chinese Security Regulatory Committee (CSRC) has imposed in more than the last decade, we observe how these interventional events affect such novel underlying structure. We utilize the rational expectation theory and establish a multi-variate time-varying empirical framework to capture the information structure change reflected in both volatility and liquidity provisions. Our results suggest that volatility variation appear to be consistent with various deregulations while the liquidity results opposite to what predicted from the information theory perspective. In particular, we find that, post 2001 B share deregulation, whilst the number of informed traders in B share increased, the variance of B share liquidity traders order submissions significantly dropped and remained low. However, substantial liquidity rise was identified in the variance of counter A share liquidity trader order submissions in 2003 when CSRC allowed qualified foreign institutional traders to access A shares.
Model Assumptions and Truth in Statistics
The way statistics is usually taught and statistical analyses are usually presented can easily seem mysterious. For example, statisticians use methods that assume the data to be normally distributed, and they insist that this assumption needs to be tested for the analysis to be justified, but the same statisticians would also state on another day that they don't believe any such model assumption to be true anyway. Does it make sense to test an assumption that we don't believe to hold regardless of the outcome of the test, and does it make sense to build our methodology on such assumptions?
As a statistician with a long-standing interest in philosophy I built my own way of understanding the reasons for what statisticians do, and that and why these reasons sometimes work well, but sometimes are less convincing. This understanding is influenced by constructivist philosophy and by a framework for relating mathematical models to the reality that we observe and that we deal with in science. That's what this presentation is about.
I will discuss how Frequentist and Bayesian statistics can be understood in terms of what way of thinking about the modelled phenomena they imply, and in what way what is misleadingly called "test of the model assumptions" can inform statistical analyses. This involves taking the unbridgeable gap between models and the modelled reality seriously, and explaining the use of models without referring to them as being supposedly "true". I will also discuss the idea of "approximating" reality by statistical models (Davies, 2014), and mention some practical implications for evaluating the quality of statistical methods in my core statistical research area, which is Cluster Analysis (finding groups in data).
This presentation was originally given at UCL Science and Technology Studies for an audience rather at home in philosophy than in statistics, so expect it to be rather informal (also there is some overlap with a presentation that I gave here a few year ago).
L. Davies: Data Analysis and Approximate Models. CRC Press (2014).
C. Hennig: A Constructivist View of the Statistical Quantification of Evidence. Constructivist Foundations 5 (2009).
C. Hennig: Mathematical Models and Reality - a Constructivist Perspective. Foundations of Science 15, 29-49 (2010).
C. Hennig and T. F. Liao: How to find an appropriate clustering for mixed type variables with application to socioeconomic stratification. Journal of the Royal Statistical Science, Series C 62, 309-369 (2013), with discussion.
Multiple Change-points Estimation by Empirical Bayesian Information Criteria and Gibbs Sampling
We have developed a new method to estimate multiple change-points that may exist in a sequence of observations. The method consists of a specific empirical Bayesian information criterion (emBIC) to assess the fitness and virtue of each candidate configuration of change-points, and also a specific Gibbs sampling induced stochastic search algorithm to find the optimal change-points configuration. It is shown that emBIC can significantly improve over BIC that is known to have tendency of over-detecting multiple change-points.
The use of the stochastic search induced by Gibbs
sampling enables one to find the optimal change-points configuration with high
probability and without going through an exhaustive search that is mostly
computationally infeasible. Simulation studies and real data examples are
presented to illustrate and assess the proposed method.
Algorithmic Design for Big Data: The ScaLE Algorithm
This talk will introduce a new methodology for the systematic error-free Monte Carlo simulation of target distributions. This new method has remarkably good scalability properties as the size of the data set increases (it has sub-linear cost, and potentially no cost), and therefore avoids the drawbacks of Markov chain Monte Carlo methods and is a natural candidate for “Big Data” inference. This is joint work with Paul Fearnhead (Lancaster), Adam Johansen (Warwick) and Gareth Roberts (Warwick).
False positive rates: how do you interpret a single test that gives P = 0.047?
It is widely believed that P values from null hypothesis testing overstate the evidence against the null.
Simulation of t tests suggests that, if you declare that you’ve made a discovery when you have observed P = 0.047 then you’ll be wrong at least 30% of the time, and quite probably more often: see http://rsos.royalsocietypublishing.org/content/1/3/140216
This problem will be discussed from several points of view: e.g. do we look at P = 0.047 or at P ≤ 0.047? Is the point null sensible? To what extent do the conclusions depend on objective Bayesian arguments?
The results of simulations are consistent with the work of J. Berger & Sellke on calibration of P values, and with the work of Valen Johnson on uniformly most-powerful Bayesian tests. On the other hand, one-sided tests with distributed rather than point null give different results.
Finally, I ask what should be done about the teaching of statistics, in the light of these results. Could it be that what’s taught in introductory statistics courses has contributed to the crisis in reproducibility in some sorts of science?
Estimating Reproducibility in Genome-Wide Association Studies
Genome-wide association studies (GWAS) are widely used to discover genetic variants associated with diseases. To control false positives, all findings from GWAS need to be verified with additional evidences, even for associations discovered from a high power study. Replication study is a common verification method by using independent samples. An association is regarded as true positive with a high confidence when it can be identified in both primary study and replication study. Currently, there is no systematic study on the behavior of positives in the replication study when the positive results of primary study are considered as the prior information.
In this paper, two probabilistic measures named Reproducibility Rate (RR) and False Irreproducibility Rate (FIR) are proposed to quantitatively describe the behavior of positive associations (identified in the primary study) in the replication study. RR is a conditional probability measuring how likely a positive association will be positive in the replication study. This can be used to guide the design of replication study, and to check the consistency between the results of primary study and those of replication study. FIR, on the contrary, measures how likely a positive association may still be true even when it is negative in the replication study. This can be used to generate a potential list of true associations in the irreproducible findings. The estimation methods of these two measures are given. Simulation and real datasets are used to show that our estimation results have high accuracy and good prediction performance.
Page last modified on 09 feb 15 21:07