Statistical Science Seminars
Usual time: Thursdays 16:00 - 17:00
Location: Room 102, Department of Statistical Science, 1-19 Torrington Place (1st floor).
Some seminars are held in different locations at different times. Click on the abstract for more details.
Spatial statistics with Markov properties
In spatial statistics, Gaussian Markov random fields on graphs and lattices have traditionally been viewed as a completely separate approach from continuous covariance functions, and also separate from many other methods developed to handle large data sets and non-stationary phenomena. In this talk, I will show some fundamental connections between several of these approaches, illustrate how Markov models based on stochastic partial differential equations (SPDEs) can be used, and discuss some current and future challenges.
On random geometric subdivisions
I will present several models of random geometric subdivisions, similar to that of Diaconis and Miclo (Combinatorics, Probability and Computing, 2011), where a triangle is split into 6 smaller triangles by its medians, and one of these parts is randomly selected as a new triangle, and the process continues ad infinitum. I will show that in a similar model the limiting shape of an indefinite subdivision of a quadrilateral is a parallelogram. I will also show that the geometric subdivisions of a triangle by angle bisectors converge (but only weakly) to a non-atomic distribution, and that the geometric subdivisions of a triangle by choosing a uniform random points on its sides converges to a flat triangle, similarly to the result of the paper mentioned above.
High-dimensional change point detection with sparse alternatives
We consider the problem of detecting a change in mean in a sequence of Gaussian vectors. It is assumed that the change in mean occurs only in a subset of unknown size of the vector components while the other components remain unchanged. We propose a non-parametric test for detecting the change in mean that is adaptive to the number of changing components. Under high-dimensional assumptions on the vector dimension and on the sequence length we show minimax rate-optimality of the test. The testing procedure is applied to segmentation of audio signals.
(Joint work with Zaid Harchaoui, INRIA Grenoble).
Estimation of Extreme Quantiles for Functions of Dependent Random Variables
Motivated by a concrete risk management problem in financial industry, we propose a new method for estimating the extreme quantiles for a function of several dependent random variables. In contrast to the conventional approach based on extreme value theory, we do not impose the condition that the tail of the underlying distribution admits an approximate parametric form, and, furthermore, our estimation makes use of the full observed data. The proposed method is semiparametric as no parametric forms are assumed on all the marginal distributions. But we select appropriate bivariate copulas to model the joint dependence structure by taking the advantage of the recent development in constructing large dimensional vine copulas. Consequently a sample quantile resulted from a large bootstrap sample drawn from the fitted joint distribution is taken as the estimates for the extreme quantile. This estimator is proved to be consistent as long as the quantile to be estimated is not too extreme. The reliable and robust performance of the proposed method is further illustrated by simulation.
Inference for generalized linear mixed models with sparse structure
Generalized linear mixed models are a natural and widely used class of models, but one in which the likelihood often involves an integral of very high dimension. Because of this intractability, it is common to conduct inference by using an approximation to the likelihood in place of the true likelihood. However, existing approximations to the likelihood often fail in models which have sparse structure, in that the data only provide a small amount of information on each random effect, which can result in misleading inference for the model parameters. I will introduce a new approximation method, which exploits the structure of the integrand of the likelihood to reduce the cost of finding a good approximation to the likelihood in models with sparse structure. I will demonstrate the method for models for tournaments between pairs of players, and for models with nested random-effect structure.
A Kernel Independence Test for Random Processes
A non-parametric approach to the problem of testing the independence of two random processes will be presented. The test statistic is the Hilbert-Schmidt Independence Criterion (HSIC), which was used previously in testing independence for i.i.d. pairs of variables. The asymptotic behaviour of HSIC will be established when computed from samples drawn from random processes. We will show that earlier bootstrap procedures which worked in the i.i.d. case will fail for random processes, and an alternative consistent estimate of the p-values will be proposed. Tests on artificial data and real-world forex data indicate that the new test procedure discovers dependence which is missed by linear approaches, while the earlier bootstrap procedure returns an elevated number of false positives.
Density estimation in infinite dimensional exponential families
In this work, we consider the problem of estimating densities in an infinite dimensional exponential family indexed by functions in a reproducing kernel Hilbert space. Since standard techniques like maximum likelihood estimation (MLE) or pseudo MLE (based on the method of sieves) do not yield practically useful estimators, we propose an estimator based on the minimization of Fisher divergence, which involves solving a simple linear system. We show that the proposed estimator is consistent, and provide convergence rates under smoothness assumptions (precisely, under the assumption that the true parameter or function that generates the data generating distribution lies in the image of a certain covariance operator). We also empirically demonstrate that the proposed method outperforms the standard non-parametric kernel density estimator.
Joint work with Kenji Fukumizu, Arthur Gretton and Aapo Hyvarinen.
Consider the problem of estimating, for fixed $w$, $A$ the evaluation $\langle w,x\rangle$ of an unknown signal $x$ observed via the process $b = Ax + \varepsilon$. A number of scenarios, including rank-1 matrix completion, can be reduced to this setting.
If $A$ is very large, then inverting it (or computing a Moore-Penrose pseudo inverse) is computationally unattractive. In this talk, I will discuss how to exploit the algebraic and combinatorial structure of $A$ (when present) to obtain variance-minimizing (MVUE) estimators much more efficiently. The optimal estimators arise from a general scheme which allows smooth tradeoffs between estimator quality and computational complexity.
This is joint work with Franz J. Király.
Modeling surfaces and stratified spaces
We look at models for using geometry and topology for modeling surfaces and mixtures of subspaces of different dimension. We introduce a statistic, the persistent homology transform (PHT), to model surfaces and shapes.. This statistic is a collection of persistence diagrams -- multiscale topological summaries used extensively in topological data analysis. We use the PHT to represent shapes and execute operations such as computing distances between shapes or classifying shapes. We prove the map from the space of simplicial complexes in three dimensions into the space spanned by this statistic is injective. This implies that the statistic is a sufficient statistic for distributions on the space of “smooth” shapes. We also show that a variant of this statistic, the Euler Characteristic Transform (ECT), admits a simple exponential family formulation which is of use in providing likelihood based inference for shapes and surfaces. We illustrate the utility of this statistic on simulated and real data.
We introduce a Bayesian model for inferring mixtures of subspaces of different dimensions. The key challenge in such a model is specifying prior distributions over subspaces of different dimensions. We address this challenge by embedding subspaces or Grassmann manifolds into a sphere of relatively low dimension and specifying priors on the sphere. We provide an efficient sampling algorithm for the posterior distribution of the model parameters. We also prove posterior consistency of our procedure. The utility of this approach is demonstrated with applications to real and simulated data.
Page last modified on 08 oct 12 11:11