Statistical Science Seminars
Usual time: Thursdays 16:00 - 17:00
Location: Room 102, Department of Statistical Science, 1-19 Torrington Place (1st floor).
Some seminars are held in different locations at different times. Click on the abstract for more details.
Optimising pseudo-marginal random walk Metropolis algorithms
Pseudo-marginal MCMC algorithms provide a general recipe for circumventing the need for target density evaluation when calculating the Metropolis-Hastings acceptance probability. Remarkably, replacing the density with an unbiased stochastic estimator thereof still leads to a Markov chain with the desired stationary distribution. We examine the pseudo-marginal random walk Metropolis algorithm and its overall efficiency in terms of both speed of mixing and computational time. Under a frequently encountered regime we identify the optimal acceptance rate and variance of the stochastic estimator. We also provide guidance for more general regimes and close with a surprising conjecture: that in certain regimes choosing the estimator with the largest noise can be optimal.
High-Dimensional Incremental Divisive Clustering under Population Drift
Clustering is a central problem in data mining and statistical pattern recognition with a long and rich history. The advent of Big Data has introduced important challenges to existing clustering methods in the form of high-dimensional, high-frequency, time-varying streams of data.
Up-to-date research on Big Data clustering has been almost exclusively focused on addressing individual aspects of the problem in isolation, largely ignoring whether and how the proposed methods can be extended to address the overall problem. We will discuss an incremental divisive clustering approach for high-dimensional data that has storage requirements that are low and more importantly independent of the stream size, and can identify changes in the population distribution that require a revision of the clustering result.
Stochastic Claims Reserving: Chain Ladder, Double Chain Ladder and Actuarial Practice
This seminar will consider the problem of setting reserves against future claims in general (non-life) insurance. It will focus on the uncertainty of these estimates and the implications for capital setting and solvency requirements. The seminar will consider the approaches taken in practice and show how relatively simple statistical modelling can be used. The advantage of simple approaches is that they are likely to be understood and used, and they can be used relatively widely and consistently. The disadvantages will also be examined and a new approach set out, which aims to retain as much simplicity as possible while addressed some of the inadequacies of the commonly-used teachniques.
Fused Community Detection
Community detection is one of the most widely studied problems in network research. In an undirected graph, communities are regarded as tightly-knit groups of nodes with comparatively few connections between them. Popular existing techniques, such as spectral clustering and variants thereof, rely heavily on the edges being suffi- ciently dense and the community structure being relatively obvious. These are often not satisfactory assumptions for large-scale real-world datasets. We therefore propose a new community detection method, called fused community detection (fcd), which is designed particularly for sparse networks and situations where the community struc- ture may be opaque. The spirit of fcd is to take advantage of the edge information, which we exploit by borrowing sparse recovery techniques from regression problems. Our method is supported by both theoretical results and numerical evidence. The algorithms are implemented in the R package fcd, which is available on cran. This is joint work with Dr. Yang Feng (Columbia University) and Prof. Richard Samworth (University of Cambridge).
Recent Results on the Eigenvalues of Random Matrices with Application to Wireless Communications and to MANOVA
The increasing demand for wireless communications has recently generated interest in multiple-input-multiple-output (MIMO) systems, realized by multiple antennas. Such systems can provide great advantages due to the presence of multiple rays propagation, causing the elements of the channel gain matrix to randomly fluctuate. The channel gain matrix can be well modeled by a random matrix. In particular, the Shannon capacity of MIMO systems depends on the distribution of the eigenvalues of Hermitian matrices, whose dimensions are related to the number of transmitting and receiving antenna elements. In several practical situations, the elements of the channel matrix can be modeled as complex Gaussian random variables, and the wireless system performance is related to the distribution of the eigenvalues of Wishart matrices or complex Gaussian quadratic forms.
In this talk, we present recent results on the distribution of the eigenvalues of complex Wishart matrices and related quadratic forms, with applications to wireless MIMO systems and to spectrum sensing. The case of real Wishart and multivariate Beta matrices is also discussed, with new results on the distribution of the Roy's statistic for MANOVA.
EEG/MEG source reconstruction using 'LDA beamforming' and signal-space projection
In EEG/MEG research, beamforming has been used in conjunction with head models to estimate source activity stemming from regions-of-interest by inverting the linear model. I present two novel approaches to inverse modelling of sources without knowledge of a head model. First, linear discriminant analysis (LDA), mostly used for the classification of mental states, can also be applied to reconstruct the time course of discriminatory brain activity. The optimization problems in LDA and LCMV beamforming are shown to be equivalent. Second, multi-component signal-space projection (MSSP) allows for the recovery of several signals of interest and the explicit modelling of noise sources. Empirical results on the analysis of single-trial ERP latencies are shown. Concluding, LDA beamforming and MSSP are purely data-driven approaches that can complement LCMV beamforming and other classical source reconstruction approaches, particularly when a head model is not available or sensor positions have not been registered.
Estimation in the Presence of Many Nuisance Parameters: Composite Likelihood and Plug-in Likelihood
We consider the incidental parameters problem in this paper, i.e. the estimation for a small number of parameters of interest in the presence of a large number nuisance parameters. By assuming that the observations are taken from a multiple strictly stationary process, the two estimation methods, namely the maximum composite quasi-likelihood estimation (MCQLE) and the maximum plug-in quasi-likelihood estimation (MPQLE) are considered. For the MCQLE, we profile out nuisance parameters based on lower-dimensional marginal likelihoods, while the MPQLE is based on some initial estimators for nuisance parameters. The asymptotic normality for both the MCQLE and the MPQLE is established under the assumption that the number of nuisance parameters and the number of observations go to infinity together, and both the estimators for the parameters of interest enjoy the standard root-$n$ convergence rate. Simulation with a spatial-temporal model illustrates the finite sample properties of the two estimation methods.
Page last modified on 08 oct 12 11:11