Statistical Science


Statistical Science Seminars

Usual time: Thursdays 14:00-15:00

Location: Room 102, Department of Statistical Science, 1-19 Torrington Place (1st floor). Some seminars are held at different locations and at different times.  Please click on the abstract for further details.


03 October 2019: Prof. Estate Khmaladze (Victoria University of Wellington, New Zealand)

Distribution-free testing of parametric regression models

Consider regression of an empirical process. The limit distribution of this process, and therefore that of the test statistics based on this process, depend on the distribution of covariates and on a particular model for the regression function. In (Khm. Koul, 2004) the innovation martingale part of regression process was introduced, and this martingale can easily be scaled down to a standard Brownian motion. Thus "all” test statistics based on the scaled innovation martingale are asymptotically distribution-free.

In this talk, we will present a different and extremely simple approach: we will “rotate” the regression empirical process into another process with limit distribution free from covariates and from the particular model.

The idea behind this rotation can be traced to the 1900 paper of K. Pearson. The chi-square statistic that K. Pearson suggested was invariant under the group of rotations. However, it was not maximal invariant. We will introduce a maximal invariant and show why this step leads to a simple and, we hope, efficient approach to distribution-free theory.


10 October 2019: Prof. Xiaoxu Li (Lanzhou University of Technology, China)

Small-sample image classification based on deep learning

In machine learning and computer vision fields, due to the rapid development of deep learning, recent years have witnessed breakthroughs for large-sample classification tasks. However, it remains a persistent challenge to learn a deep neural network with good generalizability from only a small number of training samples. In fact, humans can easily learn the concept of a class from a small amount of data rather than from millions of data. Moreover, many types of real-world data are small in quantity and are expensive to collect and label. Motivated by this fact, research on deep learning with small samples becomes more and more prevalent in the communities of machine learning and computer vision, for example, researches focusing on one-shot classification, few-shot classification, as well as classification with small training samples. This talk will introduce the background, the research status and challenges of small-sample image classification based on deep learning, as well as some of our work on this topic.


11 October 2019 1230-1330: Prof. Carmen Molina-Paris (University of Leeds)

Stochastic descriptors to study the fate of naive T cell clonotypes in the periphery

The population of naive T cells in the periphery is best described by determining both its T cell receptor diversity, or number of clonotypes, and the sizes of its clonal subsets. In this talk, we make use of a previously introduced mathematical model of naive T cell homeostasis, to study the fate and potential of naive T cell clonotypes in the periphery. This is achieved by the introduction of several new stochastic descriptors for a given naive T cell clonotype, such as its maximum clonal size, the time to reach this maximum, the number of proliferation events required to reach this maximum, the rate of contraction of the clonotype during its way to extinction, as well as the time to a given number of proliferation events. I will show that two fates can be identified for the dynamics of the clonotype: extinction in the short-term if the clonotype experiences too hostile a peripheral environment or establishment in the periphery in the long-term. In this second case the probability mass function for the maximum clonal size is bimodal, with one mode near one and the other mode far away from it. Our model also indicates that the fate of a recent thymic emigrant (RTE) during its journey in the periphery has a clear stochastic component, where the probability of extinction cannot be neglected, even in a friendly but competitive environment. On the other hand, a greater deterministic behaviour can be expected in the potential size of the clonotype seeded by the RTE in the long-term, once it escapes extinction.


17 October 2019: Dr. Vadim Shcherbakov (Royal Holloway, University of London)

Linear competition processes on general graphs

A competition process is a multivariate analogue of the classical birth-and-death process; its name comes from the original motivation to model competition between populations. In my talk, I will discuss the asymptotic behaviour of a version of the process, where the interactions are induced by the adjacency matrix of some given finite graph. While in the absence of interaction the process is barely a collection of independent linear birth processes (Yule's processes), in our case, a component also decreases with the rate proportional to the sum of its neighbouring components; and zero is an adsorbing state for each component (we say that the component becomes extinct). We prove that, with probability one, eventually only a random subset of the processs components survives, which correspond to a so-called independent set of vertices of the graph. The dynamics of the model has a striking resemblance with that of multi-type branching processes, which allows us to adapt ideas from the well-known Athreya's method for studying the long-term behaviour of the branching processes. The talk is based on joint work with S. Volkov.


24 October 2019: Dr. Qiuju Li (University College London)

A joint modelling approach for longitudinal and semi-competing risks data to accommodating informative drop-out and death

In longitudinal studies, both drop-out and death can truncate observation of a longitudinal outcome. We propose a new likelihood-based approach to dealing with both informative dropout and death by jointly modelling the longitudinal outcome and semi-competing event times of dropout and death, where the associations are characterized by shared random effects. In addition to the inference for the unconditional model parameterized for the longitudinal outcome, an important feature of our approach is that the conditional longitudinal outcome profile given being alive (i.e., inferences for the mortal cohort) can be conveniently obtained in a close form. Both maximum likelihood and Bayesian approaches can be used for estimation. The proposed methods are illustrated in the application to inferences for different longitudinal profiles of CD4 cell count for patients from the HIV Epidemiology Research Study (HERS).

31 October 2019: Dr. Heather Battey (Imperial College London)

High dimensional inference

Statistical analysis when the number of unknown parameters is comparable with the number of independent observations may demand modification of  maximum-likelihood-based methods (Bartlett, 1937). There are comparable difficulties with Bayesian analyses based on high dimensional “flat” priors. For an extreme example from a different perspective, see Stein (1956). This discursive talk will cover a number of perspectives on this situation, including the implications of sparsity and the role of different types of parameters.

Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Proc. R. Soc. Lond. A, 160, 268-82
Stein, C. (1956). Inadmissibility of the usual estimator of the mean of a multivariate distribution. Proceedings of the third Berkeley Symposium on Mathematical Statistics and Probability, 297-206.


21 November 2019: Dr. Daniel Kious (University of Bath)

Random walk on the simple symmetric exclusion process

In joint work with Marcelo R. Hilário and Augusto Teixeira, we investigate the long-term behaviour of a random walker evolving on top of the simple symmetric exclusion process (SSEP) at equilibrium. At each jump, the random walker is subject to a drift that depends on whether it is sitting on top of a particle or a hole. The asymptotic behaviour is expected to depend on the density ρ in [0, 1] of the underlying SSEP.

Our first result is a law of large numbers (LLN) for the random walker for all densities ρ except for at most two values ρ- and ρ+ in [0, 1], where the speed (as a function of the density) possibly jumps from, or to, 0.

Second, we prove that, for any density corresponding to a non-zero speed regime, the fluctuations are diffusive and a Central Limit Theorem holds. For the special case in which the density is 1/2 and the jump distribution on an empty site and on an occupied site are symmetric to each other, we prove a LLN with zero limiting speed.

Our main results extend to environments given by a family of independent simple symmetric random walks in equilibrium.

28 November 2019: Dr. Angela Noufaily (University of Warwick)

Comparison of statistical algorithms for daily syndromic surveillance aberration detection

Public health authorities can provide more effective and timely interventions to protect populations during health events if they have effective multi-purpose surveillance systems. These systems rely on aberration detection algorithms to identify potential threats within large datasets. Ensuring the algorithms are sensitive, specific and timely is crucial for protecting public health. 

We evaluate the performance of three detection algorithms extensively used for syndromic surveillance: the ‘rising activity, multilevel mixed effects, indicator emphasis’ (RAMMIE) method and the improved quasi-Poisson regression-based method known as ‘Farrington Flexible’ both currently used at Public Health England, and the ‘Early Aberration Reporting System’ (EARS) method used at the US Centre for Disease Control and Prevention. We model the wide range of data structures encountered within the daily syndromic surveillance systems used by PHE. We undertake extensive simulations to identify which algorithms work best across different types of syndromes and different outbreak sizes. We evaluate RAMMIE for the first time since its introduction. Performance metrics were computed and compared in the presence of a range of simulated outbreak types that were added to baseline data.

12 December 2019: Prof. Dani Gamerman (Federal University of Rio de Janeiro, Brazil)

Spatiotemporal point processes: regression, model specifications and future directions

Point processes are one of the most commonly encountered observation processes in Spatial Statistics. Model-based inference for them depends on the likelihood function. In the most standard setting of Poisson processes, the likelihood depends on the entirely unknown intensity function, and cannot be computed analytically. A number of approximating techniques have been proposed to handle this difficulty. In this talk, we review recent work on exact solutions that solve this infinite dimensional problem without resorting to approximations. The presentation concentrates more heavily on discrete time but also considers continuous time. The solutions are based on model specifications that impose smoothness constraints on the intensity function. We also review approaches to include a regression component and different ways to accommodate it while accounting for additional heterogeneity. Applications are provided to illustrate the results. Finally, we discuss possible extensions to account for discontinuities and/or jumps in the intensity function. Joint work with Flávio B. Gonçalves, Guido A. Moreira, Jony A. Pinto Jr, Marina S. Paez and Edna A. Reis.


Affiliated Seminars