A seminar series covering a broad range of applied and methodological topics in Statistical Science.
Talks take place in hybrid format.
Usual time: Thursdays 14:00-15:00 (to be followed by Departmental Tea in the common room).
Location: For the rest of this academic year, talks will usually take place in B09 (1-19 Torrington Place) and Zoom. Please use the contact information below to join the mailing list, so you will receive location updates and links to the talks.
Contact info: stats-seminars-join at ucl dot ac dot uk
Please subscribe to our Youtube channel, to view some recent talks from the series
Programme for 2023/24
- 5 October 2023: Somak Dutta (Iowa State University) - Bayesian variable selection with embedded screening
Abstract: During the last few decades, substantial research has been devoted to identifying the important covariates in an ultra-high dimensional linear regression where the number of covariates is in the lower exponential order of sample size. While the notion of variable screening focuses on identifying a smaller subset of covariates that includes the important ones with overwhelmingly large probability, the notion of variable selection indulges only on identifying the truly important ones. Typically, because variable selection is computationally costly, a screening step is performed to reduce the number of potential covariates. In this talk, we propose two new novel methodologies. We first introduce a sequential Bayesian rule to incorporate prior information on the true model size and effect sizes for variable screening. Finally, we propose a scalable variable selection method that embeds variable screening in its algorithm, thus providing scalability and alleviating the need for a two-stage method. Our theoretical investigations relax some conditions for screening consistency and selection consistency under ultra-high dimensional setup. We illustrate our methods using a dataset with nearly half a million covariates.
This talk is based on several joint works with Dr. Vivekananda Roy and PhD students Dongjin Li and Run Wang.
- 12 October 2023: Kalliopi Mylona (King's College London) - Multi-objective optimal split-plot experiments: a case study
Abstract: The current work considers an application of a multi-objective optimal split-plot design methodology in a pharmaceutical experiment. The aim of the experimentation was to explore the response surface with respect to various experimental factors, with two randomisation levels, as well as to provide good quality predictions. Due to the nature of the experimental process and the resource limitations, it was desirable to protect the inference against potentially present, but inestimable, model terms. We incorporated previously developed optimality criteria in a single compound criterion. The design search was carried out using a point-exchange algorithm, and a stratum-by-stratum approach, with several weight combinations on the component criteria. We discuss the particularities of this specific example, the choice of the design that was used to run the experiment, and some interesting results that have been obtained.
This is a joint work with Olga Egorova, Aleksandra Olszewska and Ben Forbes.
- 19 October 2023: Mike Smith (the University of Melbourne) - Variational Inference for Cutting Feedback in Misspecified Models
Abstract: Bayesian analyses combine information represented by different terms in a joint Bayesian model. When one or more of the terms is misspecified, it can be helpful to restrict the use of information from suspect model components to modify posterior inference. This is called “cutting feedback”, and both the specification and computation of the posterior for such “cut models” is challenging. In this paper, we define cut posterior distributions as solutions to constrained optimization problems, and propose variational methods for their computation. These methods are faster than existing Markov chain Monte Carlo (MCMC) approaches by an order of magnitude. It is also shown that variational methods allow for the evaluation of computationally intensive conflict checks that can be used to decide whether or not feedback should be cut. Our methods are illustrated in examples, including an application where recent methodological advances that combine variational inference and MCMC within the variational optimization are used.
- 26 October 2023: Antoine Dahlqvist (University of Sussex) - Free independence and random matrices
- The aim of this talk is to discuss and motivate the following principle: the notion of free independence occurs, possibly in different forms, when considering independent random matrices of size NxN, for N large. Our main motivation will be here to predict the spectrum of a random (additive) perturbation X of a matrix with known spectrum.In the early 1990s, Dan Voiculescu introduced the theory of free probability (with a motivation from operator algebra) and showed that the notion of free independence applies to independent random matrices of large size, under unitary invariance assumptions. When the matrices are hermitian, the limit empirical measure of the spectrum of X can then be studied with the notion of free convolution of measures. I will report on two generalisations of this result based on arXiv:2205.01926 and arXiv:1805.07045.1/ A first one pertains to eigenvector empirical spectral distributions (aka local density of states) in place of empirical spectral distributions. When X is the perturbation of a finite rank matrix, this allows to recover partly results on outlier eigenvalues such as the BBP phase transition.2/ The result of Voiculescu is known to require a form of unitary invariance assumption. For instance, it does not allow to consider adjacency matrices of Erdös-Rényi graphs in the sparse regime or Wigner matrices with exploding moments. Surprisingly, under much weaker invariance assumptions that cover these latter cases, a variation of the notion of free independence can nonetheless be proved.
- 2 November 2023: Antonio Linero (The University of Texas at Austin) - In nonparametric and high-dimensional models, Bayesian ignorability is an informative prior
- Abstract: In problems with large amounts of missing data one must model two distinct data generating processes: the outcome process which generates the response and the missing data mechanism which determines the data we observe. Under the ignorability condition of Rubin (1976), however, likelihood-based inference for the outcome process does not depend on the missing data mechanism so that only the former needs to be estimated; partially because of this simplification, ignorability is often used as a baseline assumption. We study the implications of Bayesian ignorability in the presence of high-dimensional nuisance parameters and argue that ignorability is typically incompatible with sensible prior beliefs about the amount of selection bias. We show that, for many problems, ignorability directly implies that the prior on the selection bias is tightly concentrated around zero. This is demonstrated on several models of practical interest, and the effect of ignorability on the posterior distribution is characterized for high-dimensional linear models with a ridge regression prior. We then show both how to build high-dimensional models which encode sensible beliefs about the selection bias and also show that under certain narrow circumstances ignorability is less problematic.
- 9 November 2023: Vanda Inacio De Carvalho (The University of Edinburgh) - Induced nonparametric ROC surface regression
Abstract: The receiver operating characteristic (ROC) surface is a popular tool for evaluating the discriminatory ability of diagnostic tests, measured on a continuous scale, when there are three ordered disease groups. Motivated by the impact that covariates may have on the diagnostic accuracy, and to safeguard against model misspecification, we develop a flexible model for conducting inference about the covariate-specific ROC surface and its functionals. Specifically, we postulate a location-scale regression model for the test outcomes in each of the three disease groups where the mean and variance functions are estimated through penalised-splines, while the distribution of the error term is estimated via a smoothed version of the empirical cumulative distribution function of the standardised residuals. Our simulation study shows that our approach successfully recovers both the true covariate-specific volume under the surface and the optimal pair of thresholds used for classification in a variety of conceivable scenarios. Our methods are motivated by and applied to data derived from an Alzheimer's disease study and we seek to assess the accuracy of several potential biomarkers to distinguish between individuals with normal cognition, mild cognitive impairment, and dementia and how this discriminatory ability may change with age and gender.
- 16 November 2023: Alexander Modell (Imperial College London) - Spectral approaches to representation learning for network data
Abstract: Analysis of network data, describing relationships, interactions and dependencies between entities, often begins with representation learning: the process of mapping these entities into a vector space in a way which preserves salient information in the data. Exploratory analysis of these representations can reveal patterns and latent structures, such as communities, and they may serve as inputs to learning algorithms such as clustering, regression, classification and neighbour recommendation. Spectral embedding, in which representations are constructed from the eigenvectors of a specially designed matrix, has emerged as a simple yet effective approach which is both highly scalable and interpretable. In the first part of this talk, I will provide a statistical lens into spectral embedding, elucidating how the eigenvectors of different matrices extract different information from the network and exploring model-based explanations the geometric patterns it produces. In particular, I will focus on spectral embedding with the random walk Laplacian matrix, and show how unlike other popular matrix constructions, it produces representations which are agnostic to node degrees. In the second part of this talk, I will present a framework for representation learning for dynamic network data describing instantaneous interactions between entities which occur in continuous time. The framework produces continuously evolving vectors trajectories which reflect the continuously evolving structural roles of the nodes in the network and allows nodes to be meaningfully compared at different points in time.
This talk is based on joint works with Patrick Rubin-Delanchy, Nick Whiteley, Ian Gallagher and Emma Ceccherini.
- 23 November 2023: Pantelis Samartsidis (MRC-BSU, University of Cambridge) - A modularized Bayesian factor analysis model for policy evaluation
The problem of estimating the effect of an intervention/policy from time-series observational data on multiple units arises frequently in many fields applied research such as epidemiology, econometrics and political science. In this talk, we propose a Bayesian causal factor analysis model for estimating intervention effects in such a setting. The model includes a regression component to adjust for observed potential confounders and its latent component can account for certain forms of unobserved confounding. Further, it can deal with outcomes of mixed type (continuous, binomial, count) and increase efficiency in the estimates of the causal effects by jointly modelling multiple outcomes affected by the intervention. In policy evaluation problems, it is often of interest to study structure in the estimated effects. We therefore extend our approach to model effect heterogeneity. Specifically, we demonstrate that modelling effect heterogeneity is not straightforward in causal factor analysis, due to non-identifiability. We then demonstrate how this problem can be circumvented using a modularization approach that prevents post-intervention data from informing a subset of the model parameters. An MCMC algorithm for posterior inference is proposed and the method is used to evaluate the impact of Local Tracing Partnerships on the effectiveness of England's Test and Trace programme for COVID-19.
- 30 November 2023: Miguel De Carvalho (University of Edinburgh) - Time-changing multivariate extremes
Abstract: In this talk, I will overview recent developments in nonstationary multivariate extremes. After providing motivation and background, I will introduce a Bayesian time-varying model that learns about the dynamics governing joint extreme values over time. The proposed model relies on dual measures of time-varying extremal dependence, that are modelled via a suitable class of generalized linear models conditional on a large threshold. The application of the proposed methods to some of the world’s most important stock markets reveals complex patterns of extremal dependence over the last 30 years, including passages from asymptotic dependence to asymptotic independence.
- 7 December 2023: Thomas Burnett (University of Bath) - Adaptive enrichment trials: What are the benefits?
Abstract: When planning a Phase III clinical trial, suppose a certain subset of patients is expected to respond particularly well to the new treatment. Adaptive enrichment designs make use of interim data in selecting the target population for the remainder of the trial, either continuing with the full population or restricting recruitment to the subset of patients. We define a multiple testing procedure that maintains strong control of the familywise error rate, while allowing for the adaptive sampling procedure. We derive the Bayes optimal rule for deciding whether or not to restrict recruitment to the subset after the interim analysis and present an efficient algorithm to facilitate simulation-based optimisation, enabling the construction of Bayes optimal rules in a wide variety of problem formulations. We compare adaptive enrichment designs with traditional non-adaptive designs in a broad range of examples and draw clear conclusions about the potential benefits of adaptive enrichment.
- 14 December 2023: Pierre Alquier (ESSEC Business School) - Rates of convergence in Bayesian meta-learning
Abstract: The rate of convergence of Bayesian learning algorithms is determined by two conditions: the behavior of the loss function around the optimal parameter (Bernstein condition), the probability mass given by the prior to neighborhoods of the optimal parameter.In meta-learning, we face multiple learning tasks, that are independent but are still expected to be related in some way. For example, the optimal parameters of all the tasks can be close to each other. It is then tempting to use the past tasks to build a better prior, that we use to solve future tasks more efficiently. From a theoretical point of view, we hope to improve the prior mass condition in future tasks, and thus, the rate of convergence. In this paper, we prove that this is indeed the case. Interestingly, we also prove that we can learn the optimal prior at a fast rate of convergence, regardless of the rate of convergence within the tasks (in other words, Bernstein condition is always satisfied for learning the prior, even when it is not satisfied within tasks).
This is joint work with Charles Riou (University of Tokyo and RIKEN AIP) and Badr-Eddine Chérief-Badellatif (CNRS). The preprint is available on arXiv: https://arxiv.org/abs/2302.11709
- 11 January 2024: Sam Power (University of Bristol) - A State-Space Perspective on Modelling and Inference for Online Skill Rating
- Abstract: In the quantitative analysis of competitive sports, a fundamental task is to estimate the skills of the different agents (‘players’) involved in a given competition based on the outcome of pairwise comparisons (‘matches’) between said players, often in an online setting. In this talk, I will discuss recent work in which we advocate for adoption of the state-space modelling paradigm in solving this problem. This perspective facilitates the decoupling of modeling from inference, enabling a more focused approach to development and critique of model assumptions, while also fostering the development of general-purpose inference tools. I will first describe some illustrative model classes which arise in this framework, before turning to a careful discussion of inference and computation strategies for these models. A key challenge throughout is to develop methodology which scales gracefully to problems with a large number of players and a high frequency of matches. I then conclude by describing some real-data applications of our approach, demonstrating how this framework facilitates a practical workflow across different sports. This is joint work with Samuel Duffield (Normal Computing) and Lorenzo Rimella (Lancaster University).
- 18 January 2024: Hans Kersting (Yahoo! Research) - The beneficial role of stochastic noise in SGD
- Abstract: The data sets used to train modern machine-learning models are often huge, e.g. millions of images. This makes it too expensive to compute the true gradient over all data sets. In each gradient descent (GD) step, a stochastic gradient is thus computed over a subset ("mini-batch”) of data. The resulting stochastic gradient descent (SGD) algorithm, and its variants, is the main workhorse of modern machine learning. Until recently, most machine-learning researchers would have preferred to use GD, if they could, and considered SGD only as a fast approximation to GD. But new research suggests that the stochasticity in SGD is part of the reason why SGD works so well. In this talk, we investigate multiple theories on the advantages of the noise in SGD, including better generalization in flatter minima (‘implicit bias’) and faster escapes from difficult parts of the landscapes (such as saddle points and local minima). We highlight how correlating noise can help optimization and zoom in on the question which noise structure would be optimal for SGD.
- 25 January 2024: Saifuddin (Saif) Syed (University of Oxford) - Scalable Bayesian inference with annealing algorithms
- Abstract: Monte Carlo (MC) methods are the most widely used tools in Bayesian statistics for making inferences from complex posterior distributions. The performance of classical MC algorithms is fragile for challenging problems where the posterior is high-dimensional with well-separated modes. Annealing is a technique that adds robustness by introducing a tractable reference distribution (e.g. prior) where inference is robust and a path of distributions that continuously interpolates to the intractable poster distribution. An annealing algorithm specifies how to traverse this path to transform inferences from the reference into inferences on the target. I will show how to design and analyze a popular annealing algorithm called parallel tempering (PT), which improves the robustness of MCMC algorithms using parallel computing. I will identify near-optimal tuning guidelines for PT and an efficient black-box algorithm to implement it scalable to GPUs. Finally, I will demonstrate how to leverage the analysis of PT to establish the efficiency of a family of annealing algorithms, including annealed importance sampling and sequential Monte Carlo samplers.
- 8 February 2024: Mona Azadkia (LSE) - A Simple Measure of Conditional Dependence
- We propose a coefficient of conditional dependence between two random variables, Y and Z, given a set of other variables X_1, ..., X_p, based on an i.i.d. sample. The coefficient has a long list of desirable properties, the most important of which is that under absolutely no distributional assumptions, it converges to a limit in [0, 1], where the limit is 0 if and only if Y and Z are conditionally independent, given X_1, ..., X_p, and is 1 if and only if Y is equal to a measurable function of Z given X_1, ..., X_p. Using this statistic, we devise a new variable selection algorithm called Feature Ordering by Conditional Independence (FOCI), which is model-free, has no tuning parameters, and is provably consistent under sparsity assumptions. In this talk, we explore the recent advancement on this measure.
- 15 February 2024: Jiwoon Park (Air Force Academy of South Korea) - Plateau of the hierarchical phi4 model
Abstract: Statistical physics serves as a systematic exploration of deriving macroscopic laws from microscopic foundations, playing a pivotal role in elucidating phenomena across natural science and engineering. The renormalization group method stands out as a cornerstone in this discipline, celebrated among physicists as the standard approach. However, its comprehension within mathematical physics remains a challenge. In this presentation, I delve into the hierarchical phi4 model—a synthetic construct designed for implementing the renormalization group method. While artificial, this model captures essential physical characteristics shared with more realistic counterparts like the O(N)-spin model. I aim to shed light on the historical context of the phi4 model, exploring its evolution and significance, and addressing intriguing questions such as the plateau of the spin-spin correlation. Additionally, I will discuss recent developments in the method, drawing attention to an insightful exploration detailed in arXiv:2306.00896. Through these insights, I aim to bridge the gap in understanding the renormalization group method from a mathematical physics perspective.
- 22 February 2024: David Phillippo (University of Bristol) - Multilevel network meta-regression for population-adjusted treatment comparisons based on individual and aggregate level data
- Abstract: Network meta-analysis (NMA) and indirect comparisons combine aggregate data from multiple randomised controlled trials to estimate relative treatment effects, assuming that any effect modifiers are balanced across populations. Population adjustment methods aim to relax this assumption, using individual patient data available from one or more studies to adjust for differences in effect modifiers between populations. However, current approaches have several limitations: matching-adjusted indirect comparison and simulated treatment comparison can only be used in a two-study scenario and can only provide estimates in the aggregate study population, and current meta-regression approaches incur aggregation bias We propose multilevel network meta-regression (ML-NMR), a general method for synthesising individual and aggregate data in networks of any size, extending the standard NMA framework. An individual-level regression model is defined, and aggregate study data are incorporated appropriately by integrating this model over the covariate distributions of the respective studies, which avoids aggregation bias. We take a general numerical approach using Quasi-Monte Carlo integration, accounting for correlations between covariates using copulae. Crucially for decision making, estimates may be provided in any target population with a given covariate distribution. We then further generalise ML-NMR to cases where the aggregate-level likelihood has no known closed form. Most notably this includes survival or time-to-event outcomes, which make up the large majority of population adjustment analyses to date. We illustrate the approach with examples and compare the results to those obtained using previous methods. A user-friendly R package multinma is available for performing ML-NMR analyses.
- 29 February 2024: Raiha Browning (University of Warwick) - Using AMIS to obtain subnational estimates of onchocerciasis transmission parameters to inform disease control efforts
Onchocerciasis, also known as river blindness, is a neglected tropical disease (NTD) caused by the parasitic filarial nematode Onchocerca volvulus, and is transmitted through the bites of Simulium blackflies. Understanding the key epidemiological parameters driving disease transmission is crucial for developing effective strategies to achieve the goals set out in the WHO’s 2021–2030 Roadmap for NTDs. Adaptive multiple important sampling (AMIS) is an extension of importance sampling whereby, at each iteration, the proposal distribution is updated using the framework of multiple mixture estimators, and the importance weights from all simulations (past and present) are recomputed. We use an extension of AMIS to fit geostatistical prevalence maps for onchocerciasis in West, Central and East African countries, to a transmission model to obtain sub-national estimates of disease prevalence at a fine spatial scale, without the need to run the transmission model many times. This ultimately produces a weighted sample of the transmission model parameters for each location or pixel in the map, which can be used to project and compare future outcomes of disease prevalence under various intervention scenarios.
This is joint work with Matt Dixon (ICL), Martin Walker (RVC), Maria-Gloria Basáñez (ICL), Simon Spencer (Warwick) and Déirdre Hollingsworth (Oxford).
- 7 March 2024: Eftychia Solea (QMUL) - High-dimensional Nonparametric Functional Graphical Models via the Functional Additive Partial Correlation Operator
Abstract: This article develops a novel approach for estimating a high-dimensional and nonparametric graphical model for functional data. Our approach is built on a new linear operator, the functional additive partial correlation operator, which extends the partial correlation matrix to both the nonparametric and functional settings. We show that its nonzero elements can be used to characterize the graph, and we employ sparse regression techniques for graph estimation. Moreover, the method does not rely on any distributional assumptions and does not require the computation of multi-dimensional kernels, thus avoiding the curse of dimensionality. We establish both estimation consistency and graph selection consistency of the proposed estimator, while allowing the number of nodes to grow with the increasing sample size. Through simulation studies, we demonstrate that our method performs better than existing methods in cases where the Gaussian or Gaussian copula assumption does not hold. We also demonstrate the performance of the proposed method by a study of an electroencephalography data set to construct a brain network.
- 14 March 2024: Pike-Burke, Ciara M (Imperial College London) - Trading-Off Payments and Accuracy in Online Classification
We consider online binary classification where in each round, before making a prediction the learner can choose to ask some a number of stochastic experts for their advice. In contrast to the standard experts problem, we investigate the case where each expert needs to be paid before they provide their advice, and that the amount we pay them directly influences the accuracy of their prediction through some unknown productivity function. In each round, the learner must decide how much to pay each expert and then make a prediction. They incur a cost equal to a weighted sum of the prediction error and upfront payments for all experts. We introduce an online learning algorithm and analyse its total cost compared to that of a predictor which knows the productivity of all experts in advance. In order to achieve this result, we combine Lipschitz bandits and online classification with surrogate losses.
Joint work with: Dirk van der Hoeven, Hao Qiu, Nicolo Cesa-Bianchi
- 21 March 2024: Daniela Castro Camilo (University of Glasgow) - A bivariate spatial extreme mixture model for unreplicated heavy metal soil contamination
Geostatistical models for multivariate applications such as heavy metal soil contamination work under Gaussian assumptions and may result in underestimated extreme values and misleading risk assessments. A more suitable framework to analyse extreme values is extreme value theory (EVT). However, EVT relies on time replications, which are generally unavailable in geochemical datasets. Therefore, using EVT to map soil contamination requires adaptation to the usual single-replicate data framework of soil surveys. We propose a bivariate spatial extreme mixture model to model the body and tail of contaminant pairs, where the tails are described using a stationary generalised Pareto distribution. We demonstrate the performance of our model using a simulation study and through modelling bivariate soil contamination in the Glasgow conurbation.Model results are given in terms of maps of predicted marginal concentrations and probabilities of joint exceedance of soil guideline values. Marginal concentration maps show areas of elevated lead levels along the Clyde River and elevated levels of chromium around the south and southeast villages, such as East Kilbride and Wishaw. The joint probability maps show higher probabilities of joint exceedance to the south and southeast of the city centre, following known legacy contamination regions in the Clyde River basin.