Statistical Science Seminars 2023-24
A seminar series covering a broad range of applied and methodological topics in Statistical Science.
Talks take place in hybrid format.
Usual time: Thursdays 14:00-15:00 (to be followed by Departmental Tea in the common room).
Location: For the rest of this academic year, talks will usually take place in B09 (1-19 Torrington Place) and Zoom. Please use the contact information below to join the mailing list, so you will receive location updates and links to the talks.
Contact info: stats-seminars-join at ucl dot ac dot uk
Recent talks
Please subscribe to our Youtube channel, to view some recent talks from the series
Programme for 2023/24
Abstract: During the last few decades, substantial research has been devoted to identifying the important covariates in an ultra-high dimensional linear regression where the number of covariates is in the lower exponential order of sample size. While the notion of variable screening focuses on identifying a smaller subset of covariates that includes the important ones with overwhelmingly large probability, the notion of variable selection indulges only on identifying the truly important ones. Typically, because variable selection is computationally costly, a screening step is performed to reduce the number of potential covariates. In this talk, we propose two new novel methodologies. We first introduce a sequential Bayesian rule to incorporate prior information on the true model size and effect sizes for variable screening. Finally, we propose a scalable variable selection method that embeds variable screening in its algorithm, thus providing scalability and alleviating the need for a two-stage method. Our theoretical investigations relax some conditions for screening consistency and selection consistency under ultra-high dimensional setup. We illustrate our methods using a dataset with nearly half a million covariates.
This talk is based on several joint works with Dr. Vivekananda Roy and PhD students Dongjin Li and Run Wang.
Abstract: The current work considers an application of a multi-objective optimal split-plot design methodology in a pharmaceutical experiment. The aim of the experimentation was to explore the response surface with respect to various experimental factors, with two randomisation levels, as well as to provide good quality predictions. Due to the nature of the experimental process and the resource limitations, it was desirable to protect the inference against potentially present, but inestimable, model terms. We incorporated previously developed optimality criteria in a single compound criterion. The design search was carried out using a point-exchange algorithm, and a stratum-by-stratum approach, with several weight combinations on the component criteria. We discuss the particularities of this specific example, the choice of the design that was used to run the experiment, and some interesting results that have been obtained.
This is a joint work with Olga Egorova, Aleksandra Olszewska and Ben Forbes.
Abstract: Bayesian analyses combine information represented by different terms in a joint Bayesian model. When one or more of the terms is misspecified, it can be helpful to restrict the use of information from suspect model components to modify posterior inference. This is called “cutting feedback”, and both the specification and computation of the posterior for such “cut models” is challenging. In this paper, we define cut posterior distributions as solutions to constrained optimization problems, and propose variational methods for their computation. These methods are faster than existing Markov chain Monte Carlo (MCMC) approaches by an order of magnitude. It is also shown that variational methods allow for the evaluation of computationally intensive conflict checks that can be used to decide whether or not feedback should be cut. Our methods are illustrated in examples, including an application where recent methodological advances that combine variational inference and MCMC within the variational optimization are used.
Abstract: The receiver operating characteristic (ROC) surface is a popular tool for evaluating the discriminatory ability of diagnostic tests, measured on a continuous scale, when there are three ordered disease groups. Motivated by the impact that covariates may have on the diagnostic accuracy, and to safeguard against model misspecification, we develop a flexible model for conducting inference about the covariate-specific ROC surface and its functionals. Specifically, we postulate a location-scale regression model for the test outcomes in each of the three disease groups where the mean and variance functions are estimated through penalised-splines, while the distribution of the error term is estimated via a smoothed version of the empirical cumulative distribution function of the standardised residuals. Our simulation study shows that our approach successfully recovers both the true covariate-specific volume under the surface and the optimal pair of thresholds used for classification in a variety of conceivable scenarios. Our methods are motivated by and applied to data derived from an Alzheimer's disease study and we seek to assess the accuracy of several potential biomarkers to distinguish between individuals with normal cognition, mild cognitive impairment, and dementia and how this discriminatory ability may change with age and gender.
Abstract: Analysis of network data, describing relationships, interactions and dependencies between entities, often begins with representation learning: the process of mapping these entities into a vector space in a way which preserves salient information in the data. Exploratory analysis of these representations can reveal patterns and latent structures, such as communities, and they may serve as inputs to learning algorithms such as clustering, regression, classification and neighbour recommendation. Spectral embedding, in which representations are constructed from the eigenvectors of a specially designed matrix, has emerged as a simple yet effective approach which is both highly scalable and interpretable. In the first part of this talk, I will provide a statistical lens into spectral embedding, elucidating how the eigenvectors of different matrices extract different information from the network and exploring model-based explanations the geometric patterns it produces. In particular, I will focus on spectral embedding with the random walk Laplacian matrix, and show how unlike other popular matrix constructions, it produces representations which are agnostic to node degrees. In the second part of this talk, I will present a framework for representation learning for dynamic network data describing instantaneous interactions between entities which occur in continuous time. The framework produces continuously evolving vectors trajectories which reflect the continuously evolving structural roles of the nodes in the network and allows nodes to be meaningfully compared at different points in time.
This talk is based on joint works with Patrick Rubin-Delanchy, Nick Whiteley, Ian Gallagher and Emma Ceccherini.
The problem of estimating the effect of an intervention/policy from time-series observational data on multiple units arises frequently in many fields applied research such as epidemiology, econometrics and political science. In this talk, we propose a Bayesian causal factor analysis model for estimating intervention effects in such a setting. The model includes a regression component to adjust for observed potential confounders and its latent component can account for certain forms of unobserved confounding. Further, it can deal with outcomes of mixed type (continuous, binomial, count) and increase efficiency in the estimates of the causal effects by jointly modelling multiple outcomes affected by the intervention. In policy evaluation problems, it is often of interest to study structure in the estimated effects. We therefore extend our approach to model effect heterogeneity. Specifically, we demonstrate that modelling effect heterogeneity is not straightforward in causal factor analysis, due to non-identifiability. We then demonstrate how this problem can be circumvented using a modularization approach that prevents post-intervention data from informing a subset of the model parameters. An MCMC algorithm for posterior inference is proposed and the method is used to evaluate the impact of Local Tracing Partnerships on the effectiveness of England's Test and Trace programme for COVID-19.
Abstract: In this talk, I will overview recent developments in nonstationary multivariate extremes. After providing motivation and background, I will introduce a Bayesian time-varying model that learns about the dynamics governing joint extreme values over time. The proposed model relies on dual measures of time-varying extremal dependence, that are modelled via a suitable class of generalized linear models conditional on a large threshold. The application of the proposed methods to some of the world’s most important stock markets reveals complex patterns of extremal dependence over the last 30 years, including passages from asymptotic dependence to asymptotic independence.
Abstract: When planning a Phase III clinical trial, suppose a certain subset of patients is expected to respond particularly well to the new treatment. Adaptive enrichment designs make use of interim data in selecting the target population for the remainder of the trial, either continuing with the full population or restricting recruitment to the subset of patients. We define a multiple testing procedure that maintains strong control of the familywise error rate, while allowing for the adaptive sampling procedure. We derive the Bayes optimal rule for deciding whether or not to restrict recruitment to the subset after the interim analysis and present an efficient algorithm to facilitate simulation-based optimisation, enabling the construction of Bayes optimal rules in a wide variety of problem formulations. We compare adaptive enrichment designs with traditional non-adaptive designs in a broad range of examples and draw clear conclusions about the potential benefits of adaptive enrichment.
Abstract: The rate of convergence of Bayesian learning algorithms is determined by two conditions: the behavior of the loss function around the optimal parameter (Bernstein condition), the probability mass given by the prior to neighborhoods of the optimal parameter.
In meta-learning, we face multiple learning tasks, that are independent but are still expected to be related in some way. For example, the optimal parameters of all the tasks can be close to each other. It is then tempting to use the past tasks to build a better prior, that we use to solve future tasks more efficiently. From a theoretical point of view, we hope to improve the prior mass condition in future tasks, and thus, the rate of convergence. In this paper, we prove that this is indeed the case.
Interestingly, we also prove that we can learn the optimal prior at a fast rate of convergence, regardless of the rate of convergence within the tasks (in other words, Bernstein condition is always satisfied for learning the prior, even when it is not satisfied within tasks).
This is joint work with Charles Riou (University of Tokyo and RIKEN AIP) and Badr-Eddine Chérief-Badellatif (CNRS). The preprint is available on arXiv: https://arxiv.org/abs/2302.11709
Abstract: Statistical physics serves as a systematic exploration of deriving macroscopic laws from microscopic foundations, playing a pivotal role in elucidating phenomena across natural science and engineering. The renormalization group method stands out as a cornerstone in this discipline, celebrated among physicists as the standard approach. However, its comprehension within mathematical physics remains a challenge. In this presentation, I delve into the hierarchical phi4 model—a synthetic construct designed for implementing the renormalization group method. While artificial, this model captures essential physical characteristics shared with more realistic counterparts like the O(N)-spin model. I aim to shed light on the historical context of the phi4 model, exploring its evolution and significance, and addressing intriguing questions such as the plateau of the spin-spin correlation. Additionally, I will discuss recent developments in the method, drawing attention to an insightful exploration detailed in arXiv:2306.00896. Through these insights, I aim to bridge the gap in understanding the renormalization group method from a mathematical physics perspective.
Onchocerciasis, also known as river blindness, is a neglected tropical disease (NTD) caused by the parasitic filarial nematode Onchocerca volvulus, and is transmitted through the bites of Simulium blackflies. Understanding the key epidemiological parameters driving disease transmission is crucial for developing effective strategies to achieve the goals set out in the WHO’s 2021–2030 Roadmap for NTDs. Adaptive multiple important sampling (AMIS) is an extension of importance sampling whereby, at each iteration, the proposal distribution is updated using the framework of multiple mixture estimators, and the importance weights from all simulations (past and present) are recomputed. We use an extension of AMIS to fit geostatistical prevalence maps for onchocerciasis in West, Central and East African countries, to a transmission model to obtain sub-national estimates of disease prevalence at a fine spatial scale, without the need to run the transmission model many times. This ultimately produces a weighted sample of the transmission model parameters for each location or pixel in the map, which can be used to project and compare future outcomes of disease prevalence under various intervention scenarios.
This is joint work with Matt Dixon (ICL), Martin Walker (RVC), Maria-Gloria Basáñez (ICL), Simon Spencer (Warwick) and Déirdre Hollingsworth (Oxford).
Abstract: This article develops a novel approach for estimating a high-dimensional and nonparametric graphical model for functional data. Our approach is built on a new linear operator, the functional additive partial correlation operator, which extends the partial correlation matrix to both the nonparametric and functional settings. We show that its nonzero elements can be used to characterize the graph, and we employ sparse regression techniques for graph estimation. Moreover, the method does not rely on any distributional assumptions and does not require the computation of multi-dimensional kernels, thus avoiding the curse of dimensionality. We establish both estimation consistency and graph selection consistency of the proposed estimator, while allowing the number of nodes to grow with the increasing sample size. Through simulation studies, we demonstrate that our method performs better than existing methods in cases where the Gaussian or Gaussian copula assumption does not hold. We also demonstrate the performance of the proposed method by a study of an electroencephalography data set to construct a brain network.
We consider online binary classification where in each round, before making a prediction the learner can choose to ask some a number of stochastic experts for their advice. In contrast to the standard experts problem, we investigate the case where each expert needs to be paid before they provide their advice, and that the amount we pay them directly influences the accuracy of their prediction through some unknown productivity function. In each round, the learner must decide how much to pay each expert and then make a prediction. They incur a cost equal to a weighted sum of the prediction error and upfront payments for all experts. We introduce an online learning algorithm and analyse its total cost compared to that of a predictor which knows the productivity of all experts in advance. In order to achieve this result, we combine Lipschitz bandits and online classification with surrogate losses.
Joint work with: Dirk van der Hoeven, Hao Qiu, Nicolo Cesa-Bianchi
Geostatistical models for multivariate applications such as heavy metal soil contamination work under Gaussian assumptions and may result in underestimated extreme values and misleading risk assessments. A more suitable framework to analyse extreme values is extreme value theory (EVT). However, EVT relies on time replications, which are generally unavailable in geochemical datasets. Therefore, using EVT to map soil contamination requires adaptation to the usual single-replicate data framework of soil surveys. We propose a bivariate spatial extreme mixture model to model the body and tail of contaminant pairs, where the tails are described using a stationary generalised Pareto distribution. We demonstrate the performance of our model using a simulation study and through modelling bivariate soil contamination in the Glasgow conurbation.
Model results are given in terms of maps of predicted marginal concentrations and probabilities of joint exceedance of soil guideline values. Marginal concentration maps show areas of elevated lead levels along the Clyde River and elevated levels of chromium around the south and southeast villages, such as East Kilbride and Wishaw. The joint probability maps show higher probabilities of joint exceedance to the south and southeast of the city centre, following known legacy contamination regions in the Clyde River basin.
We introduce parametric models for noisy directional data, in which a radial noise with magnitude $\sigma^2$ makes the observations deviate from their theoretical hyperspherical sample space, namely a hypersphere centered at $\theta$ and with radius $r$. We consider inference --- hypothesis testing, point estimation, and confidence zone estimation --- on the location parameter $\theta$, in a framework where both $r$ and $\sigma^2$ remain unspecified. We introduce several asymptotic scenarios in which the radius of the hypersphere and, most importantly, the noise magnitude may depend on the sample size $n$ in an essentially arbitrary way. This allows us to consider very diverse cases, in which the a priori information that the data belong to a hypersphere is more and more, or on the contrary less and less, relevant. We base our investigation on Le Cam's asymptotic theory of statistical experiments and aim at a full understanding of the resulting limiting experiments. The corresponding contiguity rates, that characterize how easy/hard inference on $\theta$ is, reveal rather counter-intuitive results in some scenarios. We build locally asymptotically optimal tests and estimators, that turn out to be adaptively optimal across all asymptotic scenarios. We show that, in standard asymptotic scenarios, classical procedures that would ignore the hyperspherical a priori information are rate-consistent but do not achieve efficiency bounds, and that, in non-standard asymptotic scenarios, such classical procedures are not even rate-consistent. We investigate the finite-sample relevance of our results through Monte Carlo exercises.
This is joint work with Diego Bolon (Universidade de Santiago de Compostela) and Thomas Verdebout (Université libre de Bruxelles).
Traditional probability theory is often cumbersome, especially when applied to complex problems arising in modern machine learning and statistics. To address this, there has been recent interest in reorganising probability theory using techniques from category theory, which provides a variety of tools for abstracting away low-level details in order to focus on higher-level structure of interest. In this talk, I will provide an introduction to this topic that assumes no previous familiarity with category theory. I will then present a novel application to the task of parameterising a stochastic neural network that is equivariant with respect to the action of a group. The resulting procedure is flexible and compositional, and relies on minimal assumptions about the structure of the group action. Moreover, much of the underlying theory can be expressed visually in terms of string diagrams, and in a way that closely matches a computer implementation.
The analytical implementation of international anti-doping programs for protecting clean sport competitions is based on a univariate Bayesian framework, called ADAPTIVE. This is intended to identify individual reference ranges outside of which an observation may indicate doping abuse, and relies on simultaneous analysis of different markers, without accounting for their relationship.
This work extends the ADAPTIVE method to a multivariate testing framework, making use of copula models to couple the marginal distribution of biomarkers with their dependence structure. After introducing the proposed copula-based hierarchical model, I will discuss our approach to inference, grounded in a Bayesian spirit, and present a conformal method for constructing predictive reference regions. As a conformal measure, we use the posterior predictive density of the multidimensional biomarkers of individual athletes. Focusing on the haematological module of the ABP, we evaluate the proposed framework in both data-driven simulations and real data.
This is a joint work with Brunero Liseo.
The field of record linkage is focused on matching information from the same entity across diverse sources without unique identifiers. Record linkage is gaining importance in applications ranging from medical record enhancement to the study of population mobility between censuses or surveys. Conventional record linkage models primarily concentrate on direct individual matching, often disregarding valuable group-level information inherent in the data. Motivated by recent research indicating enhanced performance when incorporating group information into the matching process, we propose a novel model-based approach that facilitates the joint estimation of individual and household match status, while also estimating the feature matching probabilities, given the match status of both individuals and their households. To illustrate the methodology we use the Italian Survey of Household Income and Wealth from 2014 and 2016. Our results, which account for different initialization methods, demonstrate a notable improvement in the $F_1$ score, with values around 80% when household information is considered, compared to approximately 46% for methods directly matching individuals without leveraging group information. Additionally, our findings underscore the model's robustness, as it consistently yields favorable outcomes across various initialization methods and in the presence of implemented blocking strategies. This work is in collaboration with Thais Pacheco Menezes and Michael Fop from the School of Mathematics and Statistics, University College Dublin.
In recent years, interest in spatial statistics has increased significantly. However, for large data sets, statistical computations for spatial models have remained a challenge, as it is extremely difficult to store a large covariance or an inverse covariance matrix and compute its inverse, determinant, or Cholesky decomposition. In this talk, we shall focus on spatial mixed models and discuss a new algorithm for fast matrix-free conditional samplings for their inference. This new algorithm relies on `rectangular' square roots of the inverse covariance matrices and covers a large class of spatial models including spatial models based on Gaussian conditional and intrinsic autoregressions, and fractional Gaussian fields. We shall show that the algorithm outperforms sparse Cholesky and other existing conditional simulation methods. We demonstrate the usefulness of this algorithm by analyzing groundwater arsenic contamination in Bangladesh, and by analyzing environmental bioassays from the New York-New Jersey harbor area. Part of this work is done in collaboration with Somak Dutta at Iowa State University.
Bio:
Debashis Mondal is an associate professor at the Department of Statistics and Data Science, at Washington University. Prior to joining Washington University, he was on the statistics faculty at Oregon State University and the University of Chicago. He received his PhD in statistics from the University of Washington.
Mondal's research interests include Spatial statistics, computational science, and machine learning; applications in ecology (including microbial ecology) and environmental sciences. He is a recipient of the NSF Career Award, the Young Researcher Award, and the inaugural Junior Service Award by the International Indian Statistical Association and is an elected member of the International Statistical Institute.
Local variable selection aims to discover localized effects by assessing the impact of covariates on outcomes within specific regions defined by other covariates. We outline some challenges of local variable selection in the presence of non-linear relationships and model misspecification. Specifically, we highlight a potential drawback of common semi-parametric methods: even slight model misspecification can result in a high rate of false positives. To address these shortcomings, we propose a methodology based on orthogonal cut splines that achieves consistent local variable selection in high-dimensional scenarios. Our approach offers simplicity, handles both continuous and discrete covariates, and provides theory for high-dimensional covariates and model misspecification. We discuss settings with either independent or dependent data. Our proposal allows including adjustment covariates that do not undergo selection, enhancing flexibility in modeling complex scenarios. We illustrate its application in simulation studies with both independent and functional data, as well as with two real datasets. One dataset evaluates salary gaps associated with discrimination factors at different ages, while the other examines the effects of covariates on brain activation over time. The approach is implemented in the R package mombf. A pre-print is available at https://arxiv.org/abs/2401.10235
Joint work with: Sebastian Engelke and Jevgenijs Ivanovs
Statistical modelling of complex dependencies in extreme events requires meaningful sparsity structures in multivariate extremes. In this context two perspectives on conditional independence and graphical models have recently emerged: One that focuses on threshold exceedances and multivariate pareto distributions, and another that focuses on max-linear models and directed acyclic graphs. What connects these notions is the exponent measure that lies at the heart of each approach. In this work we develop a notion of conditional independence defined directly on the exponent measure (and even more generally on measures that explode at the origin) that builds a bridge between these approaches. We characterize this conditional independence in various ways through kernels and factorization of a modified density, including a Hammersley-Clifford type theorem for undirected graphical models. As opposed to the classical conditional independence, our notion is intimately connected to the support of the measure. Structural max-linear models turn out to form a Bayesian network with respect to our new form of conditional independence. Our general theory unifies and extends recent approaches to graphical modeling in the fields of extreme value analysis and Lévy processes. Our results for the corresponding undirected and directed graphical models lay the foundation for new statistical methodology in these areas.
This work arose as a follow-up from the RSS discussion meeting Graphical Models for Extremes by Engelke & Hitz (https://www.youtube.com/watch?v=bDwlDtyoJQc). A preprint is available on https://arxiv.org/abs/2211.15769.
Kernel discrepancies, such as the maximum mean discrepancy (MMD) and kernel Stein discrepancy (KSD), have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference. While kernel-based discrepancies are more computational tractable than the classical distances of probability theory, their successful application in each of the above setting requires them to: (i) separate a target distribution from other probability measures or even (ii) control weak convergence to the target.
In this seminar I will introduce kernels, motivate their associated MMDs and KSDs, which respectively leverage the target’s topological and differential information, and discuss some of the results on convergence control we obtained in [1-3].
[1] Simon-Gabriel, Carl-Johann, et al. "Metrizing weak convergence with maximum mean discrepancies." Journal of Machine Learning Research 24.184 (2023): 1-20.
[2] Barp, Alessandro, et al. "Targeted separation and convergence with kernel discrepancies." To appear in Journal of Machine Learning Research (2024)
[3] Kanagawa, Heishiro, et al. "Controlling moments with kernel Stein discrepancies." Submitted to Annals Of Statistics.
One day symposium
Master protocols are a new class of clinical trial designs that allow treatment arms to enter and leave the trial and/or (sub)-populations to be added over time. They include platform, Basket and Umbrella trials and have proven to be particularly popular during the COVID-19 pandemic where no fewer than 58 trials have been registered as a platform trial. A common feature of these designs is the desire to answer several research questions within a single protocol. One of the questions arising from this feature that has generated a lot of discussion in the literature is the need for (or lack of need to) control error rates as well as the most appropriate type of error control. Another core feature of Master protocols is the desire to maximise the utility of information in the study by using the same information to answer multiple research questions (e.g. shared control group in platform trials or information borrowing in Basket trials).
In this talk I will begin by reflecting on the discussion around error rates and introduce different possible testing strategies that could be considered for Master protocols and platform trials in particular. I will then introduce possible strategies to borrow information across different research questions and discuss implications of such strategies on the sample size requirements of the study. Throughout this presentation I will highlight areas where further research is necessary to enable Master protocols to unleash their full potential.
TBC
Due to the fast growth of data that are measured on a continuous scale, functional data analysis has undergone many developments in recent years. Regression models with a functional response involving functional covariates, also called "function-on-function", are thus becoming very common. Studying this type of model in the presence of heterogeneous data can be particularly useful in various practical situations. We mainly develop in this work a Function-on-Function Mixture of Experts (FFMoE) regression model. Like most of the inference approach for models on functional data, we use basis expansion (B-splines) both for covariates and parameters. A regularized inference approach is also proposed, it accurately smoothes functional parameters in order to provide interpretable estimators. Numerical studies on simulated data illustrate the good performance of FFMoE as compared with competitors. Usefullness of the proposed model is illustrated on two data sets: the reference Canadian weather data set, in which the precipitations are modeled according to the temperature, and a Cycling data set, in which the developed power is explained by the speed, the cyclist heart rate and the slope of the road.