Statistical Science


Statistical Science Seminars

A seminar series covering a broad range of applied and methodological topics in Statistical Science.

*** All talks will take place online until further notice ***

Usual time: Thursdays 14:00-15:00

Location: Zoom

Please email thomas dot bartlett dot 10 at ucl dot ac dot uk to join the mailing list, and receive the links to the talks.

Recent talks

Please subscribe to our Youtube channel, to view some recent talks from the series:


Upcoming talks


18 Feb 2021: Jim Griffin (UCL) - Survival regression models with dependent Bayesian nonparametric priors

I will present a novel Bayesian nonparametric model for regression in survival analysis. The model builds on the neutral to the right model of Doksum (1974) and on the Cox proportional hazards} model of Kim and Lee (2003). The use of a vector of dependent Bayesian nonparametric priors allows us to efficiently model the hazard as a function of covariates whilst allowing non-proportionality. Properties of the model and inference schemes will be discussed. The method will be illustrated using simulated and real data.

Joint work with Alan Riva-Palacio (UNAM, Mexico) and Fabrizio Leisen (University of Nottingham)

25 Feb 2021: William Aeberhard (ETH Zurich / Swiss Data Science Center) - Robust Fitting and Smoothing Parameter Selection for Generalized Additive Models for Location, Scale and Shape

The validity of estimation and smoothing parameter selection for the wide class of generalized additive models for location, scale and shape (GAMLSS) relies on the correct specification of a likelihood function. Deviations from such assumptions are known to mislead inference and can hinder penalization schemes meant to ensure some degree of smoothness for non-parametric (additive, non-linear) effects. We propose a general approach to achieve robustness in fitting GAMLSSs by limiting the contribution of observations with low log-likelihood values. Robust selection of the smoothing parameters can be carried out by minimizing information criteria that naturally arise from the robustified likelihood or via an extended Fellner–Schall method, the latter which is particularly advantageous in applications with multiple smoothing parameters. We also address the challenge of tuning robust estimators for models with non-parametric effects by introducing a novel median downweighting proportion criterion. This enables a fair comparison with existing robust estimators for the special case of generalized additive models, where our estimator competes favorably. The overall good performance of our proposal is illustrated by further simulations in the GAMLSS setting and by an application to functional magnetic resonance brain imaging using bivariate smoothing splines.

Joint work with Eva Cantoni (University of Geneva), Giampiero Marra (University College London), and Rosalba Radice (City, University of London)

4 Mar 2021: Swati Chandna (Birkbeck) - Nonparametric regression for multiple heterogeneous networks

Network data, and particularly collection of heterogeneous networks with covariate information, are commonly observed 

in a wide variety of applications. This has led to a growing interest in probabilistic models which not only offer generative mechanisms but are also easily estimable using existing methods. In the setting where multiple networks are observed on the same set of nodes, it is key to understand how interactions between nodes evolve within the collection. To answer questions under this setting, we propose a natural extension of the graphon model to simultaneously allow node level as well as network level heterogeneity, via a new multi-graphon function. We show how information from multiple networks can be leveraged to allow the use of standard nonparametric regression techniques for estimation of the multi-graphon function, without necessarily restricting to communities or network histogram estimators as in the existing literature. Application to two real network datasets illustrate this approach. 

Joint work with P.A. Maugis (Google).

11 Mar 2021: Roberto Cerina (Maastricht University) - Converting the Twitter API into an Online Panel via Human Intelligence to Measure Public Opinion

We use Human Intelligence provided by Mechanical Turks to convert the Twitter streaming API into a structured online panel, to monitor opinion over the 2020 election campaign in the US. We leverage opinion about the horse-race of both Twitter users and Turks, as they can be obtained simultaneously with a single survey instrument, to produce our estimates. A Probability Machine is trained on the resulting data, and predictions from this learner are stratified according to a stratification frame made up of likely voters, provided to us by  0ptimus Analytics. Results are aggregated up to the state level to produce state-level estimates of opinion. A correction factor to account for online-selection is calculated by comparison with publicly available data and added to the estimates. The results are close to FiveThirtyEight's election-day forecasts, though they do present challenges especially in the least competitive states. This paper attempts to build the fundamentals - methods and data - to train machines to regularly estimate public opinion for a range of topics, anywhere in the world, anytime - similarly to a `Google Trends’ for behavioural and opinion data.

18 Mar 2021: Javier Rubio (KCL)


(Title and abstract TBC)

25 Mar 2021: Anatol Wegner (UCL)


(Title and abstract TBC)

29 Apr 2021: Andrew Wade (Durham)


(Title and abstract TBC)

6 May 2021: David Colquhoun (UCL) - The FPR50: a simple, but rough, solution to the p value wars?

It’s remarkable that statisticians are still at war about how best to decide whether the difference between the means of two independent samples is a result of sampling error alone or whether it’s real.

Most experimenters with no access to professional statistical advice calculate a p values which they then misinterpret as the probability that their results have occurred by chance. Journals continue to advise this procedure. It is to these people that my suggestion is aimed. They will, rightly, continue to ask whether their results have occurred by chance, and the only way to improve practice is to provide them with an alternative that’s simple enough for them to understand.

One simple alternative is to calculate the likelihood ratio, as a measure of the evidence provided by the experiment about the relative plausibility of H0 and H1. This is entirely deductive and frequentist and thus uncontroversial.

If these odds are expressed as a probability, this probability can be interpreted as the posterior probability of H1, for the case where the prior odds are 1, a quantity that I propose should be called the FPR50, the false positive risk when prior P(H1)=0.5. I suggest that this should be cited, along with the p value and confidence interval, to give a better idea of the possible false positive risk.

This is equivalent, in Bayesian terms, to using a prior distribution with the densities concentrated on the null, and on the observed difference. I shall try to justify the use of this simplification, and the use of a skeptical point null hypothesis.

As EJ Wagenmakers said

“At least Bayesians attempt to find an approximate answer to the right question, instead of struggling to interpret an exact answer to the wrong question.”

13 May 2021: Benjamin Eltzner (Universität Göttingen)


(Title and abstract TBC)

20 May 2021: Richard Samworth (Cambridge) - USP: an independence test that improves on Pearson's chi-squared 
and the G-test

We introduce the U-Statistic Permutation (USP) test of 
independence in the context of discrete data displayed in a contingency 
table. Either Pearson's chi-squared test of independence, or the 
Generalised Likelihood Ratio test (G-test), are typically used for this 
task, but we argue that these tests have serious deficiencies, both in 
terms of their inability to control the size of the test, and their power 
properties. By contrast, the USP test is guaranteed to control the size of 
the test at the nominal level for all sample sizes, has no issues with 
small (or zero) cell counts, and is able to detect distributions that 
violate independence in only a minimal way. The test statistic is derived 
from a U-statistic estimator of a natural population measure of 
dependence, and we prove that this is the unique minimum variance unbiased 
estimator of this population quantity.

In the last one-third of the talk, I will show how this is a special case 
of a much more general methodology and theory for independence testing.

27 May 2021: Matt Graham (Newcastle / Turing)


(Title and abstract TBC)

10 Jun 2021: Amanda Turner (Lancaster)


(Title and abstract TBC)

17 Jun 2021: Cédric Archambeau (Amazon)

(Title and abstract TBC)

24 Jun 2021, 16:00-17:00: Amy Willis (University of Washington)


(Title and abstract TBC)

Affiliated Seminars