Statistical Science Seminars
Usual time: Thursdays 16:00 - 17:00
Location: Room 102, Department of Statistical Science, 1-19 Torrington Place (1st floor).
Some seminars are held in different locations at different times. Click on the abstract for more details.
On random geometric subdivisions
I will present several models of random geometric subdivisions, similar to that of Diaconis and Miclo (Combinatorics, Probability and Computing, 2011), where a triangle is split into 6 smaller triangles by its medians, and one of these parts is randomly selected as a new triangle, and the process continues ad infinitum. I will show that in a similar model the limiting shape of an indefinite subdivision of a quadrilateral is a parallelogram. I will also show that the geometric subdivisions of a triangle by angle bisectors converge (but only weakly) to a non-atomic distribution, and that the geometric subdivisions of a triangle by choosing a uniform random points on its sides converges to a flat triangle, similarly to the result of the paper mentioned above.
Discretisation schemes for level sets of planar Gaussian fields
Gaussian random fields are prevalent throughout mathematics and the sciences, for instance in physics (wave-functions of high energy electrons), astronomy (cosmic microwave background radiation) and probability theory (connections to SLE, random tilings etc). Despite this, the geometry of such fields, for instance the connectivity properties of level sets, is poorly understood. In this talk I will discuss methods of extracting geometric information about levels sets of a planar Gaussian random field through discrete observations of the field. In particular, I will present recent work that studies three such discretisation schemes, each tailored to extract geometric information about the levels set to a different level of precision, along with some applications.
BASiCS: Bayesian Analysis of Single Cell Sequencing data
Recently, single-cell mRNA sequencing (scRNA-seq) has emerged as a novel tool for quantifying gene expression profiles of individuals cells. These assays can provide novel insights into a tissue's function and regulation. However, besides experimental issues, statistical analysis of scRNA-seq data is itself a challenge. In particular, a prominent feature of scRNA-seq experiments is strong measurement error. This is reflected by (i) technical dropouts, where a gene is expressed in a cell but its expression is not captured through sequencing and (ii) poor correlation between expression measurements of technical replicates. Critically, these effects must be taken into account in order to reveal biological findings that are not confounded by technical variation.
In this talk I introduce BASiCS (Bayesian Analysis of
Single-Cell Sequencing data) [1,2], an integrative approach to jointly infer
biological and technical effects in scRNA-seq datasets. It builds upon a
Bayesian hierarchical modelling framework, based on a Poisson formulation.
BASiCS uses a vertical integration approach, exploiting a set of
"gold-standard" genes in order to quantify technical artifacts.
Additionally, it provides a probabilistic decision rule to identify (i) key
drivers of heterogeneity within a population of cells and (ii) changes in gene
expression patterns between multiple populations (e.g. experimental conditions
or cell types). More recently, we extended BASiCS to experimental designs where
gold-standard genes are not available using a horizontal integration framework,
where technical variation is quantified through the borrowing of information
from observations across multiple groups of samples (e.g. sequencing batches
that are not confounded with the biological effect of interest). Control
experiments validate our method's performance and a case study suggests that
novel biological insights can be revealed.
Our method is implemented in R and
available at https://github.com/catavallejos/BASiCS.
 Vallejos, Marioni and Richardson (2015) PLoS Computational Biology
 Vallejos, Richardson and Marioni (2016) Genome Biology
Clever Hans, Clever Algorithms: Are your machine learnings learning what you think?
machine learning, generalisation is the aim, and overfitting is the bane; but
just because one avoids the latter does not guarantee the former. Of particular
importance in some applications of machine learning is the “sanity" of the
models learnt. In this talk, I discuss one discipline in which model sanity is
essential -- machine music listening — and how several hundreds of research
publications may have unknowingly built, tuned, tested, compared and advertised
“horses” instead of solutions. The true cautionary tale of the horse-genius
Clever Hans provides the most appropriate illustration, but also ways forward.
B. L. Sturm, “A simple method to determine if a music information retrieval system is a “horse”,” IEEE Trans. Multimedia, vol. 16, no. 6, pp. 1636-1644, 2014.
Asymptotic rigidity at zero temperature for large particle systems with power-law interactions
We consider the model of a gas of N particles in d-dimensional Euclidean space, which have inverse-power-law interactions with exponent s<d. The most celebrated case is the case of Coulomb potentials for which s=d-2 (where for d=2, s=0 we consider logarithmic interactions instead). Our particles are confined to a compact set and in the limit of N going to infinity, we study the asymptotic behavior of the energy.
The leading order in N is quadratic in N and described by a mean-field energy on probability measures. I will describe a strategy for controlling the next-order term, which grows like the (1+s/d)-th power of N. This lower-order term is expressed in terms of an energy W on "micro-scale configurations" of the particles.
As the temperature of the gas tends to zero, the gas "crystallizes"
on minimizers of W, with a conjectural drop of
complexity. I will present the study of asymptotics of minimizers in which by
using the energy W we produce a first quantification of this rigidity
phenomenon, by proving hyperuniformity and quantitative equidistribution of the
Possible extensions including related numerical conjectures in several directions will be presented.
The talk is based on joint papers with S. Rota-Nodari and S. Serfaty.
Improving grid based Bayesian Methods
In some cases, computational benefit can be gained by exploring the hyper-parameter space using a deterministic set of grid points instead of a Markov chain. We view this as a numerical integration problem and make three unique contributions. First, we explore the space using low discrepancy point sets instead of a grid. This allows for accurate estimation of marginals of any shape at a much lower computational cost than a grid based approach and thus makes it possible to extend the computational bene fit to a hyper parameter space with higher dimensionality (10 or more). Second, we propose a new, quick and easy method to estimate the marginal using a least squares polynomial and prove the conditions under which this polynomial will converge to the true marginal. Our results are valid for a wide range of point sets including grids, random points and low discrepancy points. Third, we show that further accuracy and efficiency can be gained by taking into consideration the functional decomposition of the integrand and illustrate how this can be done using anchored f-ANOVA on weighted spaces.