Statistical Science Seminars
Usual time: Thursdays 16:00 - 17:00
Location: Room 102, Department of Statistical Science, 1-19 Torrington Place (1st floor).
Some seminars are held in different locations at different times. Click on the abstract for more details.
18 April: Zoubin Ghahramani (University of Cambridge)
Bayesian nonparametric modelling of networks
Network data and more generally relational data encoding the pairwise relations between objects appear in many fields. For instance in biology, a protein network connects interacting partners, while in a social network, links among people indicate social relationships. The problems of analysing, understanding and modelling such networks have attracted interest from many research communities. I will briefly review some probabilistic approaches to modelling networks. The key idea behind many such models is that each object has certain latent features, and that observed links in the network depend on these latent features. Probabilistic inference allows one to discover the potentially unbounded number of latent features (including discovering communities as a special case), predict missing links, and generally learn about the statistical properties of the networks. Many of these models can be cast within the theoretical framework of exchangeable arrays established by Aldous, Hoover and Kallenberg. I will describe our work on a general network model (the Random Function Model) that instantiates this theory using Gaussian processes, and relate it to existing models. I will also discuss our work on the Infinite Latent Attribute (ILA) model which allows for a highly structured nonparametric latent variable representation of nodes in a network. Finally, I will describe our Latent Feature Propagation model for dynamic networks. What ties these models together is the idea that rich latent representations underlie the structure of networks, and that these can be discovered via Bayesian inference.
Joint work with Creighton Heaukulani, David A. Knowles, James Lloyd, Peter Orbanz, Konstantina Palla, and Dan Roy.
2 May: Siegfried Hörmann (Université Libre de Bruxelles)
Dynamic functional principal components
Data in many fields of science are sampled from processes that can most naturally be described as functional. Examples include growth curves, temperature curves, curves of financial transaction data and patterns of pollution data. Functional data analysis (FDA) is concerned with the statistical analysis of such data. Since these are intrinsically infinite dimensional objects, tools for dimension reduction are desirable. The functional principal analysis (FPCA) takes here a leading role. It is a key tool in many important empirical and theoretical problems.
A problem with classical FPCA is that it operates in a static way and doesn't take into account any possible serial dependence of the functional observations. Such dependence occurs quite frequently, e.g.\ if the data consist of a continuous time process which has been cut into segments (e.g.\ days). Though cross-sectionally uncorrelated for a fixed observation, the classical FPC-score vectors have non-diagonal cross-correlations. This means that we cannot analyse them componentwise (like in the i.i.d. case), but need to consider them as vector time series which are less easy to handle and interpret. In particular, a functional principal component with small eigenvalue, hence negligible instantaneous impact on some observation, may have a major impact on the lagged values. Classical static FPCs, thus, in a time series context, will not lead to an adequate dimension reduction technique, as they do in the i.i.d.\ case. This motivates the development of {\em dynamic functional principal components}. The idea is to transform the (possibly infinite dimensional) functional time series, into a vector time series (of low dimension 3 or 4, say), where the individual component processes are mutually uncorrelated, and explain a bigger part of the dynamics and variability of the original process.
In this talk we will propose such a dynamic version of FPCA and study its properties. An empirical analysis and a real data example will be given.
This talk is based on joint work with Łukasz Kidziński and Marc Hallin.
9 May: David Choi (Carnegie Mellon University)
Consistency of co-clustering exchangeable array data
We analyze the problem of partitioning a 0-1 array or bipartite graph into subgroups (also known as co-clustering), under a relatively mild assumption that the data is generated by a general nonparametric process. Our main application is the analysis of a simple clustering model for networks, the stochastic co-blockmodel, when the data is not assumed to be generated (even approximately) by the model. Our result suggests that the stochastic co-blockmodel and other community detection algorithms may be robust to model misspecification. This is joint work with Patrick Wolfe (arXiv:1212.4093).
David Choi is an assistant professor at Carnegie Mellon University, in the Heinz college of public policy and information systems. His research focus is in theoretical statistics for social network data. David holds a PhD in electrical engineering from Stanford University.
16 May: Brendan Pass (University of Alberta)
Optimal transportation with infinitely many marginals
We formulate and study the problem of aligning a continuum of marginals as efficiently as possible. In our formulation, we look for the stochastic process with prescribed single time marginals which minimizes the expectation of a certain functional. This problem is a natural extension of a multi-marginal optimal transportation problem studied by Gangbo and Swiech (1998). In this talk, we prove existence, uniqueness and characterization results.
26 June: Dimitris Fouskakis (National Technical University of Athens)
Power-Expected-Posterior Priors Variable Selection in Gaussian Linear Models
In the context of the expected-posterior prior (EPP) approach to Bayesian variable selection in linear models, we combine ideas from power-prior and unit-information-prior methodologies to simultaneously (a) produce a minimally-informative prior and (b) diminish the effect of training samples. The result is that in practice our power-expected-posterior (PEP) methodology is sufficiently insensitive to the size n* of the training sample, due to PEP's unit-information construction, that one may take n* equal to the full-data sample size n and dispense with training samples altogether. This promotes stability of the resulting Bayes factors, removes the arbitrariness arising from individual training-sample selections, and greatly increases computational speed, allowing many more models to be compared within a fixed CPU budget. We focus on Gaussian linear models and develop our PEP method under two different baseline prior choices: the independence Jeffreys (or reference) prior, yielding the J-PEP posterior, and the Zellner g-prior, leading to Z-PEP. We find that, under the reference baseline prior, the asymptotics of PEP Bayes factors are equivalent to those of Schwartz's BIC criterion, ensuring consistency of the PEP approach to model selection. We compare the performance of our method, in simulation studies and a real example involving prediction of air-pollutant concentrations from meteorological covariates, with that of a variety of previously-defined variants on Bayes factors for objective variable selection. Our PEP prior, due to its unit-information structure, leads to a variable-selection procedure that (1) is systematically more parsimonious than the basic EPP with minimal training sample, while sacrificing no desirable performance characteristics to achieve this parsimony; (2) is robust to the size of the training sample, thus enjoying the advantages described above arising from the avoidance of training samples altogether; and (3) identifies maximum-a-posteriori models that achieve good out-of-sample predictive performance. Moreover, PEP priors are diffuse even when n is not much larger than the number of covariates p, a setting in which EPPs can be far more informative than intended.
Page last modified on 08 oct 12 11:11

