Statistical Science Seminars
Usual time: Thursdays 16:00 - 17:00
Location: Room 102, Department of Statistical Science, 1-19 Torrington Place (1st floor).
Some seminars are held in different locations at different times. Click on the abstract for more details.
Mixing time of the exclusion process on hypergraphs
We introduce a process defined on hypergraphs and study its mixing time. This process can be viewed as an extension of the exclusion process to hypergraphs. Using a tool introduced by Morris and developed by Oliveira called the chameleon process, for any hypergraph within a certain class, we prove an upper-bound on the mixing time of the exclusion process in terms of the mixing time of a simple random walk on the same hypergraph. This is joint work with Stephen Connor.
A Hierarchical Bayesian Model for Inference of Copy Number Variants and Their Association with Gene Expression
Cancer is the result of a dynamic interplay at different molecular levels (DNA, mRNA and protein). Elucidating the association between two or more of these levels would enable the identification of biological relationships and lead to improvements in cancer diagnosis and treatment. For this purpose the development of statistical methodologies able to identify these relationships is crucial. In this talk, I present a model for the integration of high-throughput data from different sources. In particular, I focus on combining transcriptomics data (gene expression profiling) with genomic data, collected on the same subjects. At DNA level I focus on measuring copy number variation (CNV) using comparative genomic hybridization (CGH) arrays. I specify a measurement error model that relates the gene expression levels to latent copy number states. Selection of relevant associations is performed employing selection priors that explicitly incorporate dependencies information across adjacent copy number states. Copy number states are related to the observed surrogate CGH measurements via a hidden Markov model, which captures their peculiar state persistence. Posterior inference is carried out through Markov chain Monte Carlo techniques. In order to tackle the computational issue, I develop an algorithm that efficiently explores the space of all possible associations. The contribution of the methodology is twofold: infer copy number variation and, simultaneously, their association with gene expression. The performance of the method is shown on simulated data and I also illustrate an application to data from a prostate cancer study.
Estimating counterfactual means of static and dynamic interventions in critical care
A growing body of work in causal inference focuses on estimating the effects of longitudinal interventions, using observational data. Here, standard regression approaches cannot adjust for time-varying confounders, because those confounders can themselves be affected by the treatment. Inverse probability of treatment weighting and parametric g-computations are specialized methods that can consistently estimate the causal effects of longitudinal interventions, if their underlying models are correctly specified.
An alternative approach is targeted maximum likelihood estimation (TMLE), which combines the estimates of the treatment mechanism and the outcome, and is double-robust, i.e. consistent if at least one of the two components is correctly specified. In order to minimize residual bias due to mis-specification of these components, TMLE is often coupled with data-adaptive estimation. In spite of the flexibility of the TMLE framework for longitudinal settings, its uptake in applied work has been limited.
This presentation aims to demonstrate the feasibility of this approach, in an evaluation of a critical care intervention, nutritional support for children admitted to the intensive care unit. In the context of this study, I will define the intervention-specific mean parameter, distinguish between static and dynamic treatment regimes, state identifying assumptions, and provide a step-by-step guidance of estimation using parametric and data-adaptive methods.
Robustness and Efficiency of Covariate Adjusted Linear
Instrumental Variable Methods
Instrumental variables provide an approach for consistent inference on causal effects even in the presence of unmeasured confounding. Such methods have for instance been used in the context of Mendelian randomisation, as well as in pharmaco-epidemiological contexts. In these and other applications, it is common that covariates are available, even if deemed insufficient to adjust for all confounding. As IVs allow inference when there is unobserved confounding, it appears that often the analyst assumes that even observed confounders / covariates do not need to or should not be taken into account. However, this is not generally the case. With view to the role of covariates, we here contrast two-stage-least-squares estimators, generalized methods of moment estimators and variants thereof with methods more common in biostatistics using G-estimation in so-called structural mean models.
When using covariates, there are structural aspects to be considered, e.g. whether the covariates are prior to or potentially affected by the instruments. But in addition, one has to worry even more about efficiency versus model misspecification when modelling covariates. We discuss this for the IV procedures mentioned above, especially for linear instrumental variable models. Our results motivate adaptive procedures that guarantee efficiency improvements through covariate adjustment, without the need for covariate selection strategies. Besides theoretical findings, simulation results will be shown to provide numerical insight.
(This is joint work with Stijn Vansteelandt)
Multiple imputation in Cox regression when there
are time-varying effects of exposures
Cox regression is the most widely used method to study associations between exposures and times-to-event. It is often of interest to study whether there are time-varying effects of exposures; these can be investigated using the extended Cox model in which the log hazard ratio is modeled as a function of time. This is also a popular way of testing the proportional hazards assumption.
Missing data on explanatory variables are common and multiple imputation (MI) is a popular approach to handling it. The imputation model should accommodate the form of the analysis model and White and Royston (Stat Med 2009) derived an approximate imputation model suitable for missing exposures in Cox regression. Another approach to imputing missing data under the Cox model was described by Bartlett et al. (Stat Meth Med Res 2014), which uses rejection sampling to draw imputed values from the correct distribution. However no MI methods have been devised which handle time-varying effects of exposures.
In this talk I will show how the imputation model of White and Royston can be extended to accommodate time-varying effects of exposures and also describe a simple extension to the method of Bartlett et al. Using simulations, we have shown that the proposed methods perform well, giving improvements relative to the complete case analysis. The methods also give approximately correct type I errors in the test for proportional hazards. Failure to account for time-varying effects results in the imputation results in biased estimates and incorrect tests for proportional hazards.
I will also discuss some further work in which the time-varying effect is modelled using fractional polynomials rather than a pre-specified functions. The methods will be illustrated using data from the Rotterdam Breast Cancer study.