Current PhD Students
Note: email addresses end in @ucl.ac.uk
Investigating Multivariate Tail Dependence in Currency Carry Trade Portfolios via Copula Models
My initial research is focusing on modeling the dependence structure in currency carry trade portfolios and analysing the degree of multivariate tail dependence present via mixture copula models. Novel portfolio optimisation techniques are also being explored.
Modelling non-stationary extremes with application to climate prediction
This research project will develop methodology for making inferences about the extremes of non-stationary processes. This is important because many datasets exhibit some kind of non-stationarity. A standard approach identifies extreme events by setting a high threshold. Early work will compare the efficiency of estimation of different threshold selection strategies. In particular a recent proposal to set a covariate-dependent threshold using quantile regression will be explored. This is expected to result in more efficient estimation of covariate effects than choosing a constant threshold, because threshold exceedances will cover a wider range of covariate values. This will be achieved using theoretical asymptotic considerations and using stochastic simulation. Bias in estimation will also be considered. The findings will be used to inform analyses of climate extremes and predictions of climate extremes from General Circulation Models. Later in the project the variability in climate predictions between different climate models, different model runs from different climate models and over different socio-economic emissions scenarios will be studied. Economic climate change impacts will also be considered.
Gaussian Process Related Latent Model For Longitudinal Study: Model and Algorithm
My current research is to develop a computational methodology of Gaussian Process related latent model for longitudinal study.
Extending the 'isolation-with-migration' (IM) model (coalescent theory)
My main research goal is to extend ML estimation of population parameters (based on pairwise differences data from a large number of independent genetic loci) to the following cases: asymmetric migration rates; unequal subpopulation sizes and/or growing/decreasing populations; a model with additional stages; samples of m >= 2 genes from n >= 2 subpopulations; more complex mutation models.
- Hilde Wilkinson-Herbots
- Ziheng Yang
Quantifying uncertainty in climate models, under a Bayesian probabilistic framework
The notion of "multi-model ensemble" of climate simulators is now widely used in climate science. Particularly, information from many different climate simulators is collectively used to describe and predict the climate. The statistical framework as described in Chandler (2011), regards the properties of the projections from the different simulators as parameters which describe the output of a statistical model (which has a common form for all the different simulators). Those parameters are centred on the true climate plus the shared discrepancies of simulators from the true climate. Under a Bayesian probabilistic framework, a posterior distribution of the true climate, given simulator projections, is derived using the above ideas. The aim of my work it to further develop this framework. At the moment, I am applying it to surface air temperature data and constructing the full-Bayesian version of it, by considering different choices of hyperprior distributions, as well as techniques for obtaining hyperprior distribution parameters.
Implementing Bayesian Inference for Ion Channel Modelling
His PhD is focussed on the application of Bayesian methodology in discriminating between and parameterising models of ligand-gated ion-channels.
- Mark Girolami
- Lucia Sivilotti
Probabilistic Models for Adaptive Content Creation
My research focuses on the development of structured prediction models to build document templates and learn to customise texts or sentences according to user preferences and habits. Conditional language models to generate human readable text based on the specific target application and device appropriate algorithms for the generation of small pieces of text, such as introductory sentences will also be developed. This project draws upon recent advances in Natural Language Processing tools, Machine Learning algorithms and Stochastic Optimization techniques, in developing intelligent document creation tools. The research is be carried out in collaboration with the Xerox Research Centre, Europe, represented by Dr Cedric Archambeau and Dr Guillaume Bouchard
- Mark Girolami
- Cedric Archambeau
- Guillaume Bouchard
Estimation of Semiparametric Trivariate Models in the Presence of Endogeneity and Sample Selection
Endogeneity and sample selection bias are both caused by unobserved confounders. The former arises when unobserved variables are associated with both response and a variable of interest, while the latter occurs when the outcome of interest is observed for a restricted non-randomly selected sample of the population. The use of statistical models ignoring these issues can have severe effects on parameter estimation as biased and inconsistent estimates are expected. The proposed PhD research will build on a new well-founded theoretical and computational tool for fitting flexibly a model structured within a three-equation latent variable framework that can account for the two issues mentioned above.
Inference for networks
Beate is a Phd student under the supervision of Professor Patrick J. Wolfe. Her PhD is focussed on statistical estimation and inference in networks. Beate has studied mathematics at the University Bremen in Germany with specialization in statistics.
Statistical signal processing and machine learning for network traffic anomaly detection
Networked systems are increasingly being targeted by sophisticated cyber-criminals and hostile nations. The effects of cyber-espionage alone are estimated to cost around 0.2-1.2% of national income. Since current methods for detecting attacks are primarily based on prior knowledge of attack scenarios, previously unseen (“zero-day”) attacks are very hard to mitigate or even detect.
My project aims to build on recent machine learning approaches for attack detection by extending sparse structure learning techniques to model the correlation structure of network traffic in non-stationary, low-flow, non-Gaussian regimes.
Bayesian health economic modelling of human papilloma virus vaccination
The aim of this project will be to assess the cost-effectiveness of a quadrivalent (an innovative mixture of four HPV type-specific) vaccination strategy. A first analysis in the study BEST I, a co-operation between UCL and Sanofi Pasteur, has been based on four cohorts of females, aged 12, 15, 18 and 25. In this strategy, herd immunity could not be accounted for because just female cohorts were observed and the model was not dynamic. During the PhD research, several other intervention strategies will be evaluated. As a first step, cohorts of males will be included in the model to evaluate the changes in cost-effectiveness if boys are also vaccinated. Virus transmission between the two sexes is simulated with a sexual mixing matrix according to certain infection probabilities per sexual partnership with respect to sexual mixing corresponding to age. A full Bayesian Markov model based on 23 health states will be used to estimate lifetime HPV related events.
Modelling spatio-temporal trends in the Continuous Plankton Recorder Dataset using Sparse Principal Component Analysis
Vicky is a PhD student at CoMPLEX and works on interdisciplinary models, applying statistical techniques to biological problems. Her PhD is a joint project with UCL department of Statistics and the Sir Alister Hardy Foundation for Ocean Science in Plymouth.
- Sofia Olhede
- David Murrell
Structural Modelling and Multivariate Oscillations
The research is focusing on methods in understanding stochastic oscillations and developing oscillatory models in time series analysis. Non-stationary and multivariate extensions will be considered with applications in various physical fields.
Models for data that are missing not at random in health studies
Missing data due to attrition occurs in almost all longitudinal trials and observational studies. None of the standard statistical models such as multilevel or marginal models are valid when missing data are missing not at random (MNAR). To account for MNAR, model based approaches have been proposed which can broadly be divided into two classes, pattern mixture and selection models. These models make untestable assumptions and cannot be fitted easily using standard statistical software. One of the main objectives is to evaluate fully the performance of the existing models and to identify the best method and extend it if necessary. Based on these work recommendations on analysing MNAR data will be made to applied statisticians.
- Rumana Omar
- Andrew Copas
New statistical inverse methods for multispectral photoacoustic imaging
Photoacoustic imaging (PAI) is a novel hybrid imaging modality based on the use of laser generated ultrasound that combines the high absorption contrast and specificity of optical imaging with the high spatial resolution of ultrasound imaging. Multispectral PAI, acquired at multiple optical wavelengths, provides 3D structural, functional and molecular information of living biological tissue. If quantitative concentration distributions of the tissue’s chromophores can be accurately recovered, it can offer high potential in preclinical/clinical applications, such as cancer and brain/skin disorders. However, concentration recovery is an ill-posed inverse problem, mainly due to its nonlinear nature, its entanglement of spatial dependence and spectral correlation, and its large number of unknown spatial and spectral parameters to be estimated.
Statistical methods for infrared spectroscopic clinical diagnostics
My research focuses on the development of probabilistic models that describe the heterogeneity of different stages between diseased and healthy tissues. This project explores techniques and tools from Gaussian processes, Bayesian inference, approximate inference methods, and Markov Chain Monte Carlo method. I collaborate with UCL Hospital.
Nonparametric quantile regression
My current research interests include: nonparametric and semiparametric estimation methods, quantile regression, time series analysis, asymptotic theory, spatial and spatio-temporal modelling.
Pattern-clustering method for longitudinal data - heroin users receiving methadone
My current research is to develop a novel clustering algorithm that can be used for grouping patients by the methadone dosages, and to capture the time series nature of the data in order to help clinical investigators for the behaviors among subgroups.
Spatial uncertainties in a tsunami model: from bathymetry to run-ups
Tsunami modelling, uncertainty analysis, functional data analysis.
Markov Chain Monte Carlo Methods
I'm interested in Markov chain Monte Carlo methods (MCMC), and properties of MCMC estimators. I've worked on devising ways to explore the state space of a high-dimensional distribution with a highly non-linear correlation structure using a Markov chain, and how incorporating some ideas from differential geometry can aid efficient exploration. I'm now assessing the ergodic properties of chains produced by different methods (based on Langevin diffusions and Hamiltonian flow), which can lead to central limit theorems for estimators. I'm also particularly interested in quantitative and non-asymptotic bounds on Markov chain estimators, and efficient exploration of multi-modal distributions.
A systems approach to the analysis of ARC syndrome
The aim of the project is to identify molecular pathways implicated in the rare genetic disorder Arthrogryposis, Renal Dysfunction and Cholestasis (ARC) syndrome. The syndrome is caused by germline mutations in either VPS33B or VIPAR, proteins which are implicated in maintaining cell polarity. A number of biochemical pathways have been linked with progression of the disease and the project will include utilising the existing knowledge accumulated in the Gissen Lab to develop dynamic models of signalling pathways. Further to this, advanced Markov chain Monte Carlo methods will be employed and developed as required in the Girolami group to carry out statistical evaluation of the models.
- Mark Girolami
- Paul Gissen
Financial and physical risk management for energy projects
I am working on risk management problems applying methods of stochastic programming. I investigate how owners with different risk aversions can run a CHP plant optimally and in my next projects I will study how to manage financial and physical risk in nuclear plants and during transmission capacity expansion.
- Afzal Siddiqui
- Bert De Reyck
Practical use of multiple imputation
Missing data are a pervasive problem in medical research. In clinical trials, we fail to follow up some patients, and we worry that this can cause bias. In observational studies, we fail to record all the data we need to predict disease outcomes, and we end up with data sets that are hard to analyse well.
Multiple imputation is a popular and flexible technique for handling missing data. The missing values are imputed stochastically in a way that reflects the uncertainty about the missing data. This is repeated more than once and the estimates from each individual `completed' dataset are combined using a set of rules known as Rubin's rules.
This project aims to further develop the methods used to impute missing data. Although these are well understood in some simple settings, they are much less well understood in the sort of practical settings in which multiple imputation is being applied, with data sets containing tens or hundreds of variables, mis-specified imputation models, clustered data, and complex analysis models.
- Ian White
- Patrick Royston
Mixtures of linear mixed effects models
Linear mixed effects models are well suited to modelling clustered data, for example repeated measurements on people over time. In such data the clusters often exhibit large differences in both mean levels of response, and the relationship between the response and time. For some of these data the assumption that the data follow a single normal distribution may be not hold and instead the distribution may be a finite mixture of normal distributions. For example in clinical trials there may be two distinct sub-populations that can be classed as responders or non-responders. In these circumstances mixtures of linear mixed effects models may offer advantages over a standard linear mixed effect model. My research will focus upon some theoretical/methodological aspects of these models, as well as searching for evidence of mixture distributions in datasets, in particular for repeated measurements data from clinical trials.
Advanced MCMC methods in Hilbert space
The subject of my PhD primarily involves the development of advanced Markov Chain Monte Carlo (MCMC) algorithms to solve difficult stochastic problems.
Statistical Methods in Finance
Quantification of uncertainty for landslide-generated tsunami models
The project revolves around the improvement of a complex computational tsunami model, with a main focus on landslide-generated tsunamis. Since the simulation of tsunamis is very computationally expensive, the project will involve the building of an emulator, which is a statistical representation of the model that can quickly provide an accurate statistical prediction of the model's output. Using the emulator, one can also quantify uncertainties and produce a sensitivity analysis of the model.
How much data are required to validate a risk model?
Risk models are becoming increasingly important in health care, and are typically used to provide information to both clinicians and patients, and to facilitate fairer, risk-adjusted comparisons between healthcare providers. Despite the importance of these models, relatively little research has been conducted into how many patients are required to both develop a new risk model and validate an existing model. Risk models developed or validated using small datasets may be inaccurate which may have implications for patient care. The focus of my research is to ascertain the amount of data required to develop and validate a reliable risk model.
Advanced Monte Carlo Methods for Risk and Insurance
Statistics and computational intelligence
My objective is to establish links between statistical techniques and computational-intelligence approaches.
Assessment of the potential and implications of analysing multiple outcomes collected in clinical trials simultaneously using multivariate modelling
My research explores the benefits of analysing clinical trials data with multiple outcomes using multivariate modelling compared to standard methods. I will develop practical guidance, sample size calculation and easy-to-use software for multivariate models.
Flexible sample selection count data modelling
Sample selection occurs when observations are not from a random sample of the population. Instead, individuals may have selected themselves (or have been selected by others) into (or out of) the sample, based on a combination of observed and unobserved characteristics. This problem can be addressed using a sample selection model. Various variants of this model and related estimation procedures have been developed. This research will focus on developing estimation methods for count data that can account for non-random sample selection and model flexibly covariate effects.
Quantification of Prediction Uncertainty for Principal Components Regression and Partial Least Squares Regression
Brief description of the PhD topic: My research project is investigating the quantification of prediction uncertainty for the factor-based approaches principal components regression and partial least squares regression, which are commonly used to derive linear prediction equations with high dimensional data. As application of particular interest is near infrared spectroscopy.
Page last modified on 14 nov 12 15:45