Current PhD Students
Note: email addresses end in @ucl.ac.uk
Investigating Multivariate Tail Dependence in Currency Carry Trade Portfolios via Copula Models
My initial research is focusing on modeling the dependence structure in currency carry trade portfolios and analysing the degree of multivariate tail dependence present via mixture copula models. Novel portfolio optimisation techniques are also being explored.
Modelling non-stationary extremes with application to climate prediction
This research project will develop methodology for making inferences about the extremes of non-stationary processes. This is important because many datasets exhibit some kind of non-stationarity. A standard approach identifies extreme events by setting a high threshold. Early work will compare the efficiency of estimation of different threshold selection strategies. In particular a recent proposal to set a covariate-dependent threshold using quantile regression will be explored. This is expected to result in more efficient estimation of covariate effects than choosing a constant threshold, because threshold exceedances will cover a wider range of covariate values. This will be achieved using theoretical asymptotic considerations and using stochastic simulation. Bias in estimation will also be considered. The findings will be used to inform analyses of climate extremes and predictions of climate extremes from General Circulation Models. Later in the project the variability in climate predictions between different climate models, different model runs from different climate models and over different socio-economic emissions scenarios will be studied. Economic climate change impacts will also be considered.
Extending the 'isolation-with-migration' (IM) model (coalescent theory)
My main research goal is to extend ML estimation of population parameters (based on pairwise differences data from a large number of independent genetic loci) to the following cases: asymmetric migration rates; unequal subpopulation sizes and/or growing/decreasing populations; a model with additional stages; samples of m >= 2 genes from n >= 2 subpopulations; more complex mutation models.
- Hilde Wilkinson-Herbots
- Ziheng Yang
Quantifying uncertainty in climate models, under a Bayesian probabilistic framework
The notion of "multi-model ensemble" of climate simulators is now widely used in climate science. Particularly, information from many different climate simulators is collectively used to describe and predict the climate. The statistical framework as described in Chandler (2011), regards the properties of the projections from the different simulators as parameters which describe the output of a statistical model (which has a common form for all the different simulators). Those parameters are centred on the true climate plus the shared discrepancies of simulators from the true climate. Under a Bayesian probabilistic framework, a posterior distribution of the true climate, given simulator projections, is derived using the above ideas. The aim of my work it to further develop this framework. At the moment, I am applying it to surface air temperature data and constructing the full-Bayesian version of it, by considering different choices of hyperprior distributions, as well as techniques for obtaining hyperprior distribution parameters.
The polycystic ovary syndrome (PCOS) has been recognized as one of the commonest endocrine abnormalities, affecting millions of women worldwide nowadays. The economic burden of polycystic ovary syndrome to our health care is significantly high and accounts for a large percentage of total NHS expenditures.
The aim of this study is to, firstly estimate and compare the prevalence of PCOS patients among different ethnic groups according to different criteria. Our prevalence estimation will be based on both Bayesian hierarchical modeling strategy as well as data extraction from a database called THIN. We will compare the results obtained from both methods and then establish prevalence of PCOS among our population. Furthermore, based on our results of PCOS prevalence, the epidemiology of PCOS-related diseases will be evaluated. In addition, we will investigate treatments of various PCOS-related diseases and their associated costs so the financial burden that PCOS population represents can be worked out for NHS. Finally, we use cost-effective analysis to compare current and potential alternative treatments for PCOS so the optimal clinical decisions can be established for this complex syndrome.
Implementing Bayesian Inference for Ion Channel Modelling
His PhD is focussed on the application of Bayesian methodology in discriminating between and parameterising models of ligand-gated ion-channels.
- Mark Girolami
- Lucia Sivilotti
- Mr Dieter Girmes
Probabilistic Models for Adaptive Content Creation
My research focuses on the development of structured prediction models to build document templates and learn to customise texts or sentences according to user preferences and habits. Conditional language models to generate human readable text based on the specific target application and device appropriate algorithms for the generation of small pieces of text, such as introductory sentences will also be developed. This project draws upon recent advances in Natural Language Processing tools, Machine Learning algorithms and Stochastic Optimization techniques, in developing intelligent document creation tools. The research is be carried out in collaboration with the Xerox Research Centre, Europe, represented by Dr Cedric Archambeau and Dr Guillaume Bouchard
- Mark Girolami
- Cedric Archambeau
- Guillaume Bouchard
Estimation of Semiparametric Trivariate Models in the Presence of Endogeneity and Sample Selection
Endogeneity and sample selection bias are both caused by unobserved confounders. The former arises when unobserved variables are associated with both response and a variable of interest, while the latter occurs when the outcome of interest is observed for a restricted non-randomly selected sample of the population. The use of statistical models ignoring these issues can have severe effects on parameter estimation as biased and inconsistent estimates are expected. The proposed PhD research will build on a new well-founded theoretical and computational tool for fitting flexibly a model structured within a three-equation latent variable framework that can account for the two issues mentioned above.
Inference for networks
My main research focus at present is to derive new methodology for statistical network analysis – an important example of statistical inference for big data. My current work focuses on the basic questions of how, and whether, a given network can be partitioned naturally into communities. I take a statistical perspective on this question, because we need practical tools to distinguish true community structure from noise.
Statistical signal processing and machine learning for network traffic anomaly detection
Networked systems are increasingly being targeted by sophisticated cyber-criminals and hostile nations. The effects of cyber-espionage alone are estimated to cost around 0.2-1.2% of national income. Since current methods for detecting attacks are primarily based on prior knowledge of attack scenarios, previously unseen (“zero-day”) attacks are very hard to mitigate or even detect.
My project aims to build on recent machine learning approaches for attack detection by extending sparse structure learning techniques to model the correlation structure of network traffic in non-stationary, low-flow, non-Gaussian regimes.
Bayesian health economic modelling of human papilloma virus vaccination
The aim of this project will be to assess the cost-effectiveness of a quadrivalent (an innovative mixture of four HPV type-specific) vaccination strategy. A first analysis in the study BEST I, a co-operation between UCL and Sanofi Pasteur, has been based on four cohorts of females, aged 12, 15, 18 and 25. In this strategy, herd immunity could not be accounted for because just female cohorts were observed and the model was not dynamic. During the PhD research, several other intervention strategies will be evaluated. As a first step, cohorts of males will be included in the model to evaluate the changes in cost-effectiveness if boys are also vaccinated. Virus transmission between the two sexes is simulated with a sexual mixing matrix according to certain infection probabilities per sexual partnership with respect to sexual mixing corresponding to age. A full Bayesian Markov model based on 23 health states will be used to estimate lifetime HPV related events.
A modelling framework for estimation of benefit in uncontrolled clinical studies
Pharmaceuticals are most commonly licensed on the basis of randomised controlled trials. Occasionally however products are licensed without comparative data, and have studies where all patients received the investigational drug.
My research program involves identifying drugs licensed on this basis, identifying how economic modelling has been performed previously for this type of problem, and investigating the problem further using Bayesian statistics and suggesting further methods which may be applicable.
Ultimately the aim of the project is to create an algorithm for the appropriate use of modelling techniques when only uncontrolled data is available. This will include previously published methods, and any further work performed within the PhD.
Bayesian Computations for the Expected Value of Partial Perfect Information for Health Economic Evaluation using Gaussian Processes and INLA
Decision theory is a widely applicable area of mathematics which aids informed choice, given a range of options, by properly quantifying the uncertainties on available knowledge. I will focus on the economic cost of decisions in the health service. Typically, there are different treatments for conditions, with different costs and success rates. All treatments undergo trials but our knowledge of the effectiveness of treatments is inevitably subject to uncertainty. Further trials reduce this uncertainty but have an economic cost. There are substantial benefits, in outcomes and costs, to reliably predicting the cost of obtaining information to reduce uncertainty.
The traditional methods of determining the cost of this information are complex to implement and computationally intensive. Recently, researchers have developed approximation methods that are simpler and cheaper. Therefore, the cost of further trials can be included in the decision-making process. This project will further this field, allowing health service planners to use the cost of obtaining information in decisions. Specifically, I will develop approximation methods using Gaussian Processes and INLA, comparing these with approximation methods that are already available.
Non-parametric inference for stochastic differential equations
Structural Modelling and Multivariate Oscillations
The research is focusing on methods in understanding stochastic oscillations and developing oscillatory models in time series analysis. Non-stationary and multivariate extensions will be considered with applications in various physical fields.
Models for data that are missing not at random in health studies
Missing data due to attrition occurs in almost all longitudinal trials and observational studies. None of the standard statistical models such as multilevel or marginal models are valid when missing data are missing not at random (MNAR). To account for MNAR, model based approaches have been proposed which can broadly be divided into two classes, pattern mixture and selection models. These models make untestable assumptions and cannot be fitted easily using standard statistical software. One of the main objectives is to evaluate fully the performance of the existing models and to identify the best method and extend it if necessary. Based on these work recommendations on analysing MNAR data will be made to applied statisticians.
- Rumana Omar
- Andrew Copas
New statistical inverse methods for multispectral photoacoustic imaging
Photoacoustic imaging (PAI) is a novel hybrid imaging modality based on the use of laser generated ultrasound that combines the high absorption contrast and specificity of optical imaging with the high spatial resolution of ultrasound imaging. Multispectral PAI, acquired at multiple optical wavelengths, provides 3D structural, functional and molecular information of living biological tissue. If quantitative concentration distributions of the tissue’s chromophores can be accurately recovered, it can offer high potential in preclinical/clinical applications, such as cancer and brain/skin disorders. However, concentration recovery is an ill-posed inverse problem, mainly due to its nonlinear nature, its entanglement of spatial dependence and spectral correlation, and its large number of unknown spatial and spectral parameters to be estimated.
Multivariate analysis of X-ray diffraction data for investigating chemical structure and identifying materials
X-ray diffraction is a powerful tool for determining the chemical structure of materials. Tehcniques used to identify material structure in real-world applications, such as in medical diagnostics, result in less well-defined diffraction spectra than those observed in a controlled laboratory setting. The aim of the research is to find and develop statistical methods for identifying the important structural information of materials from the less well-defined diffraction data obtained using systems used in practice.
Statistical methods for infrared spectroscopic clinical diagnostics
My research focuses on the development of probabilistic models that describe the heterogeneity of different stages between diseased and healthy tissues. This project explores techniques and tools from Gaussian processes, Bayesian inference, approximate inference methods, and Markov Chain Monte Carlo method. I collaborate with UCL Hospital.
Pattern-clustering method for longitudinal data - heroin users receiving methadone
My current research is to develop a novel clustering algorithm that can be used for grouping patients by the methadone dosages, and to capture the time series nature of the data in order to help clinical investigators for the behaviors among subgroups.
Spatial uncertainties in a tsunami model: from bathymetry to run-ups
Tsunami modelling, uncertainty analysis, functional data analysis.
Markov Chain Monte Carlo Methods
I'm interested in Markov chain Monte Carlo methods (MCMC), and properties of MCMC estimators. I've worked on devising ways to explore the state space of a high-dimensional distribution with a highly non-linear correlation structure using a Markov chain, and how incorporating some ideas from differential geometry can aid efficient exploration. I'm now assessing the ergodic properties of chains produced by different methods (based on Langevin diffusions and Hamiltonian flow), which can lead to central limit theorems for estimators. I'm also particularly interested in quantitative and non-asymptotic bounds on Markov chain estimators, and efficient exploration of multi-modal distributions.
A systems approach to the analysis of ARC syndrome
The aim of the project is to identify molecular pathways implicated in the rare genetic disorder Arthrogryposis, Renal Dysfunction and Cholestasis (ARC) syndrome. The syndrome is caused by germline mutations in either VPS33B or VIPAR, proteins which are implicated in maintaining cell polarity. A number of biochemical pathways have been linked with progression of the disease and the project will include utilising the existing knowledge accumulated in the Gissen Lab to develop dynamic models of signalling pathways. Further to this, advanced Markov chain Monte Carlo methods will be employed and developed as required in the Girolami group to carry out statistical evaluation of the models.
- Mark Girolami
- Paul Gissen
Financial and physical risk management for energy projects
I am working on risk management problems applying methods of stochastic programming. I investigate how owners with different risk aversions can run a CHP plant optimally and in my next projects I will study how to manage financial and physical risk in nuclear plants and during transmission capacity expansion.
- Afzal Siddiqui
- Bert De Reyck
Mixtures of linear mixed effects models
Linear mixed effects models are well suited to modelling clustered data, for example repeated measurements on people over time. In such data the clusters often exhibit large differences in both mean levels of response, and the relationship between the response and time. For some of these data the assumption that the data follow a single normal distribution may be not hold and instead the distribution may be a finite mixture of normal distributions. For example in clinical trials there may be two distinct sub-populations that can be classed as responders or non-responders. In these circumstances mixtures of linear mixed effects models may offer advantages over a standard linear mixed effect model. My research will focus upon some theoretical/methodological aspects of these models, as well as searching for evidence of mixture distributions in datasets, in particular for repeated measurements data from clinical trials.
- Mr Dieter Girmes
Advanced MCMC methods in Hilbert space
The subject of my PhD primarily involves the development of advanced Markov Chain Monte Carlo (MCMC) algorithms to solve difficult stochastic problems.
Bayesian Nonparametric modelling of phenomena in retail analytics
I'm focusing on using Bayesian Nonparametric methods to improve the pricing models used retails analytics. I have a particular interest in Dirichlet and Gaussian processes. My broad aim to use nonparametric methods to model predictive densities that smoothly and flexibly change in changes of the data.
Full Bayesian methods to model utility measures using mixture of distributions
I use full Bayesian methods to model utility measures using mixture of distributions, in the context of Health Economic Evaluation. Although utility measures, such as QUALYs, are usually modeled using a single distribution, however, often observed data show multimodality. Therefore, it may be necessary to use mixture models to properly account for this.
Quantification of uncertainty for landslide-generated tsunami models
The project revolves around the improvement of a complex computational tsunami model, with a main focus on landslide-generated tsunamis. Since the simulation of tsunamis is very computationally expensive, the project will involve the building of an emulator, which is a statistical representation of the model that can quickly provide an accurate statistical prediction of the model's output. Using the emulator, one can also quantify uncertainties and produce a sensitivity analysis of the model.
How much data are required to validate a risk model?
Risk models are becoming increasingly important in health care, and are typically used to provide information to both clinicians and patients, and to facilitate fairer, risk-adjusted comparisons between healthcare providers. Despite the importance of these models, relatively little research has been conducted into how many patients are required to both develop a new risk model and validate an existing model. Risk models developed or validated using small datasets may be inaccurate which may have implications for patient care. The focus of my research is to ascertain the amount of data required to develop and validate a reliable risk model.
Advanced Monte Carlo Methods for Risk and Insurance
Statistical analysis for highly structured data
This project will develop novel statistical learning techniques to understand and classify highly structured information such as that found in hyperspectral remote sensing data. Unlike previous work, the approach will attempt to manage the rich, dual spatial-spectral nature of the data by considering recent advances in spatial visual attention modelling and statistical spectral band selection weighting developed by the supervisory team and collaborators.
Statistics and computational intelligence
My objective is to establish links between statistical techniques and computational-intelligence approaches.
Assessment of the potential and implications of analysing multiple outcomes collected in clinical trials simultaneously using multivariate modelling
My research explores the benefits of analysing clinical trials data with multiple outcomes using multivariate modelling compared to standard methods. I will develop practical guidance, sample size calculation and easy-to-use software for multivariate models.
Flexible sample selection count data modelling
Sample selection occurs when observations are not from a random sample of the population. Instead, individuals may have selected themselves (or have been selected by others) into (or out of) the sample, based on a combination of observed and unobserved characteristics. This problem can be addressed using a sample selection model. Various variants of this model and related estimation procedures have been developed. This research will focus on developing estimation methods for count data that can account for non-random sample selection and model flexibly covariate effects.
Page last modified on 14 nov 12 15:45