XClose

UCL's Centre for Data Intensive Science

Home
Menu

Available PhD Projects

The Centre primarily carries out research in STFC's flagship Data Intensive Science projects, in High Energy Physics and Astronomy, which have been at the forefront of DIS research for several decades and provide the ideal training ground for DIS.

You can find below a provisional list of the studentships projects which will likely be available for the Sept 2024 intake. Applicants should keep in mind the projects outlined below are a starting point for a conversation on the exact research project that will be undertaken during the period of the PhD. Projects will be assigned after you have accepted a place in the CDT, with students able to further discuss their final project choice and topic with perspective supervisors at that point. For further details on any of the projects please contact the project supervisor.

Additional projects may be added to the list prior to the interviews and if there is an area which you would really like to undertake a project on, but you don't see listed below please get in touch with us: dis-cdt-phd-admissions@live.ucl.ac.uk

COLLIDER/ATLAS - What happened one picosecond after the Big Bang? (Novel ML and edge-AI techniques to search for Higgs pair production at the LHC)

Supervisor: Prof Nikos Konstantinidis & Prof Andreas Demosthenous

The Electroweak Phase Transition, one of the most dramatic and defining moments in the evolution of our Universe, happened around 1ps after the Big Bang. The shape of the Higgs field potential is intricately linked to that moment. In this project, you will continue the quest for the discovery of Higgs pair production, the most sensitive process to give direct access to the Higgs potential. The project will give you the freedom to take the directions that suit best your interests and skills, from employing novel ML techniques (such as normalising flows and diffusion models) in the offline analysis of data from the latest LHC running period, to developing and implementing novel ML algorithms on FPGAs for the future hardware triggering system of ATLAS in the High-Lumi LHC era.

COLLIDER/ATLAS - Transformers at the Energy Frontier

Supervisor: Dr Gabriel Facini & Prof. Tim Scanlon

The UCL Group has pioneered the usage of graph neural networks (https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PUBNOTES/ATL-PHYS-PUB-2022-027/) and transformers (https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PUBNOTES/ATL-PHYS-PUB-2023-021/) for the identification of b-hadrons (think Higgs physics)

We offer an interdisciplinary PhD project that is a unique opportunity to work on cutting-edge research with ATLAS Collaboration at CERN. The project's main goal is to develop methods to improve the current state-of-the-art transformer-based model GN2 and GN2X used by ATLAS to classify events produced in the Large Hadron Collider (LHC).

The GN2 family of models leverages auxiliary task learning to improve main task performance by learning multiple tasks simultaneously. Relationships between tasks and advanced auxiliary learning techniques is an area of active development. Hence, there is a lot of potential to improve the state of the art. Still, you will also be welcome to propose your own research ideas.

You will have a chance to gain valuable practical experience with auxiliary task learning which has also proved to be useful for reinforcement learning, computer vision, NLP, or robotics. Moreover, GN2 will be crucial for discovering new physics in future LHC runs, so you would directly contribute to uncovering the secrets of our universe!

There are several analyses where these techniques can be deployed within ATLAS. A preference is http://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/HIGG-2021-08/ - looking for physics beyond the reach of the LHC in the Higgs sector by exploiting the Higgs boost to probe higher scales of new physics. However, the student will be given options to choose from including long-lived particle searches or heavy resonances decaying to a Higgs boson. 

Beyond that, we work on side-projects for fun like exploring the usage of LLM within the collaboration to build custom-knowledge chatbots, understanding plots (images), and generally accelerating the advancement of science.

We invite those with the ambition to learn and the motivation to work at the bleeding edge to apply.

If you would like more information, reach out for a chat. We can discuss your interests and see what road that brings us down. A student working on a project they believe in or what to work on is the best kind of student to work with. Instead of giving you more details of my vision, I invite you to meet and build a vision together.

 

 
COLLIDER/ATLAS - Opening the doors to knowledge

Supervisor: Dr Gabriel Facini (& Prof. Jon Butterworth)

At UCL, we're not just advancing the field of particle physics; we're pushing to change the way research is conducted by making information more accessible. This touches on the ideas of open-source, but also theory-experiment connection.

Why Open Source Matters:
Facilitating Cutting-Edge Algorithm Sharing: By working on open source platforms, you'll bridge the gap between different collaborations, such as ATLAS and future colliders. This initiative fosters the seamless transfer of groundbreaking algorithms, accelerating scientific discovery.
Cross-Community Collaboration: Your work will extend beyond the confines of particle physics, inviting talents from diverse fields like computer science to contribute their expertise. Our goal is to contribute to an ecosystem where innovative minds can experiment on reconstruction algorithms and collaborate without the complexities of joining large collaborations.
Experiment-Theory Knowledge Transfer: For a new theory to be tested, it often takes an experimentalist to pick it up, bring it to the collaboration, work on the topic and publish it. That can take years. By publishing more information that is readily usable by theorists, they do a first-order test of their models in days to weeks.

Your Role and Projects (2 of 3):
Contribute to Open Data Detector (ODD) & A Common Tracking Software Package (ACTS): You will be instrumental in enhancing the general infrastructure of ODD and ACTS, ensuring these platforms remain at the forefront of particle physics research.
Develop and Port Advanced ML Techniques: Leverage your skills to develop cutting-edge machine learning techniques. These techniques are not just academic exercises but tools designed to unravel the mysteries of the universe (Various types of transformer networks)
Measure processes in extreme regions of phase space: Experimentalists often give theorists very useful information (unfolded data) but usually in only very specific regions of phase-space. The goal is to give the same level of information in different regions of phase-space i.e. the extremes or the true energy-frontier. (In collaboration with Jon Butterworth)

Physics Projects:
Flavour tagging / Boosted Higgs / LLP / Tau Tagging:
There is a rich and interesting set of physics results that are backboned by high-performance ML tagging algorithms (tagging = classification). This ranges from Di-Higgs production looking at how the universe evolved, to Boosted Higgs physics looking for new heavy particles that explain Dark Matter (DM) or the matter-antimatter asymmetry or the Higgs boson mass itself, to long-lived particles which can again connect to DM or Dark Energy. There are many routes to interesting physics from this position and we can follow any you like.
Measurements for interpretation: Several interesting measurements can be defined based on a lack of knowledge of specific processes. Or one can look to drop into a large number of existing projects and enable the measurement of specific regions that fit our criteria. (In collaboration with Jon Butterworth)

COLLIDER/ATLAS - Anomaly detection in ATLAS using machine learning

Supervisor: Prof Mario Campanelli & Prof Nikos Konstantinidis

After over a decade of unsuccessful searches for new physics at the LHC, the attention is shifting more and more towards model-independent searches that exploit the rapid advancement of machine learning from last years. Detecting anomalies means searching for events that differ from the bulk of Standard Model collisions recorded by the detector, in a novel and model-independent way. Instead of searching for a specific model of new physics, the events are classified into classes using unsupervised techniques, mapping the emitted particles to a lower-dimensional space the various regions corresponding to known physics identifies using Monte Carlo simulations. Anomalous events laying outside of the known boundaries may indicate unforeseen phenomena. The work will also explore the possibility to perform this searches at trigger level, for the high-luminosity upgrade program.

The work will be done in the framework of the ATLAS experiment, and will use autoencoders and unsupervised classifiers to search for new physics in hadronic events. Anomaly detection is a rapidly growing field of research, where the search for new physics is not performed searching for specific models, but rather looking for outliers in data and having the machine learn the varous event topologies. This approach can find events not foreseen by any existing model, or in the more conservative case, indicate detector issues. The student will start by applying these techniques to data taken by AZLAS during Run 3, then to the study of the possibility of anomaly detection for the high-luminosity upgrade of the trigger system.

COLLIDER/PHENOMENOLOGY - Unveiling anomalies at the Large Hadron Collider with cutting-edge precision calculations

Supervisor: Dr Christian Gutschow & Prof Jonathan Butterworth

Harnessing the combined strength of CPU- and GPU-accelerated computing infrastructure, this project will develop an automated tool chain that delivers unparalleled coverage of Standard Model predictions to facilitate direct comparisons with published collider measurements, ensuring that even the most subtle variations in the data are not overlooked. Get ready to uncover the anomalies that may redefine our understanding of the universe.

DARK MATTER/LZ - Searching for Dark Matter with the LZ Experiment

Supervisor: Dr Amy Cottle & Prof Chamkaur Ghag

The LZ experiment is at the forefront in the quest to observe galactic dark matter and has now entered uncharted electroweak parameter space, with the ability to discover or provide constraints on the foremost dark matter theories. The successful applicant will be working at the cutting edge of LZ’s flagship dark matter search, currently being led by the UCL group. Projects will focus on applying advanced data science and machine learning techniques to the exploration of new data being presently acquired by LZ, in such areas as event classification, pulse characterisation and background rejection. The candidate will play an active role in new physics results from LZ, with the course of the studentship well-aligned with the data-taking schedule and foreseen science output of the experiment.

NEUTRINO OSCILLATIONS - Machine Learning re-analysis of MINOS/MINOS+ neutrino oscillation data

Supervisor: Prof Jennifer Thomas & Prof Ryan Nichol

Neutrino oscillations are being studied around the world, with new experiments being planned to come online in the next decade. The existing experiments are struggling to make improved measurements owing to the very low event rate measured in the detectors. This is because neutrinos have a very low interaction probability, and for that reason the detectors have to be very large. After the experiments have been running for some years, the statistical improvement year on year becomes very modest, and radical analysis improvements are called for to make better measurements. To this end, the existing experiments have already incorporated machine learning in their analysis. Furthermore, experiments are pooling their neutrino events to try to get more reach on the precision of the oscillation parameters.

MINOS was a the gold-standard long baseline experiment which took data between 2006 and 2016. Its results are presently being combined with those of the NOVA experiment to improve oscillation parameter precision accuracy. However, the MINOS data has not been subject to any machine learning improvements and is ripe for this kind of treatment to improve its precision.

The project would entail using existing MonteCarlo and Data samples, upgrading the analysis and storage of that data to use modern tools, and studying the improvements in the event reconstruction that can be achieved using modern machine learning techniques. This would likely result in a new publication within the time frame of the PhD.

NEUTRINO/COSMO/ASTRO/P-ONE - New frontiers in multi-messenger neutrino astronomy with machine learning imaging

Supervisor: Dr Matteo Agostini & Prof Nikos Konstantinidis

Neutrinos from the edge of the observable universe are revolutionizing our understanding of astrophysical systems at the ultimate energy and gravitational frontiers. Giant neutrino telescopes will soon get online, posing extraordinary analysis and computational challenges. This project aims to tackle these challenges by employing statistical and machine-learning techniques developed within a cross-disciplinary framework, ultimately applying them to extract physics results from the Pacific Ocean Neutrino Experiment (P-ONE).

ASTRO - The Galactic Genome Project -- stellar characterisation with the Gaia satellite and machine learning

Supervisor: Prof Jay Farihi & Dr Jason Sanders

In 2022, the Gaia satellite released over 200 million low-resolution spectra for all stars in the sky brighter than 17th magnitude. These unprecedented XP spectra contain the chemical fingerprint (like DNA) for each star, which can used to deduce rare and exotic events such as nearby supernovae, neutron star mergers, stellar cannibalism, even planet ingestion(!). The project aims to identify exotic stellar DNA within the Gaia XP data. The data volume, multiple examples, and the uniformity of these space-based spectra make the project well suited for machine learning techniques. In addition to identification and catalogues of chemically peculiar sources, the project offers the opportunity for further scientific exploitation by the student. Groups of stars that are prime candidates for machine learning identification are carbon stars as well as other sources with marked chemical depletions (metal poverty) or enhancements (nitrogen), and white dwarfs. Such groups of stars contain key signatures that address many fundamental questions about the Milky Way, Galactic chemical evolution, and the very oldest stars in the Universe.

ASTRO/COSMOLOGY/ASTROSTATISTICS - Probabilistic deep learning for cosmology and beyond

Supervisor: Prof Jason McEwen

In the proposed project we will develop probabilistic deep learning approaches, where probabilistic components are incorporated as integral components of deep learning models. Similarly, we will also develop statistical analysis techniques for which deep learning components are incorporated as integral components. For further details see: http://www.jasonmcewen.org/opportunities/#phd-project-1

ASTRO/COSMOLOGY/ASTROSTATISTICS - Geometric generative AI for cosmology and beyond 

Supervisor: Prof Jason McEwen

In this project we will develop generative geometric deep learning techniques for the analysis of cosmological data obsvered over the celestial sphere. The focus of the current project is two-fold. First, further foundations of geometric deep learning on the sphere will be developed, including new types of spherical deep learning layers and architectures, in order to address the open problems in the field, such as scalability, interpretability, and generative models. Second, geometric deep learning techniques on the sphere will be applied to the analysis of cosmological data of the CMB and of cosmic shear, in particular from Euclid and the Rubin Observatory, in order to better understand the nature of dark matter and dark energy.

    ASTRO/EXOPLANETS - Machine learning spectral assignments

    Supervisor: Prof Jonathan Tennyson & Prof  Sergey Yurchenko

    The ExoMol project (www.exomol.com) provides laboratory data (largely computed) on the spectroscopic properties of molecules for studies of exoplanets and other hot astronomical atmospheres. Molecular spectra contain a wealth of information about the molecule concerned and about the environment in which the spectra are recorded. They are both rich and complicated making them difficult and time-consuming to unravel. For example, the high profile assignment of the spectrum of water vapour in the Sun (actually in sunspots) by the UCL group only actually managed to assign (attach quantum numbers to) about 20% of the observed lines. The project will develop machine learning tools to allow such spectra to be fully assigned. Various algorithms will be explored including a bootstrap procedure where partial assignments are fed back into the learning algorithm to give an improved model allowing further assignments. There are a wealth astronomically important spectral data that such a procedure could be applied to.

    ASTRO/EXOPLANETS + MACHINE LEARNING - Exploring the interplay between uncertainty quantification and interpretability of machine learning models

    Supervisor: Dr Nikos Nikolaou & Prof Ingo Waldmann

    The project will investigate the question: ‘Can we make machine learning models more interpretable by improving how they quantify uncertainty over their predictions?’ Machine learning (ML) algorithms are driving innovation across domains, exoplanetary science & biomedicine among them. Two important yet often overlooked aspects are model interpretability and uncertainty quantification. Most popular ML algorithms (e.g. deep neural networks, ensembles), are notorious for being 'black boxes', but several model interpretability methods -of various degrees of reliability- have been developed to allow us understand their inner workings, to uncover hidden biases in the models (or the data), to increase trustworthiness and adoption, to inspect when and how they fail, or to uncover new domain knowledge. Similarly, popular ML algorithms are also known to suffer from poor uncertainty estimation -crucial for cost-sensitive applications (common in biomedicine) and for prioritizing objects for further analysis in large datasets (common in exoplanetary science). Regression models often fail to provide cohesive confidence intervals and classification models often ignore class membership probabilities -or obtain systematically unreliable estimates thereof. Techniques to address these issues include conformal prediction and probabilistic calibration. In this project we will investigate if these two weaknesses are connected and by addressing one, we improve upon the other. We will explore this interplay in the context of ML models for select exoplanetary science and biomedical applications, more specifically, for inferring atmospheric parameters from spectra obtained from extrasolar planets (exoplanet characterization) and for histological whole slide imaging (WSI) classification.

    COSMOLOGY/GRAVITATIONAL WAVES - Deep learning for dark matter and dark sirens

    Supervisor: Prof Benjamin Joachimi & Dr Niall Jeffrey

    In this project we will apply machine learning and data science techniques, such as data compression via geometric deep learning and inference via neural density estimation, to constrain cosmology through the cosmic dark matter distribution traced by large galaxy surveys and gravitational wave sources.

      EXOPLANETS/EXTRAGALATIC ASTRO/COSMOLOGY- Causal Machine Learning in Astrophysics and Beyond

      Supervisor: Prof Ofer Lahav & Dr Nikos Nikolaou

      Recent advancements in Machine Learning (ML) have enabled the efficient training of powerful statistical models from large amounts of high-dimensional data in various application domains, Astronomy included. Yet current learning systems are still almost exclusively operating on the level of statistical associations/correlations among the observed variables. The next big step in the field will involve causal modelling; moving beyond simply capturing statistical associations to modelling cause-and-effect relationships among the underlying variables. Causal discovery aims to identify causal structure from data ('Variable A has a causal effect on variable B.') and causal inference to predict the results of intervening on variables ('What if I do X?') or -going a step further- of asking counterfactual questions ('What if I had done Y instead?'). This CDT PhD project will be among the first to explore applications of causal ML algorithms in astronomy, particularly in the study of exoplanets and galaxies. In the exoplanetary literature, causal ML methods (half-sibling regression) have so far been applied for decoupling observations from instrument systematics only in the context of exoplanet detection from transit light curves. The project will apply and extend methods to exoplanet characterization (inferring exoplanet atmospheric parameters from spectra) for simulated data from the Ariel mission (data available) and beyond. In extragalactic astronomy causal ML methods have previously been applied by members of the project team to simulations (IllustrisTNG) for understanding the effect of environment on star formation in galaxies (‘nature vs. nurture’). Key findings include that local density is found to be suppressing star formation at redshift z < 1, while the situation is reversed at higher redshift and that the mass of the halo is found to be a confounder. This project will explore application of these methodologies to real data from DES & DESI (data available), Euclid & Rubin-LSST (data expected during the PhD phase). There is synergy among the sub-projects. For instance, when exploring real galaxy data, the methods for decoupling instrument systematics from observations (exoplanet application) can also be used. This is an ambitious project with the potential to advance both causal ML and the two application subareas of Astronomy.

        INVERSE PROBLEMS/MACHINE LEARNING/ASTRO - Inverse Problems in Imaging with Uncertainty Quantification

        Supervisor: Prof Marta Betcke & Prof Jason McEwen

        Inverse problems are ubiquitous in medicine, science and engineering whenever the quantity of interest can only be indirectly accessed via some, usually lossy, measurement procedure. Images/volumes provide a convenient way of visualising spatial distributions. Sophisticated imaging techniques capture data which allow for reconstruction of high resolution images/volumes resulting in high dimensional inverse problems. The advances of the last two decades with convex optimisation formulation of compressed sensing and, more recently, in adaptation of machine learning techniques to the solution of inverse problems delivered methods capable of impressively looking reconstructions [1,2].

        Many of these successful approaches fall in the category of variational reconstruction methods and admit an interpretation as maximal a posteriory estimate in the Bayesian sense. However, neither in the case of analytical priors e.g. total variation, nor - in particular - in the case of learned priors (priors or denoisers learned on imaging training sets), they offer insights about the uncertainties in the produced reconstructions. In some inverse problems e.g. X-ray or Photoacoustic tomography, the forward operator preserves the singularities, which allows limited analysis when using priors which focus on singularities in the image such as total variation, directional total variation and sparsity in directional frames such as directional wavelets or curvelets [1].

        The need became even more apparent with the widening gap between the image quality delivered by the “black box” learned priors and denonises and the classical approaches. In particular, the recent developments in generative modelling such as e.g. diffusion priors, produce very visually convincing images which drives the interest in adapting such priors for imaging applications. However, many imaging applications require complete transparency and interpretability of the reconstructions for instance in medical or security sector, which these methodology at present cannot offer.

        To address these shortcomings research on uncertainty quantification for inverse problems has been gaining momentum recently, however the dimensionality of the imaging inverse problems is a serious obstacle for example for sampling based uncertainty quantification. The goal of this project is to tackle the dimensionality challenge and develop efficient numerical methods for uncertainly quantification for large scale imaging inverse problems.
        We will focus on the frontier of the reconstruction methods which combine the state of the art hybrid model and data driven reconstruction techniques and generative modelling and will explore insights from Monte Carlo methods, (stochastic) differential equations, optimisation, multi-scale methods and numerical analysis to develop efficient methods with provable guarantees. We will be motivated by applications in medical and preclinical imaging such as X-ray computed tomography, Photoacoustic tomography and MRI as well as applications in Astronomy such as large radio telescope surveys.

        An ideal candidate for the proposed research program will have a degree in mathematics or equivalent (statistics, computer science, physics or engineering degrees with high content of mathematics a.k.a.: linear algebra, analysis, multi-scale calculus, numerical analysis, ordinary and partial differential equations, statistics) and programming experience in at least one of: Python, Matlab, C++. In addition, background in machine learning, optimisation and image processing would be very beneficial.

        [1] Bolin Pan and Marta M. Betcke, On Learning the Invisible in Photoacoustic Tomography with Flat Directionally Sensitive Detector, SIAM Journal on Imaging Sciences 2023 16:2, 770-801
        [2] Matthijs Mars and Marta M. Betcke and Jason D. McEwen, Learned Interferometric Imaging for the SPIDER Instrument, 2023 arXiv 2301.10260

        INVERSE PROBLEMS - Fusion of magnetic induction tomography with X-ray CT for detection and classification of concealed threats and fine defects

        Supervisor: Prof Simon Arridge & Prof Marta Betcke

        Magnetic induction tomography (MIT) allows reconstruction of conductivity which is an important material parameter when looking for threats e.g. sharps or guns in luggage, packages or containers which are routine security tests performed at large scale worldwide in the context of transit of goods and people. However, requirement on penetration depth, for instance into cargo containers which act as electromagnetic shields, restricts the use of high frequencies for probing, ultimately leading to relatively low resolution reconstructions [1].

        The objective of this project is to overcome the low resolution limitation to tap into the full potential of this modality. To this end we propose to combine MIT with a high resolution modality such as X-ray CT to perform joint reconstruction of conductivity, and possibly also magnetic and electric permeabilities, along with the X-ray relevant material parameters such as linear attenuation in the simplest model, with other parameters possible using more sophisticated non-linear X-ray transport models. Under assumption of correlation of the discontinuities in the electromagnetic material parameters and linear attenuation, complete data X-ray CT will provide us with sharp and accurate contours. These contours can be used to regularise the more severely ill-posed MIT reconstruction problem using methods such as parallel level sets [2] or directional total variation [3]. Richer X-ray data such as e.g. dual energy or scatter measurements could provide additional constraints such as for instance on the range of values of the conductivity parameters which can be combined with the aforementioned total variation functionals via infimal convolution to further constrain the MIT reconstruction and possibly relax the requirements on complete X-ray attenuation data which could be beneficial in case of e.g. large containers to which we may have one sided access only.

        [1] Alexander J. Hiles, 2020, PhD thesis. Novel algorithms for magnetic indiction tomography with applications in security screening

        [2] Ehrhardt, M. J., Thielemans, K., Pizarro, L., Atkinson, D., Ourselin, S., Hutton, B. F. & Arridge, S. R. Joint reconstruction of PET-MRI by exploiting structural similarity, 2014, Inverse Problems. 31, 1, 015001.

        [3] Ehrhardt, M. J., and Betcke, M. M., 2016, Multi-contrast MRI reconstruction with structure-guided total variation. SIAM Journal on Imaging Sciences. 9, 3, p. 1084-1106

          Title placeholder

          Content placeholder

          Title placeholder

          Content placeholder