Jools Clarke
I’m excited about bringing interpretable machine learning methods to long-standing fields in a way that can foster new discoveries and be beneficial for everyone.
10 June 2024
Project title: Investigating performance, interpretability & resource efficiency trade-offs of machine learning models for exoplanet data
Research Group: Astrophysics
Supervisor(s): Dr Nikos Nikolaou
Introduction:
I'm an Astrophysicist and Machine Learning student currently pursuing my PhD with the Data Intensive Science cohort and Exoplanet Group. My research is centered on leveraging advanced machine learning techniques to streamline exoplanet spectral retrieval for ESA's upcoming ARIEL mission. In addition to my research endeavors, I am actively involved in teaching data analysis at UCL Observatory and collaborating with industry partners at Mediatek Research on developing novel optimizers for image classification and object detection. I am particularly excited about introducing interpretable machine learning methods to longstanding fields, with the aim of fostering new discoveries and benefiting everyone involved.
Project description:
Machine learning (ML) algorithms are driving innovation across domains, exoplanetary science among them. However, training ML models and using them to perform inference -especially modern deep learning architectures- can be very demanding in terms of data, memory & computation. This, in turn, translates to high data collection costs & energy requirements. The recent trend for increased training set size coupled with increased model complexity, although leading to improved predictive performance, is proving inefficient and unsustainable.
Innovation is driven by efficiency, not small performance gains at a massive cost. Consequently, research is focusing on making ML models more resource-efficient to train or deploy for useful predictions, aiming to minimise model size and training set size without compromising performance. Techniques like transfer learning, weight pruning, and active learning offer different efficiency benefits in various scenarios. Simultaneously, the interpretability of ML models, especially in scientific applications where gaining insights is crucial, is becoming increasingly important. The objective of this project is to explore the trade-offs between performance, interpretability, and efficiency of ML algorithms in exoplanetary science, addressing key questions in exoplanet detection and characterisation while advancing ML efficiency and interpretability.
Proposed subtasks for the CDT PhD project:
1. Exploration of performance vs. interpretability vs. efficiency trade-offs in inferring atmospheric characteristics from exoplanetary spectra. The goal would be to use 2 or more resource-efficient training approaches involving active or transfer learning to improve efficiency without compromising performance. Appropriate model interpretability methods would then be applied to investigate the degree to which the resulting models rely on different aspects of the data.
2. As above, but applied to the task of predicting joint distributions of planetary parameters.
3. Refinement of resource-efficiency approaches used to jointly optimise for all 3 objectives under investigation (low cost, high predictive performance & high interpretability).
First year group project: MediaTek Research
Placement: