The 67th CREST Open Workshop on AI-powered Software Engineering
30 June 2025–01 July 2025, 10:00 am–5:00 pm
Event Information
Open to
- All
Availability
- Sold out
Organiser
-
Dr. He Ye, Prof. Justyna Petke, Prof. Federica Sarro, Carol Hanna, David Williams – CREST Centre, SSE Group, Department of Computer Science, UCL, UK
In today's rapidly evolving tech landscape, AI is revolutionising the way we design, develop, test, and maintain software. In this workshop on AI-powered Software Engineering, we will deep dive into cutting-edge techniques and tools that are redefining development workflows. Designed to spark calready sent an emaritical discussions and foster collaboration, this workshop invites participants to explore the transformative impact of AI on modern software engineering practices.
All talks at this workshop are by invitation only. Talks will be a maximum of 20 minutes long with plenty of time for questions and discussion. We also hope that the workshop will foster and promote collaboration, and there will be time set aside to support this.
Participants are expected to attend the whole event in person since the workshop is interactive and discursive. There is no registration fee, due to kind support of grant by Meta. Light lunches, will be included, along with the usual refreshments all at no charge.
Policy on Student Registrations
We welcome registrations from PhD students, where the student is pursuing a programme of research for which the COW will provide intellectual benefit and/or from whom the workshop and its other attendees will gain benefit. We do not normally expect to register students other than those on PhD level programmes of study. For example, those students taking a course at the equivalent of UK masters or bachelors level would not, ordinarily, be considered eligible to register for COW. However, we are willing to consider exceptional cases, where a masters or bachelors student has a clear contribution to make to the topic of the COW. In all cases, students must have the approval of their supervisor/advisor for their attendance at the COW and their consent to the terms of registration. This is why we ask that students seeking to register for a COW also supply the contact details of their supervisor.
Cancellation Fee
Please appreciate that numbers are limited and catering needs to be booked in advance, so registration followed by non-attendance will cause difficulties. For this reason, though the workshop is entirely free of charge, there will be a cancellation fee of £100 for those who register but subsequently fail to attend.
Schedule
Day 1 - 30th June 2025
10:00 Welcome & Introductions
10:45 Satish Chandra, Google
11:15 Baptiste Rozière, MistralAI
- Code Assistants: From Completion to Agents
This presentation explores how large language models can be trained to power code assistants. We will discuss key applications such as in-IDE code completion and agentic capabilities. The attendees will get an overview of the use of LLMs for code assistants, and some insight into pre-training and post-training methodologies.
11:45 Group Photo
11:50 Lunch
13:20 Joost Noppen, British Telecom (BT) Digital
- Beyond Code Prediction: Working on the Bigger Challenges in Software Engineering with AI
MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/2jCEJ8JJ Over the last few years, AI has arrived in the software engineering workplace, and for many developers it already has become a vital tool in their toolbox. What started out as more comprehensive code completion has quickly turned into conversational learning and problem solving and supportive, more autonomous function such as bug identification, proposes patches and raising merge requests. It therefore seems a suitable time to look forward to what is next for AI in the Software Engineering space. What are the new (or old) bottlenecks we still experience in the workplace, and what are the profound challenges we face to resolve them? In a whistlestop tour that will be a mix of technical, such as architecture and quality, and organisational challenges, such as team composition and conformance, in this talk I will examine the next generation of opportunities we can now see emerge in an attempt to kick off a long-term research agenda for AI in Software Engineering.
13:50 He Ye, University College London (UCL)
- Constructing a Unified Knowledge Graph from Codebases
MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/1F1bb9Jc
14:20 Tea/Coffee Break
14:50 Mark Harman, Meta/University College London (UCL)
- Mutation-Guided LLM-based Test Generation at Meta
MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/H52fejcj This talk will cover Meta’s work on the Automated Compliance Hardening (ACH) tool, which uses mutation testing to guide Assured LLM-based Software Engineering. ACH generates relatively few mutants (aka simulated faults), compared to traditional mutation testing. Instead, it focuses on generating currently undetected faults that are specific to an issue of concern. From these currently uncaught faults, ACH generates tests that can catch them, thereby `killing' the mutants and consequently hardening the platform against regressions. ACH also deploys an LLM-based equivalent mutant detection agent that achieves a precision of 0.79 and a recall of 0.47 (rising to 0.95 and 0.96 with simple pre-processing). ACH was used by Messenger and WhatsApp test-a-thons where engineers accepted 73% of its tests, judging 36% to relevant. The talk will review Assured LLMSE, LLM-based test generation and mutation testing work at Meta. Slides are based on FSE 2025 industry track talk, and on Mutation 2025, EuroSTAR 2025 and FSE 2025 keynotes.
15:20 Phil McMinn, University of Sheffield
- AI for Test Suite "Health"
MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/2GJ6IBBd
15:50 Breakout Session
17:00 Day 1 Closing Remarks
Day 2 - 1st July 2025
10:00 Pastries
10:30 Jie Zhang, King's College London (KCL)
- Benchmarking and Improving the Efficiency of Automatically Generate Code
MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/dEIj72cJ Large Language Models (LLMs) are increasingly becoming integral to modern software development workflows. While recent research has extensively evaluated the correctness of code generated by these models, the efficiency of the generated code has received significantly less attention. Yet, code efficiency is critical for building scalable, high-performance, and sustainable systems, particularly in resource-constrained environments such as mobile devices and embedded systems. In this talk, I will introduce our recent efforts to benchmark the efficiency of code generated by LLMs and explore techniques to improve it. These include prompt engineering strategies and fine-tuning methods designed to guide LLMs toward producing more efficient code without compromising correctness.
11:00 Miltos Allamanis, Google Deepmind
- Execution-driven Feedback for Disproving Program Properties with LLMs
AI-powered software tools have made remarkable progress, but their outputs often require painstaking validation by humans or partial oracles, such as unit tests, which provide limited checks. To address these challenges and enhance AI’s autonomy and output quality, we explore how LLMs can ensure that some program properties hold.
11:30 Yutian Tang, University of Glasgow
- LLM-CompDroid: Repairing Configuration Compatibility Bugs in Android Apps with LLMs
MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/HDIA6GBa XML configurations are essential to Android UI development but often introduce compatibility bugs across different API levels, leading to inconsistent visuals and system crashes. In this talk, I will present our investigation into using large language models (LLMs) to detect and repair these bugs. While LLMs have limitations, we found they excel at resolving complex issues where traditional tools struggle. Building on this, I will introduce LLM-CompDroid, our hybrid framework that combines LLMs with existing techniques. LLM-CompDroid-GPT-3.5 and GPT-4 outperform the state-of-the-art tool ConfFix by over 9.8% in key accuracy metrics, offering a promising step toward more reliable Android applications.
12:00 Lunch
13:00 Gunel Jahangirova, King's College London (KCL)
- Comparative Analysis of Carbon Footprint in Manual vs. LLM-Assisted Code Development
MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/5hcF2FH6 Large Language Models (LLM) have significantly transformed various domains, including software development. These models assist programmers in generating code, potentially increasing productivity and efficiency. However, the environmental impact of utilising these AI models is substantial, given their high energy consumption during both training and inference stages. This research aims to compare the energy consumption of manual software development versus an LLM-assisted approach, using Codeforces as a simulation platform for software development. The goal is to quantify the environmental impact and propose strategies for minimising the carbon footprint of using LLM in software development. Our results show that the LLM-assisted code generation leads on average to 32.72 higher carbon footprint than the manual one. Moreover, there is a significant correlation between task complexity and the difference in the carbon footprint of the two approaches.
13:30 Carol Hanna, University College London (UCL)
- AI-Powered Advances in Genetic Improvement
MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/a55dj2hi Genetic Improvement (GI) has long leveraged search-based techniques to automatically optimize software by evolving existing code. Recent breakthroughs in artificial intelligence particularly in machine learning, natural language processing, and code generation have opened new avenues for enhancing the effectiveness, efficiency, and usability of GI.
In this talk, I present our recent research that integrates AI into multiple stages of the GI pipeline. We explore the use of large language models as mutation operators, including both code replacement and context-aware masking to generate high quality edits. Additionally, we investigate reinforcement learning approaches for dynamic operator selection during search, replacing static heuristics with adaptive, reward-driven policies for both functional and non-functional software properties. We also demonstrate how AI models can complement traditional GI in helping to bridge the gap between automation and human understanding.
This growing synergy between AI and GI marks a significant step forward in the field. By blending the principled structure of classic GI with the generative and adaptive power of modern AI, we can begin to realize tools that are both more powerful and more accessible to developers.
14:00 Breakout Session
15:15 Tea/Coffee Break
15:45 Haoxiang Jia, Peking University
- Automated Repair of Ambiguous Natural Language Requirements
MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/bBDcf6h5 In an era where large language models are increasingly relied upon for code generation, a critical yet underexplored challenge threatens their effectiveness: the ambiguity inherent in natural language requirements. When developers provide vague or multi-interpretable specifications to AI coding assistants, the resulting code often fails to meet intended functionality. While existing methods focus on improving models' reasoning capabilities or asking users for clarification, these approaches often generate lengthy reasoning analysis or irrelevant queries that burden developers without addressing the underlying issue. In this talk, I will introduce our novel approach that automatically repairs requirements to eliminate ambiguity at its source, demonstrating how minimal, targeted modifications to natural language specifications can improve code generation quality across different models without requiring human intervention.
16:15 Paul Baker & Rebecca Moussa, Chase (J.P. Morgan)
- Improving Software Engineering Productivity and Efficiency Through the Systematic Deployment of LLMs
MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/fa7aEJ41 Within Chase (UK) we have been looking to increase the productivity and efficiency of software engineering through the systematic deployment of LLMs and ML based tools and processes. In doing so, we have learnt a lot about the practical deployment of such tools and their usage. This is a subject that is not often covered within the literature, and we hope to present key learnings from the industrial use of these technologies to achieve real impact.
16:45 Day 2 Closing Remarks
Attendees
| Name | Affiliation |
|---|---|
| Ahmed Zaki | Imperial College London, UK |
| Aldeida Aleti | Monash University, Australia |
| Asif Tamuri | University College London, UK |
| Baptiste Rozière | MistralAI, France |
| Carol Hanna | University College London, UK |
| Shrimoyee Chaki | University College London, UK |
| Dave Williams | University College London, UK |
| David Clark | University College London, UK |
| Davide Yi Xian Hu | Politecnico di Milano, Italy |
| DongGyun Han | Royal Holloway, University of London, UK |
| Earl Barr | University College London, UK |
| Enrique Alba | University of Malaga, Spain |
| Facundo Molina | IMDEA Software Institute, Spain |
| Federica Sarro | University College London, UK |
| German Anorve Pons | Durham University, UK |
| Giordano d'Aloisio | University of L'Aquila, Italy |
| Giovanni Pinna | University of Trieste, Italy |
| Giuseppe Destefanis | Brunel University of London, UK |
| Gregory Gay | Chalmers University of Technology and University of Gothenburg, Sweden |
| Gunel Jahangirova | King's College London, UK |
| Haoxiang Jia | Peking University, China |
| He Ye | University College London, UK |
| Illaria Pia la Torre | University College London, UK |
| James Hetherington | University College London, UK |
| Jerffeson Teixeira de Souza | State University of Ceara, Brazil |
| Jie Zhang | King's College London, UK |
| Joost Noppen | British Telecom, UK |
| Jordi Armengol Estape | Meta, UK |
| José Miguel Rojas | University of Sheffield, UK |
| Justyna Petke | University College London, UK |
| Lorenzo Cavallaro | University College London, UK |
| Mark Harman | Meta/University College London, UK |
| Martin Shepperd | Brunel University of London, UK |
| Matias Martinez | Universitat Politècnica de Catalunya, Spain |
| Matthew Hague | Royal Holloway, University of London, UK |
| Michael Konstantinou | University of Luxembourg, Luxembourg |
| Mike Papadakis | University of Luxembourg, Luxembourg |
| Miltos Allamanis | Google DeepMind, UK |
| Myra Cohen | Iowa State University, US |
| Nelly Bencomo | Durham University, UK |
| Nick Louloudakis | University of Edinburgh, UK |
| Paul Baker | JP Morgan, UK |
| Peter O'Hearn | Meta/University College London, UK |
| Phil McMinn | University of Sheffield, UK |
| Qunying Song | University College London, UK |
| Satish Chandra | Google, US |
| Serkan Kirbas | Bloomberg LP, UK |
| Shifat Sahariar Bhuiyan | Università della Svizzera italiana, Switzerland |
| Thanatad Songpetchmongkol | University College London, UK |
| Bill Langdon | University College London, UK |
| Wesley Xu | University College London, UK |
| Yixin Bian | University College London, UK |
| Yongcheng Huang | Delft University of Technology, Netherlands |
| Yutian Tang | University of Glasgow, UK |
| Zakaria Senousy | University College London, UK |
| Zhang Lyuye | Nanyang Technological University, China |
| Zhou Yang | Singapore Management University/University of Alberta, Canada |
Close
