XClose

Centre for Research on Evolution, Search and Testing (CREST)

Home
Menu

The 67th CREST Open Workshop on AI-powered Software Engineering

30 June 2025–01 July 2025, 10:00 am–5:00 pm

Event Information

Open to

All

Availability

Sold out

Organiser

Dr. He Ye, Prof. Justyna Petke, Prof. Federica Sarro, Carol Hanna, David Williams – CREST Centre, SSE Group, Department of Computer Science, UCL, UK

In today's rapidly evolving tech landscape, AI is revolutionising the way we design, develop, test, and maintain software. In this workshop on AI-powered Software Engineering, we will deep dive into cutting-edge techniques and tools that are redefining development workflows. Designed to spark calready sent an emaritical discussions and foster collaboration, this workshop invites participants to explore the transformative impact of AI on modern software engineering practices.

All talks at this workshop are by invitation only. Talks will be a maximum of 20 minutes long with plenty of time for questions and discussion. We also hope that the workshop will foster and promote collaboration, and there will be time set aside to support this.

Participants are expected to attend the whole event in person since the workshop is interactive and discursive. There is no registration fee, due to kind support of grant by Meta. Light lunches, will be included, along with the usual refreshments all at no charge.

Policy on Student Registrations

We welcome registrations from PhD students, where the student is pursuing a programme of research for which the COW will provide intellectual benefit and/or from whom the workshop and its other attendees will gain benefit. We do not normally expect to register students other than those on PhD level programmes of study. For example, those students taking a course at the equivalent of UK masters or bachelors level would not, ordinarily, be considered eligible to register for COW. However, we are willing to consider exceptional cases, where a masters or bachelors student has a clear contribution to make to the topic of the COW. In all cases, students must have the approval of their supervisor/advisor for their attendance at the COW and their consent to the terms of registration. This is why we ask that students seeking to register for a COW also supply the contact details of their supervisor.

Cancellation Fee

Please appreciate that numbers are limited and catering needs to be booked in advance, so registration followed by non-attendance will cause difficulties. For this reason, though the workshop is entirely free of charge, there will be a cancellation fee of £100 for those who register but subsequently fail to attend.


Schedule

Day 1 - 30th June 2025

10:00 Welcome & Introductions

10:45 Satish Chandra, Google

AI in Developer Productivity in Industry

More about this speaker.

11:15 Baptiste Rozière, MistralAI 

Code Assistants: From Completion to Agents

This presentation explores how large language models can be trained to power code assistants. We will discuss key applications such as in-IDE code completion and agentic capabilities. The attendees will get an overview of the use of LLMs for code assistants, and some insight into pre-training and post-training methodologies.

More about this speaker.

11:45 Group Photo

11:50 Lunch

13:20 Joost Noppen, British Telecom (BT) Digital

Beyond Code Prediction: Working on the Bigger Challenges in Software Engineering with AI

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/2jCEJ8JJ

Over the last few years, AI has arrived in the software engineering workplace, and for many developers it already has become a vital tool in their toolbox. What started out as more comprehensive code completion has quickly turned into conversational learning and problem solving and supportive, more autonomous function such as bug identification, proposes patches and raising merge requests. It therefore seems a suitable time to look forward to what is next for AI in the Software Engineering space. What are the new (or old) bottlenecks we still experience in the workplace, and what are the profound challenges we face to resolve them? In a whistlestop tour that will be a mix of technical, such as architecture and quality, and organisational challenges, such as team composition and conformance, in this talk I will examine the next generation of opportunities we can now see emerge in an attempt to kick off a long-term research agenda for AI in Software Engineering.

More about this speaker.

13:50 He Ye, University College London (UCL) 

Constructing a Unified Knowledge Graph from Codebases

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/1F1bb9Jc

 

More about this speaker.

14:20 Tea/Coffee Break

14:50 Mark Harman, Meta/University College London (UCL)

Mutation-Guided LLM-based Test Generation at Meta

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/H52fejcj

This talk will cover Meta’s work on the Automated Compliance Hardening (ACH) tool, which uses mutation testing to guide Assured LLM-based Software Engineering. ACH generates relatively few mutants (aka simulated faults), compared to traditional mutation testing. Instead, it focuses on generating currently undetected faults that are specific to an issue of concern. From these currently uncaught faults, ACH generates tests that can catch them, thereby `killing' the mutants and consequently hardening the platform against regressions. ACH also deploys an LLM-based equivalent mutant detection agent that achieves a precision of 0.79 and a recall of 0.47 (rising to 0.95 and 0.96 with simple pre-processing). ACH was used by Messenger and WhatsApp test-a-thons where engineers accepted 73% of its tests, judging 36% to relevant. The talk will review Assured LLMSE, LLM-based test generation and mutation testing work at Meta. Slides are based on FSE 2025 industry track talk, and on Mutation 2025, EuroSTAR 2025 and FSE 2025 keynotes.

More about this speaker.

15:20 Phil McMinn, University of Sheffield

AI for Test Suite "Health"

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/2GJ6IBBd

 

More about this speaker.

15:50 Breakout Session

17:00 Day 1 Closing Remarks

Day 2 - 1st July 2025

10:00 Pastries

10:30 Jie Zhang, King's College London (KCL)

Benchmarking and Improving the Efficiency of Automatically Generate Code

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/dEIj72cJ

 

Large Language Models (LLMs) are increasingly becoming integral to modern software development workflows. While recent research has extensively evaluated the correctness of code generated by these models, the efficiency of the generated code has received significantly less attention. Yet, code efficiency is critical for building scalable, high-performance, and sustainable systems, particularly in resource-constrained environments such as mobile devices and embedded systems. In this talk, I will introduce our recent efforts to benchmark the efficiency of code generated by LLMs and explore techniques to improve it. These include prompt engineering strategies and fine-tuning methods designed to guide LLMs toward producing more efficient code without compromising correctness.

More about this speaker.

11:00 Miltos Allamanis, Google Deepmind

Execution-driven Feedback for Disproving Program Properties with LLMs

AI-powered software tools have made remarkable progress, but their outputs often require painstaking validation by humans or partial oracles, such as unit tests, which provide limited checks. To address these challenges and enhance AI’s autonomy and output quality, we explore how LLMs can ensure that some program properties hold.

More about this speaker.

11:30 Yutian Tang, University of Glasgow

LLM-CompDroid: Repairing Configuration Compatibility Bugs in Android Apps with LLMs

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/HDIA6GBa

 

XML configurations are essential to Android UI development but often introduce compatibility bugs across different API levels, leading to inconsistent visuals and system crashes. In this talk, I will present our investigation into using large language models (LLMs) to detect and repair these bugs. While LLMs have limitations, we found they excel at resolving complex issues where traditional tools struggle. Building on this, I will introduce LLM-CompDroid, our hybrid framework that combines LLMs with existing techniques. LLM-CompDroid-GPT-3.5 and GPT-4 outperform the state-of-the-art tool ConfFix by over 9.8% in key accuracy metrics, offering a promising step toward more reliable Android applications.

More about this speaker.

12:00 Lunch

13:00 Gunel Jahangirova, King's College London (KCL)

Comparative Analysis of Carbon Footprint in Manual vs. LLM-Assisted Code Development

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/5hcF2FH6

 

Large Language Models (LLM) have significantly transformed various domains, including software development. These models assist programmers in generating code, potentially increasing productivity and efficiency. However, the environmental impact of utilising these AI models is substantial, given their high energy consumption during both training and inference stages. This research aims to compare the energy consumption of manual software development versus an LLM-assisted approach, using Codeforces as a simulation platform for software development. The goal is to quantify the environmental impact and propose strategies for minimising the carbon footprint of using LLM in software development. Our results show that the LLM-assisted code generation leads on average to 32.72 higher carbon footprint than the manual one. Moreover, there is a significant correlation between task complexity and the difference in the carbon footprint of the two approaches.

More about this speaker.

13:30 Carol Hanna, University College London (UCL)

AI-Powered Advances in Genetic Improvement

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/a55dj2hi

 

Genetic Improvement (GI) has long leveraged search-based techniques to automatically optimize software by evolving existing code. Recent breakthroughs in artificial intelligence particularly in machine learning, natural language processing, and code generation have opened new avenues for enhancing the effectiveness, efficiency, and usability of GI.
 
In this talk, I present our recent research that integrates AI into multiple stages of the GI pipeline. We explore the use of large language models as mutation operators, including both code replacement and context-aware masking to generate high quality edits. Additionally, we investigate reinforcement learning approaches for dynamic operator selection during search, replacing static heuristics with adaptive, reward-driven policies for both functional and non-functional software properties. We also demonstrate how AI models can complement traditional GI in helping to bridge the gap between automation and human understanding.
 
This growing synergy between AI and GI marks a significant step forward in the field. By blending the principled structure of classic GI with the generative and adaptive power of modern AI, we can begin to realize tools that are both more powerful and more accessible to developers.

More about this speaker.

14:00 Breakout Session

15:15 Tea/Coffee Break

15:45 Haoxiang Jia, Peking University

Automated Repair of Ambiguous Natural Language Requirements

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/bBDcf6h5

 

In an era where large language models are increasingly relied upon for code generation, a critical yet underexplored challenge threatens their effectiveness: the ambiguity inherent in natural language requirements. When developers provide vague or multi-interpretable specifications to AI coding assistants, the resulting code often fails to meet intended functionality. While existing methods focus on improving models' reasoning capabilities or asking users for clarification, these approaches often generate lengthy reasoning analysis or irrelevant queries that burden developers without addressing the underlying issue. In this talk, I will introduce our novel approach that automatically repairs requirements to eliminate ambiguity at its source, demonstrating how minimal, targeted modifications to natural language specifications can improve code generation quality across different models without requiring human intervention.

More about this speaker.

16:15 Paul Baker & Rebecca Moussa, Chase (J.P. Morgan)

Improving Software Engineering Productivity and Efficiency Through the Systematic Deployment of LLMs

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/fa7aEJ41

 

Within Chase (UK) we have been looking to increase the productivity and efficiency of software engineering through the systematic deployment of LLMs and ML based tools and processes. In doing so, we have learnt a lot about the practical deployment of such tools and their usage. This is a subject that is not often covered within the literature, and we hope to present key learnings from the industrial use of these technologies to achieve real impact.

More about Paul Baker.

More about Rebecca Moussa.

16:45 Day 2 Closing Remarks

Attendees

NameAffiliation
Ahmed ZakiImperial College London, UK
Aldeida AletiMonash University, Australia
Asif TamuriUniversity College London, UK
Baptiste RozièreMistralAI, France
Carol HannaUniversity College London, UK
Shrimoyee ChakiUniversity College London, UK
Dave WilliamsUniversity College London, UK
David ClarkUniversity College London, UK
Davide Yi Xian HuPolitecnico di Milano, Italy
DongGyun HanRoyal Holloway, University of London, UK
Earl BarrUniversity College London, UK
Enrique AlbaUniversity of Malaga, Spain
Facundo MolinaIMDEA Software Institute, Spain
Federica SarroUniversity College London, UK
German Anorve PonsDurham University, UK
Giordano d'AloisioUniversity of L'Aquila, Italy
Giovanni PinnaUniversity of Trieste, Italy
Giuseppe DestefanisBrunel University of London, UK
Gregory GayChalmers University of Technology and University of Gothenburg, Sweden
Gunel JahangirovaKing's College London, UK
Haoxiang JiaPeking University, China
He YeUniversity College London, UK
Illaria Pia la TorreUniversity College London, UK
James HetheringtonUniversity College London, UK
Jerffeson Teixeira de SouzaState University of Ceara, Brazil
Jie ZhangKing's College London, UK
Joost NoppenBritish Telecom, UK
Jordi Armengol EstapeMeta, UK
José Miguel RojasUniversity of Sheffield, UK
Justyna PetkeUniversity College London, UK
Lorenzo CavallaroUniversity College London, UK
Mark HarmanMeta/University College London, UK
Martin ShepperdBrunel University of London, UK
Matias MartinezUniversitat Politècnica de Catalunya, Spain
Matthew HagueRoyal Holloway, University of London, UK
Michael KonstantinouUniversity of Luxembourg, Luxembourg
Mike PapadakisUniversity of Luxembourg, Luxembourg
Miltos AllamanisGoogle DeepMind, UK
Myra CohenIowa State University, US
Nelly BencomoDurham University, UK
Nick LouloudakisUniversity of Edinburgh, UK
Paul BakerJP Morgan, UK
Peter O'HearnMeta/University College London, UK
Phil McMinnUniversity of Sheffield, UK
Qunying SongUniversity College London, UK
Satish ChandraGoogle, US
Serkan KirbasBloomberg LP, UK
Shifat Sahariar BhuiyanUniversità della Svizzera italiana, Switzerland
Thanatad SongpetchmongkolUniversity College London, UK
Bill LangdonUniversity College London, UK
Wesley XuUniversity College London, UK
Yixin BianUniversity College London, UK
Yongcheng HuangDelft University of Technology, Netherlands
Yutian TangUniversity of Glasgow, UK
Zakaria SenousyUniversity College London, UK
Zhang LyuyeNanyang Technological University, China
Zhou YangSingapore Management University/University of Alberta, Canada