The 65th CREST Open Workshop - Automated Program Repair and Genetic Improvement

18 September 2023–19 September 2023, 10:00 am–5:00 pm

Event Information

Open to: All
Availability: Sold out
Organiser: Dr Justyna Petke, Dr Sergey Mechtaev, Dr Maria Kechagia, Nikhil Parasaram, CREST Centre, SSE Group, Department of Computer Science, UCL, UK

crest-admin@cs.ucl.ac.uk

Program repair has the potential to reduce the significant manual effort that developers devote to finding and fixing software bugs. Recent years have witnessed a dramatic growth of research in program repair. Researchers have proposed a large number of techniques aimed to address fundamental challenges of program repair such as scalability and test-overfitting, and have successfully deployed program repair in industry. One of the techniques that has been used in the repair field has been genetic improvement. GI uses automated search in order to improve existing software. Aside from bug fixing, GI has been used to optimise other software properties, such as runtime, memory and energy consumption. It has also been used for other kinds of improvement such as specialising and porting. The goal of this workshop is to reflect on the progress that the research community has made over the last years in those two closely related fields, share experience in research and deployment, and identify key challenges that need to be addressed in future research.

All talks at this workshop are by invitation only. Talks will be a maximum of 20 minutes long with plenty of time for questions and discussion. We also hope that the workshop will foster and promote collaboration, and there will be time set aside to support this.

Participants are expected to attend the whole event in person since the workshop is interactive and discursive. There is no registration fee, due to kind support of UKRI EPSRC's fellowship grant on "Automated Software Specialisation Using Genetic Improvement" Light lunches, will be included, along with the usual refreshments all at no charge.

The registration of interest is closed.

Policy on Student Registrations

We welcome registrations from PhD students, where the student is pursuing a programme of research for which the COW will provide intellectual benefit and/or from whom the workshop and its other attendees will gain benefit. We do not normally expect to register students other than those on PhD level programmes of study. For example, those students taking a course at the equivalent of UK masters or bachelors level would not, ordinarily, be considered eligible to register for COW. However, we are willing to consider exceptional cases, where a masters or bachelors student has a clear contribution to make to the topic of the COW. In all cases, students must have the approval of their supervisor/advisor for their attendance at the COW and their consent to the terms of registration. This is why we ask that students seeking to register for a COW also supply the contact details of their supervisor.

Cancellation Fee

Please appreciate that numbers are limited and catering needs to be booked in advance, so registration followed by non-attendance will cause difficulties. For this reason, though the workshop is entirely free of charge, there will be a cancellation fee of £100 for those who register but subsequently fail to attend.

Location: 66-72 Gower Street, Room G.01

Schedule

Day 1 - 18th September 2023

10:00 Welcome and Introductions

10:30 Dr. Saemundur Haraldsson, University of Stirling

Will programming become an obsolete skill?

Every significant technological progress that increases productivity and/or efficiency in any industry has made at least a few people worry about their jobs. Our work on automatic software improvement, including APR and GI, is no exception.

In this talk I will discuss the evolution of software improvement technologies and their integration and acceptance in industry through my eyes as a software developer, an academic, and an educator.

I will pose some questions of academic interest as well as for industry and the education of the workforce by reflecting on a limited historical point of view to speculate on what lies ahead with the recent rapid deployment of powerful generative-AI tools.

How might our work shape the workforce of the future?

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/gf219aHc

More About The Speaker

11:00 Nikhil Parasaram, University College London

Rete: Learning Namespace Representation for Program Repair

A key challenge of automated program repair is finding correct patches in the vast search space of candidate patches. Real-world programs define large namespaces of variables that considerably contributes to the search space explosion. Existing program repair approaches neglect information about the program namespace, which makes them inefficient and increases the chance of test-overfitting.

We propose Rete, a new program repair technique, that learns project-independent information about program namespace and uses it to navigate the search space of patches. Rete uses a neural network to extract project-independent information about variable CDU chains, def-use chains augmented with control flow. Then, it ranks patches by jointly ranking variables and the patch templates into which the variables are inserted.

We evaluated Rete on 142 bugs extracted from two datasets, ManyBugs and BugsInPy. Our experiments demonstrate that Rete generates six new correct patches that fix bugs that previous tools did not repair, an improvement of 31% and 59% over the existing state of the art.

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/CDIBGEhG

More About The Speaker

11:30 Dr. Jie Zhang, Kings College London

Quantifying the Threat to Empirical Software Engineering Validity from LLM Non-determinism: A Comprehensive Study of ChatGPT

There has been a recent explosion of research on Large Language Models (LLMs) for software engineering tasks, in particular code generation. However, results from LLMs can be highly unstable; nondeterministically returning very different codes for the same prompt. Non-determinism is a potential menace to scientific conclusion validity. When non-determinism is high, scientific conclusions simply cannot be relied upon unless researchers change their behaviour to control for it in their empirical analyses. In this talk, I will introduce our empirical study which demonstrates that non-determinism is, indeed, high, thereby underlining the need for this behavioural change. We choose to study ChatGPT because it is already highly prevalent in the code generation research literature. We report results from a study of 829 code generation problems from three code generation benchmarks (i.e., CodeContests, APPS, and HumanEval). Our results reveal high degrees of non-determinism: the ratio of problems with zero equal test output among code candidates is 72.73%, 60.40%, and 65.85% for CodeContests, APPS, and HumanEval, respectively. In addition, we find that setting the temperature to 0 does not guarantee determinism in code generation, although it indeed brings less non-determinism than the default configuration (temperature=1). These results confirm that there is, currently, a significant threat to scientific conclusion validity. In order to put LLM-based research on firmer scientific foundations, researchers need to take into account non-determinism in drawing their conclusion.

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/11AI61di

More About The Speaker

12:00 Lunch

13:00 Dr. Serkan Kirbas, Bloomberg

Automatic Program Repair in Bloomberg

During this session, Serkan will share findings and observations from Automatic Program Repair (APR) work at Bloomberg. He will focus on the software engineers’ experience and practical aspects of getting automatically-generated code changes accepted and used in industry. Furthermore, he will discuss the results of qualitative research at Bloomberg, demonstrating the importance of the timing and the presentation of fixes.

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/e5dbAHeA

More About The Speaker

13:30 Dr. Vesna Nowack, Imperial College London

14:00 Sebastian Schweikl, University of Passau

14:30 Refreshments

15:00 Dr. Gunel Jahangirova, King's College London

Repairing DNN Architecture: Are We There Yet?

As Deep Neural Networks (DNNs) are rapidly being adopted within large software systems, software developers are increasingly required to design, train, and deploy such models into the systems they develop. Consequently, testing and improving the robustness of these models have received a lot of attention lately. However, relatively little effort has been made to address the difficulties developers experience when designing and training such models: if the evaluation of a model shows poor performance after the initial training, what should the developer change? We survey and evaluate existing state-of-the-art techniques that can be used to repair model performance, using a benchmark of both real-world mistakes developers made while designing DNN models and artificial faulty models generated by mutating the model code. The empirical evaluation shows that random baseline
is comparable with or sometimes outperforms existing state-of-the-art techniques. However, for larger and more complicated models, all repair techniques fail to find fixes. Our findings call for further research to develop more sophisticated techniques for Deep Learning repair.

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/b72b75EI

More About The Speaker

15:30 Dr. David Clark, University College London

Causing the Repair of Hyperproperties

Hyperproperties are currently attracting interest from the programming languages research community both in terms of formal, semantics based analysis as well as testing and dynamic analysis. People I work with have been developing methods to automatically detect hyperproperty violations then to measure the extent of interference using information theory and finally to apply GI in order to repair the violations.

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/G6Jei12a

More About The Speaker

16:00 Discussion/Breakout

17:00 Close

Day 2 - 19th September 2023

10:00 Refreshments

10:30 Dr. Claire Le Goues, Carnegie Mellon University

Automatic repair of client code in light of evolving APIs

Modern software engineering revolves around the use of third-party libraries and APIs. Changing or upgrading libraries that a client project depends on is tedious and error-prone, to the point that many developers simply don't. In this talk, I will discuss our recent work on inferring and applying code transformations to automatically update client code in light of evolving libraries. These techniques rely on a careful interplay between powerful advances in ML/NLP models to manage the search space, and more traditional symbolic approaches to support transformation correctness and quality.

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/abh9GfiD

More About The Speaker

11:00 Prof. Tegawendé F. Bissyandé, University of Luxembourg

11:30 Dr. Matias Martinez, Universitat Politècnica de Catalunya-Barcelona Tech

Energy Consumption of Automated Program Repair: From Search-based to Large Language Model-based Repair Tools

Automated program repair (APR) aims to automatize the process of repairing software bugs in order to reduce the cost of
maintaining software programs. Moreover, the success (given by the accuracy metric) of APR approaches has been increasing in recent
years. However, no previous work has considered the energy impact of repairing bugs automatically using APR. The field of green software
research aims to measure the energy consumption required to develop, maintain and use software products. This paper combines, for the first time, the APR and Green software research fields. We have as main goal to define the foundation for measuring the energy consumption of the APR activity. We measure the energy consumption of ten traditional program repair tools for Java and ten fine-tuned Large-Language Models (LLM) on source code trying to repair real bugs from Defects4J, a set of real buggy programs. The initial results from this experiment show the existing trade-off between energy consumption and the ability to correctly repair bugs: Some APR tools are capable of achieving higher accuracy by spending less energy than other tools.

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/E7giiFd6

More About The Speaker

12:00 Lunch

13:00 Dr. Michele Tufano, Microsoft

InferFix: End-to-End Program Repair with LLMs

Software development life cycle is profoundly influenced by bugs: their introduction, identification, and eventual resolution account for a significant portion of software cost. This has motivated software engineering researchers and practitioners to propose different approaches for automating the identification and repair of software defects. Large language models have been adapted to the program repair task through few-shot demonstration learning and instruction prompting, treating this as an infilling task. However, these models have only focused on learning general bug-fixing patterns for uncategorized bugs mined from public repositories. In this paper, we propose InferFix: a transformer-based program repair framework paired with a state-of-the-art static analyzer to fix critical security and performance bugs. InferFix combines a Retriever -- transformer encoder model pretrained via contrastive learning objective, which aims at searching for semantically equivalent bugs and corresponding fixes; and a Generator -- a large language model (Codex Cushman) finetuned on supervised bug-fix data with prompts augmented via bug type annotations and semantically similar fixes retrieved from an external non-parametric memory. To train and evaluate our approach, we curated InferredBugs, a novel, metadata-rich dataset of bugs extracted by executing the Infer static analyzer on the change histories of thousands of Java and C# repositories. Our evaluation demonstrates that InferFix outperforms strong LLM baselines, with a top-1 accuracy of 65.6% for generating fixes in C# and 76.8% in Java. We discuss the deployment of InferFix alongside Infer at Microsoft which offers an end-to-end solution for detection, classification, and localization of bugs, as well as fixing and validation of candidate patches, integrated in the continuous integration pipeline to automate the software development workflow.

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/fIh6IgFD

More About The Speaker

13:30 Prof. Leon Moonen, Simula Research Laboratory

SEIDR - Fully Autonomous Programming with Large Language Models

Current approaches to program synthesis with Large Language Models (LLMs) exhibit a "near miss syndrome": they tend to generate programs that semantically resemble the correct answer (as measured by text similarity metrics or human evaluation), but achieve a low or even zero accuracy as measured by unit tests due to small imperfections, such as the wrong input or output format.

To address this challenge, we have recently proposed SEIDR (Synthesize, Execute, Instruct, Debug and Rank), an approach in which a draft solution is generated first, followed by an iterative program repair loop addressing the failed tests. To effectively apply this approach to instruction-driven LLMs, one needs to determine which prompts perform best as instructions for LLMs, and strike a balance between repairing unsuccessful programs and replacing them with newly generated ones.

We explore these trade-offs empirically, comparing replace-focused, repair-focused, and hybrid debug strategies, as well as different template-based and model-based prompt-generation techniques. We use OpenAI Codex as the LLM and Program Synthesis Benchmark 2 as a database of problem descriptions and tests for evaluation. The resulting framework outperforms both conventional usage of Codex without the repair phase and traditional genetic programming approaches.

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/c8bg6h48

More About The Speaker

14:00 Prof. Lin Tan, Purdue University

14:30 Refreshments

15:00 Prof. Mark Harman, Meta/University College London

Large Language Models for Software Engineering: Survey and Open Problems

This talk review the forthcoming survey of the emerging area of

Large Language Models (LLMs) for Software Engineering (SE), which will appear in the proceedings of the ICSE 2023 Future of Software Engineering track.

It sets out open research challenges for the application of LLMs to technical problems faced by software engineers.

LLMs' emergent properties bring novelty and creativity with applications right across the spectrum of Software Engineering activities including coding, design, requirements, repair, refactoring, performance improvement, documentation and analytics.

However, these very same emergent properties also pose significant technical challenges; we need techniques that can reliably weed out incorrect solutions, such as hallucinations.

Our survey reveals the pivotal role hybrid techniques (traditional SE plus LLMs) have to play in the development and deployment of reliable, efficient and effective LLM-based SE.

The talk will focus on the way in which LLM-based code generation naturally fits within an overall genetic improvement framework.

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/49IGB22H

More About The Speaker

15:30 Dr. Lingming Zhang, University of Illinois Urbana-Champaign

Automated Program Repair in the Era of Large Language Models

Automated Program Repair (APR) aims to help developers automatically patch software systems. Existing traditional and learning-based APR techniques typically rely on high-quality bug-fixing datasets to craft repair templates or directly predict potential patches based on Neural Machine Translation (NMT). Meanwhile, such bug-fixing datasets can be extremely hard to construct, limited in size, and may also contain various irrelevant/noisy commits or changes. As a result, it is hard for existing APR techniques to fix complicated bugs unseen or hard to generalize from such bug-fixing datasets.

In this talk, I will talk about AlphaRepair, the first approach to reformulating the APR problem as a cloze (or infilling) task based on the recent Large Language Models (LLMs) trained on billions of text/code tokens. Our main insight is that instead of modeling what a repair edit should look like (i.e., a NMT task), we can directly use LLMs to predict what the correct code is based on the surrounding contexts of the buggy code (i.e., a cloze task). Such cloze-style APR can completely free APR from historical bug fixes and leverage the massive pre-training corpora of LLMs for multi-lingual APR. Our AlphaRepair study also demonstrates that LLMs can outperform existing APR techniques studied for over a decade. Lastly, I will also briefly talk about our other work on LLM-based APR, including ChatRepair, a conversational APR approach based on the very recent ChatGPT model.

MediaCentral Widget Placeholderhttps://mediacentral.ucl.ac.uk/Player/c3e1IdFD

More About The Speaker

16:00 Discussion/Breakout

17:00 Close

Attendees

Dr Alexandros Tasos, Bayforest Technologies

Prof Bill Langdon, University College London

Dr Sandy Brownlee, University of Stirling

Dr Ezekiel Soremekun, Royal Holloway, University of London

Prof Leon Moonen, Simula Research Laboratory

Dr Saemundur Haraldsson, University of Stirling

Dr David Kelly, King’s College London

Dr Zhenpeng Chen, University College London

Daniel Blackwell, University College London

Dr Derek Jones, Knowledge Software

Dr. Mike Papadakis, University of Luxembourg

Elisa Braconaro, Università degli Studi di Padova

Dr. Eleonora Losiouk, Università degli Studi di Padova

Carol Hanna, University College London

Ilaria Pia la Torre, University College London

Prof Darrell Whitley, Colorado State University

Prof Gabriela Ochoa, University of Stirling

Dr DongGyun Han, Royal Holloway, University of London

Dr Jie Zhang, King's College London

Dr Justyna Petke, University College London

Prof Federica Sarro, University College London

Dr Maria Kechagia, University College London

Prof Mark Harman, Meta and University College London

Dr Michele Tufano, Microsoft

Dr Sergey Mechtaev, University College London

Dr Serkan Kirbas, Bloomberg

Dr Vesna Nowack, Imperial College London

Sebastian Schweikl, University of Passau

Dr Gunel Jahangirova, King's College London

Dr David Clark, University College London

Dr Claire Le Goues, Carnegie Mellon University

Prof Tegawendé F. Bissyandé, University of Luxembourg

Dr Matias Martinez, Universitat Politècnica de Catalunya-Barcelona Tech

Prof Lin Tan, Purdue University

Nikhil Parasaram, University College London

Dr Lingming Zhang, University of Illinois Urbana-Champaign

Andre Silva, KTH

Han Fu, KTH and Ericsson

Shuyin Ouyang, King's College London

Yonghao Wu, King's College London

Tweets by CRESTCOW

GDPR notice

CREST statement on the use of personal data in our research

The 65th CREST Open Workshop - Automated Program Repair and Genetic Improvement

Event Information

Open to

Availability

Organiser

Policy on Student Registrations

Cancellation Fee

Schedule

Day 1 - 18th September 2023

10:00 Welcome and Introductions

10:30 Dr. Saemundur Haraldsson, University of Stirling

11:00 Nikhil Parasaram, University College London

11:30 Dr. Jie Zhang, Kings College London

12:00 Lunch

13:00 Dr. Serkan Kirbas, Bloomberg

13:30 Dr. Vesna Nowack, Imperial College London

14:00 Sebastian Schweikl, University of Passau

14:30 Refreshments

15:00 Dr. Gunel Jahangirova, King's College London

15:30 Dr. David Clark, University College London

16:00 Discussion/Breakout

17:00 Close

Day 2 - 19th September 2023

10:00 Refreshments

10:30 Dr. Claire Le Goues, Carnegie Mellon University

11:00 Prof. Tegawendé F. Bissyandé, University of Luxembourg

11:30 Dr. Matias Martinez, Universitat Politècnica de Catalunya-Barcelona Tech

12:00 Lunch

13:00 Dr. Michele Tufano, Microsoft

13:30 Prof. Leon Moonen, Simula Research Laboratory

14:00 Prof. Lin Tan, Purdue University

14:30 Refreshments

15:00 Prof. Mark Harman, Meta/University College London

15:30 Dr. Lingming Zhang, University of Illinois Urbana-Champaign

16:00 Discussion/Breakout

17:00 Close

Attendees

GDPR notice