The 63rd CREST Open Workshop - Genetic Improvement and Software Specialisation
27 March 2023–28 March 2023, 10:00 am–5:00 pm
- Sold out
Dr Justyna Petke, Prof Federica Sarro, Prof William Langdon, Dr Giovani Guizzo, James Callan, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
Room 103Engineering Front BuildingTorrington PlaceLondonWC1E 7JE
The topic of this workshop is Genetic Improvement and Software Specialisation. GI uses automated search in order to improve existing software. Aside from bug fixing, GI has been used to optimise software properties such as runtime, memory and energy consumption, and specialising to particular application domains. In light of GI usage to improve software performance through software specialisation, we invite experts in similar areas, e.g., software performance, compiler optimisation, parameter tuning, optimisation for particular application domains. The goal of this workshop is to reflect on the progress that the research community has made over the last years, share experience in research and deployment, and identify key challenges that need to be addressed in future research.
Location: Engineering Front Building, Room 103
Day 1 - 27th March 2023
10:30 Welcome and Introductions
11:00 Prof. Myra Cohen, Iowa State University
- Keeping Secrets: A Journey in Multi-Objective Genetic Improvement
Information leaks in software can unintentionally reveal private data, yet they are hard to detect and fix. While there are some static approaches to find leaks, they require specialist knowledge and may not guide the developer towards a successful patch. At the same time dynamic detection and repair remains elusive. In this talk I discuss our journey to design a genetic improvement approach for reducing information leakage. Our solution, LeakReducer, first leverages hyper-testing to find and quantify unwanted flow of information and then repairs the leakage via genetic improvement. Along the way, we learned that (a) traditional security testing approaches like fuzzing may be unsuited for finding leakage, and (b) program correctness without context is subjective. In fact, I argue that genetic improvement is fundamentally a multi-objective endeavor where it may not be possible to satisfy both functional and non-functional objectives as specified. This leads to the ultimate question of what it means for a program to be correct in the context of genetic improvement.
11:30 Prof. Bill Langdon, University College London
- GI success stories and what the future holds
Software is and will remain the dominant technology of the third millennium. Without programs computer hardware is useless. The world is increasingly automated but despite many advances in software engineering tools, the production and maintenance of software remains labour intensive. The dream of total automated programming remains distant but increasingly search based optimisation is being applied to improve existing software.
GI was demonstrated to give a 70 fold speed up for a state-of-the-art bioinfomatics tool (Bowtie2) for a particular task. A GPU based tool, BarraCUDA, was the first to accept GI changes into production and the automatically optimised code changes have been down loaded many thousands of times. Another state-of-the-art tool, RNAfold, has been evolved in several ways, including speeding code using parallel hardware and tuning data parameters to give more accurate answers. Again these GI changes have been accepted into the standard release and download many thousands of times, including by people generating better RNA probes to detect COVID-19.
Software is an engineering material. It is plastic and robust. It can be reformed and re-used in many ways. Sometimes things go wrong but in many cases it can be and is used and delivers economic benefits despite containing many errors (bugs). We have started using information theory to give insights into why hand made software is in practice robust, the difficulty of testing, and the implications for test oracle placement. The average impact of errors and runtime perturbations falls exponentially with nesting _depth_, so even increasing the number of IID test cases, increases the visibility of disruptions only logarithmically.
Almost all software is written in high-level languages, and mostly genetic improvement is applied directly to human written program source code. However researchers have shown that Java byte code, assembly code and even binary machine code can also be automatically evolved. Recent experiments (EuroGP 2023) have shown that GI can improve LLVM intermediate representation (IR) even exceeding compiler optimisations, for two industrial open source programs from Google (OLC) and Uber (H3).
13:00 Dr. Markus Wagner, Monash University
- CryptOpt and Socialz — search tools for specialised assembly code and for diverse community interaction
In this presentation, I will ever so briefly outline two projects: (1) CryptOpt (https://arxiv.org/abs/2211.10665) is the first compilation pipeline that specialises high-level cryptographic functional programs into assembly code significantly faster than what GCC or Clang produce, with mechanised proof (in Coq). We apply randomised search through the space of assembly programs, with repeated automatic benchmarking on target CPUs. The overall prototype is quite practical, e.g. producing new fastest-known implementations for the relatively new Intel i9 12G, of finite-field arithmetic for both Curve25519 (part of the TLS standard) and the Bitcoin elliptic curve secp256k1. (2) Socialz (https://arxiv.org/abs/2302.08664) aims to provide anyone with the capability to perform comprehensive social testing, thereby improving the reliability and security of online social networks used around the world. Socialz is a novel approach to social fuzz testing that (i) characterises real users of a social network, (ii) diversifies their interaction using evolutionary computation across multiple, non-trivial features, and (iii) collects performance data as these interactions are executed.
13:30 Dr. Oliver Krauss, University of Applied Sciences Upper Austria
- Pattern Mining and Genetic Improvement in Compilers and Interpreters
Source code can be improved through the process of genetic improvement, which involves creating numerous variants of the same software. By using pattern mining, we can identify recurring patterns in the code that are responsible for non-functional properties or bugs. In this talk, we'll explore ways to identify patterns in software variants and apply them in genetic improvement to optimize a software's runtime performance.
14:00 Dr. Luca Traini, University of L'Aquila
- Towards Effective Java Performance Evaluation: Are we there yet?
Performance evaluation is a crucial activity in modern Software Engineering (SE). Software development processes rely on performance evaluation to assess the impact of software revisions, and state-of-the art SE techniques, including Genetic Improvement (GI), incorporate performance evaluation as a key step of their methodologies. As a consequence, inadequate performance evaluation can hinder release velocity of software development processes, or inadvertently introduce performance issues in production. Similarly, the effectiveness of state-of-the-art SE techniques, such as GI, can be severely affected by a suboptimal performance evaluation. Unfortunately, conducting effective performance evaluation presents daunting challenges, especially when assessing software subject to just-in-time compilation, such as Java software. In this regard, our recent empirical study on Java performance evaluation exposes potential pitfalls and shortcomings in current practices and state-of-the-art techniques. Our findings demonstrate that such approaches often fall short in providing effective performance evaluation, leading to significant consequences, including prolonged execution times and inaccurate results. This is a joint work with Vittorio Cortellessa, Daniele Di Pompeo and Michele Tucci. Open-Access article: https://doi.org/10.1007/s10664-022-10247-x
14:30 Tea/coffee break
15:00 Dr. Sandy Brownlee, University of Stirling
- Lost weekends with Gin: recent updates and results for the Java GI toolbox
Gin is a Java-based toolbox for experimentation in genetic improvement of software, first released in 2017. This presentation will share some recent developments with Gin designed to keep it up to date with modern Java. We will also cover some preliminary results exploring the search space associated with several different approaches to modifying code.
15:30 Dr. Aymeric Blot, Université du Littoral Côte d'Opale
- Magpie: Machine Automated General Performance Improvement via Evolution of Software
Magpie is a powerful tool for automated software improvement that provides a unified framework for exploring the space of possible software improvements. With its support for program transformations, parameter tuning, and compiler optimizations, Magpie enables researchers and software engineers to improve both the functional and non-functional properties of software. In this presentation, we will introduce Magpie and demonstrate its capabilities and effectiveness through examples and case studies. We will also discuss how Magpie can simplify the software improvement process by isolating the search process from the specific improvement technique. Overall, this presentation will provide an introduction to Magpie and demonstrate how it can be used to improve software in various domains. The framework is freely available online at https://github.com/bloa/magpie
Day 2 - 28th March 2023
10:30 Dr. Andrea Rendl, Satalia
- Challenges in Developing Optimisation Algorithms for Industry
Optimisation algorithms calculate the best decisions to optimise a set of objectives, such as minimising costs and maximising customer/employee satisfaction. For example, an optimisation algorithm can calculate optimal tours for a delivery company. At Satalia, we create custom-built optimisation algorithms for clients in sectors such as transportation, logistics, workforce planning. However, building optimisation algorithms for an industrial client comes with several challenges. First, the algorithms must be very fast and scale well, since the problem to solve grows exponentially with size. Second, the algorithms must be easily adaptable, since company goals and constraints can change quickly, and the algorithm needs to reflect those changes. Third, the software itself must be easy to maintain, since typically many different experts need to work on the code over its life cycle. These three goals, speed, adaptability or maintainability, are often contradictory in itself, and it is a challenge to find a sweet spot where all are satisfied to build a sustainable, economically viable optimisation product.
In this talk we will outline the main challenges on some examples and discuss how we currently try to mitigate the biggest risks in developing custom-built optimisation algorithms for industry.
11:00 Dr. Serkan Kirbas, Bloomberg
- Automatic Program Repair in Industry: Findings from Bloomberg
- During this session, Serkan will share observations and findings from Automatic Program Repair (APR) work at Bloomberg. He will focus on the software engineers’ experience and practical aspects of getting automatically-generated code changes accepted and used in industry. Furthermore, he will discuss the results of qualitative research at Bloomberg, demonstrating the importance of the timing and the presentation of fixes.
11:30 Prof. Mark Harman, Meta/University College London
- Software Improvement Research Challenges: An Industrial Perspective
There have been rapid recent developments in automated software test design, repair and program improvement. For search based approaches to specialisation and improvement, advances in automated software testing go hand-in-hand with advances in improvements. Advances in artificial intelligence also have great potential impact to tackle software engineering automation problems, including specialisation and improvement. In this talk I will highlight open research problems and challenges from an industrial perspective. This perspective draws on experience at Meta Platforms, which has been actively involved in software engineering research and development for over a decade. There are many exciting opportunities for research to achieve the widest and deepest impact on software practice. With this talk, I want to engage with the scientific community, especially on problems in search based automated software improvement for performance optimisation. This talk is partly based on a forthcoming invited paper and keynote talk at the International Conference on Software Testing (ICST 2023).The ICST keynote talk will be given by Nadia Alshahwan and Mark Harman. The paper is by Nadia Alshahwan, Mark Harman and Alexandru Marginean. Thanks to the many Meta engineers, managers and leadership for their assistance, support and work on deploying automated software testing and improvement techniques.
13:00 Dr. Vesna Nowack, Imperial College London
- Human Factors in Automatically Generated Software
Recently we have seen a significant increase in the number of tools that generate software and help developers in programming. Adopting these tools could change software developers’ daily activities and transform their work practices. For example, Automatic Program Repair (APR) generates bug fixes by applying different techniques (like genetic improvement) and might reduce the manual effort of fixing bugs. To understand the benefits of APR, it is vital that we consider how software engineers feel about APR and the impact it may have on developers’ work.
In this talk, I will show our analysis of human factors in 260 articles in APR literature to understand how developers are considered in APR research. Over half of the reviewed articles were motivated by a problem faced by developers, but fewer than 7% of the reviewed articles included a human study. Our results suggest that software developers are often talked about in APR literature, but are rarely talked with.
To understand developers' general attitudes to APR or developers' current bug-fixing practices, we carried out a survey of 386 software developers. Our findings show that developers derive satisfaction and benefit from bug fixing and that they prefer being kept in the loop (for example, choosing between different fixes or validating fixes) as opposed to a fully automated process. This suggests that APR should consider the involvement of developers, as well as what information is presented to developers alongside fixes.
13:30 James Callan, University College London
- Improving the Non-Functional Properties of Android Apps with Genetic Improvement
Due to the limited hardware on which Android Applications are generally run, non-functional properties are particularly important for both developer and users. While Genetic Improvement (GI) has been shown to be able to improve non-functional properties in traditional desktop domains, it has only rarely been applied to Android apps. This talk will present both the successes and failures that we have had in applying GI in the Android domain, attempting to improve frame rate, responsiveness, execution time, memory consumption, and bandwidth usage of apps. The talk will also explore the challenges faced when attempting to apply GI to Android apps, compared with the the traditional domain, and the largest problems which are still faced when applying GI to Android.
14:00 Dr. Giovani Guizzo, University College London
- SE4AI: Using Regression Test Selection to speed-up GI
Although GI has been proven to be effective in improving functional and non-functional properties of software, it still demands a great deal of computational resources. The main reason for GI's elevated cost is the execution of (potentially thousands of) test cases to validate new patches. In this presentation, I will show how we used a classic SE technique to improve the execution time of GI by selecting only the subset of relevant test cases that can potentially reveal bugs in the patched code, i.e., using Regression Test Selection (RTS) techniques. This presentation shows a fine example of how SE can be applied to AI in order to solve problems, not only the other way around.
14:30 Tea/coffee break
15:00 Dr. Maria Kechagia, University College London
- Green AI: Do Deep Learning Frameworks Have Different Costs?
The use of Artificial Intelligence (AI), and more specifically of Deep Learning (DL), in modern software systems, is nowadays widespread and continues to grow. At the same time, its usage is energy demanding and contributes to the increased CO2 emissions, and has a great financial cost as well. Even though there are many studies that examine the capabilities of dl, only a few focus on its green aspects, such as energy consumption. This paper aims at raising awareness of the costs incurred when using different dl frameworks. To this end, we perform a thorough empirical study to measure and compare the energy consumption and runtime performance of six different dl models written in the two most popular dl frameworks, namely PyTorch and TensorFlow. We use a well-known benchmark of dl models, DeepLearningExamples, created by nvidia, to compare both the training and inference costs of dl. Finally, we manually investigate the functions of these frameworks that took most of the time to execute in our experiments. The results of our empirical study reveal that there is a statistically significant difference between the cost incurred by the two dl frameworks in 94% of the cases studied. While TensorFlow achieves significantly better energy and runtime performance than PyTorch, and with large effect sizes in 100% of the cases for the training phase, PyTorch instead exhibits significantly better energy and runtime performance than TensorFlow in the inference phase for 66% of the cases, always, with large effect sizes. Such a large difference in performance costs does not, however, seem to affect the accuracy of the models produced, as both frameworks achieve comparable scores under the same configurations. Our manual analysis, of the documentation and source code of the functions examined, reveals that such a difference in performance costs is under-documented, in these frameworks. This suggests that developers need to improve the documentation of their dl frameworks, the source code of the functions used in these frameworks, as well as to enhance existing dl algorithms.
15:30 Dr. Max Hort, Simula Research Laboratory
- Multi-objective search for gender-fair and semantically correct word embeddings
Fairness is a crucial non-functional requirement of modern software systems that rely on the use of Artificial Intelligence (AI) to make decisions regarding our daily lives in application domains such as justice, healthcare and education. In fact, these algorithms can exhibit unwanted discriminatory behaviours that create unfair outcomes when the software is used, such as giving privilege to one group of users over another (e.g., males vs. females). Mitigating algorithmic bias during the development life cycle of AI-enabled software is crucial given that any bias in these algorithms is inherited by the software systems using them. However, previous work has shown that mitigating bias can impact the performance of such systems. Therefore, we propose herein a novel use of soft computing for improving AI-enabled software fairness. Specifically, we exploit multi-objective search, as opposed to previous work optimising fairness only, to strike an optimal balance between reducing gender bias and improving semantic correctness of word embedding models, which are at the core of many AI-enabled systems.
|Dr Andrea Rendl, Satalia|
|Dr Aymeric Blot, Universite Du Littoral Côte D'Opale|
|Prof Bill Langdon, University College London|
|Carol Hanna, University College London|
|Dr David Clark, University College London|
|Dr David Kelly, King's College London|
|Dr Derek Jones, Knowledge Software|
|Dr DongGyun Han, Royal Holloway, University of London|
|Prof Federica Sarro, University College London|
|Fraser Garrow, Edinburgh Centre for Robotics|
|Prof George Magoulas, Birkbeck College, University of London|
|Dr Giovani Guizzo, University College London|
|Dr Gunel Jahangirova, King's College London|
|Dr Hector Menendez, King's College London|
|James Callan, University College London|
|Jeongju Sohn, University of Luxembourg|
|Dr Jie Zhang, King's College London|
|Dr Justyna Petke, University College London|
|Dr Kelly Androutsopoulos, Middlesex University|
|Prof Leon Moonen, Simula Research Laboratory|
|Dr Luca Traini, University of L'Aquila|
|Dr Maria Kechagia, University College London|
|Prof Mark Harman, Meta|
|Dr Markus Wagner, Monash University|
|Dr Max Hort, Simula Research Laboratory|
|Prof Myra Cohen, Iowa State University|
|Dr Oliver Krauss, University of Applied Sciences Upper Austria|
|Dr Sandy Brownlee, University of Stirling|
|Dr Sarah L. Thomson, University of Stirling|
|Prof Sarfraz Khurshid, University of Texas at Austin|
|Dr Sergey Mechtaev, University College London|
|Dr Serkan Kirbas, Bloomberg|
|Prof Tracy Hall, Lancaster University|
|Dr Vesna Nowack, Imperial College London|
|Dr Yue Jia, Meta|