Notes from the Wired

AutoCodeRover: Autonomous Program Improvement

Published: April 17, 2024

Paper Title: AutoCodeRover: Autonomous Program Improvement
Link to Paper: https://arxiv.org/abs/2404.05427
Date: 8. April 2024
Paper Type: LLM, Code-generation
Short Abstract:
LLMs have already been extensively used in easing the workload of developers. In this paper the authors proposed AutoCodeRover a framework to automatically solve GitHub issues. In this framework LLMs are combined with code search algorithm’s. They use Abstract syntax trees instead of rare files.

1. Introduction

Automating software developing has for a long time been a vision of engineers. Specifically, the authors of the paper, focused on bug fixing and feature addition.

Their method works as follows:

2. Datasets

For evaluation they use the SWE-bench, which is benchmark that tries to test LLMs on real-life end-to-end software engineering tasks. The benchmark consist of ca. 2000 real-life tasks collected from 12 popular python projects. Each instance consist of a GitHub issue and a corresponding GitHub pull request, that resolves the issue.

SWE-bench-lite consist of 300 task instances sampled from SWE, is more lightweight for evaluation.

3. Motivating Example

The authors showcase the capabilities of AutoCodeRover with the help of a Django Issue, which is classified as “New feature”.

AutoCodeRover operates in two stages:

The first thing the model does, is collecting relevant code, by inferring relevant names and searching an abstract syntax tree e.g. “ModelChoiceField”. In step 1. it finds relevant classes and uses the tools search_class and search_method_in_class. In step 2 the signatures of the classes and implementation of methods are returned. In step 3 the agent receives more implementation of various methods. In step 4, a software patch is written.

4. Method

4.1 Overview

We have:

Stages:

LLM decides for itself which APIs to use, based on available information and iteratively changes the API calls.

In the patch generation, a separate LLM to craft a patch, the patch needs to be in a specified format. If the patch doesn’t work or the format is not correct, the agent goes into a retry-loop.

4.2 Context Retrieval APIs

In typical software projects, issues often have “hints” for which part of the codebase is relevant:

These hints can then be used in combination with a API call to find relevant code:

API nameDescriptionOutput
search_class (cls)Search for class cls in the codebase.Signature of the searched class.
search_class_in_file (cls, f)Search for class cls in file f.Signature of the searched class.
search_method (m)Search for method m in the codebase.Implementation of the searched method.
search_method_in_class (m, cls)Search for method m in class cls.Implementation of the searched method.
search_method_in_file (m, f)Search for method m in file f.Implementation of the searched method.
search_code (c)Search for code snippet c in the codebase.Region of code surrounding the searched snippet.
search_code_in_file (c, f)Search for code snippet c in file f.Region of code surrounding the searched snippet

General iterative pipeline:

In each stratum, the LLM agent gets prompted to select a set of necessary API invocations, based on the current context.

5. Evaluation

6. Conclusion

AutoCodeRover is an interesting direction of using papers like toolformer and LLM agent on a new application.