CodeCureAgent is an autonomous LLM-based agent designed for automated static analysis warning repair.
It can classify and fix arbitrary SonarQube rule violations in Java code.
The repository is structured as follows:
- code_cure_agent: CodeCureAgent code, experiment setup and experiment output
- code_cure_agent/agent_core: Main parts of the CodeCureAgent implementation. The implemented tools are at: code_cure_agent/agent_core/commands
- code_cure_agent/agent_config_and_prompt_files: Used prompt files for the two sub-agents and agent config
- code_cure_agent/evaluation_results: Results of our evaluation on 1000 warnings. Contains markdown files with aggregated evaluation results, csv files with data on all warning runs, plots, and all log files created during the experiment run.
- code_cure_agent/evaluation_results/evaluation_outputs: This holds the log files of the full evaluation. It is split into multiple experiment batches.
The most interesting files in this log output are the files in the subfoldercode_cure_agent/evaluation_results/evaluation_outputs/experiment_X/run_summaries. These show for each warning run: details about the warning, classification and fix results including a diff of made changes for successful fixes. (Multi-File Fix Example: code_cure_agent/evaluation_results/evaluation_outputs/experiment_1/run_summaries/6_summary.diff)
- code_cure_agent/evaluation_results/evaluation_outputs: This holds the log files of the full evaluation. It is split into multiple experiment batches.
- code_cure_agent/experimental_setups: Contains the evaluation dataset and utility scripts for evaluating your own experiment
- code_cure_agent/experimental_setups/evaluation_dataset/evaluation_dataset_filled_up_to_1000_input_file.csv: This is the input file with the 1000 warnings used for running our evaluation.
- comparative_study: All files and results of the comparison to baselines Sorald and CORE.
Before you start using CodeCureAgent, ensure that your system meets the following requirements:
- Docker: Version 20.04 or higher. For installation instructions, see the Docker documentation.
- VS Code: VS Code provides an easy way to interact with CodeCureAgent using Dev Containers (see the instructions below).
- OpenAI Token and Credits:
- Create an account on the OpenAI website and purchase credits to use the API.
- Generate an API token on the same website.
- Disk Space:
- At least 40GB of available disk space on your machine. The code itself does not take 40GB. However, the dependencies might take up to 8GB, and files generated from running on different instances may use more. 40GB is a safe estimate.
- Internet Access: Required while running CodeCureAgent to connect to OpenAI's API.
-
Ensure you have the Dev Containers extension installed in VS Code. You can install it from the Visual Studio Code Marketplace.
-
Clone the CodeCureAgent repository:
git clone https://github.com/sola-st/CodeCureAgent.git
-
Open the repository folder in VS Code.
-
When prompted by VS Code to "Reopen in Container," click it. If not prompted, open the Command Palette (Ctrl+Shift+P) and select "Dev Containers: Reopen in Container."
VS Code will now build and start the Dev Container, setting up the environment for you. This will take a while.
After the Dev Container is built it will continue to run further setups in the terminal. Wait until this is completed too.
If the Dev Container opened in less than a few minutes it likely failed to create the container properly. Then rebuild the container via opening the Command Palette (Ctrl+Shift+P) and selecting "Dev Containers: Rebuild in Container." -
Within your VS Code terminal, move to the folder code_cure_agent
cd code_cure_agent
Inside the Dev Container terminal, configure your OpenAI API key by running:
python3.10 set_api_key.pyThe script will prompt you to paste your API token.
CodeCureAgent takes a csv file as input, where each line specifies a single warning of a SonarQube rule in a single Java file in a single Git repository.
For an example on how the input file has to look like see evaluation_dataset_filled_up_to_1000_input_file.csv.
You can create your own by following the steps described in IV. Running and Evaluating your own Experiment.
To execute CodeCureAgent on an input file, run the following from the code_cure_agent folder:
./run_on_dataset.sh ./experimental_setups/evaluation_dataset/evaluation_dataset_filled_up_to_1000_input_file.csv hyperparams.jsonThe first argument is the csv input file to run on. The second argument specifies hyperparameter settings.
You can open the hyperparams.json file to review or customize its parameters (explained further in the customization section).
If you only care for CodeCureAgent itself and not the comparison to baselines, you can delete the comparative_study folder to improve VS Code responsiveness.
- CodeCureAgent goes through the input file line by line
- For each line CodeCureAgent checks out the project with the given URL and commit.
- It initiates the autonomous repair process, first classifying the warning as true positive or false positive and then fixing or suppressing the warning accordingly.
- Logs detailing each step performed will be displayed in your terminal.
All utility scripts must be run from the folder code_cure_agent.
-
Create a .csv file with one line per Git repository with three columns (without header):
- URL to the Git repository
- CommitID that you want to run on. If you want to use the most current commit on the master/main branch set the commitID to 'MASTER'.
- targetJavaVersion. The Java version the project compiles to. Used to configure the SonarQube analyzer with the correct rules. Can be automatically inferred by using the script code_cure_agent/experimental_setups/infer_target_java_version_of_projects.py
For an example see evaluation_dataset_repos_list_with_java_versions.csv. Currently, CodeCureAgent only supports Maven projects that can be built by running a simple
mvn clean packagewith Maven 3.6.3. -
Use the Sorald mining tool to mine SonarQube warnings on the repositories specified in the file.
Example usage (run fromcode_cure_agenton theevaluation_dataset_repos_list_with_java_versions.csvfile):java -jar ./sorald/sorald.jar mine \ --git-repos-list ./experimental_setups/evaluation_dataset/evaluation_dataset_repos_list_with_java_versions.csv \ --miner-output-file ./experimental_setups/evaluation_dataset/mining_results/evaluation_dataset_out.txt \ --stats-output-file ./experimental_setups/evaluation_dataset/mining_results/evaluation_dataset_mining_result.json \ --temp-dir ./experimental_setups/evaluation_dataset/temp \ --stats-on-git-repos \ --rule-parameters ./sonarqube_quality_profile/quality_profile_rule_parameters.json \ --handled-rules
Remove the --handled-rules flag if you want to mine all warnings supported by the used SonarQube version.
If you only want to mine specific rules, pass the IDs of the rules via --rule-keys, or to only mine for specific types of rules use --rule-types.
For our experiments we use only rules that are part of the SonarWay quality profile, by using the keys fromcode_cure_agent/sonarqube_quality_profile/quality_profile_rule_keys.txt.
After running the mining tool the output is saved in a json file. In the example this isevaluation_dataset_mining_result.json. -
Finally, you can create your csv input file from the json report by using
code_cure_agent/experimental_setups/prepare_experiment_input_file.py.
To this script provide the previously created json report as the first argument. Also --rule-violations-mode must be set tosingle.
Additionally, you can provide the path, the csv-file is to be saved to, via --target-csv-file-path.
Example:
python3 ./experimental_setups/prepare_experiment_input_file.py ./experimental_setups/evaluation_dataset/mining_results/evaluation_dataset_mining_result.json \
--target-csv-file-path ./experimental_setups/evaluation_dataset/mining_results/evaluation_dataset_input_file_all_violations.csv --rule-violations-mode single- Optionally, you can sample from the input file to run only on some of the warnings using code_cure_agent/experimental_setups/sample_rule_violations_from_input_file.py.
CodeCureAgent saves the output in multiple files.
- When running CodeCureAgent the primary logs are created in the folder
code_cure_agent/experimental_setups/experiment_X, whereexperiment_Xincrements automatically with each run of the command./run_on_dataset.sh. - The folder is structured into subfolders
classification,fix_fp,fix_tpandtasks. - Most interesting are the classification_result files in the
classificationfolder and the prompt_history files in theprompt_historysubfolders ofclassification,fix_fpandfix_tp.
Within the code_cure_agent/experimental_setups folder, several scripts are available to calculate evaluation results from one or multiple executed CodeCureAgent runs.
All scripts are expected to be run from the code_cure_agent folder.
-
Create evaluation results file
After running one or multiple experiments, logs are located in the folderscode_cure_agent/experimental_setups/experiment_X.
If you do not want to run your own experiments, but calculate evaluation results on the log files of our experiment runs, copy all folders and files from code_cure_agent/evaluation_results/evaluation_outputs to code_cure_agent/experimental_setups.
The script code_cure_agent/experimental_setups/write_experiment_results_to_csv_file.py can be used to extract the experiment run results from the experiment logs into a csv file.
By default, the evaluation results are appended to the csv file code_cure_agent/evaluation_results/evaluation_results.csv. -
Create extended evaluation results file with further info
An extended version of the evaluation results file can be created by using the code_cure_agent/experimental_setups/extend_evaluation_results_with_more_stats.py script. It expects the previously created evaluation results file as input.
By default, the extended evaluation results are written to the csv file code_cure_agent/evaluation_results/evaluation_results_extended.csv. -
Aggregate results into a Markdown
The evaluation results can be aggregated into a Markdown file that presents relevant stats.
Use the script code_cure_agent/experimental_setups/calculate_stats_from_evaluation_results.py for this.
It expects the extended evaluation results csv file as first argument.
By default, the Markdown is written tocode_cure_agent/evaluation_results/analysis_results_overview.md.
See an example result Markdown here: code_cure_agent/evaluation_results/analysis_results_overview_all.md -
Create summaries for each fixed/unfixed warning You can create summaries that show the most important information on a CodeCureAgent run on a warning, including a diff of all made changes. Use the script code_cure_agent/experimental_setups/create_warning_summaries.py. It requires that the extended evaluation results csv file has been created before (2.). The summaries are added to the experiment logs (
code_cure_agent/experimental_setups/experiment_X) in a subfolderrun_summaries. -
Manually inspect a repaired warning
We provide a further script code_cure_agent/experimental_setups/show_next_warning_for_manual_inspection.py that can be used to quickly open relevant files for a specified warning, including a VS Code diff between the unfixed and fixed versions of the warning.
The instanceID of the warning that is to be looked at can be provided via option--id-to-show.
This script also requires that logs are located in the folderscode_cure_agent/experimental_setups/experiment_X. (see 1.) -
Create plots
We provide further Jupyter notebooks for creating plots, including a Venn diagram.
-
Budget Control Strategy: Defines how the agent views the remaining cycles, suggested fixes, and minimum required fixes:
- FULL-TRACK: Put the max, consumed and left budget in the prompt (default for our experiments).
- NO-TRACK: Suppresses budget information.
Example Configuration:
"budget_control": { "name": "FULL-TRACK", "params": { "#fixes": 4 //The agent should suggest at least 4 patches within the given budget, the number is updated based on agent progress (4 is default). } }
-
Repetition Handling: Default settings restrict repetitions.
"repetition_handling": "RESTRICT",
-
Cycle Limits: Control the maximum allowed cycles (budget) in the different sub-agents. Default for our experiment:
"classification_cycles_limit": 20, "fix_cycles_limit": 40
-
Threshold of cycles left after which write_fix is prioritized: Set the threshold of cycles left before the cycle budget is exhausted, where, when reached, the prompt is modified to force the agent to use write_fix. Default for our experiment:
"prioritize_write_fix_cycle_threshold": 5
In the run_on_dataset.sh file, locate the line:
./run.sh --ai-settings agent_config_and_prompt_files/ai_settings.yaml --model-version gpt-4.1-mini-2025-04-14 -m json_file --experiment-file "$2"Change the model_version to one of the following supported models:
- gpt-3.5-turbo-0125
- gpt-4-turbo-2024-04-09
- gpt-4o-mini-2024-07-18
- gpt-4o-2024-08-06
- gpt-4.1-nano-2025-04-14
- gpt-4.1-mini-2025-04-14
- gpt-4.1-2025-04-14
Reasoning models are not supported by the used OpenAI API version.
For our experiments, we utilized CodeCureAgent on a dataset of 1000 warnings, successfully creating plausible fixes for 968 of them.
The experiment input files are located in code_cure_agent/experimental_setups/evaluation_dataset.
Most relevant is here the file code_cure_agent/experimental_setups/evaluation_dataset/evaluation_dataset_filled_up_to_1000_input_file.csv, which is the input file to CodeCureAgent.
All log files from running the experiment on the 1000 warnings are located in code_cure_agent/evaluation_results/evaluation_outputs (split into multiple batches of experiment runs).
The extracted and aggregated evaluation results, as described in IV. Running and Evaluating your own Experiment: 3. Scripts for Evaluation above, are located in code_cure_agent/evaluation_results.
-
Run CodeCureAgent on the code_cure_agent/experimental_setups/evaluation_dataset/evaluation_dataset_filled_up_to_1000_input_file.csv input file as described in
III. Run CodeCureAgent. -
Post-process the created log files as described in
IV. Running and Evaluating your own Experiment: 3. Scripts for Evaluationto receive aggregated results (Markdown file and plots).
We ran Sorald on the same dataset of 1000 warnings, of which Sorald supports 62 warnings.
The scripts and results are found in comparative_study/sorald_comparison.
Refer to the dedicated README for more information: comparative_study/sorald_comparison/README.md.
We ran CORE on the same dataset of 1000 warnings.
The scripts and results are found in comparative_study/core_comparison.
Refer to the dedicated README for more information: comparative_study/core_comparison/README.md.
