LLM-Bisect

LLM-Bisect is an automated tool for identifying Bug-Inducing Commits (BICs) in the Linux kernel using Large Language Models (LLMs) combined with git bisect techniques.

Overview

When a security vulnerability is patched in the Linux kernel, understanding when the vulnerability was introduced is crucial for:

Determining which kernel versions are affected
Assessing the severity and impact of the vulnerability
Identifying patterns in how vulnerabilities are introduced

LLM-Bisect automates this process by:

Analyzing the vulnerability patch to identify critical code changes
Tracing the history of changed functions and lines using git blame
Using LLMs to determine which historical commit introduced the vulnerability

Features

Critical Line Extraction: Automatically identifies the most security-relevant changes in a patch
Multi-strategy Candidate Finding: Uses both function-based and line-based history tracing
LLM-powered Analysis: Leverages models like GPT-4/o1 to understand code semantics
Accuracy Evaluation: Built-in tools for evaluating results against ground truth

Installation

# Clone the repository
git clone https://github.com/your-org/llm-bisect.git
cd llm-bisect

# Install dependencies
pip install -r requirements.txt

API Key Configuration

LLM-Bisect requires an OpenAI API key to function. You can configure it using one of the following methods:

Method 1: Environment Variable (Recommended for Production)

Set the OPENAI_API_KEY environment variable:

export OPENAI_API_KEY="sk-your-api-key-here"

Add this to your ~/.bashrc or ~/.zshrc to persist across sessions:

echo 'export OPENAI_API_KEY="sk-your-api-key-here"' >> ~/.bashrc
source ~/.bashrc

Method 2: `.env` File (Recommended for Development)

Create a .env file in the project root directory:

# .env
OPENAI_API_KEY=sk-your-api-key-here
LLM_BISECT_KERNEL_PATH=/path/to/linux/kernel
LLM_BISECT_MODEL=o1

The application automatically loads this file using python-dotenv.

Important: Add .env to your .gitignore to avoid committing secrets:

echo ".env" >> .gitignore

Method 3: Programmatic Configuration

Set the API key in your Python code:

import os
os.environ["OPENAI_API_KEY"] = "sk-your-api-key-here"

from llm_bisect import bisect_vulnerability
result = bisect_vulnerability("commit-hash")

Environment Variables Reference

Variable	Description	Default
`OPENAI_API_KEY`	Your OpenAI API key (required)	-
`LLM_BISECT_KERNEL_PATH`	Path to Linux kernel repository	`/home/zzhan173/repos/linux`
`LLM_BISECT_STORE_PATH`	Path to store intermediate results	`./cases/`
`LLM_BISECT_MODEL`	Default LLM model to use	`o1`
`LLM_BISECT_DEBUG`	Enable debug logging	`false`

Getting an OpenAI API Key

Go to OpenAI Platform
Sign up or log in to your account
Navigate to API Keys section
Click Create new secret key
Copy the key (it starts with sk-)

Note: The o1 model requires a paid OpenAI account with sufficient credits.

Quick Start

Single Commit Analysis

from llm_bisect import bisect_vulnerability

# Analyze a vulnerability patch commit
result = bisect_vulnerability("6ca575374dd9a507cdd16dfa0e78c2e9e20bd05f")

if result.success:
    print(f"Bug-inducing commit: {result.inducing_commit}")
else:
    print(f"Analysis failed: {result.error}")

Command Line Interface

# Analyze a single commit
python -m llm_bisect.cli bisect -c 6ca575374dd9a507cdd16dfa0e78c2e9e20bd05f

# Batch processing
python -m llm_bisect.cli bisect -i commits.json -o results.json

# Evaluate results
python -m llm_bisect.cli evaluate results.json -g ground_truth.json

Using the Main Module

# Run analysis
python -m llm_bisect.main 6ca575374dd9a507cdd16dfa0e78c2e9e20bd05f --model o1

# With verbose output
python -m llm_bisect.main 6ca575374dd9a507cdd16dfa0e78c2e9e20bd05f -v

Configuration

Configure LLM-Bisect using the Config class:

from llm_bisect import Config, set_config

config = Config()
config.paths.kernel_repo = "/path/to/linux"
config.llm.model = "o1"
config.llm.temperature = 0.0
config.bisect.max_candidates = 50

set_config(config)

Project Structure

llm_bisect/
├── __init__.py          # Package exports
├── config.py            # Configuration management
├── main.py              # Main entry point
├── cli.py               # Command-line interface
├── utils/               # Utility modules
│   ├── git_utils.py     # Git operations
│   ├── patch_parser.py  # Patch parsing
│   └── file_utils.py    # File I/O
├── llm/                 # LLM integration
│   ├── client.py        # API client
│   ├── prompts.py       # Prompt templates
│   └── analyzers.py     # Analysis functions
├── bisect/              # Bisection logic
│   ├── candidate_finder.py  # Candidate identification
│   └── bisector.py      # Main bisection algorithm
└── analysis/            # Code analysis
    ├── critical_lines.py    # Critical line extraction
    ├── function_parser.py   # C function parsing
    └── evaluator.py         # Accuracy evaluation

How It Works

1. Critical Line Extraction

LLM-Bisect first analyzes the patch to identify which code changes are most relevant to fixing the vulnerability:

from llm_bisect.analysis import extract_critical_lines

lines = extract_critical_lines(patch_commit, use_llm=True)
for line in lines:
    print(f"{line.filename}:{line.absolute_line}")

2. Candidate Commit Identification

Two strategies are used to find potential bug-inducing commits:

Function-based: Traces the history of functions that were modified
Line-based: Uses git blame to find when specific lines were introduced

3. LLM Analysis

The LLM evaluates each candidate commit to determine if it could have introduced the vulnerability:

from llm_bisect.llm import determine_if_commit_introduces_vulnerability

is_inducing = determine_if_commit_introduces_vulnerability(
    patch_commit, 
    candidate_commit
)

4. Result Selection

When multiple candidates remain, the LLM selects the most likely inducing commit by comparing them.

API Reference

Core Functions

`bisect_vulnerability(patch_commit: str) -> BisectResult`

Main function to find the bug-inducing commit for a given patch.

`extract_critical_lines(commit: str, use_llm: bool = True) -> List[CriticalLine]`

Extract security-relevant lines from a patch.

`evaluate_bisect_accuracy(results_path: str, ground_truth_path: str) -> EvaluationResult`

Evaluate bisect results against ground truth.

Classes

`Bisector`

Main class for performing bisection analysis.

`Config`

Configuration container with sections for paths, LLM settings, and bisect parameters.

`BisectResult`

Result container with inducing_commit, candidates, success, and error fields.

Accuracy Evaluation

LLM-Bisect includes tools for evaluating accuracy:

from llm_bisect.analysis import Evaluator

evaluator = Evaluator(ground_truth_path="ground_truth.json")
result = evaluator.evaluate_commit_accuracy(my_results)

print(f"Accuracy: {result.accuracy:.2%}")
print(f"Correct: {result.correct}/{result.total}")

Requirements

Python 3.8+
Git
Access to a Linux kernel repository
OpenAI API key (or compatible LLM API)

License

MIT License

Citation

If you use LLM-Bisect in your research, please cite:

@software{llm_bisect,
  title = {LLM-Bisect: Automated Bug-Inducing Commit Identification},
  year = {2024},
  url = {https://github.com/your-org/llm-bisect}
}

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting pull requests.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
analysis		analysis
bisect		bisect
llm		llm
utils		utils
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
cli.py		cli.py
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

LLM-Bisect

Overview

Features

Installation

API Key Configuration

Method 1: Environment Variable (Recommended for Production)

Method 2: .env File (Recommended for Development)

Method 3: Programmatic Configuration

Environment Variables Reference

Getting an OpenAI API Key

Quick Start

Single Commit Analysis

Command Line Interface

Using the Main Module

Configuration

Project Structure

How It Works

1. Critical Line Extraction

2. Candidate Commit Identification

3. LLM Analysis

4. Result Selection

API Reference

Core Functions

bisect_vulnerability(patch_commit: str) -> BisectResult

extract_critical_lines(commit: str, use_llm: bool = True) -> List[CriticalLine]

evaluate_bisect_accuracy(results_path: str, ground_truth_path: str) -> EvaluationResult

Classes

Bisector

Config

BisectResult

Accuracy Evaluation

Requirements

License

Citation

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Method 2: `.env` File (Recommended for Development)

`bisect_vulnerability(patch_commit: str) -> BisectResult`

`extract_critical_lines(commit: str, use_llm: bool = True) -> List[CriticalLine]`

`evaluate_bisect_accuracy(results_path: str, ground_truth_path: str) -> EvaluationResult`

`Bisector`

`Config`

`BisectResult`

Packages