LLM-Bisect is an automated tool for identifying Bug-Inducing Commits (BICs) in the Linux kernel using Large Language Models (LLMs) combined with git bisect techniques.
When a security vulnerability is patched in the Linux kernel, understanding when the vulnerability was introduced is crucial for:
- Determining which kernel versions are affected
- Assessing the severity and impact of the vulnerability
- Identifying patterns in how vulnerabilities are introduced
LLM-Bisect automates this process by:
- Analyzing the vulnerability patch to identify critical code changes
- Tracing the history of changed functions and lines using git blame
- Using LLMs to determine which historical commit introduced the vulnerability
- Critical Line Extraction: Automatically identifies the most security-relevant changes in a patch
- Multi-strategy Candidate Finding: Uses both function-based and line-based history tracing
- LLM-powered Analysis: Leverages models like GPT-4/o1 to understand code semantics
- Accuracy Evaluation: Built-in tools for evaluating results against ground truth
# Clone the repository
git clone https://github.com/your-org/llm-bisect.git
cd llm-bisect
# Install dependencies
pip install -r requirements.txtLLM-Bisect requires an OpenAI API key to function. You can configure it using one of the following methods:
Set the OPENAI_API_KEY environment variable:
export OPENAI_API_KEY="sk-your-api-key-here"Add this to your ~/.bashrc or ~/.zshrc to persist across sessions:
echo 'export OPENAI_API_KEY="sk-your-api-key-here"' >> ~/.bashrc
source ~/.bashrcCreate a .env file in the project root directory:
# .env
OPENAI_API_KEY=sk-your-api-key-here
LLM_BISECT_KERNEL_PATH=/path/to/linux/kernel
LLM_BISECT_MODEL=o1The application automatically loads this file using python-dotenv.
Important: Add .env to your .gitignore to avoid committing secrets:
echo ".env" >> .gitignoreSet the API key in your Python code:
import os
os.environ["OPENAI_API_KEY"] = "sk-your-api-key-here"
from llm_bisect import bisect_vulnerability
result = bisect_vulnerability("commit-hash")| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
Your OpenAI API key (required) | - |
LLM_BISECT_KERNEL_PATH |
Path to Linux kernel repository | /home/zzhan173/repos/linux |
LLM_BISECT_STORE_PATH |
Path to store intermediate results | ./cases/ |
LLM_BISECT_MODEL |
Default LLM model to use | o1 |
LLM_BISECT_DEBUG |
Enable debug logging | false |
- Go to OpenAI Platform
- Sign up or log in to your account
- Navigate to API Keys section
- Click Create new secret key
- Copy the key (it starts with
sk-)
Note: The o1 model requires a paid OpenAI account with sufficient credits.
from llm_bisect import bisect_vulnerability
# Analyze a vulnerability patch commit
result = bisect_vulnerability("6ca575374dd9a507cdd16dfa0e78c2e9e20bd05f")
if result.success:
print(f"Bug-inducing commit: {result.inducing_commit}")
else:
print(f"Analysis failed: {result.error}")# Analyze a single commit
python -m llm_bisect.cli bisect -c 6ca575374dd9a507cdd16dfa0e78c2e9e20bd05f
# Batch processing
python -m llm_bisect.cli bisect -i commits.json -o results.json
# Evaluate results
python -m llm_bisect.cli evaluate results.json -g ground_truth.json# Run analysis
python -m llm_bisect.main 6ca575374dd9a507cdd16dfa0e78c2e9e20bd05f --model o1
# With verbose output
python -m llm_bisect.main 6ca575374dd9a507cdd16dfa0e78c2e9e20bd05f -vConfigure LLM-Bisect using the Config class:
from llm_bisect import Config, set_config
config = Config()
config.paths.kernel_repo = "/path/to/linux"
config.llm.model = "o1"
config.llm.temperature = 0.0
config.bisect.max_candidates = 50
set_config(config)llm_bisect/
├── __init__.py # Package exports
├── config.py # Configuration management
├── main.py # Main entry point
├── cli.py # Command-line interface
├── utils/ # Utility modules
│ ├── git_utils.py # Git operations
│ ├── patch_parser.py # Patch parsing
│ └── file_utils.py # File I/O
├── llm/ # LLM integration
│ ├── client.py # API client
│ ├── prompts.py # Prompt templates
│ └── analyzers.py # Analysis functions
├── bisect/ # Bisection logic
│ ├── candidate_finder.py # Candidate identification
│ └── bisector.py # Main bisection algorithm
└── analysis/ # Code analysis
├── critical_lines.py # Critical line extraction
├── function_parser.py # C function parsing
└── evaluator.py # Accuracy evaluation
LLM-Bisect first analyzes the patch to identify which code changes are most relevant to fixing the vulnerability:
from llm_bisect.analysis import extract_critical_lines
lines = extract_critical_lines(patch_commit, use_llm=True)
for line in lines:
print(f"{line.filename}:{line.absolute_line}")Two strategies are used to find potential bug-inducing commits:
- Function-based: Traces the history of functions that were modified
- Line-based: Uses git blame to find when specific lines were introduced
The LLM evaluates each candidate commit to determine if it could have introduced the vulnerability:
from llm_bisect.llm import determine_if_commit_introduces_vulnerability
is_inducing = determine_if_commit_introduces_vulnerability(
patch_commit,
candidate_commit
)When multiple candidates remain, the LLM selects the most likely inducing commit by comparing them.
Main function to find the bug-inducing commit for a given patch.
Extract security-relevant lines from a patch.
Evaluate bisect results against ground truth.
Main class for performing bisection analysis.
Configuration container with sections for paths, LLM settings, and bisect parameters.
Result container with inducing_commit, candidates, success, and error fields.
LLM-Bisect includes tools for evaluating accuracy:
from llm_bisect.analysis import Evaluator
evaluator = Evaluator(ground_truth_path="ground_truth.json")
result = evaluator.evaluate_commit_accuracy(my_results)
print(f"Accuracy: {result.accuracy:.2%}")
print(f"Correct: {result.correct}/{result.total}")- Python 3.8+
- Git
- Access to a Linux kernel repository
- OpenAI API key (or compatible LLM API)
MIT License
If you use LLM-Bisect in your research, please cite:
@software{llm_bisect,
title = {LLM-Bisect: Automated Bug-Inducing Commit Identification},
year = {2024},
url = {https://github.com/your-org/llm-bisect}
}Contributions are welcome! Please read our contributing guidelines before submitting pull requests.