HintPilot: LLM-based Compiler Hint Synthesis for Code Optimization

HintPilot is the research code release for the paper "HintPilot: LLM-based Compiler Hint Synthesis for Code Optimization". The project studies a different optimization interface from direct code rewriting: instead of asking an LLM to rewrite source code, it synthesizes compiler hints such as attributes and pragmas, grounds them with compiler documentation, and validates them with execution feedback.

The repository follows the paper pipeline:

Build a knowledge base of semantics-preserving compiler hints from compiler documentation.
Extract structured insertion sites from source code.
Retrieve relevant hint descriptions and examples with RAG.
Generate candidate hint plans with an LLM.
Compile, profile, and iteratively refine the program using execution feedback.

What Is Included

Core hint synthesis and refinement pipeline under agent/
Hint insertion, validation, and dataset construction utilities under constructor/
The compiler-hint knowledge base under knowledge/
Parser helpers under tools/

What Is Not Included

Large benchmark archives and local experiment outputs.
API keys and provider-specific credentials.

Repository Layout

agent/
   evaluation/      Benchmark runners for HumanEval-CPP, PolyBench tasks
   llm/             Prompts for hints generation, test case generation
      clients/      Online, LangChain, and local vLLM client integrations
   preprocessing/   Dataset preprocessing utilities
   tools/           Parsing, construction, profiling, and verification helpers

constructor/
   hints/           Hint insertion backends for GCC and Clang
   validate/        Validation helpers including Alive2-based checks

knowledge/
   gcc_final/       Hint documentation and examples used by RAG
   */               Structured hint metadata grouped by scope

tools/
   parser/          Compiler-parser support code and parser binaries

Requirements

Linux
Python 3.10+
GCC or Clang toolchain
Optional: LLVM/Alive2 for semantic validation
Optional: vLLM for local model serving

Setup

Create an environment and install the Python dependencies declared in pyproject.toml:

python -m venv .venv
source .venv/bin/activate
pip install -e .

Most scripts expect the repository root and agent/ to be on PYTHONPATH:

export PYTHONPATH="$PWD:$PWD/agent:$PYTHONPATH"

If you use an online LLM provider, copy .env.example to .env and set the relevant API variables.

Quick Start

1. Run the Main Optimization Pipeline on One File

python agent/run.py \
   --src agent/examples/test.c \
   --out /tmp/test.attr.c \
   --model qwen3-coder-plus \
   --test_key cpp_stress_test

2. Preprocess HumanEval-CPP

python agent/preprocessing/process_humaneval_cpp.py \
   --data-path data/humaneval-cpp.jsonl \
   --out-name humaneval_cpp \
   --test-key cpp_stress_test

3. Run Batch Evaluation

python agent/evaluation/batch_evaluation.py \
   --client online \
   --suite humaneval \
   --results-dir ./outputs \
   --model-name qwen3-coder-plus \
   --model-path qwen3-coder-plus \
   --opt 3 \
   --data-path ./agent/preprocessing/data/humaneval_cpp_pure_processed.json \
   --dataset-name HumanEval_CPP \
   --test-key __HUMANEVAL_CPP__

For PolyBench, preprocess the benchmark first and pass --suite polybench plus --polybench-src-root.

Environment Variables

Common variables used across the repository:

API_KEY, BASE_URL: online model backend used by OnlineClient
OPENAI_API_KEY, OPENAI_BASE_URL, OPENAI_MODEL: OpenAI-compatible endpoints
DEEPSEEK_API_KEY, DEEPSEEK_API_BASE, DEEPSEEK_MODEL: LangChain RAG backend
VLLM_OPENAI_BASE_URL, VLLM_DEVICE, VLLM_TARGET_DEVICE: local or served vLLM configuration
RAG_KNOWLEDGE, RAG_TOPK, RAG_RETRIEVER_K: retrieval configuration
TOOL_GCC_PATH: optional override for the parser binary used by preprocessing helpers

Reproducing the Paper

The code maps to the paper as follows:

Knowledge base construction: knowledge/ and the retrieval helpers in agent/llm/clients/langchain_client.py
Structured context extraction: parser integration in agent/tools/parser.py
RAG-based hint synthesis: agent/rag_runner.py and prompt generation in agent/llm/
Execution-guided self-refinement: agent/agent_parallel.py, agent/tools/construct_tool.py, and agent/tools/profiler.py
Evaluation: agent/evaluation/

Notes For Open-Source Use

This is a research codebase, not a production-hardened package.
Some scripts still assume external compiler tooling and benchmark datasets are prepared locally.
Large experimental outputs should be written outside the repository or into ignored directories such as results/.

Citation

If you use this repository, please cite the HintPilot paper.

@misc{jiang2026hintpilotllmbasedcompilerhint,
      title={HintPilot: LLM-based Compiler Hint Synthesis for Code Optimization}, 
      author={Hanyun Jiang and Peisen Yao and Kaiyue Li and Tingting Lin and Chengpeng Wang and Kui Ren},
      year={2026},
      eprint={2604.15041},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2604.15041}, 
}

Name	Name	Last commit message
Latest commit History 6 Commits
agent	agent
constructor	constructor	No commit message
knowledge	knowledge
third_party/alive2	third_party/alive2
tools	tools
.env.example	.env.example
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
pyproject.toml	pyproject.toml
uv.lock	uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HintPilot: LLM-based Compiler Hint Synthesis for Code Optimization

What Is Included

What Is Not Included

Repository Layout

Requirements

Setup

Quick Start

1. Run the Main Optimization Pipeline on One File

2. Preprocess HumanEval-CPP

3. Run Batch Evaluation

Environment Variables

Reproducing the Paper

Notes For Open-Source Use

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HintPilot: LLM-based Compiler Hint Synthesis for Code Optimization

What Is Included

What Is Not Included

Repository Layout

Requirements

Setup

Quick Start

1. Run the Main Optimization Pipeline on One File

2. Preprocess HumanEval-CPP

3. Run Batch Evaluation

Environment Variables

Reproducing the Paper

Notes For Open-Source Use

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages