HintPilot is the research code release for the paper "HintPilot: LLM-based Compiler Hint Synthesis for Code Optimization". The project studies a different optimization interface from direct code rewriting: instead of asking an LLM to rewrite source code, it synthesizes compiler hints such as attributes and pragmas, grounds them with compiler documentation, and validates them with execution feedback.
The repository follows the paper pipeline:
- Build a knowledge base of semantics-preserving compiler hints from compiler documentation.
- Extract structured insertion sites from source code.
- Retrieve relevant hint descriptions and examples with RAG.
- Generate candidate hint plans with an LLM.
- Compile, profile, and iteratively refine the program using execution feedback.
- Core hint synthesis and refinement pipeline under
agent/ - Hint insertion, validation, and dataset construction utilities under
constructor/ - The compiler-hint knowledge base under
knowledge/ - Parser helpers under
tools/
- Large benchmark archives and local experiment outputs.
- API keys and provider-specific credentials.
agent/
evaluation/ Benchmark runners for HumanEval-CPP, PolyBench tasks
llm/ Prompts for hints generation, test case generation
clients/ Online, LangChain, and local vLLM client integrations
preprocessing/ Dataset preprocessing utilities
tools/ Parsing, construction, profiling, and verification helpers
constructor/
hints/ Hint insertion backends for GCC and Clang
validate/ Validation helpers including Alive2-based checks
knowledge/
gcc_final/ Hint documentation and examples used by RAG
*/ Structured hint metadata grouped by scope
tools/
parser/ Compiler-parser support code and parser binaries
- Linux
- Python 3.10+
- GCC or Clang toolchain
- Optional: LLVM/Alive2 for semantic validation
- Optional: vLLM for local model serving
Create an environment and install the Python dependencies declared in pyproject.toml:
python -m venv .venv
source .venv/bin/activate
pip install -e .Most scripts expect the repository root and agent/ to be on PYTHONPATH:
export PYTHONPATH="$PWD:$PWD/agent:$PYTHONPATH"If you use an online LLM provider, copy .env.example to .env and set the relevant API variables.
python agent/run.py \
--src agent/examples/test.c \
--out /tmp/test.attr.c \
--model qwen3-coder-plus \
--test_key cpp_stress_testpython agent/preprocessing/process_humaneval_cpp.py \
--data-path data/humaneval-cpp.jsonl \
--out-name humaneval_cpp \
--test-key cpp_stress_testpython agent/evaluation/batch_evaluation.py \
--client online \
--suite humaneval \
--results-dir ./outputs \
--model-name qwen3-coder-plus \
--model-path qwen3-coder-plus \
--opt 3 \
--data-path ./agent/preprocessing/data/humaneval_cpp_pure_processed.json \
--dataset-name HumanEval_CPP \
--test-key __HUMANEVAL_CPP__For PolyBench, preprocess the benchmark first and pass --suite polybench plus --polybench-src-root.
Common variables used across the repository:
API_KEY,BASE_URL: online model backend used byOnlineClientOPENAI_API_KEY,OPENAI_BASE_URL,OPENAI_MODEL: OpenAI-compatible endpointsDEEPSEEK_API_KEY,DEEPSEEK_API_BASE,DEEPSEEK_MODEL: LangChain RAG backendVLLM_OPENAI_BASE_URL,VLLM_DEVICE,VLLM_TARGET_DEVICE: local or served vLLM configurationRAG_KNOWLEDGE,RAG_TOPK,RAG_RETRIEVER_K: retrieval configurationTOOL_GCC_PATH: optional override for the parser binary used by preprocessing helpers
The code maps to the paper as follows:
- Knowledge base construction:
knowledge/and the retrieval helpers inagent/llm/clients/langchain_client.py - Structured context extraction: parser integration in
agent/tools/parser.py - RAG-based hint synthesis:
agent/rag_runner.pyand prompt generation inagent/llm/ - Execution-guided self-refinement:
agent/agent_parallel.py,agent/tools/construct_tool.py, andagent/tools/profiler.py - Evaluation:
agent/evaluation/
- This is a research codebase, not a production-hardened package.
- Some scripts still assume external compiler tooling and benchmark datasets are prepared locally.
- Large experimental outputs should be written outside the repository or into ignored directories such as
results/.
If you use this repository, please cite the HintPilot paper.
@misc{jiang2026hintpilotllmbasedcompilerhint,
title={HintPilot: LLM-based Compiler Hint Synthesis for Code Optimization},
author={Hanyun Jiang and Peisen Yao and Kaiyue Li and Tingting Lin and Chengpeng Wang and Kui Ren},
year={2026},
eprint={2604.15041},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2604.15041},
}