Welcome to PROVEN-GNN (PROgram Vulnerability Examination Network using Graph Neural Networks), a competition designed to compare human-built Graph Neural Network (GNN) solutions against Large Language Model (LLM)-based approaches.
The objective is to classify code functions as vulnerable or non-vulnerable using graph representations of source code.
| Aspect | Details |
|---|---|
| Task Type | Binary Graph Classification |
| Evaluation Metric | Macro F1-Score |
| Dataset | Inspired by DiverseVul |
- Original Dataset: DiverseVul Dataset
- Citation: Chen, Yizheng, et al. "DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning-Based Vulnerability Detection." Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses (RAID), 2023.
The dataset was adapted by generating graph representations for a subset of code functions. It includes 2,487 training graphs and 622 test graphs. The class distribution is imbalanced, with vulnerable code representing only 29% of the data.
We used Joern to construct Code Property Graphs (CPG), which combine:
- Abstract Syntax Tree (AST)
- Control Flow Graph (CFG)
- Program Dependence Graph (PDG)
- Control Dependence Graph (CDG)
- Data Dependence Graph (DDG)
The following figure shows an example of a generated CPG:
-
Node Type (15 features) Encodes the type of node (e.g., method, function call, variable declaration, etc.)
-
Code Embedding (512 features) Dense embedding representing the code snippet associated with the node.
Edge features represent the type of relationship between nodes (e.g., AST, CFG, PDG, CDG, DDG).
PROVEN-GNN
├── data/
│ ├── public/
│ │ ├── train_data.csv (to be downloaded)
│ │ ├── test_data.csv (to be downloaded)
│ │ ├── test_ids.csv
│ │ ├── sample_submission.csv
│ │ └── README.md
├── competition/
│ ├── config.yaml
│ ├── update_leaderboard.py
│ ├── evaluate.py
│ ├── render_leaderboard.py
│ └── validate_submission.py
├── submissions/ (place submission files here)
│ └── README.md
├── leaderboard/
│ ├── leaderboard.csv
│ └── leaderboard.md
├── docs/
│ ├── leaderboard.html
│ ├── leaderboard.css
│ └── leaderboard.js
└── .github/workflows/
├── score_submission.yml
└── publish_leaderboard.yml
git clone https://github.com/abdksm/PROVEN-GNN.git
cd PROVEN-GNNpip install -r starter_code/requirements.txtcd data/public
pip install gdown
gdown --id 1kUNwo7WjVpJ2D1GPsotiNO5FJnCqt--9 -O train_data.parquet
gdown --id 1xhg62LTAJm5ityBKiXKv8Rsg0eSl9TJC -O test_data.parquetcd ../../starter_code
python baseline.pyid,y_pred
0,1
1,0
2,1
...{
"team": "example_team",
"run_id": "example_run_id",
"type": "human",
"model": "GAT",
"notes": "Additional notes"
}type must be one of:
"human""llm-only""human+llm"
You should be in the root directory, and the file predictions.csv should be inside the submissions/ directory
# For Windows users
.\extra\encrypt_win.ps1 submissions\predictions.csv
# For Linux users
bash extra/encrypt_linux.sh submissions/predictions.csv- Fork this repository
- Train your model and generate predictions
- Add
predictions.csvandmetadata.jsonto thesubmissions/directory - Run the encryption script (A critical step !)
- Create a Pull Request
- GitHub Actions automatically evaluate your submission
- Results are posted as a comment and added to the leaderboard
The evaluation metric is Macro F1-Score:
Rankings are sorted by descending score.
After submission, scores are automatically added to:
leaderboard/leaderboard.csvleaderboard/leaderboard.md
An interactive leaderboard is available here:
- ❌ No external or private datasets
- ❌ No manual labeling of test data
- ❌ No modification of evaluation scripts
- ✅ Unlimited offline training is allowed
⚠️ Only one submission per user is allowed
Violations may result in disqualification.
@misc{proven_gnn_2025,
title={PROVEN-GNN: Program Vulnerability Examination Network},
author={Abderrahmane Kasmi},
year={2025},
url={https://github.com/abdksm/PROVEN-GNN}
}This project is licensed under the MIT License. See the LICENSE file for details.
