A Graph Neural Network (GNN) framework for reverse engineering architecturally diverse and fully optimised gate level netlist with functional module classification of gates.
Segregate is designed to analyze optimised netlists and classify gates into different categories (Adder, Multiplier, Subtractor, Comparator, Mux) using Graph Neural Networks. The project processes fully optimised verilog netlists, converts them to graph representations, and trains a GNN model for multilabel node classification.
Segregate/
├── data/ # Data directory
│ ├── graphs.pt # Processed graph data (generated)
│ ├── netlist/ # Original Verilog netlists
│ ├── netlist_partially_labelled/ # Partially labeled netlists
│ ├── graphs_partially_labelled/ # Intermediate graph data
│ └── reports/ # Cross-probing reports
├── train/ # Training module
│ ├── train.py # Main training script
│ ├── utils.py # Utility functions
│ ├── layers.py # GNN layer implementations
│ └── settings.yml # Training configuration
├── scripts/ # Processing scripts
│ ├── netlist_to_graph_re_multilabel.pl # Netlist to graph conversion
│ └── theCircuit.pm # Circuit processing module
├── checkpoints/ # Model checkpoints (generated)
├── complete_labelling_and_save_graph.py
├── rename_using_crossprobings.py
└── graph_parser_parallel.sh
- Ubuntu 24.04
- Conda or Miniconda
- GNU Parallel (for parallel netlist processing)
- CUDA-compatible GPU (optional, for faster training)
conda create -n segregate python=3.9.23
conda activate segregate# Install PyTorch & PyTorch-Geometric
pip install torch==2.7.1 torch-geometric==2.6.1
# Install other required packages
pip install PyYAML==6.0.3 scikit-learn==1.6.1 scipy==1.13.1 networkx==3.2.1 numpy==1.23.0 matplotlib==3.9.4
Install GNU Parallel for parallel netlist processing:
sudo apt update
sudo apt install parallelDownload the dataset from the following link: https://drive.google.com/file/d/1oPA04XU9hf3NjbU7Fup_anTrvr-KcnDQ/view?usp=sharing
Extract the downloaded data archive in the project root directory:
unzip data.zipExecute the rename script to process cross-probing reports and generate partially labeled netlists:
python rename_using_crossprobings.py --netlist_dir ./data/netlist --report_dir ./data/reportsThis script will:
- Process cross-probing reports from the
data/reports/directory - Generate partially labeled netlists in
data/netlist_partially_labelled/
Before running the parser, configure the appropriate number of parallel jobs based on your available CPU cores. Edit the graph_parser_parallel.sh file and set MAX_JOBS according to your system (default is 8, recommended up to 40 for high-core systems):
Important: Update the hardcoded path in the Perl script. Edit scripts/netlist_to_graph_re_multilabel.pl and change line 6:
require "/workspace/ckarfa/innocent/Segregate/scripts/theCircuit.pm"to your actual project path:
require "/your/actual/path/to/Segregate/scripts/theCircuit.pm"Note: Ensure GNU Parallel is installed (see Step 0 for installation instructions).
Make the graph parser executable and run it:
chmod +x graph_parser_parallel.sh
./graph_parser_parallel.shThis script will:
- Process Verilog netlists using the Perl script
scripts/netlist_to_graph_re_multilabel.pl - Convert netlists to graph representations
- Create intermediate graph files with features, adjacency matrices, and labels
- Generate corresponding graph data in
data/graphs_partially_labelled/
Run the complete labelling script to finalize the graph data:
python complete_labelling_and_save_graph.pyThis script will:
- Process all intermediate graph files
- Complete the labelling process
- Create the final
data/graphs.ptfile containing all processed graphs
python -m train.train --epochs 500 --batch_size 2 --device cuda--epochs: Number of training epochs (default: 100)--batch_size: Training batch size (default: 2)--device: Device to use (cuda,cpu)--dir_saver: Directory to save model checkpoints (default: "checkpoints")
The GNN model consists of:
- Attention-based Graph Convolution Layers: Adapted from GraphSAINT
- Sequential Concatenation: Combines outputs from multiple graph layers (JK-concat-style)
- MLP Classifier: 4-layer fully connected network:
- Input: 2048 dimensions (concatenated graph features)
- Hidden layers: 1024 → 512 → 256
- Output: 5 classes (multilabel classification)
Training parameters can be configured in train/settings.yml:
train_params:
lr: 0.01 # Learning rate
weight_decay: 0.0 # L2 regularization
dropout: 0.1 # Dropout rate
n_classes: 5 # Number of output classes
arch_gcn:
dim: 256 # Hidden dimension
aggr: concat # Aggregation method
loss: sigmoid # Loss function type
arch: "1-1-1-1" # GNN convolution layers
act: relu # Activation function
bias: norm # Bias type
attention: 4 # Number of attention headsTrained models are saved in the checkpoints/ directory:
conv_layers_best_model.pth: Graph convolution layersclassifier_best_model.pth: MLP classifier layers
- Use GPU for faster training:
--device cuda - Adjust batch size based on available memory