Skip to content

innocentmchry/Segregate

Repository files navigation

Segregate

A Graph Neural Network (GNN) framework for reverse engineering architecturally diverse and fully optimised gate level netlist with functional module classification of gates.

Overview

Segregate is designed to analyze optimised netlists and classify gates into different categories (Adder, Multiplier, Subtractor, Comparator, Mux) using Graph Neural Networks. The project processes fully optimised verilog netlists, converts them to graph representations, and trains a GNN model for multilabel node classification.

Project Structure

Segregate/
├── data/                                   # Data directory
│   ├── graphs.pt                           # Processed graph data (generated)
│   ├── netlist/                            # Original Verilog netlists
│   ├── netlist_partially_labelled/         # Partially labeled netlists
│   ├── graphs_partially_labelled/          # Intermediate graph data
│   └── reports/                            # Cross-probing reports
├── train/                                  # Training module
│   ├── train.py                            # Main training script
│   ├── utils.py                            # Utility functions
│   ├── layers.py                           # GNN layer implementations
│   └── settings.yml                        # Training configuration
├── scripts/                                # Processing scripts
│   ├── netlist_to_graph_re_multilabel.pl   # Netlist to graph conversion
│   └── theCircuit.pm                       # Circuit processing module
├── checkpoints/                            # Model checkpoints (generated)
├── complete_labelling_and_save_graph.py
├── rename_using_crossprobings.py
└── graph_parser_parallel.sh

Prerequisites

System Requirements

  • Ubuntu 24.04
  • Conda or Miniconda
  • GNU Parallel (for parallel netlist processing)
  • CUDA-compatible GPU (optional, for faster training)

Setup and Usage

Step 0: Environment Setup

Create Conda Environment

conda create -n segregate python=3.9.23
conda activate segregate

Install Dependencies

# Install PyTorch & PyTorch-Geometric
pip install torch==2.7.1 torch-geometric==2.6.1

# Install other required packages
pip install PyYAML==6.0.3 scikit-learn==1.6.1 scipy==1.13.1 networkx==3.2.1 numpy==1.23.0 matplotlib==3.9.4

Install System Dependencies

Install GNU Parallel for parallel netlist processing:

sudo apt update
sudo apt install parallel

Step 1: Download & Extract Data

Download the dataset from the following link: https://drive.google.com/file/d/1oPA04XU9hf3NjbU7Fup_anTrvr-KcnDQ/view?usp=sharing

Extract the downloaded data archive in the project root directory:

unzip data.zip

Step 2: Process Cross-probing Reports

Execute the rename script to process cross-probing reports and generate partially labeled netlists:

python rename_using_crossprobings.py --netlist_dir ./data/netlist --report_dir ./data/reports

This script will:

  • Process cross-probing reports from the data/reports/ directory
  • Generate partially labeled netlists in data/netlist_partially_labelled/

Step 3: Parse Netlists to Graphs

Before running the parser, configure the appropriate number of parallel jobs based on your available CPU cores. Edit the graph_parser_parallel.sh file and set MAX_JOBS according to your system (default is 8, recommended up to 40 for high-core systems):

Important: Update the hardcoded path in the Perl script. Edit scripts/netlist_to_graph_re_multilabel.pl and change line 6:

require "/workspace/ckarfa/innocent/Segregate/scripts/theCircuit.pm"

to your actual project path:

require "/your/actual/path/to/Segregate/scripts/theCircuit.pm"

Note: Ensure GNU Parallel is installed (see Step 0 for installation instructions).

Make the graph parser executable and run it:

chmod +x graph_parser_parallel.sh
./graph_parser_parallel.sh

This script will:

  • Process Verilog netlists using the Perl script scripts/netlist_to_graph_re_multilabel.pl
  • Convert netlists to graph representations
  • Create intermediate graph files with features, adjacency matrices, and labels
  • Generate corresponding graph data in data/graphs_partially_labelled/

Step 4: Complete Graph Labelling

Run the complete labelling script to finalize the graph data:

python complete_labelling_and_save_graph.py

This script will:

  • Process all intermediate graph files
  • Complete the labelling process
  • Create the final data/graphs.pt file containing all processed graphs

Training

python -m train.train --epochs 500 --batch_size 2 --device cuda

Available Training Options

  • --epochs: Number of training epochs (default: 100)
  • --batch_size: Training batch size (default: 2)
  • --device: Device to use (cuda, cpu)
  • --dir_saver: Directory to save model checkpoints (default: "checkpoints")

Model Architecture

The GNN model consists of:

  1. Attention-based Graph Convolution Layers: Adapted from GraphSAINT
  2. Sequential Concatenation: Combines outputs from multiple graph layers (JK-concat-style)
  3. MLP Classifier: 4-layer fully connected network:
    • Input: 2048 dimensions (concatenated graph features)
    • Hidden layers: 1024 → 512 → 256
    • Output: 5 classes (multilabel classification)

Configuration

Training parameters can be configured in train/settings.yml:

train_params:
  lr: 0.01                    # Learning rate
  weight_decay: 0.0           # L2 regularization
  dropout: 0.1                # Dropout rate
  n_classes: 5                # Number of output classes

arch_gcn:
  dim: 256                    # Hidden dimension
  aggr: concat                # Aggregation method
  loss: sigmoid               # Loss function type
  arch: "1-1-1-1"             # GNN convolution layers
  act: relu                   # Activation function
  bias: norm                  # Bias type
  attention: 4                # Number of attention heads

Model Checkpoints

Trained models are saved in the checkpoints/ directory:

  • conv_layers_best_model.pth: Graph convolution layers
  • classifier_best_model.pth: MLP classifier layers

Performance Tips

  • Use GPU for faster training: --device cuda
  • Adjust batch size based on available memory

License

Citation

About

Scalable and Expressive GNN for reverse engineering of Gate Level Netlist

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors