FSM-Based Vulnerability Detection in Tokenized Assembly:

A Case Study on CWE-457 (Use of Uninitialized Variable)
Presentation discussing approach, methodology, and results can be found here.

⚠️*Note:*⚠️
This repository documents Phase 1 of an ongoing multi-stage research effort, which is expected to form the foundation of my future dissertation work. Due to the academic and proprietary nature of this research, the full source code and FSM implementation are not publicly available at this time. However, the repository includes the associated research paper to outline the methodology, experimental design, and key findings. The primary purpose of this repository is to share the scope of work, technical direction, and contributions to the field of binary vulnerability detection. Additional materials (e.g., visualizations, presentation slides, and selected data samples) may be added in the future, depending on publication timelines and review clearance.

Project Overview

This project explores the feasibility of using a Finite State Machine (FSM) approach to detect vulnerabilities directly from disassembled binary code. Specifically, it targets the detection of CWE-457: Use of Uninitialized Variable without relying on source code availability.

The workflow includes:

Disassembly normalization and abstraction
Tokenization of critical memory operations
FSM simulation of variable initialization and usage
Full dataset evaluation using precision, recall, and F1 metrics

This approach demonstrates how binary-level static analysis can emulate compiler or source-aware vulnerability detection in a purely compiled environment.

Dataset Preparation

Note: This project assumes access to a compiled disassembly dataset from the Juliet Test Suite (CWE-457 subset).

However, a complete automated preparation pipeline is provided for future use and reproducibility.

Clone the Juliet Test Suite Linux port by Alexander Richardson:

git clone https://github.com/arichardson/juliet-test-suite-c.git
cd juliet-test-suite-c

Build the test cases into ELF binaries:

python3 juliet.py -457 --generate --make -k --run | tee full_build.log

This compiles only CWE-457 from the Juliet C/C++ test cases (excluding Windows-specific ones).

python3 juliet.py --all --generate --make -k --run | tee full_build.log

This compiles all available Juliet C/C++ test cases (excluding Windows-specific ones).

Run the provided generate_dataset.py script:

python3 generate_dataset.py

This script:

Disassembles all binaries automatically using objdump
Normalizes disassembly format
Generate structured outputs:
- dataset.csv: Basic metadata table
- dataset.jsonl: Full samples with disassembly + tokenized form
- disasm/: Organized disassembly files by CWE and good/bad labels

This script prepares clean, reusable datasets to support both:

FSM-based vulnerability analysis (this project)
Future LLM or machine learning-based vulnerability classification

Project Setup

Create and activate a Python virtual environment:

python3 -m venv my_venv
source my_venv/bin/activate

Install project dependencies:

pip install --upgrade pip
pip install -r "requirements.txt"

The setup_env.sh script will:

Create a virtual environment (if not already created)
Install all required Python libraries listed in requirements.txt

Requirements

Python 3.9 or newer is recommended.

Key libraries:

numpy
pandas
argparse
glob
os
re

Full list is available in requirements.txt.

Running the Project

Once dependencies are installed:
Open and run the provided FSM_Binary_Analyzer.ipynb notebook for training and evaluation.

Alternatively, you can run the FSM evaluation script standalone via CLI for full dataset testing.

python3 -m fsm_binary_analyzer.tests.functional.test_fsm --mode full

Notes

The project uses a lightweight dataset derived from the Juliet Test Suite.

Acknowledgments

📄 License

This project is licensed under the Creative Commons BY-NC-ND 4.0 License.
You may view, share, or cite this work with attribution. Commercial use, redistribution, or modification is not permitted.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
Marrion_Taylor_FSM_Binary_Analyzer_IEEE.pdf		Marrion_Taylor_FSM_Binary_Analyzer_IEEE.pdf
Marrion_Taylor_FSM_Binary_Analyzer_notebook.pdf		Marrion_Taylor_FSM_Binary_Analyzer_notebook.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FSM-Based Vulnerability Detection in Tokenized Assembly:

Project Overview

Dataset Preparation

Project Setup

Requirements

Running the Project

Notes

Acknowledgments

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

FSM-Based Vulnerability Detection in Tokenized Assembly:

Project Overview

Dataset Preparation

Project Setup

Requirements

Running the Project

Notes

Acknowledgments

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages