A Case Study on CWE-457 (Use of Uninitialized Variable)
Presentation discussing approach, methodology, and results can be found here.
⚠️ *Note:*⚠️
This repository documents Phase 1 of an ongoing multi-stage research effort, which is expected to form the foundation of my future dissertation work. Due to the academic and proprietary nature of this research, the full source code and FSM implementation are not publicly available at this time. However, the repository includes the associated research paper to outline the methodology, experimental design, and key findings. The primary purpose of this repository is to share the scope of work, technical direction, and contributions to the field of binary vulnerability detection. Additional materials (e.g., visualizations, presentation slides, and selected data samples) may be added in the future, depending on publication timelines and review clearance.
This project explores the feasibility of using a Finite State Machine (FSM) approach to detect vulnerabilities directly from disassembled binary code. Specifically, it targets the detection of CWE-457: Use of Uninitialized Variable without relying on source code availability.
The workflow includes:
- Disassembly normalization and abstraction
- Tokenization of critical memory operations
- FSM simulation of variable initialization and usage
- Full dataset evaluation using precision, recall, and F1 metrics
This approach demonstrates how binary-level static analysis can emulate compiler or source-aware vulnerability detection in a purely compiled environment.
Note: This project assumes access to a compiled disassembly dataset from the Juliet Test Suite (CWE-457 subset).
However, a complete automated preparation pipeline is provided for future use and reproducibility.
- Clone the Juliet Test Suite Linux port by Alexander Richardson:
git clone https://github.com/arichardson/juliet-test-suite-c.git
cd juliet-test-suite-c- Build the test cases into ELF binaries:
python3 juliet.py -457 --generate --make -k --run | tee full_build.log- This compiles only CWE-457 from the Juliet C/C++ test cases (excluding Windows-specific ones).
python3 juliet.py --all --generate --make -k --run | tee full_build.log- This compiles all available Juliet C/C++ test cases (excluding Windows-specific ones).
- Run the provided
generate_dataset.pyscript:
python3 generate_dataset.pyThis script:
- Disassembles all binaries automatically using objdump
- Normalizes disassembly format
- Generate structured outputs:
- dataset.csv: Basic metadata table
- dataset.jsonl: Full samples with disassembly + tokenized form
- disasm/: Organized disassembly files by CWE and good/bad labels
This script prepares clean, reusable datasets to support both:
- FSM-based vulnerability analysis (this project)
- Future LLM or machine learning-based vulnerability classification
- Create and activate a Python virtual environment:
python3 -m venv my_venv
source my_venv/bin/activate- Install project dependencies:
pip install --upgrade pip
pip install -r "requirements.txt"The setup_env.sh script will:
- Create a virtual environment (if not already created)
- Install all required Python libraries listed in
requirements.txt
Python 3.9 or newer is recommended.
Key libraries:
- numpy
- pandas
- argparse
- glob
- os
- re
Full list is available in requirements.txt.
Once dependencies are installed:
Open and run the provided FSM_Binary_Analyzer.ipynb notebook for training and evaluation.
Alternatively, you can run the FSM evaluation script standalone via CLI for full dataset testing.
python3 -m fsm_binary_analyzer.tests.functional.test_fsm --mode fullThe project uses a lightweight dataset derived from the Juliet Test Suite.
This project is licensed under the Creative Commons BY-NC-ND 4.0 License.
You may view, share, or cite this work with attribution. Commercial use, redistribution, or modification is not permitted.