IoT Identification

Project Overview

The aim of this project is to develop a machine learning model to identify an IoT device based on DNS logs from a Wi-Fi access point.

The repository proposes 2 mathematically equivalent Random Forest classifiers, achieving an accuracy of 97%. The first proposal is multi class random forest classifier, whereas the second implementation is an array of binary random forest classifiers. The purpose of the second model is to simplify adding classes to the model without retrainining the entire model.

Installation

Prerequisites

Docker and Docker Compose
(Optional) VS Code + Dev Containers extension

Clone the repo

git clone https://github.com/SafeNetIoT/iot_identification.git
cd iot_identification

Start the dev environment

docker compose up --build

Runs the same container used in production and CI.
Your code is mounted into /app, so changes persist.

VS Code Users

Using VS Code Dev Containers gives you a fully pre-configured, reproducible development environment — with automatic Python setup, debugging, and dependency management — without installing anything locally.\

Install the Dev Containers extension.
Open the repo in VS Code.
Click “Reopen in Container”.

Quickstart

This project provides two types of ML models:

Binary Model – an array of independent binary classifiers (one per device)
Multiclass Model – a single classifier with one class per device

Architecturally, each individual model is the same (Random Forest).
The difference lies in how models are organized.

The Binary Model array exists to make it easy to add new devices without retraining everything from scratch.
Although it consists of multiple models, we refer to the structure simply as the Binary Model.

Because the Binary Model is the default and most commonly used, the rest of this documentation focuses on that architecture.

Dataset Structure

After running the setup steps, a data/raw/ directory should exist, containing the device-specific data.

Each device must have its own subdirectory, named after the desired class label, and containing one or more .pcap files.
Intermediate directories (e.g., by date) are optional, because the program recursively searches for .pcap files.

Example structure:

data/raw/
    device_A/
        2024-01-01/session1.pcap
        2024-01-02/session2.pcap
    device_B/
        capture1.pcap
        capture2.pcap

Important: Each capture session must follow the “on/off” experimental structure described in
https://inria.hal.science/hal-04777603v1/document
and sessions must be kept isolated.

Training the Binary Model Array (Slow Pipeline)

The BinaryModel class supports end-to-end training of all binary Random Forest models:

from src.ml.binary_model import BinaryModel

manager = BinaryModel()
manager.slow_train()

This trains one model per device and stores them inside a uniquely generated output directory within models/.

To customize the output directory:

manager = BinaryModel(output_directory="your/directory")
manager.slow_train()

Training also writes evaluation metrics to z_evaluation.json.

Adding a Device to the Binary Model Array (Fast Pipeline)

A new device can be incorporated into the Binary Model without retraining every model.

Before adding a device:

Load the existing model array
Set the preprocessing cache
Call add_device()

Example:

from src.ml.binary_model import BinaryModel

manager = BinaryModel(
    output_directory="models/2025-11-27/binary_model",
    loading_dir="models/2025-11-27/binary_model"
)

manager.load_model()
manager.set_cache()
manager.add_device("alexa_swan_kettle2", "data/raw/alexa_swan_kettle/")

This:

Extracts features for the new device
Trains a new binary classifier
Saves the new pickle file
Updates z_evaluation.json

Model Under Test

To conveniently use a specific trained model, set:

model_under_test = "path/to/best/model"

in config.py.

This reduces boilerplate during prediction.

Prediction Using the Binary Model Array

To load a previously trained model and run predictions on a .pcap file:

from src.ml.binary_model import BinaryModel

manager = BinaryModel(loading_dir="models/2025-11-27/binary_model")
manager.predict("your/data.pcap")

To predict directly from scapy packets:

from scapy.all import sniff
from src.ml.binary_model import BinaryModel
from config import settings

packets = sniff(iface="eth0", timeout=180)
model = BinaryModel(loading_dir=settings.model_under_test)

device = model.predict(packets)
if device is not None:
    print("Detected device:", device)

Limitations and Further Research

Potential overfitting in certain cases.
Data drift
Model degredation with new classes (binary model)

Name		Name	Last commit message	Last commit date
Latest commit History 334 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
docs		docs
scripts		scripts
src		src
tests		tests
.deploy_exclude.txt		.deploy_exclude.txt
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile.dev		Dockerfile.dev
LICENSE		LICENSE
README.md		README.md
config.py		config.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IoT Identification

Table of Contents

Project Overview

Installation

Prerequisites

Clone the repo

Start the dev environment

VS Code Users

Quickstart

Dataset Structure

Training the Binary Model Array (Slow Pipeline)

Adding a Device to the Binary Model Array (Fast Pipeline)

Model Under Test

Prediction Using the Binary Model Array

Limitations and Further Research

About

Uh oh!

Releases 9

Packages

Contributors 2

Uh oh!

Languages

License

SafeNetIoT/iot_identification

Folders and files

Latest commit

History

Repository files navigation

IoT Identification

Table of Contents

Project Overview

Installation

Prerequisites

Clone the repo

Start the dev environment

VS Code Users

Quickstart

Dataset Structure

Training the Binary Model Array (Slow Pipeline)

Adding a Device to the Binary Model Array (Fast Pipeline)

Model Under Test

Prediction Using the Binary Model Array

Limitations and Further Research

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 2

Uh oh!

Languages

Packages