Unsupervised Manifold Learning for High-Dimensional Network Signatures

Overview

This repository implements a Topology-Preserving Manifold Discovery framework for high-dimensional, sparse telecom signal spaces. By leveraging Self-Organizing Maps (SOM), the project projects sparse DPI (Deep Packet Inspection) bitmaps and behavioral metrics into a low-dimensional latent space to identify structural invariants and perform non-convex quantization error minimization.

The objective is to map complex subscriber traffic signatures to optimal Resource Profiles (Service Classes) while maintaining the topological relationships of the input manifold.

Mathematical Formulation

1. Manifold Projection (SOM Dynamics)

The framework utilizes a competitive learning process to minimize the quantization error between the input vector $x$ and the weight vectors $w$:

Best Matching Unit (BMU) Selection: $$c(t) = \arg\min_j | x(t) - w_j(t) |_2$$
Weight Adaptation: $$w_j(t + 1) = w_j(t) + \alpha(t) \cdot h_{c,j}(r(t)) \cdot ( x(t) - w_j(t) )$$
Neighborhood Decay (Gaussian): $$h_{c,j}(r) = \exp\left( -\frac{r^2}{2\sigma(t)^2} \right)$$

2. Feature Sparsification Pipeline

To handle high-dimensional categorical metadata (DPI policies, content-types), we implement a Bitmap Encoding pipeline that transforms discrete logs into sparse binary vectors, preserving feature cardinality while enabling Euclidean distance metrics in the latent space.

Comparison of Dimensionality Reduction Techniques

Method	Complexity	Topology Preservation	Scalability
SOM (O(n · grid))	Non-linear Manifolds	High (Local & Global)	Batch-Friendly
PCA	Linear Projection	Low (Linear Only)	O(d²n)
t-SNE	Non-linear	High (Local Only)	O(n²) - Memory Intensive

Project Structure

├── models2/ # Serialized Model Artifacts (Joblib)

├── results/ # U-Matrix and Quantization Error Diagnostics

├── Dockerfile # Reproducible Research Environment

├── notebook.ipynb # Derivations, Training, and Fidelity Analysis

├── batch_projection_pipeline.py # Batch Manifold Projection Pipeline

└── requirements.txt # System Dependencies

Implementation Highlights

Scalable Pipeline: Designed to process 106+ high-dimensional records via batch-processing with periodic SQLite commits.
Structural Fidelity: Includes automated computation of Quantization Error and Topographic Error to validate manifold stability.
Reproducibility: Fully Dockerized environment to ensure consistent execution across heterogeneous computational nodes.

Usage

Build the research environment:

docker build -t net-manifold-som .

Run batch inference on network signatures:

python batch_projection_pipeline.py

Model Artifacts & Manifold Geometry

The system relies on pre-trained topological weights and scalar parameters stored in the models2/ directory. These artifacts define the manifold space used to represent and analyze network signatures.

Required Artifacts

File Name	Description
`som_model.joblib`	Trained MiniSom object containing the learned topological grid
`som_weights1.npy`	Raw NumPy weight matrix for high-speed, vectorized manifold projection
`centroid_feature_map.joblib`	Mapping of SOM nodes to latent regime assignments
`scaler.joblib`	`StandardScaler` parameters used for signal normalization
`medians.joblib`	Feature-wise medians for robust imputation of missing telemetry
`policy_to_bit.joblib`	Encoding dictionary for bitmap transformation of `DpiPolicy`
`numeric_cols.joblib`	Definitive feature ordering to ensure consistent manifold projection

Key Research Artifacts

Key Research Artifacts: Manifold Interpretation

The following visualizations illustrate the topological preservation and feature-space correlation discovered by the SOM:

1. Global Weight Distribution

The latent manifold reveals three distinct regime clusters (1, 2, 3) representing different classes of network traffic signatures.

2. Feature Component Planes (C-Planes)

By decomposing the manifold into individual feature planes (IpProtocol, Bytes from Client, etc.), we observe the non-linear correlations that drive cluster formation. This is critical for understanding how specific system constraints (e.g., bandwidth vs. protocol type) influence the latent representation.

3. Multivariate Latent Projections

Three-feature combination visualizations showing the interaction between high-dimensional features across the 2D lattice, confirming the stability of the topology-preserving mapping.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Manifold Learning for High-Dimensional Network Signatures

Overview

Mathematical Formulation

1. Manifold Projection (SOM Dynamics)

2. Feature Sparsification Pipeline

Comparison of Dimensionality Reduction Techniques

Project Structure

Model Artifacts & Manifold Geometry

Required Artifacts

Key Research Artifacts: Manifold Interpretation

1. Global Weight Distribution

2. Feature Component Planes (C-Planes)

3. Multivariate Latent Projections

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
results		results
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
batch_projection_pipeline.py		batch_projection_pipeline.py
inference_service.py		inference_service.py
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Manifold Learning for High-Dimensional Network Signatures

Overview

Mathematical Formulation

1. Manifold Projection (SOM Dynamics)

2. Feature Sparsification Pipeline

Comparison of Dimensionality Reduction Techniques

Project Structure

Model Artifacts & Manifold Geometry

Required Artifacts

Key Research Artifacts: Manifold Interpretation

1. Global Weight Distribution

2. Feature Component Planes (C-Planes)

3. Multivariate Latent Projections

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages