This project applies data mining methods to the UNSW-NB15 network intrusion detection dataset.
The main task is multiclass classification: predicting the traffic class of each network record.
Target variable:
attack_cat
Normal
Generic
Exploits
Fuzzers
DoS
Reconnaissance
Analysis
Backdoors
Shellcode
Worms- DO NOT commit raw dataset files.
- DO NOT commit processed dataset files.
- Each member should download the raw dataset locally and run the preprocessing notebook(01, and 02) to generate their own local processed files.
Dataset: UNSW-NB15
Required files: in /data/raw
UNSW-NB15_1.csv
UNSW-NB15_2.csv
UNSW-NB15_3.csv
UNSW-NB15_4.csv
NUSW-NB15_features.csv
nid-data-mining/
├── data/
│ ├── raw/ # original dataset files, not committed
│ ├── processed/ # generated train/test files, not committed
│ ├── raw/ # original dataset files, not committed, should have the required files of dataset.
│ ├── processed/ # generated train/test files, not committed,
│ └── outputs/ # small result tables
├── notebooks/ # Jupyter notebooks for analysis
├── reports/ # figures and report materials
├── README.md
├── requirements.txt
└── .gitignore
- Clone the repo
git clone <repo-url>
cd nid-data-mining- Download/place the original dataset files in
/data/raw
UNSW-NB15_1.csv
UNSW-NB15_2.csv
UNSW-NB15_3.csv
UNSW-NB15_4.csv
NUSW-NB15_features.csv- Create or activate your Python environment, set directory to the repo folder, then install dependencies. (From Anaconda Prompt or any you prefer)
pip install -r requirements.txt- Before starting work, update your local repo
git pull origin main- Create a new branch
git checkout -b your-branch-name- Run Jupyter Notebook from the project folder or from the
notebooks/folder.
cd ..\YOUR-DIRECTORY\nid-data-mining\notebooks
jupyter notebook- Run notebooks in order:
01_data_understanding.ipynb
02_baseline_preprocessing.ipynb --> This generates local processed data in data/processed/- After making changes:
- DO NOT commit raw dataset files.
- DO NOT commit processed dataset files.
git add FILE-NAME
git commit -m "Add class distribution analysis"
git push origin your-branch-name