- Implementing Honeypot, Honeynets, Simple Malware Classifier, RSA algorithm and Learning Python's Socket library
- Malware Classification using ML
The project focuses on the classification of malware and legitimate software using machine learning techniques. The dataset used for this project is the "MalwareData.csv" file. The project follows the following key steps:
Data Preparation: The dataset is loaded into a Pandas DataFrame. The dataset is divided into two subsets: "legit" containing legitimate software samples and "mal" containing malware samples.
Exploratory Data Analysis: The columns and initial rows of the dataset are examined to understand the structure and content of the data.
Feature Selection: An ExtraTreesClassifier model is trained on the dataset to determine the importance of each feature. SelectFromModel is used to select the most important features for improving accuracy.
Feature Importance: The feature importances are ranked, and the top features contributing to the classification are identified and printed.
Model Training: The dataset is split into training and testing sets using the train_test_split function. A RandomForestClassifier model is trained on the selected features.
Model Evaluation: The trained model's accuracy is evaluated by calculating its score on the test data.
Confusion Matrix Analysis: A confusion matrix is generated by comparing the predicted labels with the true labels from the test set. The matrix helps analyze the performance of the model by examining false positives and false negatives.
Gradient Boosting Classifier: Another machine learning model, GradientBoostingClassifier, is trained on the dataset, and its accuracy score is evaluated.
The project demonstrates the application of machine learning techniques for malware classification. By analyzing the importance of features and training classification models, it provides insights into distinguishing between legitimate software and malware samples. The accuracy scores of the RandomForestClassifier and GradientBoostingClassifier models provide a measure of their effectiveness. The project summary serves as a concise overview of the key steps and outcomes of the project.