Machine-learning & signature-based static malware scanner
Jackal is a light weight static malware scanner that offers dual layer threat detection, it offers:
-
File and Folder Scanning: Users can scan a single file or recursively scan a directory and all of its subfolders.
-
Static Analysis: Jackal never executes files it analyzes. Instead it inspects static features like metadata and byte patterns which allows it to quickly and safely scan files for threats without risking running a dangerous file or requiring a sandboxed environment
-
Machine Learning Detection: For Windows PE files, Jackal uses a trained machine learning model to identify malicious files.
-
Signature-Based Detection: Using YARA rules, Jackal can scan a wide variety of file types—including executables, documents, and scripts—for known malware signatures.
-
Threat Summary: After scanning Jackal provides a summary showing how many files were scanned, how many threats were detected by each engine, and the corresponding file paths
-
Modern GUI: GUI built in CustomTkinter that simplifies malware scanning by letting users select files or folders, choose between detection modes and view scan summaries in real-time.
Jackal utilizes a machine learning model that was trained off of a malware dataset containing features extracted from PE (Portable Executable) files. To improve accuarcy and to disregard dynamic features the top 20 most important static features were identified using feature importance analysis performed with a Random Forest classifier implemented with Scikit-learn.
From there 18 of the most important and staticly extractable features where selected. The model was retrained with only these features using Scikit-learn, then a feature extractor was developed to extract these features from unknown PE files, allowing the model to make predictions on new input at runtime. Since the model relies on static PE features, the ML scanner only supports Windows executable formats such as .exe, .dll, .sys, and .scr.
Jackal uses YARA rules for signature-based detection, the engine scans files against a set of anti-malware YARA rules from the YARA Forge repository. The rule set is designed to provide protection and flag many malware families such as:
- Remote Access Trojans (RATS)
- Backdoors
- Wipers
- Downloaders
- Trojans
- Information Stealers
- Credential Harvesters
- APT Toolkits
- Ransomware
- Miscellaneous Threats
Unlike the Machine learning model the YARA scanner can analyze a larger range of file types like documents, scripts, and executables.
Python 3.9 or newer
YARA binary
- Windows: https://github.com/VirusTotal/yara/releases or
choco install yara - Mac:
brew install yara(need homebrew) - Linux Ubuntu:
sudo apt install yara
Step by step
git clone https://github.com/Kayetan17/static-malware-detector.gitcd static-malware-detectorpip install -r requirements.txtpython main.py(run the gui)
This project is licensed under the GPLv3 License.
YARA rules used by this project were sourced from the YARA Forge project and fall under the GPLv3 license. As such, this project as a whole is also distributed under GPLv3.
See LICENSE for full terms.
-
PE Malware Dataset:
Malware Dataset https://www.kaggle.com/datasets/amauricio/pe-files-malwares/data -
Signature Scanning
YARA https://virustotal.github.io/yara/ -
YARA Rules:
YARA Forge repository https://github.com/YARAHQ/yara-forge -
ML Tools:
Scikit-learn https://scikit-learn.org/
Pandas https://pandas.pydata.org/ -
GUI:
GUI FrameWork CustomTkinter https://github.com/TomSchimansky/CustomTkinter
GUI Font Lemon Milk https://www.dafont.com/lemon-milk.font -
Static Feature Extraction
PEfile https://github.com/erocarrera/pefile
Kayetan Protas - kayetanp@gmail.com
Project Link: - https://github.com/Kayetan17/static-malware-detector.git