This project is a Web Application Vulnerability Scanner that combines machine learning techniques with heuristic checks to identify potential security vulnerabilities in web applications. The tool crawls websites, analyzes responses, and detects common web vulnerabilities such as SQL injection, XSS, directory listings, and exposed configuration files.
- Machine Learning Detection: Uses a Logistic Regression model trained on TF-IDF features to identify potential vulnerabilities
- Heuristic Checks: Detects common issues like directory listings and exposed config files
- Web Crawling: Recursively crawls websites up to a specified depth
- Robots.txt Compliance: Respects website crawling policies
- Retry Mechanism: Handles temporary network issues with exponential backoff
- GUI Interface: User-friendly graphical interface with progress tracking
- Visual Reporting: Generates pie charts showing vulnerability severity distribution
- Remediation Guidance: Provides references and video tutorials for found vulnerabilities
- Algorithm: Logistic Regression with class balancing
- Feature Extraction: TF-IDF vectorization of request/response text
- Additional Features: SQL keywords and XSS payload counts
- Accuracy: Reported during model training (see console output)
- SQL Injection: Detects common SQL keywords in requests
- XSS: Identifies script tags and JavaScript URIs
- Directory Listing: Checks for "Index of /" in responses
- Exposed Configs: Looks for .env and web.config files
- Python 3.x
- Libraries:
- requests (with retry mechanism)
- BeautifulSoup (HTML parsing)
- scikit-learn (machine learning)
- matplotlib (visualization)
- tkinter (GUI)
- PIL (image handling)
-
Clone the repository:
git clone https://github.com/yourusername/web-vulnerability-scanner.git cd web-vulnerability-scanner -
Install the required dependencies:
pip install -r requirements.txt
(If requirements.txt isn't provided, install these packages manually:)
pip install requests beautifulsoup4 scikit-learn matplotlib pillow
-
Run the application:
python new2.py
- Enter the target URL in the input field
- Set crawling parameters:
- Max Depth: How many levels deep to crawl (default: 2)
- Max Pages: Maximum number of pages to scan (default: 100)
- Click "Start Scan"
- View results in the output panel, including:
- Detected vulnerabilities with severity levels
- Remediation advice
- Video tutorial links
- Visual severity distribution chart
(Include actual screenshots of the application in action here)
web-vulnerability-scanner/
├── new2.py # Main application file (enhanced version)
├── s.py # Alternative version
├── p.jpg # Background image for GUI
├── README.md # This file
└── requirements.txt # Python dependencies
- The current model is trained on a small synthetic dataset. For production use, it should be trained on real-world data.
- Only basic vulnerability types are detected. More sophisticated attacks may be missed.
- The crawler may not handle all website structures perfectly.
- JavaScript-heavy sites may not be fully analyzed.
- Expand training dataset with real-world examples
- Add more vulnerability types (CSRF, SSRF, etc.)
- Implement authenticated scanning
- Add command-line interface version
- Support for scanning REST APIs
- Export reports in multiple formats (PDF, HTML)
This tool should only be used on:
- Websites you own
- Websites where you have explicit permission to test
- Never use this tool for unauthorized security testing
Unauthorized scanning may be illegal in many jurisdictions.
Contributions are welcome! Please open an issue or submit a pull request for any:
- Bug fixes
- New features
- Documentation improvements
- Dataset enhancements
- OWASP for vulnerability references
- scikit-learn developers
- Python community for excellent libraries