An interactive web application for visualizing and understanding Support Vector Machine (SVM) algorithms through real-time parameter manipulation and multiple data input methods. This educational tool was developed as part of a dissertation research project to provide intuitive insights into machine learning concepts.
- Real-time Decision Boundaries: Visualize how SVM decision boundaries change with different parameters
- ROC Curves: Dynamic ROC curve generation with AUC scoring
- Confusion Matrix: Live confusion matrix updates for performance evaluation
- Contour Plots: Beautiful contour visualizations showing decision function values
- Kernel Selection: Choose from RBF, Linear, Polynomial, and Sigmoid kernels
- Cost Parameter (C): Interactive sliders for regularization strength (0.01 - 10,000)
- Gamma Parameter: Control kernel coefficient for RBF, Polynomial, and Sigmoid
- Degree Parameter: Polynomial kernel degree control (2-10)
- Threshold Adjustment: Real-time decision threshold modification
- Scikit-learn Datasets: Pre-loaded Moons, Circles, and Linearly Separable datasets
- File Upload: Support for CSV and Excel file uploads with column mapping
- Hand-drawn Data: Interactive canvas for drawing custom data points
- Responsive Design: Bootstrap-based responsive interface
- Real-time Updates: Instant visualization updates without page refresh
- Performance Timing: Built-in performance monitoring and display
- Professional UI: Clean, modern interface with intuitive controls
- Python 3.9 or higher
- pip package manager
-
Clone the repository
git clone https://github.com/your-username/dissertation.git cd dissertation -
Install dependencies
pip install -r requirements.txt
-
Run the application
python app.py
-
Open your browser Navigate to
http://localhost:8050to access the application
# Build the Docker image
docker build -t svm-visualization .
# Run the container
docker run -p 8050:8050 svm-visualizationdissertation/
βββ app.py # Main Dash application
βββ app.ipynb # Jupyter notebook version with detailed comments
βββ requirements.txt # Python dependencies
βββ Procfile # Heroku deployment configuration
βββ LICENSE # MIT license
βββ README.md # This file
βββ Dissertation_Report.pdf # Academic research report
βββ assets/ # Static assets
β βββ logo.png # Application logo
β βββ favicon.ico # Browser favicon
β βββ custom.css # Custom styling
β βββ canvas_bg.png # Canvas background
βββ utils/ # Utility modules
βββ charting.py # Plotting and visualization functions
βββ handle_func.py # File upload and data handling
βββ modeling.py # SVM model implementation
βββ sampling.py # Dataset generation and splitting
βββ split_components.py # UI component definitions
The application follows a modular architecture with clear separation of concerns:
- Dash Framework: Interactive web interface with real-time updates
- Bootstrap Components: Responsive design and modern UI elements
- Plotly Graphics: High-quality interactive visualizations
- Multiple Input Handlers: Support for various data sources
- Data Validation: Ensures data integrity and format compliance
- Train/Test Splitting: Automated dataset partitioning
- Scikit-learn Integration: Robust SVM implementation
- Parameter Optimization: Real-time model retraining
- Performance Metrics: Comprehensive evaluation tools
- Gunicorn WSGI: Production-ready web server
- Heroku Integration: Cloud deployment configuration
- Static Asset Management: Optimized resource delivery
- Click "SELECT DATA" button
- Choose "Scikit-learn Datasets" tab
- Select dataset type (Moons, Circles, or Linearly Separable)
- Adjust sample size (100-500) and noise level (0-1)
- Set test size ratio (0.1-0.5)
- Click "SAVE" to generate data
- Click "SELECT DATA" button
- Choose "Upload Data" tab
- Drag and drop or select your CSV/Excel file
- Map columns to X, Y, and class variables
- Set test size ratio
- Click "SAVE" to process data
- Click "SELECT DATA" button
- Choose "Hand Drawn Datapoints" tab
- Draw points on the canvas (use Toggle to switch classes)
- Set test size ratio
- Click "SAVE" to use drawn data
- RBF (Radial Basis Function): Good for non-linear, complex boundaries
- Linear: Best for linearly separable data
- Polynomial: Effective for polynomial decision boundaries
- Sigmoid: Neural network-like behavior
- Low C (0.01-1): More tolerant to misclassification, smoother boundary
- High C (100-10000): Less tolerant to misclassification, complex boundary
- Low Gamma (0.00001-0.1): Far-reaching influence, smoother boundary
- High Gamma (1-100): Close influence, more complex boundary
- Use the threshold knob to adjust decision boundary
- Click "RESET THRESHOLD" to auto-calculate optimal threshold
- Training Data: Circles showing training points with accuracy
- Test Data: Triangles showing test points with accuracy
- Decision Boundary: Black line separating classes
- Contour Colors: Background showing decision function values
- AUC Score: Area Under Curve indicating model performance
- Diagonal Line: Random classifier baseline
- Curve Shape: Higher curves indicate better performance
- True Positive/Negative: Correct predictions
- False Positive/Negative: Incorrect predictions
- Color Intensity: Darker colors indicate higher values
This application serves as an educational tool for understanding:
- Support Vector Machines: Visual understanding of SVM decision boundaries
- Kernel Methods: Interactive exploration of different kernel functions
- Bias-Variance Tradeoff: Observe overfitting/underfitting through parameter changes
- Model Evaluation: Real-time performance metrics and visualization
- Data Preprocessing: Experience with data loading and cleaning
- Feature Engineering: Understanding of 2D feature spaces
- Model Selection: Hands-on parameter tuning experience
- Performance Evaluation: Comprehensive metrics interpretation
- Interactive Dashboards: Modern web application development
- Real-time Computing: Live data processing and visualization
- Modular Design: Clean code architecture and separation of concerns
# Install dependencies
pip install -r requirements.txt
# Run development server
python app.py
# Access at http://localhost:8050# Login to Heroku
heroku login
# Create new Heroku app
heroku create your-app-name
# Deploy to Heroku
git push heroku main
# Open deployed app
heroku openSet the following environment variables for production:
PORT: Application port (default: 8050)DEBUG: Debug mode (set to False for production)
Trains SVM model with specified parameters.
- Parameters: cost, kernel, degree, gamma, data
- Returns: Trained model, decision function, mesh grids
Generates prediction visualization with decision boundaries.
- Parameters: data, model, threshold
- Returns: Plotly figure object
Creates ROC curve with AUC score.
- Parameters: data, model
- Returns: Plotly figure object
Generates confusion matrix heatmap.
- Parameters: data, model, threshold
- Returns: Plotly figure object
Parses uploaded CSV/Excel files.
- Parameters: file contents, filename, header flag, columns
- Returns: Pandas DataFrame or column names
Processes hand-drawn canvas data.
- Parameters: JSON canvas data
- Returns: Feature matrix X, labels y
We welcome contributions to improve this educational tool! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests if applicable
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8 style guidelines
- Add docstrings to new functions
- Include tests for new features
- Update documentation as needed
- Ensure backwards compatibility
- Additional Algorithms: Implement other ML algorithms (Decision Trees, Neural Networks)
- 3D Visualizations: Extend to 3D feature spaces
- Data Export: Add functionality to export results
- Mobile Optimization: Improve mobile user experience
- Performance: Optimize for larger datasets
This project was developed as part of a dissertation research focusing on:
- Visualization Effectiveness: Studying the impact of interactive visualizations on ML education
- User Experience: Analyzing how UI design affects learning outcomes
- Parameter Understanding: Investigating intuitive methods for teaching hyperparameter effects
- User Studies: Conducted with computer science students
- A/B Testing: Compared with traditional teaching methods
- Performance Metrics: Measured learning outcomes and engagement
- Interactive visualizations significantly improve parameter understanding
- Real-time feedback enhances learning retention
- Multi-modal data input increases engagement
- Dissertation Report - Complete academic analysis and findings
This project is licensed under the MIT License - see the LICENSE file for details.
- β Commercial use allowed
- β Modification allowed
- β Distribution allowed
- β Private use allowed
- β License and copyright notice required
- β No warranty provided
- Dash Framework: For the interactive web interface
- Plotly: For beautiful, interactive visualizations
- Scikit-learn: For robust machine learning algorithms
- Bootstrap: For responsive design components
- Research supervisors and academic advisors
- Computer science department resources
- Student participants in user studies
- Open source contributors and maintainers
- Stack Overflow community for troubleshooting
- Dash and Plotly communities for technical guidance
Pranav Pai
- Email: paipranav01@gmail.com
- LinkedIn: https://www.linkedin.com/in/pranav-pai/
- Bug Reports: GitHub Issues
- Feature Requests: GitHub Discussions
- Academic Inquiries: Contact via university email
If you use this work in your research, please cite:
@mastersthesis{yourname2024,
title={Visual Explanation of Support Vector Machines: An Interactive Educational Tool},
author={Your Name},
year={2024},
school={Your University},
type={Master's Thesis}
}Made with β€οΈ for machine learning education
β Star this repo | π Read the docs | π Report bug | π‘ Request feature