CodeBug Analyzer is built with a modular architecture that separates concerns and allows for extensibility. The application follows the MVC (Model-View-Controller) pattern with Flask as the web framework.
┌─────────────────┐ ┌───────────────┐ ┌─────────────────┐
│ │ │ │ │ │
│ Web Interface ├────────►│ Controllers ├────────►│ Models │
│ │ │ │ │ │
└────────┬────────┘ └───────┬───────┘ └────────┬────────┘
│ │ │
│ ▼ │
│ ┌───────────────┐ │
│ │ │ │
└─────────────────►│ Services │◄─────────────────┘
│ │
└───────┬───────┘
│
▼
┌──────────────┐
│ │
│ Analyzers │
│ │
└──────────────┘
The database structure is defined using SQLAlchemy models:
-
Repository: Stores information about analyzed repositories
- Fields: id, url, name, last_analyzed, status
- Relationships: scans (one-to-many)
-
Scan: Represents an analysis session of a repository
- Fields: id, repository_id, timestamp, total_files, analyzed_files, total_bugs
- Relationships: bugs (one-to-many), language_stats (one-to-many)
-
Bug: Stores details about identified bugs
- Fields: id, scan_id, file_path, line_number, bug_type, severity, description, code_snippet, recommendation, language
-
LanguageStats: Tracks statistics about language usage
- Fields: id, scan_id, language, file_count, line_count, bug_count
Services handle the business logic of the application:
-
repository.py: Manages repository operations (cloning, cleaning up)
- Functions: clone_repository, cleanup_repository, get_repository_name, list_files
-
analyzer.py: Coordinates the analysis process
- Functions: analyze_repository
-
language_detector.py: Identifies programming languages
- Functions: detect_language, count_lines, analyze_language_stats
-
report_generator.py: Creates summary reports
- Functions: generate_report
-
individual_report_generator.py: Generates detailed reports for individual bugs
- Functions: generate_individual_bug_reports, generate_bug_report, ensure_repo_directory
Language-specific analyzers implement bug detection logic:
-
common_analyzer.py: Checks common issues across all languages
- Functions: analyze_common_issues, check_file_size, get_code_snippet
-
python_analyzer.py: Python-specific bug detection
- Classes: PythonAstVisitor (extends ast.NodeVisitor)
- Functions: analyze_python_file
-
javascript_analyzer.py: JavaScript/TypeScript bugs detection
- Functions: analyze_javascript_file, check_for_strict_equality
-
go_analyzer.py: Go language bug detection
- Functions: analyze_go_file
┌────────────────┐ ┌────────────────┐ ┌────────────────┐
│ Repository │ │ Scan │ │ Bug │
├────────────────┤ ├────────────────┤ ├────────────────┤
│ id │ │ id │ │ id │
│ url │ │ repository_id ├───┐ │ scan_id ├───┐
│ name │ │ timestamp │ │ │ file_path │ │
│ last_analyzed │◄──────┤ total_files │ │ │ line_number │ │
│ status │ │ analyzed_files │ │ │ bug_type │ │
└────────────────┘ │ total_bugs │ │ │ severity │ │
└────────────────┘ │ │ description │ │
│ │ code_snippet │ │
│ │ recommendation │ │
│ │ language │ │
│ └────────────────┘ │
│ │
│ ┌────────────────┐ │
│ │ LanguageStats │ │
│ ├────────────────┤ │
│ │ id │ │
└───┤ scan_id │ │
│ language │ │
│ file_count │ │
│ line_count │ │
│ bug_count │◄──┘
└────────────────┘
-
User Submits Repository URL
- Form submission handled by routes.py (analyze function)
- URL validation and repository creation in database
-
Repository Cloning
- services/repository.py clones the GitHub repository
- Clone is saved to a temporary directory
-
Code Analysis
- services/analyzer.py coordinates the analysis process
- Language detection for each file
- Language-specific analyzers process files
- Bugs are identified and stored in the database
-
Report Generation
- Summary report is generated with statistics
- Charts and visualizations are created for the web interface
- Individual bug reports can be generated on demand
-
Results Display
- Web interface shows comprehensive report
- Users can browse bugs, filter by severity
- Report includes charts for visualization
The application generates detailed reports for each identified bug in JSON format. These reports include:
{
"general_info": {
"vulnerability_title": "Bug type in filename",
"description": "Detailed description of the bug",
"target": {
"repository": "Repository name",
"repository_url": "GitHub URL",
"file_path": "Path to affected file",
"line_number": 42,
"language": "Programming language"
},
"vulnerability_category": "Bug type",
"timestamp": "ISO timestamp"
},
"severity": {
"level": "critical|high|medium|low|info",
"score": "CVSS-like score range",
"vector": "CVSS vector string"
},
"details": {
"description": "Detailed explanation",
"code_snippet": "Problematic code",
"recommendation": "How to fix the issue"
},
"validation": {
"steps": [
"Step 1: Navigate to file",
"Step 2: Go to line number",
"Step 3: Observe the problem"
]
}
}The application defines the following routes:
- / (index): Home page with repository submission form
- /analyze (POST): Handles repository analysis request
- /results/<scan_id>: Displays analysis results
- /scan/<scan_id>/generate-reports: Generates individual bug reports
- /scan/<scan_id>/reports: Displays the list of generated reports
- /results/: Serves individual report files
- /api/scans: JSON API for scan listing
- /api/scan/<scan_id>/bugs: JSON API for bugs in a scan
- Repository URLs are validated before cloning
- Only GitHub repositories are supported to prevent potential security issues
- Temporary files are cleaned up after analysis
- User sessions are secured with a secret key
- Database connections use connection pooling and reconnection
- Large repositories are analyzed file by file to manage memory usage
- Database queries use pagination for bug listing
- Report generation is done on-demand for individual bug reports
- Images and assets are cached by the browser
The application can be deployed using various methods:
- Clone the repository
- Install dependencies:
pip install -r requirements.txt - Set environment variables
- Run with gunicorn:
gunicorn --bind 0.0.0.0:5000 main:app
- Build the Docker image:
docker build -t codebug-analyzer . - Run the container:
docker run -p 5000:5000 -e DATABASE_URL=<your-db-url> codebug-analyzer
The application is compatible with platforms like Heroku:
# Procfile
web: gunicorn main:app
- Create a new analyzer in the
analyzersdirectory (e.g.,rust_analyzer.py) - Implement the language-specific bug detection logic
- Update
services/language_detector.pyto recognize the new language - Integrate the new analyzer in
services/analyzer.py
- Identify the appropriate analyzer for the language
- Add new detection logic as functions in the analyzer
- Update the analyzer's main function to call the new detection logic
- Add test cases to validate the new detection rules
| Variable | Description | Default |
|---|---|---|
| DATABASE_URL | Database connection URL | sqlite:///code_analyzer.db |
| SESSION_SECRET | Secret key for Flask sessions | Random value |
| DEBUG | Enable/disable debug mode | True |
| REPO_TEMP_DIR | Directory for temporary repository clones | temp_repos/ |
- Python 3.11+
- Flask
- Flask-SQLAlchemy
- GitPython
- gunicorn (for production)
- psycopg2-binary (for PostgreSQL support)
-
Repository cloning fails
- Ensure the GitHub repository is public
- Check internet connectivity
- Verify Git is installed and in the PATH
-
Analysis takes too long
- Large repositories may require more time
- Consider limiting analysis to specific directories
- Ensure sufficient system resources
-
Database errors
- Check DATABASE_URL environment variable
- Ensure database server is running
- Verify database user has proper permissions