CodeBug Analyzer - Technical Documentation

Architecture Overview

CodeBug Analyzer is built with a modular architecture that separates concerns and allows for extensibility. The application follows the MVC (Model-View-Controller) pattern with Flask as the web framework.

High-Level Components

┌─────────────────┐         ┌───────────────┐         ┌─────────────────┐
│                 │         │               │         │                 │
│  Web Interface  ├────────►│  Controllers  ├────────►│     Models      │
│                 │         │               │         │                 │
└────────┬────────┘         └───────┬───────┘         └────────┬────────┘
         │                          │                          │
         │                          ▼                          │
         │                  ┌───────────────┐                  │
         │                  │               │                  │
         └─────────────────►│   Services    │◄─────────────────┘
                            │               │
                            └───────┬───────┘
                                    │
                                    ▼
                             ┌──────────────┐
                             │              │
                             │  Analyzers   │
                             │              │
                             └──────────────┘

Core Components

1. Models

The database structure is defined using SQLAlchemy models:

Repository: Stores information about analyzed repositories
- Fields: id, url, name, last_analyzed, status
- Relationships: scans (one-to-many)
Scan: Represents an analysis session of a repository
- Fields: id, repository_id, timestamp, total_files, analyzed_files, total_bugs
- Relationships: bugs (one-to-many), language_stats (one-to-many)
Bug: Stores details about identified bugs
- Fields: id, scan_id, file_path, line_number, bug_type, severity, description, code_snippet, recommendation, language
LanguageStats: Tracks statistics about language usage
- Fields: id, scan_id, language, file_count, line_count, bug_count

2. Services

Services handle the business logic of the application:

repository.py: Manages repository operations (cloning, cleaning up)
- Functions: clone_repository, cleanup_repository, get_repository_name, list_files
analyzer.py: Coordinates the analysis process
- Functions: analyze_repository
language_detector.py: Identifies programming languages
- Functions: detect_language, count_lines, analyze_language_stats
report_generator.py: Creates summary reports
- Functions: generate_report
individual_report_generator.py: Generates detailed reports for individual bugs
- Functions: generate_individual_bug_reports, generate_bug_report, ensure_repo_directory

3. Analyzers

Language-specific analyzers implement bug detection logic:

common_analyzer.py: Checks common issues across all languages
- Functions: analyze_common_issues, check_file_size, get_code_snippet
python_analyzer.py: Python-specific bug detection
- Classes: PythonAstVisitor (extends ast.NodeVisitor)
- Functions: analyze_python_file
javascript_analyzer.py: JavaScript/TypeScript bugs detection
- Functions: analyze_javascript_file, check_for_strict_equality
go_analyzer.py: Go language bug detection
- Functions: analyze_go_file

Database Schema

┌────────────────┐       ┌────────────────┐       ┌────────────────┐
│  Repository    │       │     Scan       │       │      Bug       │
├────────────────┤       ├────────────────┤       ├────────────────┤
│ id             │       │ id             │       │ id             │
│ url            │       │ repository_id  ├───┐   │ scan_id        ├───┐
│ name           │       │ timestamp      │   │   │ file_path      │   │
│ last_analyzed  │◄──────┤ total_files    │   │   │ line_number    │   │
│ status         │       │ analyzed_files │   │   │ bug_type       │   │
└────────────────┘       │ total_bugs     │   │   │ severity       │   │
                         └────────────────┘   │   │ description    │   │
                                              │   │ code_snippet   │   │
                                              │   │ recommendation │   │
                                              │   │ language       │   │
                                              │   └────────────────┘   │
                                              │                        │
                                              │   ┌────────────────┐   │
                                              │   │  LanguageStats │   │
                                              │   ├────────────────┤   │
                                              │   │ id             │   │
                                              └───┤ scan_id        │   │
                                                  │ language       │   │
                                                  │ file_count     │   │
                                                  │ line_count     │   │
                                                  │ bug_count      │◄──┘
                                                  └────────────────┘

Flow of Operation

User Submits Repository URL
- Form submission handled by routes.py (analyze function)
- URL validation and repository creation in database
Repository Cloning
- services/repository.py clones the GitHub repository
- Clone is saved to a temporary directory
Code Analysis
- services/analyzer.py coordinates the analysis process
- Language detection for each file
- Language-specific analyzers process files
- Bugs are identified and stored in the database
Report Generation
- Summary report is generated with statistics
- Charts and visualizations are created for the web interface
- Individual bug reports can be generated on demand
Results Display
- Web interface shows comprehensive report
- Users can browse bugs, filter by severity
- Report includes charts for visualization

Individual Bug Reports

The application generates detailed reports for each identified bug in JSON format. These reports include:

{
  "general_info": {
    "vulnerability_title": "Bug type in filename",
    "description": "Detailed description of the bug",
    "target": {
      "repository": "Repository name",
      "repository_url": "GitHub URL",
      "file_path": "Path to affected file",
      "line_number": 42,
      "language": "Programming language"
    },
    "vulnerability_category": "Bug type",
    "timestamp": "ISO timestamp"
  },
  "severity": {
    "level": "critical|high|medium|low|info",
    "score": "CVSS-like score range",
    "vector": "CVSS vector string"
  },
  "details": {
    "description": "Detailed explanation",
    "code_snippet": "Problematic code",
    "recommendation": "How to fix the issue"
  },
  "validation": {
    "steps": [
      "Step 1: Navigate to file",
      "Step 2: Go to line number",
      "Step 3: Observe the problem"
    ]
  }
}

Routes

The application defines the following routes:

/ (index): Home page with repository submission form
/analyze (POST): Handles repository analysis request
/results/<scan_id>: Displays analysis results
/scan/<scan_id>/generate-reports: Generates individual bug reports
/scan/<scan_id>/reports: Displays the list of generated reports
/results/: Serves individual report files
/api/scans: JSON API for scan listing
/api/scan/<scan_id>/bugs: JSON API for bugs in a scan

Security Considerations

Repository URLs are validated before cloning
Only GitHub repositories are supported to prevent potential security issues
Temporary files are cleaned up after analysis
User sessions are secured with a secret key
Database connections use connection pooling and reconnection

Performance Optimization

Large repositories are analyzed file by file to manage memory usage
Database queries use pagination for bug listing
Report generation is done on-demand for individual bug reports
Images and assets are cached by the browser

Deployment

The application can be deployed using various methods:

Option 1: Traditional Deployment

Clone the repository
Install dependencies: pip install -r requirements.txt
Set environment variables
Run with gunicorn: gunicorn --bind 0.0.0.0:5000 main:app

Option 2: Docker Deployment

Build the Docker image: docker build -t codebug-analyzer .
Run the container: docker run -p 5000:5000 -e DATABASE_URL=<your-db-url> codebug-analyzer

Option 3: Platform as a Service (PaaS)

The application is compatible with platforms like Heroku:

# Procfile
web: gunicorn main:app

Extending the Application

Adding Support for New Languages

Create a new analyzer in the analyzers directory (e.g., rust_analyzer.py)
Implement the language-specific bug detection logic
Update services/language_detector.py to recognize the new language
Integrate the new analyzer in services/analyzer.py

Adding New Bug Detection Rules

Identify the appropriate analyzer for the language
Add new detection logic as functions in the analyzer
Update the analyzer's main function to call the new detection logic
Add test cases to validate the new detection rules

Environment Variables

Variable	Description	Default
DATABASE_URL	Database connection URL	sqlite:///code_analyzer.db
SESSION_SECRET	Secret key for Flask sessions	Random value
DEBUG	Enable/disable debug mode	True
REPO_TEMP_DIR	Directory for temporary repository clones	temp_repos/

Requirements

Python 3.11+
Flask
Flask-SQLAlchemy
GitPython
gunicorn (for production)
psycopg2-binary (for PostgreSQL support)

Troubleshooting

Common Issues

Repository cloning fails
- Ensure the GitHub repository is public
- Check internet connectivity
- Verify Git is installed and in the PATH
Analysis takes too long
- Large repositories may require more time
- Consider limiting analysis to specific directories
- Ensure sufficient system resources
Database errors
- Check DATABASE_URL environment variable
- Ensure database server is running
- Verify database user has proper permissions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CodeBug Analyzer - Technical Documentation

Architecture Overview

High-Level Components

Core Components

1. Models

2. Services

3. Analyzers

Database Schema

Flow of Operation

Individual Bug Reports

Routes

Security Considerations

Performance Optimization

Deployment

Option 1: Traditional Deployment

Option 2: Docker Deployment

Option 3: Platform as a Service (PaaS)

Extending the Application

Adding Support for New Languages

Adding New Bug Detection Rules

Environment Variables

Requirements

Troubleshooting

Common Issues

FilesExpand file tree

TECHNICAL_DOCUMENTATION.md

Latest commit

History

TECHNICAL_DOCUMENTATION.md

File metadata and controls

CodeBug Analyzer - Technical Documentation

Architecture Overview

High-Level Components

Core Components

1. Models

2. Services

3. Analyzers

Database Schema

Flow of Operation

Individual Bug Reports

Routes

Security Considerations

Performance Optimization

Deployment

Option 1: Traditional Deployment

Option 2: Docker Deployment

Option 3: Platform as a Service (PaaS)

Extending the Application

Adding Support for New Languages

Adding New Bug Detection Rules

Environment Variables

Requirements

Troubleshooting

Common Issues