Skip to content

shvenkat-rh/AI-Issue-Triage

Repository files navigation

AI Issue Triage

CI Python 3.11+ Code style: black

An AI-powered issue analysis tool that uses Google's Gemini AI to perform comprehensive analysis of software issues based on your codebase content.


🚀 Quick Start

Want to set up automated issue analysis in your GitHub repo?

👉 See the QUICKSTART Guide for step-by-step instructions to:

  • Set up automated GitHub Actions workflows
  • Configure AI-powered issue analysis
  • Analyze issues against your codebase automatically
  • Get started in under 10 minutes

Looking to use this as a CLI tool or library? Continue reading below.


Features

  • AI-Powered Analysis: Uses Google Gemini 2.0 Flash for intelligent issue analysis with the latest Google Gen AI SDK
  • Two-Pass Architecture: Librarian identifies relevant files, Surgeon performs deep analysis on targeted context
  • Root Cause Analysis: Identifies primary causes and contributing factors
  • Solution Generation: Proposes specific code changes with rationale
  • Issue Triage: Automatically classifies issues as bugs, enhancements, or feature requests
  • Severity Assessment: Rates issues from low to critical priority
  • Code Location Mapping: Identifies relevant files, functions, and classes
  • Export Capabilities: Export analysis results in JSON format
  • Enhanced Performance: Faster analysis with the latest Gemini 2.0 Flash model
  • Smart Retry Mechanism: Automatically retries analysis if low-quality responses are detected
  • Security Protection: Built-in prompt injection detection to protect against malicious inputs
  • Duplicate Detection: Automatically identifies and flags duplicate issues
  • GitHub Actions Integration: Multiple workflow options including label-based filtering for selective analysis
  • PR Review & Analysis: AI-powered pull request review with code quality feedback and suggestions

Setup

1. Install the Package

The easiest way to get started is to install the package:

# Clone the repository
git clone https://github.com/shvenkat-rh/AI-Issue-Triage.git
cd AI-Issue-Triage

# Install in development mode (editable)
pip install -e .

This will:

  • Install all dependencies
  • Make the utils package importable
  • Install CLI commands: ai-triage, ai-triage-duplicate, ai-triage-cosine, ai-triage-pr

Alternative (manual dependency installation):

pip install -r requirements.txt

2. Get Gemini API Key

⚠️ Important Notes:

  • Red Hat employees: Do NOT follow these steps. Please refer to the RH Internal Guidelines for generating your API keys.
  • Already have a GCP/Gemini API key? You can skip this section and use your existing key.
  1. Visit Google AI Studio
  2. Create a new API key
  3. Copy the env_example.txt to .env and add your API key:
cp env_example.txt .env
# Edit .env and add your API key

Note: The application now uses the latest Google Gen AI SDK with the advanced gemini-2.0-flash-001 model for faster and more accurate analysis.

3. Prepare Your Codebase

By default, the analyzer looks for a repomix-output.txt file in the project directory. This file should contain your codebase content generated by Repomix.

To generate this file:

# Install repomix
npm install -g repomix

# Generate codebase file in your project directory
repomix --output repomix-output.txt

Alternative: You can use any text file containing your codebase content and specify its path using the --source-path option (see CLI usage below).

Usage

The AI Issue Triage system can be used in multiple ways:

1. GitHub Actions Workflow (Automated) - ⚡ RECOMMENDED

📖 For complete setup instructions, see the QUICKSTART Guide

Quick Overview

The most powerful way to use this system is through automated GitHub Actions workflows. The system provides four workflow options:

Automatic Workflows (All Issues)

  • Single Issue Analysis: Analyzes each new issue as it's created
  • Bulk Issue Analysis: Re-analyzes all open issues when code changes (with smart duplicate detection)

Label-Based Workflows (Selective Analysis)

  • Labeled Issue Analysis: Only analyzes issues with "Gemini Analyze" label
  • Bulk Labeled Analysis: Re-analyzes only labeled issues on PR merge

Common Features

  • Beautiful Formatting: Professional GitHub-flavored Markdown with emojis and collapsible sections
  • Security Checks: Built-in prompt injection detection
  • Duplicate Detection: Identifies similar/duplicate issues automatically

Quick Setup

For detailed step-by-step instructions, see the QUICKSTART Guide.

TL;DR:

  1. Copy workflows from cutlery/workflows/ to .github/workflows/ in your repo
  2. Add GEMINI_API_KEY secret in GitHub repository settings
  3. Create triage.config.json (example in cutlery/triage.config.json)
  4. (Optional) Create "Gemini Analyze" label for selective analysis
  5. Push changes and create a test issue

How It Works

Single Issue Analysis - When a new issue is opened:

  1. Security Check: Scans for prompt injection attempts to protect the AI system
  2. Duplicate Detection: Compares against existing issues to identify duplicates
  3. AI Analysis: Performs comprehensive issue analysis using your codebase
  4. Auto-Labeling: Adds appropriate labels based on issue type and severity
  5. Comment Generation: Posts detailed analysis results as issue comments

Bulk Issue Analysis - When a PR is merged to main:

  1. Fetches all open issues (sorted oldest → newest for canonical duplicate handling)
  2. For each issue in sequential order:
    • Prompt Injection Check: Scans and posts security report comment
    • Duplicate Detection: Compares against previously analyzed issues in this run
      • If duplicate: adds label, posts duplicate comment with confidence score, skips AI analysis
    • AI Analysis: Re-analyzes with updated codebase (only if not duplicate and safe)
      • Posts "Updated AI Analysis" comment with fresh insights
  3. Automatically labels and comments on all issues

Note: Bulk analysis is efficient - duplicates and high-risk issues skip expensive AI analysis, saving API calls and time.

Workflow Features

  • Security Protection: Automatically detects and flags malicious prompt injection attempts
  • Smart Duplicate Detection:
    • Single issue workflow: checks against all existing open issues
    • Bulk analysis workflow: processes oldest → newest, comparing each against previously analyzed in the same run
    • Older issues become "canonical" references for newer duplicates
    • Duplicates are marked and skipped to save API calls
  • Smart Labeling: Adds labels like type:bug, severity:high, gemini-analyzed, duplicate, security-alert
  • Comprehensive Comments: Posts three types of comments:
    • Prompt injection security reports (all issues)
    • Duplicate detection results (when duplicates found)
    • Updated AI analysis (non-duplicate, safe issues)
  • Artifact Storage: Saves analysis results and debug logs for review
  • Fast Processing: Uses latest Gemini 2.0 Flash model for quick analysis
  • Flexible Filtering: Choose between automatic (all issues) or label-based (selective) workflows

Available Workflows

The system provides four workflow files in cutlery/workflows/:

Workflow File Trigger Filter Use Case
gemini-issue-analysis.yml Issue opened None Analyze all new issues automatically
gemini-labeled-issue-analysis.yml Issue opened/labeled "Gemini Analyze" label Selective analysis - only labeled issues
ai-bulk-issue-analysis.yml PR merged to main None Re-analyze all open issues
ai-bulk-labeled-issue-analysis.yml PR merged to main "Gemini Analyze" label Re-analyze only labeled issues

When to Use Label-Based Workflows:

  • 🎯 Cost Control: Reduce API usage by analyzing only selected issues
  • 🔍 Manual Triage: Team decides which issues need AI analysis
  • Complex Issues: Focus AI resources on difficult problems
  • 📊 Gradual Rollout: Test AI analysis on select issues first

Workflow Configuration Example:

# Automatic: Analyzes all issues
name: AI Issue Analysis
on:
  issues:
    types: [opened]

# Label-Based: Only analyzes issues with "Gemini Analyze" label
name: AI Labeled Issue Analysis
on:
  issues:
    types: [labeled, opened]
jobs:
  analyze-issue:
    if: contains(github.event.issue.labels.*.name, 'Gemini Analyze')

Key Configuration Options:

  • Trigger Events: Modify on.issues.types to include edited, reopened, etc.
  • Label Filter: Change 'Gemini Analyze' to use a different label name
  • Repository Source: Change the AI-Issue-Triage repository reference if using a fork
  • Node.js Version: Adjust Node.js version for repomix compatibility
  • Python Version: Modify Python version based on your requirements
  • Artifact Retention: Adjust how long analysis artifacts are stored

Workflow Artifacts

The workflow generates several artifacts for debugging and audit purposes:

  • prompt_injection_result.json: Security scan results
  • prompt_injection_debug.log: Debug information for security checks
  • duplicate_result.json: Duplicate detection results
  • analysis_result.json: Complete AI analysis in JSON format
  • analysis_result.txt: Human-readable analysis results
  • repomix-output.txt: Generated codebase content

2. Web Interface (Interactive) - 🚧 Work in Progress

⚠️ Note: The Streamlit web interface is currently under development and not recommended for production use.

streamlit run ui/streamlit_app.py

This will open a web interface where you can:

  1. Enter your Gemini API key in the sidebar
  2. Provide issue details:
    • Issue Title
    • Detailed Description
  3. Click "Analyze Issue" to get comprehensive analysis
  4. Review results including:
    • Issue classification and severity
    • Root cause analysis
    • Proposed solutions with code changes
    • Confidence score
  5. Export results as JSON for further use

Status: We're actively working on improving the UI/UX. For production use, please use the GitHub Actions workflow or CLI.


3. Command Line Interface (CLI) - Scripting & Automation

The analyzer provides a powerful command-line interface for automation and scripting.

Quick Start

Using installed commands (after pip install -e .):

# Interactive mode - prompts for title and description
ai-triage

# Direct analysis
ai-triage --title "Login bug" --description "Users can't login on mobile devices"

# Analyze from file
ai-triage --file sample_issue.txt

# Use custom source of truth file
ai-triage --title "Bug" --description "Description" --source-path /path/to/my-codebase.txt

# Use custom prompt template
ai-triage --title "Bug" --description "Description" --custom-prompt /path/to/custom_prompt.txt

# Use a different Gemini model
ai-triage --title "Bug" --description "Description" --model gemini-1.5-pro

# Save output to file
ai-triage --title "Bug" --description "Description" --output analysis.txt

# JSON output for automation
ai-triage --title "Bug" --description "Description" --format json

# Quiet mode (no progress messages)
ai-triage --quiet --title "Bug" --description "Description"

# Configure retry attempts for better quality
ai-triage --title "Bug" --description "Description" --retries 3

Alternative (using Python module):

python -m cli.analyze --title "Bug" --description "Details"

CLI Options

positional arguments:
  none

options:
  -h, --help            show this help message and exit
  --title TITLE, -t TITLE
                        Issue title
  --description DESCRIPTION, -d DESCRIPTION
                        Issue description  
  --file FILE, -f FILE  Read issue from file (title on first line, description below)
  --output OUTPUT, -o OUTPUT
                        Output file (default: stdout)
  --format {text,json}  Output format (default: text)
  --source-path SOURCE_PATH, -s SOURCE_PATH
                        Path to source of truth file (default: repomix-output.txt)
  --custom-prompt CUSTOM_PROMPT
                        Path to custom prompt template file
  --api-key API_KEY     Gemini API key (default: from GEMINI_API_KEY env var)
  --model MODEL         Gemini model name (default: gemini-2.0-flash-001)
  --retries RETRIES     Maximum retry attempts for low quality responses (default: 2)
  --quiet, -q           Suppress progress messages
  --no-clean            Disable data cleaning (preserve raw input)
  --version             show program's version number and exit

File Format for --file option

Issue Title Here
Issue description starts here.
Can be multiple lines.
Include all relevant details.

Additional Command-Line Tools

The project includes several specialized CLI tools for specific tasks:

1. Duplicate Issue Detection

Detect duplicate issues using AI-powered semantic analysis:

Using installed commands:

# Check if a new issue is duplicate
ai-triage-duplicate --title "Issue title" --description "Issue details" --issues issues.json

# Use a different Gemini model
ai-triage-duplicate --title "Issue title" --description "Details" --issues issues.json --model gemini-1.5-pro

# Batch check multiple issues
ai-triage-duplicate --file new-issues.json --issues existing-issues.json

Alternative (using Python module):

python -m cli.duplicate_check --title "..." --description "..." --issues issues.json

Features:

  • AI-powered semantic similarity detection
  • Compares against existing open issues
  • Provides similarity scores and recommendations
  • Configurable Gemini model selection

Status: ✅ Stable and ready for use


2. Cosine Similarity Duplicate Detection

Alternative duplicate detection using TF-IDF and cosine similarity:

Using installed commands:

ai-triage-cosine --title "Issue title" --description "Details" --issues issues.json

Alternative (using Python module):

python -m cli.cosine_check --title "..." --description "..." --issues issues.json

Features:

  • Fast, no API required
  • Uses scikit-learn TF-IDF vectorization
  • Good for offline/local analysis

Status: 🚧 Experimental - We're still refining the similarity thresholds and accuracy. Use with caution.


4. Pull Request Review

AI-powered pull request analysis and code review:

Using installed command (after pip install -e .):

# Review a PR from JSON file
ai-triage-pr --pr-file pr_data.json

# Review with custom configuration
ai-triage-pr --pr-file pr.json --config pr_prompt_config.yml

# Review and save to file
ai-triage-pr --pr-file pr.json --output review.md --format markdown

# Review with repo URL for context-specific prompts
ai-triage-pr --pr-file pr.json --repo-url "https://github.com/user/repo"

# Review with inline data
ai-triage-pr --title "Add feature" --body "Description" --files changes.json

Using as a module:

# Review a PR from JSON file
python -m cli.pr_review --pr-file pr_data.json

# Review with custom configuration
python -m cli.pr_review --pr-file pr.json --config pr_prompt_config.yml

# Review and save to file
python -m cli.pr_review --pr-file pr.json --output review.json --format markdown

# Review with repo URL for context-specific prompts
python -m cli.pr_review --pr-file pr.json --repo-url "https://github.com/user/repo"

# Review with inline data
python -m cli.pr_review --title "Add feature" --body "Description" --files changes.json

PR JSON file format:

{
  "title": "PR title",
  "body": "PR description",
  "repo_url": "https://github.com/user/repo",
  "file_changes": [
    {
      "filename": "path/to/file.py",
      "status": "modified",
      "additions": 10,
      "deletions": 5,
      "patch": "@@ -1,5 +1,10 @@\n..."
    }
  ]
}

Features:

  • Comprehensive code review with AI
  • File-specific comments with line numbers
  • Identifies strengths, issues, and suggestions
  • Configurable prompts for different repo types (Python, AI/ML, etc.)
  • Workflow analysis for GitHub Actions
  • Markdown and JSON output formats

Prompt Configuration:

The PR analyzer uses a YAML configuration file (pr_prompt_config.yml) to customize review behavior based on repository type:

# Repository URL patterns
repo_mappings:
  python:
    - 'github.com/.*/.*-python.*'
  ai_ml:
    - 'github.com/.*/.*AI.*'

# Custom prompts per repo type
prompts:
  python:
    pr_review:
      system_role: 'Python expert code reviewer...'
      review_structure: |
        Focus on:
        - PEP 8 compliance
        - Type hints
        - Docstrings
        ...

Status: ✅ Stable and ready for use


3. Prompt Injection Detection

Security tool to detect malicious prompt injection attempts:

Using as a module:

python -m utils.security.prompt_injection "title" "description"

Using as a library:

from utils.security import PromptInjectionDetector

detector = PromptInjectionDetector()
result = detector.detect("Issue content")
print(f"Risk Level: {result.risk_level}")

Features:

  • Detects prompt injection patterns
  • ML-based detection using pytector
  • Pattern-based heuristics
  • Risk level classification

Status: ✅ Stable - Automatically integrated into GitHub Actions workflows


5. Two-Pass Architecture (Librarian + Surgeon)

For complex issues requiring full code context, use the Two-Pass Architecture that intelligently breaks down the codebase into directory chunks and identifies relevant files before deep analysis:

How It Works:

  1. Directory Chunking: Repository is cloned and divided into per-directory compressed repomix files
  2. Pass 1 - Librarian: Analyzes each directory chunk to identify relevant files (with dependency tracking)
  3. Pass 2 - Surgeon: Creates targeted repomix with only identified files for deep analysis

Pass 1 - Librarian (File Identification):

# Librarian analyzes directory chunks to identify relevant files
python -m cli.librarian \
  --title "Bug in authentication flow" \
  --description "Users cannot login after password reset" \
  --chunks-dir repomix-chunks \
  --output relevant_files.json

Pass 2 - Surgeon (Deep Analysis) - Use existing analyzer with targeted files:

# Surgeon pass uses the standard analyzer with targeted repomix
# (see GitHub Actions workflow for automated integration)

Benefits:

  • Scalable: Works with repos of any size by breaking into chunks
  • Token Efficient: Avoids 1M+ token limits by analyzing directories separately
  • Smart Dependencies: If file A imports file B, both are included
  • Precise Context: Surgeon gets only relevant files, not entire codebase

Automated Workflow: The ai-lib-triage.yml workflow automatically handles:

  • Repository cloning and directory tree generation
  • Per-directory repomix generation with compression
  • Librarian analysis across all chunks
  • Targeted repomix creation with identified files
  • Surgeon analysis with full context of relevant files
  • All security, duplicate detection, and labeling features

How It Works:

  1. Librarian analyzes compressed codebase skeleton
  2. AI identifies ALL relevant files (no arbitrary limits)
  3. Includes dependency chains (if file A imports B, both included)
  4. Creates targeted repomix with only identified files
  5. Surgeon performs deep analysis with focused context
  6. Results in more accurate analysis with lower token usage

Features:

  • AI determines relevant file count (no manual limits)
  • Automatic dependency inclusion
  • Targeted codebase generation
  • Integrates with existing analyzer (Surgeon)
  • GitHub Actions workflow available (ai-lib-triage.yml)

GitHub Workflow: The ai-lib-triage.yml workflow provides:

  • Label-triggered ("AI_Triage" or bypass label)
  • Security checks with prompt injection detection
  • Duplicate detection
  • Two-pass analysis (Librarian → Targeted Repomix → Surgeon)
  • Auto-labeling based on results
  • Comprehensive comments with file lists

When to Use:

  • ✅ Subtle bugs requiring full code context
  • ✅ Complex issues spanning multiple files
  • ✅ Issues where file location is unclear
  • ✅ Large codebases where full context exceeds token limits

Status: ✅ Stable and ready for use


Programmatic Usage (Python Library)

You can also use the analyzer programmatically:

# Import from the package
from utils import GeminiIssueAnalyzer, IssueAnalysis, IssueType, Severity

# Or import specific modules
from utils.analyzer import GeminiIssueAnalyzer
from utils.duplicate import CosineDuplicateAnalyzer, GeminiDuplicateAnalyzer
from utils.models import IssueAnalysis, IssueType, Severity
from utils.security import PromptInjectionDetector
from utils.pr_analyzer import PRAnalyzer
from utils.librarian import LibrarianAnalyzer

# Initialize analyzer with default source path
analyzer = GeminiIssueAnalyzer(api_key="your-api-key")

# Or initialize with custom source path
analyzer = GeminiIssueAnalyzer(
    api_key="your-api-key",
    source_path="/path/to/your/codebase.txt"
)

# Or use a different Gemini model
analyzer = GeminiIssueAnalyzer(
    api_key="your-api-key",
    model_name="gemini-1.5-pro"
)

# Note: The analyzer uses the Google Gen AI SDK with gemini-2.0-flash-001 by default

# Analyze an issue
analysis = analyzer.analyze_issue(
    title="Login page crashes on mobile",
    issue_description="When users try to login on mobile devices, the app crashes..."
)

print(f"Issue Type: {analysis.issue_type}")
print(f"Severity: {analysis.severity}")
print(f"Root Cause: {analysis.root_cause_analysis.primary_cause}")

# Use duplicate detection
duplicate_analyzer = GeminiDuplicateAnalyzer(
    api_key="your-api-key",
    model_name="gemini-1.5-pro"  # Optional
)
result = duplicate_analyzer.detect_duplicate(
    new_issue_title="Bug title",
    new_issue_description="Details",
    existing_issues=[...]
)

# Use security detection
security = PromptInjectionDetector()
check = security.detect("User input text")
print(f"Risk: {check.risk_level}")

# Use PR analyzer
pr_analyzer = PRAnalyzer(
    api_key="your-api-key",
    model_name="gemini-2.0-flash-001"  # Optional
)
review = pr_analyzer.review_pr(
    title="Add new feature",
    body="Description of changes",
    file_changes=[
        {
            "filename": "src/feature.py",
            "status": "modified",
            "additions": 10,
            "deletions": 5,
            "patch": "@@ -1,5 +1,10 @@\n..."
        }
    ],
    repo_url="https://github.com/user/repo"
)
print(f"Overall Assessment: {review.overall_assessment}")
print(f"Issues Found: {len(review.issues_found)}")

# Format review for display
formatted_review = pr_analyzer.format_review_summary(review)
print(formatted_review)

# Use Two-Pass Architecture (Librarian + Surgeon)
librarian = LibrarianAnalyzer(
    api_key="your-api-key",
    chunks_dir="repomix-chunks"
)

# Pass 1: Identify relevant files from directory chunks
result = librarian.identify_relevant_files(
    title="Authentication Bug",
    issue_description="Users cannot login after password reset"
)
print(f"Analysis: {result['analysis_summary']}")
print(f"Identified {len(result['relevant_files'])} relevant files")

# Pass 2: Use standard analyzer with targeted context
# (create targeted repomix with only relevant_files, then use GeminiIssueAnalyzer)

Source of Truth Configuration

The analyzer uses a "source of truth" file containing your codebase content to perform intelligent analysis. This gives the AI context about your specific code structure, patterns, and implementation details.

Default Behavior

  • By default, the analyzer looks for repomix-output.txt in the current directory
  • This file should contain your complete codebase content

Custom Source Path

You can specify a different source file using the --source-path option:

# Use a custom codebase file
ai-triage --source-path /path/to/my-project-dump.txt --title "Issue" --description "Details"

# Use a file in a different directory
ai-triage -s ../other-project/codebase.txt --title "Issue" --description "Details"

Supported File Formats

  • Any plain text file containing your codebase
  • Generated by tools like Repomix
  • Manual concatenation of source files
  • Output from other code analysis tools

Best Practices

  • Include all relevant source files in your source of truth
  • Keep the file updated when your codebase changes
  • Consider excluding large binary files or dependencies
  • Include configuration files, documentation, and tests for better analysis

Custom Prompt Templates

You can customize how the AI analyzes your issues by providing your own prompt template. This gives you complete control over the analysis style and focus areas.

Creating a Custom Prompt

  1. Create a text file with your custom prompt template

  2. Use placeholders for dynamic content:

    • {title} - Issue title
    • {issue_description} - Issue description
    • {codebase_content} - Full codebase content
  3. Example custom prompt (my_prompt.txt):

You are a security-focused code reviewer analyzing the following issue:

Title: {title}
Description: {issue_description}

Codebase: {codebase_content}

Focus on:
- Security vulnerabilities
- Input validation issues
- Authentication/authorization problems
- Data exposure risks

Provide analysis in JSON format with security_risks field.

Using Custom Prompts

# CLI usage
ai-triage --title "Security Issue" --description "Details..." --custom-prompt my_prompt.txt

# Web UI usage
# Enter the path in the "Custom Prompt Path" field in the sidebar

Custom Prompt Use Cases

  • Security Analysis: Focus on vulnerabilities and security best practices
  • Performance Review: Emphasize performance optimization opportunities
  • Architecture Review: Concentrate on design patterns and architectural improvements
  • Compliance Check: Ensure code meets specific coding standards or regulations
  • Domain-Specific: Tailor analysis for specific frameworks or technologies

Security Features

The AI Issue Triage system includes comprehensive security protections to prevent misuse and protect the AI analysis system.

Prompt Injection Detection

The system automatically scans all issue content for potential prompt injection attempts using:

  • Machine Learning Detection: Uses the pytector library with trained models
  • Pattern-Based Detection: Custom regex patterns for common injection techniques
  • Heuristic Analysis: Behavioral analysis for suspicious content patterns

Detection Categories

The system identifies various types of malicious inputs:

  • Role Manipulation: Attempts to change the AI's role or persona
  • System Prompts: Trying to inject system-level instructions
  • Instruction Bypass: Commands to ignore previous instructions
  • File Manipulation: Requests to create, modify, or access files
  • Code Injection: Attempts to execute arbitrary code
  • Data Extraction: Trying to extract sensitive information
  • Prompt Leakage: Attempts to reveal system prompts

Risk Levels

Issues are classified into risk levels:

  • Critical: Severe injection attempts (flagged and processing stopped)
  • High: Clear malicious intent (flagged with warning)
  • Medium: Suspicious patterns (flagged for review)
  • Low: Minor concerns (noted but processed)
  • Safe: No security concerns detected

Security Response

When prompt injection is detected:

  1. Issue Flagging: Adds security labels (security-alert, prompt-injection-detected)
  2. Warning Comment: Posts educational message explaining the detection
  3. Processing Halt: Stops AI analysis to prevent system manipulation
  4. Audit Trail: Logs detection details for security review

Duplicate Detection

The system includes intelligent duplicate detection to prevent redundant analysis and improve issue management.

How It Works

  • Semantic Analysis: Uses AI to understand issue meaning beyond exact text matches
  • Similarity Scoring: Calculates confidence scores for potential duplicates
  • Context Awareness: Considers issue status, labels, and resolution state
  • Cross-Reference: Compares against all existing open issues

Duplicate Handling

When duplicates are detected:

  • Automatic Labeling: Adds duplicate label
  • Reference Comment: Links to the original issue
  • Processing Skip: Avoids redundant AI analysis
  • Consolidation: Helps maintainers merge related issues

Smart Retry Mechanism

The analyzer includes an intelligent retry system that automatically detects low-quality responses and retries the analysis for better results.

How It Works

The system automatically identifies responses that contain:

  • Generic phrases like "requires further investigation" or "to be determined"
  • Very low confidence scores (< 60%)
  • Vague file paths or empty solutions
  • Short or incomplete analysis summaries

Configuration

# Default: 2 retries
ai-triage --title "Issue" --description "Details"

# Custom retry count
ai-triage --title "Issue" --description "Details" --retries 3

# Disable retries
ai-triage --title "Issue" --description "Details" --retries 0

Benefits

  • Higher Quality: Automatically improves analysis quality
  • Reliability: Reduces chance of getting generic responses
  • Transparency: Shows retry attempts in progress messages
  • Configurable: Adjust retry count based on your needs

Analysis Components

Issue Classification

  • Bug: Issues that represent errors or defects
  • Enhancement: Improvements to existing functionality
  • Feature Request: New functionality requests

Severity Levels

  • Critical: System-breaking issues requiring immediate attention
  • High: Important issues affecting core functionality
  • Medium: Moderate impact issues
  • Low: Minor issues with minimal impact

Root Cause Analysis

  • Primary cause identification
  • Contributing factors
  • Affected components
  • Related code locations

Solution Proposals

  • Specific code changes
  • Implementation rationale
  • Target locations (files, functions, classes)
  • Step-by-step implementation guidance

Example Analysis

{
  "title": "Authentication timeout not handled properly",
  "issue_type": "bug",
  "severity": "high",
  "root_cause_analysis": {
    "primary_cause": "Missing timeout exception handling in auth module",
    "contributing_factors": [
      "No retry mechanism implemented",
      "User feedback on timeout missing"
    ],
    "affected_components": ["authentication", "user_session"],
    "related_code_locations": [
      {
        "file_path": "src/auth/login.py",
        "line_number": 45,
        "function_name": "authenticate_user"
      }
    ]
  },
  "proposed_solutions": [
    {
      "description": "Add timeout exception handling with user feedback",
      "code_changes": "try:\n    response = auth_request()\nexcept TimeoutError:\n    return {'error': 'Authentication timeout'}",
      "location": {
        "file_path": "src/auth/login.py",
        "function_name": "authenticate_user"
      },
      "rationale": "Provides graceful error handling and user feedback"
    }
  ],
  "confidence_score": 0.85
}

Testing

The project includes a comprehensive test suite to ensure code quality and reliability.

Continuous Integration

Automated quality checks run on every pull request and push to main via GitHub Actions.

CI Workflow (ci.yml) - All-in-One Status Check

The main CI workflow combines all checks into a single unified status:

Unit Tests (Matrix Strategy)

  • Runs on Python 3.11, 3.12, and 3.13
  • All versions run in parallel
  • fail-fast: false - All Python versions complete even if one fails

Lint Checks

  • Black: Code formatting validation
  • isort: Import sorting validation
  • Flake8: Code quality linting
  • Blocking: PRs cannot merge if linting fails

All Checks Pass Job

  • Single unified status check
  • Only passes if all unit tests AND linting succeed
  • Perfect for branch protection rules

See .github/workflows/ci.yml for configuration details.

Running Tests Locally

# Run all tests
pytest tests/

# Run with verbose output
pytest tests/ -v

# Run with coverage report (optional, requires pytest-cov)
# pytest tests/ --cov=. --cov-report=html

# Run only unit tests (no API required)
pytest tests/ -m unit -v

# Run only integration tests (requires API key)
pytest tests/ -m integration -v

# Run specific test file
pytest tests/test_models.py -v

# Use the test runner script
python run_tests.py

Running Linting Checks Locally

Before pushing code, run these checks locally:

# Install linting tools
pip install black isort flake8 flake8-docstrings flake8-bugbear

# Auto-fix formatting issues
black .
isort .

# Check formatting without fixing
black --check --diff .
isort --check-only --diff .

# Run flake8 linting
flake8 . --max-line-length=127 --extend-ignore=E203,W503

# Run all checks at once
black . && isort . && flake8 .

Test Organization

tests/
├── __init__.py                       # Package initialization
├── conftest.py                       # Pytest configuration & fixtures
├── test_models.py                    # Tests for data models
├── test_gemini_analyzer.py           # Tests for Gemini analyzer
├── test_duplicate_analyzer.py        # Tests for Gemini duplicate detection
└── test_cosine_duplicate_analyzer.py # Tests for cosine similarity analyzer

Test Features

  • Comprehensive test coverage for all major functionality
  • Fixtures: Reusable test data and setup
  • Markers: Categorize tests by type (unit, integration, slow)
  • Unit tests: No API key required, fast execution
  • Integration tests: Require GEMINI_API_KEY for full functionality

Project Structure

AI-Issue-Triage/
├── .github/
│   └── workflows/
│       ├── gemini-issue-analysis.yml  # (Example) Auto issue analysis workflow
│       └── ci.yml                     # Combined CI workflow (tests + lint)
│
├── utils/                      # 📦 Core Library Package
│   ├── __init__.py            # Package exports
│   ├── models.py              # Pydantic data models
│   ├── analyzer.py            # Main issue analyzer (Surgeon)
│   ├── librarian.py           # ✅ File identification analyzer (Librarian - Pass 1)
│   ├── pr_analyzer.py         # ✅ PR review analyzer
│   ├── duplicate/             # Duplicate detection module
│   │   ├── __init__.py
│   │   ├── gemini_duplicate.py    # ✅ AI-powered duplicate detection
│   │   └── cosine_duplicate.py    # 🚧 TF-IDF based detection (WIP)
│   └── security/              # Security module
│       ├── __init__.py
│       └── prompt_injection.py    # ✅ Prompt injection detection
│
├── cli/                        # 🖥️ Command-Line Tools
│   ├── __init__.py
│   ├── analyze.py             # ✅ Main CLI (ai-triage / Surgeon)
│   ├── duplicate_check.py     # ✅ Duplicate check CLI (ai-triage-duplicate)
│   ├── cosine_check.py        # 🚧 Cosine check CLI (ai-triage-cosine, WIP)
│   ├── pr_review.py           # ✅ PR review CLI
│   └── librarian.py           # ✅ Librarian CLI (Pass 1 - file identification)
│
├── ui/                         # 🎨 User Interface
│   ├── __init__.py
│   ├── streamlit_app.py       # 🚧 Streamlit web UI (WIP)
│   └── run_app.py             # Application runner
│
├── tests/                      # ✅ Comprehensive Test Suite
│   ├── __init__.py
│   ├── conftest.py            # Pytest configuration & fixtures
│   ├── test_models.py         # Data models tests
│   ├── test_gemini_analyzer.py        # Analyzer tests
│   ├── test_duplicate_analyzer.py     # Duplicate detection tests
│   ├── test_cosine_duplicate_analyzer.py  # Cosine similarity tests
│   └── test_pr_analyzer.py    # ✅ PR analyzer tests
│
├── cutlery/                    # 🚀 Quick Start Resources
│   ├── QUICKSTART.md          # Complete setup guide
│   ├── workflows/             # GitHub Actions workflow templates
│   │   ├── gemini-issue-analysis.yml           # ✅ Auto: Single issue
│   │   ├── gemini-labeled-issue-analysis.yml   # ✅ Label: Single issue
│   │   ├── ai-bulk-issue-analysis.yml          # ✅ Auto: Bulk issues
│   │   ├── ai-bulk-labeled-issue-analysis.yml  # ✅ Label: Bulk issues
│   │   ├── ai-pr-review.yml                    # ✅ PR review (label-triggered)
│   │   └── ai-lib-triage.yml                   # ✅ Two-Pass Architecture (Librarian+Surgeon)
│   ├── triage.config.json     # Example configuration
│   └── samples/               # Sample files for testing
│
├── Configuration Files
│   ├── setup.py               # Package installation config
│   ├── requirements.txt       # Python dependencies
│   ├── pytest.ini             # Pytest configuration
│   ├── pyproject.toml         # Black, isort configuration
│   ├── .flake8                # Flake8 linting configuration
│   ├── pr_prompt_config.yml   # ✅ PR review prompt configuration
│   └── env_example.txt        # Environment variables template
│
└── Documentation & Samples
    ├── README.md              # This documentation
    ├── run_tests.py           # Test runner with options
    ├── sample_issue.txt       # Example issue for testing
    └── sample_issues.json     # Sample issues data

Legend:
✅ = Stable and ready for production use
🚧 = Work in progress, use with caution
🚀 = Recommended starting point
📦 = Pip-installable package
🖥️ = Command-line interface
🎨 = Web interface

Contributing

We welcome contributions! Please follow these steps:

Development Workflow

  1. Fork and clone the repository

    git clone https://github.com/YOUR_USERNAME/AI-Issue-Triage.git
    cd AI-Issue-Triage
  2. Create a feature branch

    git checkout -b feature/your-feature-name
  3. Install dependencies

    pip install -r requirements.txt
    pip install black isort flake8 pytest
  4. Make your changes and format code

    # Auto-format code
    black .
    isort .
    
    # Check linting
    flake8 .
  5. Add tests for new functionality

    • Add unit tests in tests/
    • Run tests locally: pytest tests/ -m unit -v
  6. Run CI checks locally

    # Run all unit tests
    pytest tests/ -m unit -v
    
    # Check formatting
    black --check .
    isort --check-only .
    flake8 .
  7. Commit and push

    git add .
    git commit -m "Description of your changes"
    git push origin feature/your-feature-name
  8. Submit a pull request

    • CI will automatically run tests and linting
    • All checks must pass before merging
    • The "CI / All Checks Pass" status must be green

Code Standards

  • Python versions: Must support 3.11, 3.12, and 3.13
  • Formatting: Use black with 127 character line length
  • Import sorting: Use isort with black-compatible settings
  • Linting: Must pass flake8 checks
  • Testing: Add tests for new features
  • Documentation: Update README for significant changes

License

This project is licensed under the Apache License 2.0.

Support

For issues and questions:

  1. Check the existing issues
  2. Create a new issue with detailed description
  3. Include your environment details and error messages

📚 Additional Resources


Note: This tool requires a valid Google Gemini API key. Usage may incur costs based on Google's pricing for the Gemini API.

About

An AI-powered issue analysis tool that uses Google's Gemini AI to perform comprehensive analysis of software issues based of your codebase.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages