HSB Risk Evaluator

The Highly Stealthy Backdoor (HSB) Risk Evaluator provides security risk assessment for software repositories by analyzing individual evaluation entries across three key dimensions.

This work has been accepted by AAAI 2026, Paper Link.

Citation

@misc{yan2025llmbasedquantitativeframeworkevaluating,
      title={An LLM-based Quantitative Framework for Evaluating High-Stealthy Backdoor Risks in OSS Supply Chains}, 
      author={Zihe Yan and Kai Luo and Haoyu Yang and Yang Yu and Zhuosheng Zhang and Guancheng Li},
      year={2025},
      eprint={2511.13341},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2511.13341}, 
}

Overview

The evaluator analyzes repositories across three critical dimensions:

Software Supply Chain Dependency Location - Assesses the repository's position in the software supply chain
Difficulty of Hiding Malicious Code - Evaluates how easily malicious payloads could be hidden
Community Quality - Analyzes the health and security practices of the repository's community

Installation

Option 1: Docker (Recommended)

The easiest way to run the HSB Risk Evaluator is using Docker, which provides a consistent environment:

# Build and run the development environment
./dev.sh
uv sync
uv venv
source .venv/bin/activate

This will use the Dockerfile to create a container with all necessary dependencies and provide an interactive shell for running the evaluator and scripts.

Option 2: Native Installation

Note: Currently only supports Debian-based systems (Ubuntu, Debian, etc.)

For native installation on Debian-series systems:

uv sync
uv venv
source .venv/bin/activate

System Requirements:

Debian-based Linux distribution (Ubuntu, Debian, Mint, etc.)
Python 3.12 or higher
APT package manager (for package analysis features)

Configuration

Environment Variables

OPENAI_API_KEY - Required for LLM-based upstream repository discovery
GITHUB_TOKEN - Primary GitHub token for API access

Collector Settings

Collector configuration options are available through CollectorSettings. See src/hsbriskevaluator/collector/settings.py for detailed configuration parameters.

from hsbriskevaluator.collector.settings import CollectorSettings

settings = CollectorSettings(github_tokens=["token1", "token2"])
repo_info = await collect_all(settings=settings, ...)

Evaluator Settings

Evaluator configuration parameters are defined in src/hsbriskevaluator/evaluator/settings.py. Configure risk analysis thresholds and weights as needed.

from hsbriskevaluator.evaluator.settings import EvaluatorSettings

evaluator_settings = EvaluatorSettings()
evaluator = HSBRiskEvaluator(repo_info, settings=evaluator_settings)

Quick Start

from hsbriskevaluator.evaluator import HSBRiskEvaluator
from hsbriskevaluator.collector.repo_info import RepoInfo
from hsbriskevaluator.collector import collect_all
from datetime import timedelta
import asyncio

# Load repository information (from collector)
repo_info = await collect_all(
    pkt_type='debian',
    pkt_name='xz-utils',
    repo_name='tukaani-project/xz',
)
# Create evaluator
evaluator = HSBRiskEvaluator(repo_info)

# Run evaluation
result = asyncio.run(evaluator.evaluate())

print(result)

Individual Evaluators

You can also use individual evaluators for specific assessments:

from hsbriskevaluator.evaluator import (
    CommunityEvaluator,
    PayloadEvaluator,
    DependencyEvaluator,
    CIEvaluator
)

# Community evaluation only
community_eval = CommunityEvaluator(repo_info)
community_result = asyncio.run(community_eval.evaluate())

# Payload evaluation only
payload_eval = PayloadEvaluator(repo_info)
payload_result = asyncio.run(payload_eval.evaluate())

# Dependency evaluation only
dependency_eval = DependencyEvaluator(repo_info)
dependency_result = asyncio.run(dependency_eval.evaluate())

CI_eval = CIEvaluator(repo_info)
CI_result = asyncio.run(CI_eval.evaluate())

Scripts for Batch Repository Information Collection

The scripts/ directory contains utilities for collecting repository information in batch processing workflows. These scripts work together in a specific order to gather comprehensive data about Debian packages and their upstream repositories:

1. Package List Generation

scripts/get_priority_packages.py
Extracts Debian packages with "required", "important", and "standard" priorities into a text file for further processing.

scripts/get_cloud_packages.py
Generates lists of cloud-related packages from various sources for specialized analysis workflows.

2. Package Information Collection

scripts/generate_packages_from_file.py
Reads a package list file and generates detailed package and dependency information using APT utilities. Creates packages.yaml and dependencies.yaml files with comprehensive package metadata.

scripts/convert_packages.py Convert the packages.yaml to another format that can be easier to read by the evaluator.

3. Upstream Repository Discovery

scripts/fetch_upstream_infos.py
Uses LLM-based analysis to discover and validate upstream Git repository URLs for packages. Updates package files with upstream repository information, which is essential for the next steps.

4. Metadata Generation

scripts/generate_package_metadata.py
Processes packages and dependencies to generate metadata about package relationships and sibling packages sharing the same upstream repository. Creates meta_data.yaml files for efficient package grouping.

5. Repository Data Collection

scripts/fetch_repo_infos.py
Fetches comprehensive repository information from GitHub for packages with upstream URLs. Collects commit history, contributor data, security information, and other repository metrics for risk analysis.

scripts/delete_symlinks.py Delete the symlinks and merge the siblings to one single yaml.

6. Repository Evaulate and Score Calculation

scripts/evaluate.py

scripts/calculate_score.py

GitHub Token Management

The scripts/fetch_repo_infos.pyt script requires GitHub API access and supports multiple tokens for improved rate limiting:

Create .github_tokens file in the project root with one token per line
Generate GitHub tokens with repository read permissions
The script automatically distributes API requests across available tokens
Rate limiting is handled automatically to maximize throughput

Example .github_tokens file:

ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
ghp_yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
ghp_zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

Benefits of multiple tokens:

Higher API rate limits (5,000 requests/hour per token)
Reduced processing time for large package sets
Automatic failover if a token becomes rate limited

Typical Workflow

# 1. Generate package list
python scripts/get_priority_packages.py

# 2. Generate package information
python scripts/generate_packages_from_file.py debian_priority_packages.txt -d data/debian

python scripts/convert_packages.py

# 3. Fetch upstream repository URLs
python scripts/fetch_upstream_infos.py -d data/debian

# 4. Generate package metadata
python scripts/generate_package_metadata.py -d data/debian

# 5. Collect repository information
python scripts/fetch_repo_infos.py -d data/debian

python scripts/delete_symlinks.py

# 6. Evaluate and score calculation
python scripts/evaluate.py

python scripts/calculate_score.py

All scripts support the --directory parameter to specify custom input/output directories and include comprehensive help documentation accessible via --help.

Evaluation Analysis

Each repository evaluation produces detailed analysis across these core aspects:

1. Community Quality Analysis

class CommunityEvalResult(BaseModel):
    stargazers_count: int                   # Number of stargazers of the repository
    watchers_count: int                     # Number of watchers of the repository
    forks_count: int                        # Number of forks of the repository
    community_users_count: int              # Number of users actively participating in the community
    direct_commits_ratio: float             # Ratio of direct commits in the main branch
    direct_commit_users_count: int          # Number of users with direct commit access
    maintainers_count: int                  # Number of maintainers with authority to merge pull requests or directly commit code
    pr_reviewers_count: int                # Number of active PR reviewers
    required_reviewers_distribution: Dict[int, float] # Distribution of reviewers required to approve a PR before merge
    estimated_prs_to_become_maintainer: float # Estimated number of PRs needed to become a maintainer
    estimated_prs_to_become_reviewer: float # Estimated number of PRs needed to become a reviewer
    prs_merged_without_discussion_ratio: float  # Ratio of PRs merged without discussion
    prs_with_inconsistent_description_ratio: float # Ratio of PRs with mismatched descriptions
    avg_participants_per_issue: float      # Average participants in issues
    avg_participants_per_pr: float         # Average participants in PRs
    community_activity_score: float         # Overall community engagement score (0.0-1.0) (just ignore it, do not have a good formula yet)

Key Indicators:

Few direct committers increases centralization risk
Low PR reviewer count suggests limited oversight
PRs merged without discussion may bypass scrutiny
Inconsistent PR descriptions could hide malicious changes

2. Payload Concealment Analysis

class PayloadHiddenEvalResult(BaseModel):
    allows_binary_test_files: bool         # Whether binary files are allowed as tests
    allows_binary_document_files: bool     # Whether binary files are allowed as document files
    allows_binary_code_files: bool         # Whether binary files are allowed as code files
    allows_binary_asset_files: bool        # Whether binary files are allowed as assets
    allows_other_binary_files: bool        # Whether binary files are allowed as other files
    binary_files_count: int                # Total binary files detected

3. Dependency Impact Analysis

class DependencyEvalResult(BaseModel):
    self_priority_required_count: int # Number of required packages corresponding to the repo"
    self_priority_important_count: int # Number of important packages corresponding to the repo"
    self_priority_standard_count: int # Number of standard packages corresponding to the repo"
    self_essential_count: int # Number of essential packages corresponding to the repo"
    dependency_priority_required_count: int # Number of required packages that contains the repo as a dependency"
    dependency_priority_important_count: int # Number of important packages that contains the repo as a dependency"
    dependency_priority_standard_count: int # Number of standard packages that contains the repo as a dependency"
    dependency_essential_count: int # Number of essential packages that contains the repo as a dependency"

4. CI Analysis

class CIEvalResult(BaseModel):
    has_dependabot: bool # Whether repository has Dependabot enabled"
    dangerous_token_permission_ratio: float # Ratio of workflows with dangerous token permissions"
    dangerous_action_provider_ratio: float # Ratio of workflows with dangerous action providers"
    dangerous_action_pin_ratio: float # Ratio of workflows with dangerous action pins"
    dangerous_trigger_ratio: float # Ratio of workflows with dangerous triggers"

Diff github repo with debian repo

from hsbriskevaluator.collector import GitHubRepoCollector
from hsbriskevaluator.utils.diff import Comparator
from hsbriskevaluator.utils.apt_utils import AptUtils
from hsbriskevaluator.utils.file import get_data_dir

github_collector = GitHubRepoCollector()
apt_utils = AptUtils()
comparator = Comparator(github_collector, apt_utils)

diff_result=comparator.clone_and_compare("xz-utils")

Example Analysis Output

ci:
  dangerous_action_pin_ratio: 0.1111111111111111
  dangerous_action_provider_ratio: 0.7777777777777778
  dangerous_token_permission_ratio: 0.0
  dangerous_trigger_ratio: 0.0
  has_dependabot: false
community_quality:
  approvers_count: 18
  avg_participants_per_issue: 3.4875
  avg_participants_per_pr: 1.8316831683168318
  community_activity_score: 0.637
  community_users_count: 238
  direct_commit_users_count: 10
  direct_commits_ratio: 0.95
  forks_count: 164
  maintainers_count: 10
  prs_merged_without_discussion_ratio: 0.5294117647058824
  prs_needed_to_become_approver:
    0: 11
    1: 2
    2: 4
    6: 1
  prs_needed_to_become_maintainer:
    0: 2
    1: 2
    2: 4
    5: 1
    6: 1
  prs_with_inconsistent_description_ratio: 0.0
  required_approvals_distribution:
    0: 0.868421052631579
    1: 0.10526315789473684
    4: 0.02631578947368421
  stargazers_count: 799
  watchers_count: 24
dependency:
  dependency_essential_count: 8
  dependency_priority_important_count: 13
  dependency_priority_required_count: 10
  dependency_priority_standard_count: 15
  self_essential_count: 0
  self_priority_important_count: 0
  self_priority_required_count: 0
  self_priority_standard_count: 1
payload_hidden_difficulty:
  allows_binary_assets_files: false
  allows_binary_code_files: false
  allows_binary_document_files: false
  allows_binary_test_files: true
  allows_other_binary_files: false
  binary_files_count: 95
pkt_name:
  - liblzma5
  - xz-utils
url: https://github.com/tukaani-project/xz

Troubleshooting

Common Issues

API Key Errors: Ensure your OpenRouter API key is correctly set in the .env file
Rate Limiting: Reduce max_concurrency if hitting API rate limits

Debug Mode

Enable debug logging for detailed evaluation information:

import logging
logging.basicConfig(
    level=logging.INFO,
)

Contributing

To extend the evaluator:

New Metrics: Add new evaluation metrics by extending the result models
Custom Evaluators: Create custom evaluators by inheriting from BaseEvaluator
Risk Algorithms: Modify risk calculation algorithms in the main evaluator
LLM Prompts: Improve LLM prompts for better semantic analysis

See the source code for implementation details and extension points.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
scripts		scripts
src/hsbriskevaluator		src/hsbriskevaluator
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
dev.sh		dev.sh
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

HSB Risk Evaluator

Citation

Overview

Installation

Option 1: Docker (Recommended)

Option 2: Native Installation

Configuration

Environment Variables

Collector Settings

Evaluator Settings

Quick Start

Individual Evaluators

Scripts for Batch Repository Information Collection

1. Package List Generation

2. Package Information Collection

3. Upstream Repository Discovery

4. Metadata Generation

5. Repository Data Collection

6. Repository Evaulate and Score Calculation

GitHub Token Management

Typical Workflow

Evaluation Analysis

1. Community Quality Analysis

2. Payload Concealment Analysis

3. Dependency Impact Analysis

4. CI Analysis

Diff github repo with debian repo

Example Analysis Output

Troubleshooting

Common Issues

Debug Mode

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages