Skip to content

Feature: multilingual batch scanner with parallel execution and LLM gap-fill #98

@WhereIs38

Description

@WhereIs38

Problem

  • skillspector scan handles one skill per invocation. Users auditing repositories with hundreds of skills must script their own serial loop — no native parallel execution exists.
  • Static detection rules rely on English keywords. Non-English skills (zh/ja/ko) lose coverage on semantic security vulnerabilities that lack equivalent LLM-based analyzers.
  • No multi-API-key management for concurrent LLM calls.

Proposed Solution

A new contrib/multilingual/ module — zero changes to src/skillspector/:

  1. Batch CLI: python -m contrib.multilingual.batch_scan ./skills/ --workers 7
  2. ThreadPoolExecutor parallel execution with configurable workers
  3. Unicode script-ratio language detection, extending support to Chinese, Japanese, and Korean
  4. Targeted LLM gap-fill for 8 rules with no semantic-analyzer equivalent (P5, P6-P8, MP1-MP3, RA1-RA2)
  5. Aggregated terminal / JSON / Markdown reports
  6. Multi-key API pool with rate-limit backoff

Evidence (23 built-in fixtures, 8 workers)

Skill --no-llm LLM mode
ssd1_semantic_injection 0/100 100/100
ssd3_nl_exfiltration 0/100 60/100
ssd4_narrative_deception 10/100 100/100
sdi4_divergence 13/100 100/100
safe_skill 0/100 0/100 ✓
ssd_clean 0/100 0/100 ✓

Code ready at https://github.com/WhereIs38/SkillSpector/tree/main/contrib/multilingual
Happy to open a PR from a feature branch once this issue is acked.

README.md

DESIGN.md

CONTRIBUTING.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions