feat: add multilingual batch scanner with parallel execution and LLM gap-fill by WhereIs38 · Pull Request #100 · NVIDIA/SkillSpector

WhereIs38 · 2026-06-18T21:07:04Z

Closes #98

Summary

Adds contrib/multilingual/ — a multilingual batch scanner that scans directories of AI agent skills in parallel, with automatic language detection and targeted LLM gap-fill for non-English skills.

Zero changes to src/skillspector/. All integration is via import-time patches that wrap upstream constructors without modifying any source file.

What It Does

Discovery — recursively finds all SKILL.md directories under input root
Language detection — Unicode script-ratio heuristic, extending support to Chinese, Japanese, and Korean
Parallel scan — ThreadPoolExecutor runs graph.invoke() per skill, configurable --workers
Gap-fill — targeted LLM pass for 8 rules with no semantic-analyzer equivalent (P5, P6-P8, MP1-MP3, RA1-RA2)
Aggregated report — terminal / JSON / Markdown, sorted by risk score
Multi-key API pool — rate-limit-aware scheduler with exponential backoff

Evidence (23 built-in fixtures, 8 workers)

Skill	`--no-llm`	LLM mode
`ssd1_semantic_injection`	0/100	100/100
`ssd3_nl_exfiltration`	0/100	60/100
`ssd4_narrative_deception`	10/100	100/100
`sdi4_divergence`	13/100	100/100
`safe_skill`	0/100	0/100 ✓
`ssd_clean`	0/100	0/100 ✓

LLM semantic analyzers catch entire vulnerability categories invisible to static patterns. Clean skills remain clean — zero false-positive inflation.

Testing

Manual verification against tests/fixtures/ confirms 23/23 skills scanned, clean skills remain clean, semantic analyzers catch what static patterns miss. Cross-platform validated on macOS and Windows. make lint passes on the upstream
codebase.

Automated tests are impractical for LLM-dependent output — it is inherently non-deterministic and requires live API keys. The static-vs-LLM comparison in README provides more meaningful evidence than any mock-based test could.

Compatibility Note

If upstream adds a native response_schema=None mode in the future, all patches become no-ops and can be safely removed.

🤖 Generated with Claude Code

Signed-off-by: WhereIs38 CinderellaDoyle@icloud.com
README.md
DESIGN.md
CONTRIBUTING.md

…tion audit

…mentation

batch_scan.py main(): reconfigure stdout to UTF-8 on win32 so Rich terminal output with CJK characters renders correctly. Co-Authored-By: Claude <noreply@anthropic.com>

fix: add Windows Unicode stdout support for CJK output

…on criteria

…gap-fill Signed-off-by: WhereIs38 <CinderellaDoyle@icloud.com>

WhereIs38 and others added 10 commits June 18, 2026 23:55

add contrib multilingual batch scanner

5fd7eb0

fix: resolve LLM race condition, JSON parsing, and connection timeout

266bba0

fix: suppress asyncio noise, sanitize meta-analyzer output quirks

1427795

docs: organize documentation, translate to English, add NVIDIA conven…

809a8d8

…tion audit

fix: add SPDX headers, cross-platform cleanup, and comprehensive docu…

7780d28

…mentation

fix: add Windows Unicode stdout support for CJK output

e47d105

batch_scan.py main(): reconfigure stdout to UTF-8 on win32 so Rich terminal output with CJK characters renders correctly. Co-Authored-By: Claude <noreply@anthropic.com>

Merge pull request #1 from nanzhijin/main

e0f4ab9

fix: add Windows Unicode stdout support for CJK output

docs: add CONTRIBUTING guide, rejected alternatives, gap-fill selecti…

eb1f37e

…on criteria

docs: reorganize into core guides and process archive

51c3ba6

feat: add multilingual batch scanner with parallel execution and LLM …

3d52b45

…gap-fill Signed-off-by: WhereIs38 <CinderellaDoyle@icloud.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add multilingual batch scanner with parallel execution and LLM gap-fill#100

feat: add multilingual batch scanner with parallel execution and LLM gap-fill#100
WhereIs38 wants to merge 10 commits into
NVIDIA:mainfrom
WhereIs38:feature/multilingual-batch-scanner

WhereIs38 commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

WhereIs38 commented Jun 18, 2026

Summary

What It Does

Evidence (23 built-in fixtures, 8 workers)

Testing

Compatibility Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants