Summary
In LLM mode against an OpenAI-compatible endpoint (e.g. Ollama via OPENAI_BASE_URL), the semantic pass crashes with a pydantic ValidationError: local instruct models return confidence on a 0–100 scale, but the LLM-output schemas validate it as a 0.0–1.0 float (Field(ge=0.0, le=1.0)). Combined with the abort-on-first-error behavior (#10), one out-of-range value takes down the entire LLM stage; only --no-llm static analysis survives.
Environment
- SkillSpector 2.2.3 (installed from
main, git+https://github.com/NVIDIA/skillspector.git)
- Python 3.12 (isolated venv), macOS / Apple Silicon
SKILLSPECTOR_PROVIDER=openai, OPENAI_BASE_URL=http://localhost:11434/v1, Ollama 0.30.8
- Reproduced with two different models:
qwen2.5:14b and gemma4:12b
What happens
With LLM mode on (no --no-llm):
pydantic_core.ValidationError: N validation errors for MetaAnalyzerResult
findings.0.confidence Input should be less than or equal to 1 [input=100]
...
Both models emit confidence: 100. The constraint isn't on a single schema: after relaxing the bound on MetaAnalyzerFinding, the identical crash reappears on LLMAnalysisResult (the per-analyzer schema) — i.e. it's systemic across the LLMAnalyzerBase output models, not a one-off. (Raw model speed is fine here; this is purely the value-range mismatch, separate from any timeout.)
Root cause
The LLM-output models constrain confidence to 0–1:
src/skillspector/llm_analyzer_base.py:67 — confidence: float = Field(ge=0.0, le=1.0, ...)
src/skillspector/nodes/meta_analyzer.py:66 — confidence: float = Field(ge=0.0, le=1.0, ...)
Instruct models commonly express confidence as a percentage (0–100). Frontier models on strict function-calling providers tend to stay in range, but models served over Ollama's OpenAI-compatible endpoint don't honor the numeric bound (constrained decoding enforces type/structure, not magnitude), so the value comes back as 100 and client-side pydantic validation rejects it.
How this differs from existing issues
Possible direction (untested)
Normalizing/clamping confidence before validation would resolve it — the existing @field_validator("overall_assessment", mode="before") in meta_analyzer.py is a natural precedent. It would need to cover every confidence-bearing LLM-output model (relaxing the bound on one just surfaced the same crash on the next). Rough shape, but you'll know the right form: float(v); if v > 1: v /= 100; then clamp to [0, 1].
Repro
ollama pull qwen2.5:14b
export SKILLSPECTOR_PROVIDER=openai OPENAI_BASE_URL=http://localhost:11434/v1 \
OPENAI_API_KEY=ollama SKILLSPECTOR_MODEL=qwen2.5:14b
skillspector scan ./tests/fixtures/malicious_skill # no --no-llm
# -> ValidationError: findings.0.confidence Input should be less than or equal to 1 [input=100]
Happy to open a PR if that's useful.
Summary
In LLM mode against an OpenAI-compatible endpoint (e.g. Ollama via
OPENAI_BASE_URL), the semantic pass crashes with a pydanticValidationError: local instruct models returnconfidenceon a 0–100 scale, but the LLM-output schemas validate it as a 0.0–1.0 float (Field(ge=0.0, le=1.0)). Combined with the abort-on-first-error behavior (#10), one out-of-range value takes down the entire LLM stage; only--no-llmstatic analysis survives.Environment
main,git+https://github.com/NVIDIA/skillspector.git)SKILLSPECTOR_PROVIDER=openai,OPENAI_BASE_URL=http://localhost:11434/v1, Ollama 0.30.8qwen2.5:14bandgemma4:12bWhat happens
With LLM mode on (no
--no-llm):Both models emit
confidence: 100. The constraint isn't on a single schema: after relaxing the bound onMetaAnalyzerFinding, the identical crash reappears onLLMAnalysisResult(the per-analyzer schema) — i.e. it's systemic across theLLMAnalyzerBaseoutput models, not a one-off. (Raw model speed is fine here; this is purely the value-range mismatch, separate from any timeout.)Root cause
The LLM-output models constrain confidence to 0–1:
src/skillspector/llm_analyzer_base.py:67—confidence: float = Field(ge=0.0, le=1.0, ...)src/skillspector/nodes/meta_analyzer.py:66—confidence: float = Field(ge=0.0, le=1.0, ...)Instruct models commonly express confidence as a percentage (0–100). Frontier models on strict function-calling providers tend to stay in range, but models served over Ollama's OpenAI-compatible endpoint don't honor the numeric bound (constrained decoding enforces type/structure, not magnitude), so the value comes back as
100and client-side pydantic validation rejects it.How this differs from existing issues
minimum/maximumJSON-Schema keywords, fixed by stripping them from the schema sent to the provider. This is the client-side counterpart: the schema is accepted, the model returns an out-of-range value, and SkillSpector's own pydantic validation crashes. Schema-keyword stripping does not address it.PydanticOutputParserwould still enforcele=1and crash.ValidationErroraborts the whole semantic pass.Possible direction (untested)
Normalizing/clamping confidence before validation would resolve it — the existing
@field_validator("overall_assessment", mode="before")inmeta_analyzer.pyis a natural precedent. It would need to cover every confidence-bearing LLM-output model (relaxing the bound on one just surfaced the same crash on the next). Rough shape, but you'll know the right form:float(v); if v > 1: v /= 100; then clamp to[0, 1].Repro
Happy to open a PR if that's useful.