feat: analysis improvements, gemma4 default model & schema promotion by jmlweb · Pull Request #68 · jmlweb/hyntx

jmlweb · 2026-01-30T13:04:12Z

Summary

This PR adds several improvements to make analysis more reliable, especially for small models, and migrates the default model from gemma3:4b to gemma4:e4b.

🚀 Gemma 4 as Default Model

Default model: gemma3:4b → gemma4:e4b (~5GB Q4, 128K context, native function calling)
Added gemma4:e2b (micro), gemma4:e4b (small), gemma4:31b (standard) to MODEL_STRATEGY_MAP
Promoted small strategy to full schema — models like gemma4:e4b, mistral:7b, llama3:8b now use the rich analysis schema (patterns, severity, before/after examples) instead of the simplified individual schema
Small strategy now processes up to 10 prompts per batch (was 1 with individual schema)
Fixed installCommand bug in model-suggester (was pointing to llama3.2 instead of default model)
Updated BATCH_STRATEGIES descriptions from size-based to capability-based

🎯 Gold Standard Benchmark

50 curated prompts with human-rated quality scores
Covers all tiers: excellent (10), good (15), fair (15), poor (10)
Includes correlation calculation for model accuracy measurement
Used for calibrating scores across different providers

✅ Semantic Validation

Validates that scores correlate with issue counts
Detects when examples are not found in original prompts
Auto-corrects results when validation fails
Prevents logically inconsistent outputs

🔄 Temperature Fallback Retry

When JSON parsing fails, retries with lower temperatures
Sequence: 0.3 → 0.1 → 0.0
More deterministic outputs reduce parse failures

📝 Enhanced SYSTEM_PROMPT_MINIMAL

More contrastive examples showing score progression
Clear examples for each tier (POOR, FAIR, GOOD, EXCELLENT)
Better calibrated scoring guidelines

Test plan

All 1678 tests passing (54 test files)
Build succeeds
detectBatchStrategy('gemma4:e4b') → 'small'
Small strategy selects full schema (not individual)
Manual validation with gemma4:e4b on real prompts (when model is available in Ollama)

🤖 Generated with Claude Code

…ompts - Add gold-standard benchmark with 50 curated prompts for calibration - Add semantic validator to detect score/issue inconsistencies - Implement temperature fallback retry (0.3 -> 0.1 -> 0.0) for Ollama - Enhance SYSTEM_PROMPT_MINIMAL with more contrastive examples - Auto-correct results when semantic validation fails This improves analysis reliability especially for small models.

- Fix @typescript-eslint/restrict-template-expressions by converting numbers to strings - Fix @typescript-eslint/no-unnecessary-condition by removing redundant checks - Fix test expectation to match actual error message - Sort imports in benchmark/index.ts

- Document extractRealExamples() heuristic matching logic - Identify category ID inconsistency between base.ts and schemas.ts - Recommend unifying category mappings - Note that individual mode already extracts real examples from AI - All 50 aggregator tests passing

- undici: >=7.24.0 (CRLF injection, unbounded memory) - hono: >=4.12.4 - @hono/node-server: >=1.19.10 - @modelcontextprotocol/sdk: >=1.26.0 - @isaacs/brace-expansion: >=5.0.1 - minimatch: >=10.2.3 - rollup: >=4.59.0 - flatted: >=3.4.2 - ajv: >=8.18.0 - qs: >=6.14.2 Resolves 29 vulnerabilities (2 low, 8 moderate, 19 high) → 0

…patibility) ajv override forced >=8.18.0 but @eslint/eslintrc requires ajv v6. ESLint's ajv@6.x is already patched (>=6.14.0), so no override needed. All other overrides retained. Result: 0 vulnerabilities + lint passing.

## [3.0.2](v3.0.1...v3.0.2) (2026-03-22) ### Bug Fixes * **security:** remove ajv override that broke ESLint (ajv v6/v8 incompatibility) ([88aac5c](88aac5c)) * update dependency overrides to resolve security vulnerabilities ([685befa](685befa)) ### Documentation * add quality assessment report ([722de83](722de83))

…egy to full schema Gemma 4 E4B offers 128K context, native function calling, and configurable thinking modes — enabling richer analysis with the full schema that was previously reserved for large (>7GB) models. Key changes: - Default model: gemma3:4b → gemma4:e4b (small strategy, ~5GB Q4) - Add gemma4:e2b (micro) and gemma4:31b (standard) to MODEL_STRATEGY_MAP - Promote small strategy from individual to full schema (10 prompts/batch) - Fix installCommand bug in model-suggester (was pointing to llama3.2) - Update BATCH_STRATEGIES descriptions to capability-based Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…dator - Sort imports per simple-import-sort rule - Convert numeric template expressions to String() - Remove unnecessary conditionals on always-truthy result.patterns - Fix metadata possibly-undefined with type assertion (ID is verified via matchedId) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Update semantic-release 25.0.2 → 25.0.3 - Add overrides for handlebars (>=4.7.9), picomatch (>=4.0.4), path-to-regexp (>=8.4.0), vite (>=7.3.2), brace-expansion (>=5.0.5), yaml (>=2.8.3) - Bump existing lodash/lodash-es overrides to >=4.18.0 - Resolves all pnpm audit vulnerabilities Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…aging Each batch reports how many prompts exhibit a pattern locally. When merging batches, these counts must be summed to reflect the true global frequency. The previous average produced artificially low frequencies (e.g. frequency=1 for patterns appearing across all batches). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Short user messages like "si", "ok", "ya" that are responses to assistant questions are now detected by walking the parentUuid chain in Claude Code logs. If the nearest ancestor assistant message ends with "?", the prompt is marked as a confirmation and excluded from analysis. - Add parseLogEntry() for lightweight extraction of uuid/parentUuid/content - Add isConfirmationMessage() with max 5-hop chain traversal - Two-pass readJsonlFile: build index first, then extract with detection - Filter confirmations in CLI before sending to AI provider - 13 new tests covering detection, edge cases, and integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jmlweb and others added 10 commits January 30, 2026 14:03

fix: update ollama tests for temperature retry behavior

13ac8f2

feat: analysis-improvements

b77a600

fix: test

59a70d3

jmlweb changed the title ~~feat: benchmark calibration, semantic validation & improved prompts~~ feat: analysis improvements, gemma4 default model & schema promotion Apr 6, 2026

jmlweb and others added 5 commits April 7, 2026 01:00

fix(validator): use guard clause for metadata instead of type assertion

b10f1ab

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: analysis improvements, gemma4 default model & schema promotion#68

feat: analysis improvements, gemma4 default model & schema promotion#68
jmlweb wants to merge 15 commits intomainfrom
feat/analysis-improvements

jmlweb commented Jan 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jmlweb commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

🚀 Gemma 4 as Default Model

🎯 Gold Standard Benchmark

✅ Semantic Validation

🔄 Temperature Fallback Retry

📝 Enhanced SYSTEM_PROMPT_MINIMAL

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jmlweb commented Jan 30, 2026 •

edited

Loading