feat: analysis improvements, gemma4 default model & schema promotion#68
Open
feat: analysis improvements, gemma4 default model & schema promotion#68
Conversation
…ompts - Add gold-standard benchmark with 50 curated prompts for calibration - Add semantic validator to detect score/issue inconsistencies - Implement temperature fallback retry (0.3 -> 0.1 -> 0.0) for Ollama - Enhance SYSTEM_PROMPT_MINIMAL with more contrastive examples - Auto-correct results when semantic validation fails This improves analysis reliability especially for small models.
- Fix @typescript-eslint/restrict-template-expressions by converting numbers to strings - Fix @typescript-eslint/no-unnecessary-condition by removing redundant checks - Fix test expectation to match actual error message - Sort imports in benchmark/index.ts
- Document extractRealExamples() heuristic matching logic - Identify category ID inconsistency between base.ts and schemas.ts - Recommend unifying category mappings - Note that individual mode already extracts real examples from AI - All 50 aggregator tests passing
- undici: >=7.24.0 (CRLF injection, unbounded memory) - hono: >=4.12.4 - @hono/node-server: >=1.19.10 - @modelcontextprotocol/sdk: >=1.26.0 - @isaacs/brace-expansion: >=5.0.1 - minimatch: >=10.2.3 - rollup: >=4.59.0 - flatted: >=3.4.2 - ajv: >=8.18.0 - qs: >=6.14.2 Resolves 29 vulnerabilities (2 low, 8 moderate, 19 high) → 0
…patibility) ajv override forced >=8.18.0 but @eslint/eslintrc requires ajv v6. ESLint's ajv@6.x is already patched (>=6.14.0), so no override needed. All other overrides retained. Result: 0 vulnerabilities + lint passing.
## [3.0.2](v3.0.1...v3.0.2) (2026-03-22) ### Bug Fixes * **security:** remove ajv override that broke ESLint (ajv v6/v8 incompatibility) ([88aac5c](88aac5c)) * update dependency overrides to resolve security vulnerabilities ([685befa](685befa)) ### Documentation * add quality assessment report ([722de83](722de83))
…egy to full schema Gemma 4 E4B offers 128K context, native function calling, and configurable thinking modes — enabling richer analysis with the full schema that was previously reserved for large (>7GB) models. Key changes: - Default model: gemma3:4b → gemma4:e4b (small strategy, ~5GB Q4) - Add gemma4:e2b (micro) and gemma4:31b (standard) to MODEL_STRATEGY_MAP - Promote small strategy from individual to full schema (10 prompts/batch) - Fix installCommand bug in model-suggester (was pointing to llama3.2) - Update BATCH_STRATEGIES descriptions to capability-based Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dator - Sort imports per simple-import-sort rule - Convert numeric template expressions to String() - Remove unnecessary conditionals on always-truthy result.patterns - Fix metadata possibly-undefined with type assertion (ID is verified via matchedId) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update semantic-release 25.0.2 → 25.0.3 - Add overrides for handlebars (>=4.7.9), picomatch (>=4.0.4), path-to-regexp (>=8.4.0), vite (>=7.3.2), brace-expansion (>=5.0.5), yaml (>=2.8.3) - Bump existing lodash/lodash-es overrides to >=4.18.0 - Resolves all pnpm audit vulnerabilities Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…aging Each batch reports how many prompts exhibit a pattern locally. When merging batches, these counts must be summed to reflect the true global frequency. The previous average produced artificially low frequencies (e.g. frequency=1 for patterns appearing across all batches). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Short user messages like "si", "ok", "ya" that are responses to assistant questions are now detected by walking the parentUuid chain in Claude Code logs. If the nearest ancestor assistant message ends with "?", the prompt is marked as a confirmation and excluded from analysis. - Add parseLogEntry() for lightweight extraction of uuid/parentUuid/content - Add isConfirmationMessage() with max 5-hop chain traversal - Two-pass readJsonlFile: build index first, then extract with detection - Filter confirmations in CLI before sending to AI provider - 13 new tests covering detection, edge cases, and integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds several improvements to make analysis more reliable, especially for small models, and migrates the default model from gemma3:4b to gemma4:e4b.
🚀 Gemma 4 as Default Model
gemma3:4b→gemma4:e4b(~5GB Q4, 128K context, native function calling)gemma4:e2b(micro),gemma4:e4b(small),gemma4:31b(standard) to MODEL_STRATEGY_MAPsmallstrategy tofullschema — models like gemma4:e4b, mistral:7b, llama3:8b now use the rich analysis schema (patterns, severity, before/after examples) instead of the simplified individual schemainstallCommandbug in model-suggester (was pointing tollama3.2instead of default model)🎯 Gold Standard Benchmark
✅ Semantic Validation
🔄 Temperature Fallback Retry
📝 Enhanced SYSTEM_PROMPT_MINIMAL
Test plan
detectBatchStrategy('gemma4:e4b')→'small'fullschema (notindividual)🤖 Generated with Claude Code