Credit
Idea from @richardchen874-sys in Discussion #77 — Kompl v0.2.0. Maintainer will look at prioritizing in an upcoming sprint; open for anyone to pick up — comment on the issue if you're starting work so we don't duplicate effort.
Problem
Kompl v0.2.0 added a second compile/chat backend (DeepSeek V4 Pro alongside Gemini 2.5) and per-session model lock (compile_progress.compile_model stamped at finalize; getEffectiveCompileModel() for all session LLM calls). Settings still expose one global default: pick a model, run the whole pipeline on it.
That works as a workaround (e.g. switch to DeepSeek when Gemini truncates on dense PDFs, #7), but it's the wrong abstraction. The useful question isn't "which model is smarter." It's which backend reliably gets through extract → draft → match → crossref → commit without truncating, burning retries, or silently dropping sections.
Different providers may be better at different stages (dense extraction vs drafting vs triage). Users have no data to decide.
Proposed direction (not a spec)
Track provider behavior per pipeline stage, surfaced in a dashboard so users can make a data-informed provider choice instead of guessing from Settings.
Metrics to track (from the discussion):
| Metric |
Why it matters |
| Truncation rate |
Gemini structured-output pathology; json-repair salvage in nlp-service logs |
| Retry rate |
Hidden cost + latency |
| Cost per successfully compiled source |
Normalizes spend against actual output, not just raw token burn |
| Latency per pipeline step |
compile_progress already tracks per-step status; duration is partial today |
| Structured-output validity |
Parse/salvage failures vs clean completes |
| Failure recovery after partial extract |
Retry paths vs silent degradation |
| Stage-specific provider fit |
e.g. one provider for extract, another for draft |
Outcome: a dashboard that complements the existing daily spend cap in Settings. Today the default cap is $5/day, calibrated from heavy test usage while debugging the Gemini repetition/truncation bug (#7). With DeepSeek V4 Pro's discounted pricing becoming permanent, that cap is arguably overkill for most normal usage. Per-stage cost and reliability data would let users tune both provider and budget from evidence instead of a conservative default.
Automatic per-stage routing is out of scope for v1 of this issue. Start with observability and recommendations.
What exists today (starting points)
- Session model lock:
app/src/lib/db.ts (getSessionCompileModel, getEffectiveCompileModel)
- Per-step progress:
compile_progress steps + UI X/Y (v0.2.0)
- Provider routing:
getProviderForModel, nlp-service/services/llm_client.py, provider modules under nlp-service/services/providers/
- Daily spend cap:
nlp-service/routers/llm.py (GET /llm/usage), llm-config.json, Settings UI
- Truncation/salvage signals in nlp-service logs (
salvaged truncated response via json-repair)
Open questions (for whoever picks this up)
- Storage: extend SQLite (
activity / new table) vs derive from logs vs hybrid?
- Granularity: per-session rollup vs per-source vs per-LLM-call-site (12+ sites in
llm_client.py)?
- Dashboard placement: Settings panel, dedicated page, or compile-progress adjunct?
- Selection UX: recommendations only in v1, or per-stage defaults in Settings (larger scope)?
- Mixed sessions: same session, different stage needs. Align with per-session lock or revisit lock semantics?
- Cap interaction: should the dashboard suggest a daily cap based on observed spend, or stay read-only?
Suggested first slice (optional)
Smallest useful increment: persist per-stage success/failure + latency + provider + salvage flag for one compile session, expose via API, render a minimal table in Settings (provider × stage). Defer automatic routing and cap auto-tuning.
Files likely in scope
app/src/lib/db.ts — schema / helpers for metrics persistence
app/src/lib/compile/ — step runners (extract, draft, match, crossref, commit)
app/src/app/api/compile/* — HTTP shims that POST to nlp-service
nlp-service/services/llm_client.py — central LLM call sites, usage metadata, salvage paths
app/src/app/settings/page.tsx — dashboard UI + daily cap section (complementary placement)
Related
Credit
Idea from @richardchen874-sys in Discussion #77 — Kompl v0.2.0. Maintainer will look at prioritizing in an upcoming sprint; open for anyone to pick up — comment on the issue if you're starting work so we don't duplicate effort.
Problem
Kompl v0.2.0 added a second compile/chat backend (DeepSeek V4 Pro alongside Gemini 2.5) and per-session model lock (
compile_progress.compile_modelstamped at finalize;getEffectiveCompileModel()for all session LLM calls). Settings still expose one global default: pick a model, run the whole pipeline on it.That works as a workaround (e.g. switch to DeepSeek when Gemini truncates on dense PDFs, #7), but it's the wrong abstraction. The useful question isn't "which model is smarter." It's which backend reliably gets through
extract → draft → match → crossref → commitwithout truncating, burning retries, or silently dropping sections.Different providers may be better at different stages (dense extraction vs drafting vs triage). Users have no data to decide.
Proposed direction (not a spec)
Track provider behavior per pipeline stage, surfaced in a dashboard so users can make a data-informed provider choice instead of guessing from Settings.
Metrics to track (from the discussion):
compile_progressalready tracks per-step status; duration is partial todayOutcome: a dashboard that complements the existing daily spend cap in Settings. Today the default cap is $5/day, calibrated from heavy test usage while debugging the Gemini repetition/truncation bug (#7). With DeepSeek V4 Pro's discounted pricing becoming permanent, that cap is arguably overkill for most normal usage. Per-stage cost and reliability data would let users tune both provider and budget from evidence instead of a conservative default.
Automatic per-stage routing is out of scope for v1 of this issue. Start with observability and recommendations.
What exists today (starting points)
app/src/lib/db.ts(getSessionCompileModel,getEffectiveCompileModel)compile_progresssteps + UIX/Y(v0.2.0)getProviderForModel,nlp-service/services/llm_client.py, provider modules undernlp-service/services/providers/nlp-service/routers/llm.py(GET /llm/usage),llm-config.json, Settings UIsalvaged truncated response via json-repair)Open questions (for whoever picks this up)
activity/ new table) vs derive from logs vs hybrid?llm_client.py)?Suggested first slice (optional)
Smallest useful increment: persist per-stage success/failure + latency + provider + salvage flag for one compile session, expose via API, render a minimal table in Settings (provider × stage). Defer automatic routing and cap auto-tuning.
Files likely in scope
app/src/lib/db.ts— schema / helpers for metrics persistenceapp/src/lib/compile/— step runners (extract,draft,match,crossref,commit)app/src/app/api/compile/*— HTTP shims that POST to nlp-servicenlp-service/services/llm_client.py— central LLM call sites, usage metadata, salvage pathsapp/src/app/settings/page.tsx— dashboard UI + daily cap section (complementary placement)Related