Release v1.21.0 by MBombeck · Pull Request #363 · MBombeck/HealthLog

MBombeck · 2026-06-26T12:04:13Z

Feature release. The Coach reaches every data domain on demand, surfaces the cross-metric patterns the analytics tier discovers, opens in context from any screen, and speaks on one shared set of safety thresholds. Dates render in a chosen format everywhere. Two additive migrations (0192 date-format, 0193 rollup x-rescale); no breaking changes.

Highlights

Date-format preference (auto / DMY / MDY / ISO) honoured across the app, including every date and date-time field.
Coach reach + intelligence: full-domain on-demand retrieval incl. cycle & workouts; a get_correlations tool surfacing the FDR lagged drivers (now incl. medication-adherence ↔ symptoms); illness recovery scores in the Coach; context-scoped launch from metric pages and insight cards; one-snapshot-per-turn + a third reasoning round.
Voice: warmer, forward-looking outlooks, confidence-checked single-step actions, an acute red-flag clause; deterministic fallbacks now grounded.
Single safety source: critical BP / fever / glucose thresholds from one module across hero, Coach, cards, notifications.
Learn: "Learn more" pointers on vitals/glucose/resilience/lab surfaces; the Coach can no longer surface a made-up link.
Correctness/perf: rollup x-rescale un-quarantines the regression parity test; provider-aware AI budget (ChatGPT/OpenAI users no longer hit a spurious daily limit); Withings/med-intake N+1 fixes; calendar-adjacency + tz fixes in the illness/coincident logic.

Full detail in CHANGELOG.md. 11452 unit tests green; holistic wiring/cohesion/correctness verification passed.

Mirror the time-format preference with a per-user date display choice. AUTO follows the locale convention, DMY/MDY/YMD pin the field order. Additive, rerun-safe migration; existing rows read AUTO.

Add a date-order resolver + formatter mirroring the hour-cycle path: AUTO defers to the active locale, DMY/MDY/YMD pin the field order through a canonical Intl locale. Thread it through makeFormatters and expose useDateFormatPreference from the i18n context, backed by the same localStorage mirror as the time-format preference.

Accept dateFormat on the profile PATCH field-by-field, echo it from the profile + /me reads, mirror it into the client auth user, and add the field to the OpenAPI request/response schemas.

Surface a Datumsformat select below Stundenformat with the four preferences, in all six locales.

Dependency-free date input that displays the value in the user's date-format preference while editing rides the native picker behind a calendar button. The committed value stays ISO yyyy-MM-dd, so it is a drop-in for DateInput. Height/target-size parity, disabled/min/max/ placeholder, and progressive typed entry parsed against the order.

The shared contracts cover chronic deferral (dose, diagnosis, drug-level to a clinician) but had no acute branch. Add a closed-list crisis clause — chest pain, syncope, sudden severe symptoms, hypertensive-crisis readings, suicidal ideation — that points to prompt or emergency care without diagnosing, and keep every other turn calm and non-alarmist. It rides the single shared-contract source so it lands on the Coach, the per-metric status cards, the comprehensive briefing, and the period narrative; the coverage test asserts it on all four surfaces.

Sharpen the Coach's behaviour-change reflexes from the coaching literature, prompt-only: - Connect the signals on a why/pattern question — consult correlations and link them descriptively ("after short-sleep nights your next-morning HRV tends to read lower") instead of reading metrics in isolation, always an association worth a small experiment, never a cause. - Confidence ruler on action turns — after naming one small step, ask how doable it feels 0-10 and shrink it when that lands low, keeping the choice the user's. - Three-beat shape on data-review turns — finding, likely driver, one small step; name the dominant contributor when citing a derived band. - Extend the MI micro-moves with developing discrepancy and rolling with resistance, and add an anti-persuasion check: any suggested change must serve the user's own stated goal. One bilingual tone-calibration example added per locale.

Swap the medication scheduling, inventory, illness, vorsorge, lab-OCR and profile-birthdate date inputs from the native field to the DateField primitive, which paints the chosen DMY/MDY/YMD/AUTO order over an ISO yyyy-MM-dd value. The committed value contract, min/max bounds and aria wiring are unchanged.

…ion wizard The course-window date field now keeps its ISO value on a hidden native input (data-slot) and edits through a text overlay (data-testid). Fill the overlay, then assert the committed ISO value on the hidden input.

…n numbers The no-key and timeout fallback text carried zero reference to the user's own readings and, on a provider timeout, rendered as hasProvider:true — indistinguishable from a fresh assessment. Compose each fallback from the per-metric signal the card already builds: name the current value, place it against the user's own baseline, and close with one plain-language pointer, in the same warm voice as the model surfaces. Degrade to the prior generic tip only when a metric has no usable history. The timeout envelope now reports hasProvider:false so the UI surfaces it as the computed summary it is rather than mislabelling it as provider prose.

…nded step The per-score deterministic text named the score, its standing, and the weakest contributor, then stopped at the diagnosis. When the band is not green and the weakest driver is behaviourally addressable (sleep, mood, consistency, timing), append a single grounded pointer drawn from that same contributor. Physiology-only drivers add nothing, so the text affirms and watches rather than manufacturing a step.

The retrospective narrative was the one model surface that omitted the shared tone contract, so it read colder than the daily briefing beside it. Compose the tone fragment into both prompts and warm the hand-written tone line to name a genuine win when the period earns it, while keeping the descriptive-never-causal and no-alarm guards intact. The cross-surface coverage test now asserts the contract reaches this surface too.

The retrieval-tool DATA INVENTORY was built against the user's default narration cluster (cardio/body/mood/medication), so sleep, glucose, activity, body composition, vascular, SpO2, gait and environment series reported absent even when the data existed — and the grounding rule then told the model not to fetch them. Probe presence against the full source set instead, and generate the metric-series inventory rows from the complete source-to-section map so every series the user has rows for is advertised. Per-tool reads still re-scope to their own domain, so the wider probe never widens a figure read. Add three retrieval tools whose snapshot blocks already existed but were unreachable in tool mode: - get_workouts: recent sessions + per-sport rollup - get_cycle: phase / prediction / correlation (gated by cycle access) - get_correlations: the FDR-controlled lagged cross-metric drivers plus the coincident-deviation flag, surfaced descriptively Also honour the caller's window in get_labs and get_illness_recovery so a cross-metric answer no longer silently mixes horizons.

The Daily Briefing strips any number absent from its server-computed figures; the Coach's tool path had no equivalent, so a transcription or paraphrase drift (a tool returns systolic 128, the reply says ~138) could ship. Collect the numeric leaves from every present tool result this turn, extract the numbers the reply asserts, and soft-correct any that match no fetched figure to a neutral placeholder — annotated, non-blocking, and a no-op on a qualitative turn or the no-tools path where there is nothing to grade against. The bounded retrieval loop now returns the present results' payloads so the check has an authoritative figure set.

The multi-vital coincident-deviation flag (two or more vitals outside their usual band on the same day, with the illness-explained reframe) already reached the period narrative but never the Coach. Attach it, fired-only, to the derived block so both Coach paths can narrate it: the tool path via the recovery/correlations tools and the no-tools snapshot floor. A quiet day adds no entry, keeping the snapshot noise-free.

Make the launch context's metric scope live instead of discarding it. A conversation opened from a metric surface now narrows its snapshot to that metric with a data-aware seed question, and the global FAB inherits the page's ambient scope so drilling into a metric and tapping it no longer opens a blank chat. The scope threads through the drawer into the first turn of a fresh conversation.

Add a discreet, on-brand action to the high-value cards (briefing, recommendation, correlation, status/assessment, health score, period narrative) that opens the Coach pre-scoped to the card topic with a seeded question. The action self-gates on the operator flag and the per-user opt-out, never tints the card, and surfaces once per card.

# Conflicts: # src/lib/ai/prompts/__tests__/shared-contracts-coverage.test.ts # src/lib/insights/narrative/period-narrative-generate.ts

The budget-exceeded copy named "00:00 UTC", which reads as a wrong local clock for any non-UTC user (Berlin midnight UTC is 01:00/02:00 local) and drove confused "the limit resets at the wrong time" reports. Reword dailyLimitBody across all six locales to "midnight UTC", matching the existing errorBudget wording, so the figure is honest regardless of the reader's timezone.

The daily AI-token gate was a flat 25,000/day for every provider. That cap exists to bound the OPERATOR's API bill, but a ChatGPT-OAuth (Codex) or BYOK chain egresses on the USER's own plan/key and costs the operator nothing, so gating it on the operator-cost ceiling is wrong. A single gpt-5.x reasoning turn legitimately reports 20k-40k gross total_tokens (re-sent system prompt + inventory + tool defs per round + hidden reasoning, summed across rounds), so one or two turns exhausted the flat cap and locked the user out of a plan they pay for. Classify the chain's cost owner by its primary provider via resolveDailyCap: an admin-openai primary (operator pays) keeps the operator-cost cap; every user-egress primary (codex/openai/anthropic/local) gets a generous abuse-only ceiling. Threaded into the Coach chat gate and both OCR-extract modes. Raise the operator-cost cap 25,000 -> 200,000 so the operator-key path also survives a normal day of reasoning turns, keeping gross-token accounting rather than reworking the reconcile math. Subtract cached input tokens at reconcile: the Responses-API gross total still includes prompt-cached input the user did not re-pay for. The codex client already parsed cached_tokens; sum it through the tool loop and bill total_tokens minus cached.

The hypertensive-crisis floors were defined in three modules with a divergent diastolic value: the notification engine and the Coach acute clause used 180/120, but the dashboard hero used 180/110. A reading like 170/112 lit the red critical-BP banner on the hero yet never tripped the notification alarm or matched the Coach's stated acute number — one surface said crisis, the other two stayed calm on the same row. Promote the absolute floors into a dependency-free clinical-floors leaf and have the hero, the safety-floor engine, the status registry's fever band, and the illness escalation all import from it. The crisis diastolic floor is the guideline-correct 120 (ACC/AHA hypertensive urgency); the wider 110 hero net is dropped so the surfaces tell one story. A coverage test pins that every consumer resolves the same constants.

The coincident-deviation read grouped the latest day with a UTC date slice while the sibling readiness read already used the tz-aware day key. For a user east or west of UTC, a late-evening or early-morning reading landed on the wrong calendar day, so a fired flag could compare a vital from the wrong day against its band and narrate '2 or more vitals out of band today' on the wrong day. Thread the user's timezone into the latest-day read and mint the day key with userDayKey, matching readiness. The Coach derived-snapshot and the correlations reader pass the real account timezone through. Adds a regression covering a UTC+2 user whose late-night reading rolls to the next local day.

…dence Significance alone surfaced trivial drivers: a large-n pair with r=0.16 narrated as a confident "tends to go with" link while explaining ~2.5% of variance. Add three gates before a pair is ranked as a driver: - exclude same-metric-family lagged pairs (mood->mood, BP component self-lag) as serial-autocorrelation tautologies, widening the existing exact self-pair skip to the whole family. - shrink each Pearson estimate toward null by n/(n+10) so a thin-data correlation cannot out-rank a deep one on an inflated point estimate. - floor the shrunk effect at 0.2 (drop below) and bind 0.2-0.3 to a hedged "faint hint" phrasing tier, 0.3+ to confident phrasing scaled by sample depth. The reported r and p stay the honest raw statistics. Rank by shrunk effect magnitude, q as tie-break.

The slot-dedup pass ran a findFirst (latest takenAt) plus a findMany (all live rows) per medication — 2N round-trips, 40 for a 20-med user. Pull every live intake row for the user's medication set in one ordered findMany, group in memory by medicationId, and derive each medication's lastIntakeAt (max non-null takenAt) from the same grouped slice the findFirst used to scan. The read cost is now one query regardless of the medication count. Behaviour is unchanged: the grouped rows arrive in the same scheduledFor-asc order the heal and snap passes depend on, and the derived lastIntakeAt matches the old desc-ordered findFirst.

Both lastStableReturn and computeSymptomReturn counted the in-band settle run by array-index adjacency instead of calendar days, so three sparse logs spread over weeks could register a stable return and move the headline recovery gap. Extend the run only while the prior point is in-band and the calendar day immediately before, matching the runFlag rule, bounding the run's span to its logged-day count.

The SpO2 sustained-low scan passed episodeDays straight to runFlag, relying on the reader to deliver them in day order, while the fever path already sorts its unioned series. The run scan is calendar-consecutive, so feed it chronological input by construction: sort the filtered SpO2 points before the scan, matching the fever path and removing the latent ordering coupling.

Promote the static /learn guide catalog into a typed concept-to-slug lookup the deterministic UI surfaces can consume. learnUrl is the only sanctioned /learn URL builder and learnLinkForMetric fails closed for an unmapped id. A test asserts every mapped slug resolves in the catalog, so the mapping can never point at a guide that does not exist.

A small, calm anchor that links a concept out to its public /learn guide. Fail-closed: the href is registry-backed (a closed-set value, never user input) and an unmapped concept renders nothing. Plain anchor, no markdown. Adds a common.learnMore label across all locales.

Add a single discreet LearnMoreLink to the per-vital baseline tiles (via a new footer slot on the shared tile), the glucose clinical panel, the resilience tile, and the lab biomarker detail. Each pointer is registry-resolved and renders nothing when the concept has no mapped guide, so only surfaces with a real article show one.

…ntract The acute red-flag clause hardcoded its blood-pressure numbers as prose literals and named no glucose or fever floor at all, so the Coach could silently drift from the dashboard hero and the notification engine. Compose the systolic/diastolic, glucose, and sustained-fever thresholds straight from clinical-floors.ts and echo the previously-missing glucose and fever lines, so every acute number the Coach states is bound to the one source of truth. Add an outlook contract (gentle forecast, what-to-expect, anticipatory if-then — all conditional, ranged, association-framed) composed beside the tone contract on the Coach and the comprehensive briefing, so a reply can sharpen expectations without a false promise. Tighten the tone contract against over-validation (affirm the effort, not an unsafe choice; stay neutral when there is nothing to praise) and route why/conflict questions through get_correlations and its coincident-deviation flag before answering. Coverage asserts the clause numbers equal the constants and the new contract reaches both surfaces.

…cumulators The per-bucket OLS accumulators stored x on the raw epoch-day axis (x ~ 20540), so sum_xx accumulated past ~1e10 and the square shed ~10 of a double's significant digits before the value was ever stored. The sub-day x detail was lost at accumulation time, which no read-side identity can recover; an ill-conditioned window's composed slope/r2 then drifted from the live REGR_* probe past the parity gauge. Rebase x to a fixed origin (epoch-days of 2020-01-01) at write time so the squared terms stay O(1e7) and exact. Slope, r2, and population sd are invariant under the affine x-shift, so the live raw-epoch probe still parity-matches the cross-bucket compose; composeRegression needs no formula change. The per-bucket slope/r2 columns stay on the raw axis (shift-invariant). Migration 0193 recomputes sum_x/sum_xy/sum_xx from measurements on the rebased basis, one set-based update per granularity. Recompute-from-source is idempotent and rerun-safe; it touches only rows that already carry the accumulators and leaves pre-migration NULLs for the boot re-fold. Un-quarantine the DST-boundary parity test: it now matches live REGR_* within the existing 1e-9 relative bound, with the WEIGHT case still bit-identical.

…test

The tool-mode path dropped the launch scope's sources: the inventory probes the full source set, so a Coach opened from a metric page saw the whole inventory and could roam, while only the no-tools path honoured the narrowing. Inject a one-line FOCUS hint naming the launched domain(s) into the tool-mode user turn so the model prioritises that metric and fetches its figures first, with every other domain still reachable if the question leads there. Empty on a generic open, so the prompt prefix is unchanged on that path.

…three Each retrieval tool re-scoped the snapshot to a single source, so its cache key never matched the inventory's full-source build — a four-tool turn paid four extra full snapshot builds on top of the inventory's. Thread the inventory's probe scope to every tool so a per-tool read lands the one cache entry the inventory already primed, collapsing the turn to a single build (a window override still gets its own correct build). Raise the loop's round cap from two to three so a sequential cross-metric why-chain isn't starved; the per-round token budget is unchanged and the absolute ceiling tracks one above so the final round still forces prose.

The illness-recovery tool and the illness context block carried only labels, lifecycle, and dates plus the recovery composite — never the recovery-gap, the metric that drove it, the nadir, the pre-onset deviations, or the red flags the illness card renders. So a recovery question got the composite, not the gap the user sees, and a sustained-fever or low-SpO2 escalation was invisible in-conversation. Run the existing episode-correlation read-layer (the same engine the card and the red-flag notifier use) for the most relevant episode — active first, else the most-recently-resolved — and attach the computed scores to get_illness_recovery, coverage-gated so a thin signal yields nothing rather than a fabricated number. Document the illnessScores shape in the Coach prompt so it restates those numbers verbatim and escalates a red flag rather than reassuring.

The Coach may point at a published /learn guide, but only a slug in the catalog. That was a prompt instruction with no enforcement — a model could hallucinate /learn/lower-your-cortisol and ship a dead link. Add a deterministic post-filter on the assembled reply that keeps a published slug verbatim and strips any reference whose slug is not in the catalog, tidying the prose around the removed link. Makes the catalog's impossible-by-construction claim an enforced guarantee.

The labs block already surfaces value + unit + range + date, and the Coach prompt instructs it to quote them verbatim, so labs are cited end-to-end — no gap. Pin that with a tool-level guard asserting value and unit survive get_labs, so a future change to the labs pass-through can't quietly strip the number the Coach needs to cite.

…ness/learn/perf

…n matrix Render daily medication-compliance rate and illness symptom-severity as first-class series in the FDR discovery engine, so the adherence-dip → symptom-flare link (and compliance → vital drift) can finally surface. Compliance pools every active medication's dose-history ledger into a per-day taken/(taken+missed) rate, re-keyed to the user's display timezone. Symptom severity reads the illness day-log functional impact, zero-filling healthy days only across real episode spans so a user with no episodes yields an empty series rather than a constant. Both flow through the existing n >= 20, BH-FDR, effect-size, and shrinkage gates unchanged; a sparse series degrades to absent. Each channel forms its own metric family (no self-lag tautology) and keeps the association-not-causation framing. Add discoveryMeasurementTypes() so every caller drops the non-measurement channels before a Prisma type filter.

The correlation discovery matrix gained two non-MeasurementType channels (medication compliance, symptom severity), but the Coach get_correlations reader still filtered its measurement query by hand and never built the new channels. The channel keys leaked into the Postgres enum cast and threw for any user with medication or illness data. Extract the route's two channel-series fetchers into a shared module (correlation-channel-series.ts) so both consumers build the channels identically, then wire the reader to derive its type filter from discoveryMeasurementTypes and fold the fetched compliance + symptom series into the discovery input. The reader now humanises the channel keys to "medication adherence" / "symptom severity".

…correlations reader

…are picker The three intake dialogs (log, edit, dose-history add) still used a raw datetime-local input, so they ignored the user's date-format preference. Route them through DateTimeField like every other date surface. Also correct a stale diastolic-floor comment in the dashboard verdict.

…COST_CAP The operator-cost cap carried two export names for one value; keep the accurate one (it gates only the operator-key path) and drop the alias.

…n cap The daily cap is now provider-aware; the seeded chain resolves to a user-egress provider, so the refusal must be seeded at USER_PLAN_CAP rather than the smaller operator cap to trip the gate.

DateField and DateTimeField front an sr-only native date input with a formatted overlay carrying the 44px target, matching the file-upload pattern the sweep already exempts. Generalise the exemption to any visually-hidden or zero-size input.

The DateField/DateTimeField overlay input sits inside the wrapper's border (~42px); the 44px tap target is the wrapper. Measure the closest date-field wrapper so the sweep reflects the real affordance.

MBombeck added 30 commits June 26, 2026 12:05

feat(profile): add date-format preference column

d8ff189

Mirror the time-format preference with a per-user date display choice. AUTO follows the locale convention, DMY/MDY/YMD pin the field order. Additive, rerun-safe migration; existing rows read AUTO.

feat(profile): persist and expose the date-format preference

86a9ca2

Accept dateFormat on the profile PATCH field-by-field, echo it from the profile + /me reads, mirror it into the client auth user, and add the field to the OpenAPI request/response schemas.

feat(settings): add the date-format profile control

9b7980d

Surface a Datumsformat select below Stundenformat with the four preferences, in all six locales.

merge(DATEFMT-1): date-format preference foundation

8c09970

merge(COACHDATA): v1.21.0

a96ae9b

merge(COACHPROMPT): v1.21.0

f12f349

merge(DF2): v1.21.0

a503160

merge(COACHUI): v1.21.0

37dfce2

merge(FALLBACK): v1.21.0

490c9d2

# Conflicts: # src/lib/ai/prompts/__tests__/shared-contracts-coverage.test.ts # src/lib/insights/narrative/period-narrative-generate.ts

chore(openapi): regenerate for the date-format profile field

8ed1d27

merge(OAUTHFIX): provider-aware AI budget

8471384

MBombeck added 28 commits June 26, 2026 13:21

merge(WS3PERF): A2-M2 boundary align + Withings/med-intake N+1 fixes

7093e46

merge(ILLNESSFIX): calendar-adjacency symptom return + SpO2 local sort

5b22d69

merge(LEARNUI): Learn-links registry, component, and surface wiring

03e1935

merge(DSTMIG): rollup x-rescale, un-quarantine the regression parity …

1344b56

…test

merge(COACHFINAL): coach-core finalization — safety/scope/outlook/ill…

21a5ead

…ness/learn/perf

merge(FDREXTEND): compliance + symptom as FDR correlation channels

1fa02ce

merge(INTEGFIX): wire compliance/symptom FDR channels into the Coach …

1cee07b

…correlations reader

chore(release): v1.21.0

e6ac604

refactor(coach): collapse the duplicate daily-cap export to OPERATOR_…

5f640e1

…COST_CAP The operator-cost cap carried two export names for one value; keep the accurate one (it gates only the operator-key path) and drop the alias.

test(coach): seed the budget-refusal integration test at the user-pla…

e818178

…n cap The daily cap is now provider-aware; the seeded chain resolves to a user-egress provider, so the refusal must be seeded at USER_PLAN_CAP rather than the smaller operator cap to trip the gate.

test(e2e): measure the date-field wrapper for the 44px target sweep

5411bb3

The DateField/DateTimeField overlay input sits inside the wrapper's border (~42px); the 44px tap target is the wrapper. Measure the closest date-field wrapper so the sweep reflects the real affordance.

MBombeck merged commit 85a0819 into main Jun 26, 2026
13 checks passed

MBombeck deleted the release/v1.21.0 branch June 26, 2026 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v1.21.0#363

Release v1.21.0#363
MBombeck merged 68 commits into
mainfrom
release/v1.21.0

MBombeck commented Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MBombeck commented Jun 26, 2026

Highlights

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant