Skip to content

Release v1.21.0#363

Merged
MBombeck merged 68 commits into
mainfrom
release/v1.21.0
Jun 26, 2026
Merged

Release v1.21.0#363
MBombeck merged 68 commits into
mainfrom
release/v1.21.0

Conversation

@MBombeck

Copy link
Copy Markdown
Owner

Feature release. The Coach reaches every data domain on demand, surfaces the cross-metric patterns the analytics tier discovers, opens in context from any screen, and speaks on one shared set of safety thresholds. Dates render in a chosen format everywhere. Two additive migrations (0192 date-format, 0193 rollup x-rescale); no breaking changes.

Highlights

  • Date-format preference (auto / DMY / MDY / ISO) honoured across the app, including every date and date-time field.
  • Coach reach + intelligence: full-domain on-demand retrieval incl. cycle & workouts; a get_correlations tool surfacing the FDR lagged drivers (now incl. medication-adherence ↔ symptoms); illness recovery scores in the Coach; context-scoped launch from metric pages and insight cards; one-snapshot-per-turn + a third reasoning round.
  • Voice: warmer, forward-looking outlooks, confidence-checked single-step actions, an acute red-flag clause; deterministic fallbacks now grounded.
  • Single safety source: critical BP / fever / glucose thresholds from one module across hero, Coach, cards, notifications.
  • Learn: "Learn more" pointers on vitals/glucose/resilience/lab surfaces; the Coach can no longer surface a made-up link.
  • Correctness/perf: rollup x-rescale un-quarantines the regression parity test; provider-aware AI budget (ChatGPT/OpenAI users no longer hit a spurious daily limit); Withings/med-intake N+1 fixes; calendar-adjacency + tz fixes in the illness/coincident logic.

Full detail in CHANGELOG.md. 11452 unit tests green; holistic wiring/cohesion/correctness verification passed.

MBombeck added 30 commits June 26, 2026 12:05
Mirror the time-format preference with a per-user date display
choice. AUTO follows the locale convention, DMY/MDY/YMD pin the
field order. Additive, rerun-safe migration; existing rows read AUTO.
Add a date-order resolver + formatter mirroring the hour-cycle path:
AUTO defers to the active locale, DMY/MDY/YMD pin the field order
through a canonical Intl locale. Thread it through makeFormatters and
expose useDateFormatPreference from the i18n context, backed by the
same localStorage mirror as the time-format preference.
Accept dateFormat on the profile PATCH field-by-field, echo it from
the profile + /me reads, mirror it into the client auth user, and add
the field to the OpenAPI request/response schemas.
Surface a Datumsformat select below Stundenformat with the four
preferences, in all six locales.
Dependency-free date input that displays the value in the user's
date-format preference while editing rides the native picker behind a
calendar button. The committed value stays ISO yyyy-MM-dd, so it is a
drop-in for DateInput. Height/target-size parity, disabled/min/max/
placeholder, and progressive typed entry parsed against the order.
The shared contracts cover chronic deferral (dose, diagnosis, drug-level
to a clinician) but had no acute branch. Add a closed-list crisis clause —
chest pain, syncope, sudden severe symptoms, hypertensive-crisis readings,
suicidal ideation — that points to prompt or emergency care without
diagnosing, and keep every other turn calm and non-alarmist. It rides the
single shared-contract source so it lands on the Coach, the per-metric
status cards, the comprehensive briefing, and the period narrative; the
coverage test asserts it on all four surfaces.
Sharpen the Coach's behaviour-change reflexes from the coaching literature,
prompt-only:

- Connect the signals on a why/pattern question — consult correlations and
  link them descriptively ("after short-sleep nights your next-morning HRV
  tends to read lower") instead of reading metrics in isolation, always an
  association worth a small experiment, never a cause.
- Confidence ruler on action turns — after naming one small step, ask how
  doable it feels 0-10 and shrink it when that lands low, keeping the choice
  the user's.
- Three-beat shape on data-review turns — finding, likely driver, one small
  step; name the dominant contributor when citing a derived band.
- Extend the MI micro-moves with developing discrepancy and rolling with
  resistance, and add an anti-persuasion check: any suggested change must
  serve the user's own stated goal.

One bilingual tone-calibration example added per locale.
Swap the medication scheduling, inventory, illness, vorsorge, lab-OCR
and profile-birthdate date inputs from the native field to the
DateField primitive, which paints the chosen DMY/MDY/YMD/AUTO order
over an ISO yyyy-MM-dd value. The committed value contract, min/max
bounds and aria wiring are unchanged.
…ion wizard

The course-window date field now keeps its ISO value on a hidden native
input (data-slot) and edits through a text overlay (data-testid). Fill
the overlay, then assert the committed ISO value on the hidden input.
…n numbers

The no-key and timeout fallback text carried zero reference to the user's
own readings and, on a provider timeout, rendered as hasProvider:true —
indistinguishable from a fresh assessment. Compose each fallback from the
per-metric signal the card already builds: name the current value, place it
against the user's own baseline, and close with one plain-language pointer,
in the same warm voice as the model surfaces. Degrade to the prior generic
tip only when a metric has no usable history. The timeout envelope now
reports hasProvider:false so the UI surfaces it as the computed summary it
is rather than mislabelling it as provider prose.
…nded step

The per-score deterministic text named the score, its standing, and the
weakest contributor, then stopped at the diagnosis. When the band is not
green and the weakest driver is behaviourally addressable (sleep, mood,
consistency, timing), append a single grounded pointer drawn from that same
contributor. Physiology-only drivers add nothing, so the text affirms and
watches rather than manufacturing a step.
The retrospective narrative was the one model surface that omitted the
shared tone contract, so it read colder than the daily briefing beside it.
Compose the tone fragment into both prompts and warm the hand-written tone
line to name a genuine win when the period earns it, while keeping the
descriptive-never-causal and no-alarm guards intact. The cross-surface
coverage test now asserts the contract reaches this surface too.
The retrieval-tool DATA INVENTORY was built against the user's default
narration cluster (cardio/body/mood/medication), so sleep, glucose,
activity, body composition, vascular, SpO2, gait and environment series
reported absent even when the data existed — and the grounding rule then
told the model not to fetch them. Probe presence against the full source
set instead, and generate the metric-series inventory rows from the
complete source-to-section map so every series the user has rows for is
advertised. Per-tool reads still re-scope to their own domain, so the
wider probe never widens a figure read.

Add three retrieval tools whose snapshot blocks already existed but were
unreachable in tool mode:
  - get_workouts: recent sessions + per-sport rollup
  - get_cycle: phase / prediction / correlation (gated by cycle access)
  - get_correlations: the FDR-controlled lagged cross-metric drivers plus
    the coincident-deviation flag, surfaced descriptively

Also honour the caller's window in get_labs and get_illness_recovery so a
cross-metric answer no longer silently mixes horizons.
The Daily Briefing strips any number absent from its server-computed
figures; the Coach's tool path had no equivalent, so a transcription or
paraphrase drift (a tool returns systolic 128, the reply says ~138) could
ship. Collect the numeric leaves from every present tool result this turn,
extract the numbers the reply asserts, and soft-correct any that match no
fetched figure to a neutral placeholder — annotated, non-blocking, and a
no-op on a qualitative turn or the no-tools path where there is nothing to
grade against. The bounded retrieval loop now returns the present results'
payloads so the check has an authoritative figure set.
The multi-vital coincident-deviation flag (two or more vitals outside
their usual band on the same day, with the illness-explained reframe)
already reached the period narrative but never the Coach. Attach it,
fired-only, to the derived block so both Coach paths can narrate it: the
tool path via the recovery/correlations tools and the no-tools snapshot
floor. A quiet day adds no entry, keeping the snapshot noise-free.
Make the launch context's metric scope live instead of discarding it.
A conversation opened from a metric surface now narrows its snapshot to
that metric with a data-aware seed question, and the global FAB inherits
the page's ambient scope so drilling into a metric and tapping it no
longer opens a blank chat. The scope threads through the drawer into the
first turn of a fresh conversation.
Add a discreet, on-brand action to the high-value cards (briefing,
recommendation, correlation, status/assessment, health score, period
narrative) that opens the Coach pre-scoped to the card topic with a
seeded question. The action self-gates on the operator flag and the
per-user opt-out, never tints the card, and surfaces once per card.
# Conflicts:
#	src/lib/ai/prompts/__tests__/shared-contracts-coverage.test.ts
#	src/lib/insights/narrative/period-narrative-generate.ts
The budget-exceeded copy named "00:00 UTC", which reads as a wrong local
clock for any non-UTC user (Berlin midnight UTC is 01:00/02:00 local) and
drove confused "the limit resets at the wrong time" reports. Reword
dailyLimitBody across all six locales to "midnight UTC", matching the
existing errorBudget wording, so the figure is honest regardless of the
reader's timezone.
The daily AI-token gate was a flat 25,000/day for every provider. That cap
exists to bound the OPERATOR's API bill, but a ChatGPT-OAuth (Codex) or BYOK
chain egresses on the USER's own plan/key and costs the operator nothing, so
gating it on the operator-cost ceiling is wrong. A single gpt-5.x reasoning
turn legitimately reports 20k-40k gross total_tokens (re-sent system prompt +
inventory + tool defs per round + hidden reasoning, summed across rounds), so
one or two turns exhausted the flat cap and locked the user out of a plan they
pay for.

Classify the chain's cost owner by its primary provider via resolveDailyCap:
an admin-openai primary (operator pays) keeps the operator-cost cap; every
user-egress primary (codex/openai/anthropic/local) gets a generous
abuse-only ceiling. Threaded into the Coach chat gate and both OCR-extract
modes.

Raise the operator-cost cap 25,000 -> 200,000 so the operator-key path also
survives a normal day of reasoning turns, keeping gross-token accounting
rather than reworking the reconcile math.

Subtract cached input tokens at reconcile: the Responses-API gross total still
includes prompt-cached input the user did not re-pay for. The codex client
already parsed cached_tokens; sum it through the tool loop and bill
total_tokens minus cached.
The hypertensive-crisis floors were defined in three modules with a
divergent diastolic value: the notification engine and the Coach acute
clause used 180/120, but the dashboard hero used 180/110. A reading like
170/112 lit the red critical-BP banner on the hero yet never tripped the
notification alarm or matched the Coach's stated acute number — one
surface said crisis, the other two stayed calm on the same row.

Promote the absolute floors into a dependency-free clinical-floors leaf
and have the hero, the safety-floor engine, the status registry's fever
band, and the illness escalation all import from it. The crisis diastolic
floor is the guideline-correct 120 (ACC/AHA hypertensive urgency); the
wider 110 hero net is dropped so the surfaces tell one story. A coverage
test pins that every consumer resolves the same constants.
The coincident-deviation read grouped the latest day with a UTC date
slice while the sibling readiness read already used the tz-aware day key.
For a user east or west of UTC, a late-evening or early-morning reading
landed on the wrong calendar day, so a fired flag could compare a vital
from the wrong day against its band and narrate '2 or more vitals out of
band today' on the wrong day.

Thread the user's timezone into the latest-day read and mint the day key
with userDayKey, matching readiness. The Coach derived-snapshot and the
correlations reader pass the real account timezone through. Adds a
regression covering a UTC+2 user whose late-night reading rolls to the
next local day.
…dence

Significance alone surfaced trivial drivers: a large-n pair with r=0.16
narrated as a confident "tends to go with" link while explaining ~2.5%
of variance. Add three gates before a pair is ranked as a driver:

- exclude same-metric-family lagged pairs (mood->mood, BP component
  self-lag) as serial-autocorrelation tautologies, widening the existing
  exact self-pair skip to the whole family.
- shrink each Pearson estimate toward null by n/(n+10) so a thin-data
  correlation cannot out-rank a deep one on an inflated point estimate.
- floor the shrunk effect at 0.2 (drop below) and bind 0.2-0.3 to a
  hedged "faint hint" phrasing tier, 0.3+ to confident phrasing scaled
  by sample depth. The reported r and p stay the honest raw statistics.

Rank by shrunk effect magnitude, q as tie-break.
MBombeck added 28 commits June 26, 2026 13:21
The slot-dedup pass ran a findFirst (latest takenAt) plus a findMany
(all live rows) per medication — 2N round-trips, 40 for a 20-med user.
Pull every live intake row for the user's medication set in one ordered
findMany, group in memory by medicationId, and derive each medication's
lastIntakeAt (max non-null takenAt) from the same grouped slice the
findFirst used to scan. The read cost is now one query regardless of the
medication count.

Behaviour is unchanged: the grouped rows arrive in the same
scheduledFor-asc order the heal and snap passes depend on, and the
derived lastIntakeAt matches the old desc-ordered findFirst.
Both lastStableReturn and computeSymptomReturn counted the in-band settle
run by array-index adjacency instead of calendar days, so three sparse logs
spread over weeks could register a stable return and move the headline
recovery gap. Extend the run only while the prior point is in-band and the
calendar day immediately before, matching the runFlag rule, bounding the
run's span to its logged-day count.
The SpO2 sustained-low scan passed episodeDays straight to runFlag, relying
on the reader to deliver them in day order, while the fever path already
sorts its unioned series. The run scan is calendar-consecutive, so feed it
chronological input by construction: sort the filtered SpO2 points before
the scan, matching the fever path and removing the latent ordering coupling.
Promote the static /learn guide catalog into a typed concept-to-slug
lookup the deterministic UI surfaces can consume. learnUrl is the only
sanctioned /learn URL builder and learnLinkForMetric fails closed for
an unmapped id. A test asserts every mapped slug resolves in the
catalog, so the mapping can never point at a guide that does not exist.
A small, calm anchor that links a concept out to its public /learn
guide. Fail-closed: the href is registry-backed (a closed-set value,
never user input) and an unmapped concept renders nothing. Plain
anchor, no markdown. Adds a common.learnMore label across all locales.
Add a single discreet LearnMoreLink to the per-vital baseline tiles
(via a new footer slot on the shared tile), the glucose clinical panel,
the resilience tile, and the lab biomarker detail. Each pointer is
registry-resolved and renders nothing when the concept has no mapped
guide, so only surfaces with a real article show one.
…ntract

The acute red-flag clause hardcoded its blood-pressure numbers as prose
literals and named no glucose or fever floor at all, so the Coach could
silently drift from the dashboard hero and the notification engine. Compose
the systolic/diastolic, glucose, and sustained-fever thresholds straight from
clinical-floors.ts and echo the previously-missing glucose and fever lines, so
every acute number the Coach states is bound to the one source of truth.

Add an outlook contract (gentle forecast, what-to-expect, anticipatory if-then
— all conditional, ranged, association-framed) composed beside the tone
contract on the Coach and the comprehensive briefing, so a reply can sharpen
expectations without a false promise. Tighten the tone contract against
over-validation (affirm the effort, not an unsafe choice; stay neutral when
there is nothing to praise) and route why/conflict questions through
get_correlations and its coincident-deviation flag before answering.

Coverage asserts the clause numbers equal the constants and the new contract
reaches both surfaces.
…cumulators

The per-bucket OLS accumulators stored x on the raw epoch-day axis (x ~ 20540),
so sum_xx accumulated past ~1e10 and the square shed ~10 of a double's
significant digits before the value was ever stored. The sub-day x detail was
lost at accumulation time, which no read-side identity can recover; an
ill-conditioned window's composed slope/r2 then drifted from the live REGR_*
probe past the parity gauge.

Rebase x to a fixed origin (epoch-days of 2020-01-01) at write time so the
squared terms stay O(1e7) and exact. Slope, r2, and population sd are invariant
under the affine x-shift, so the live raw-epoch probe still parity-matches the
cross-bucket compose; composeRegression needs no formula change. The per-bucket
slope/r2 columns stay on the raw axis (shift-invariant).

Migration 0193 recomputes sum_x/sum_xy/sum_xx from measurements on the rebased
basis, one set-based update per granularity. Recompute-from-source is idempotent
and rerun-safe; it touches only rows that already carry the accumulators and
leaves pre-migration NULLs for the boot re-fold.

Un-quarantine the DST-boundary parity test: it now matches live REGR_* within
the existing 1e-9 relative bound, with the WEIGHT case still bit-identical.
The tool-mode path dropped the launch scope's sources: the inventory probes the
full source set, so a Coach opened from a metric page saw the whole inventory
and could roam, while only the no-tools path honoured the narrowing. Inject a
one-line FOCUS hint naming the launched domain(s) into the tool-mode user turn
so the model prioritises that metric and fetches its figures first, with every
other domain still reachable if the question leads there. Empty on a generic
open, so the prompt prefix is unchanged on that path.
…three

Each retrieval tool re-scoped the snapshot to a single source, so its cache key
never matched the inventory's full-source build — a four-tool turn paid four
extra full snapshot builds on top of the inventory's. Thread the inventory's
probe scope to every tool so a per-tool read lands the one cache entry the
inventory already primed, collapsing the turn to a single build (a window
override still gets its own correct build). Raise the loop's round cap from two
to three so a sequential cross-metric why-chain isn't starved; the per-round
token budget is unchanged and the absolute ceiling tracks one above so the
final round still forces prose.
The illness-recovery tool and the illness context block carried only labels,
lifecycle, and dates plus the recovery composite — never the recovery-gap, the
metric that drove it, the nadir, the pre-onset deviations, or the red flags the
illness card renders. So a recovery question got the composite, not the gap the
user sees, and a sustained-fever or low-SpO2 escalation was invisible
in-conversation.

Run the existing episode-correlation read-layer (the same engine the card and
the red-flag notifier use) for the most relevant episode — active first, else
the most-recently-resolved — and attach the computed scores to get_illness_recovery,
coverage-gated so a thin signal yields nothing rather than a fabricated number.
Document the illnessScores shape in the Coach prompt so it restates those
numbers verbatim and escalates a red flag rather than reassuring.
The Coach may point at a published /learn guide, but only a slug in the
catalog. That was a prompt instruction with no enforcement — a model could
hallucinate /learn/lower-your-cortisol and ship a dead link. Add a deterministic
post-filter on the assembled reply that keeps a published slug verbatim and
strips any reference whose slug is not in the catalog, tidying the prose around
the removed link. Makes the catalog's impossible-by-construction claim an
enforced guarantee.
The labs block already surfaces value + unit + range + date, and the Coach
prompt instructs it to quote them verbatim, so labs are cited end-to-end — no
gap. Pin that with a tool-level guard asserting value and unit survive
get_labs, so a future change to the labs pass-through can't quietly strip the
number the Coach needs to cite.
…n matrix

Render daily medication-compliance rate and illness symptom-severity as
first-class series in the FDR discovery engine, so the adherence-dip →
symptom-flare link (and compliance → vital drift) can finally surface.

Compliance pools every active medication's dose-history ledger into a per-day
taken/(taken+missed) rate, re-keyed to the user's display timezone. Symptom
severity reads the illness day-log functional impact, zero-filling healthy days
only across real episode spans so a user with no episodes yields an empty
series rather than a constant. Both flow through the existing n >= 20, BH-FDR,
effect-size, and shrinkage gates unchanged; a sparse series degrades to absent.

Each channel forms its own metric family (no self-lag tautology) and keeps the
association-not-causation framing. Add discoveryMeasurementTypes() so every
caller drops the non-measurement channels before a Prisma type filter.
The correlation discovery matrix gained two non-MeasurementType channels
(medication compliance, symptom severity), but the Coach get_correlations
reader still filtered its measurement query by hand and never built the new
channels. The channel keys leaked into the Postgres enum cast and threw for any
user with medication or illness data.

Extract the route's two channel-series fetchers into a shared module
(correlation-channel-series.ts) so both consumers build the channels identically,
then wire the reader to derive its type filter from discoveryMeasurementTypes and
fold the fetched compliance + symptom series into the discovery input. The reader
now humanises the channel keys to "medication adherence" / "symptom severity".
…are picker

The three intake dialogs (log, edit, dose-history add) still used a raw
datetime-local input, so they ignored the user's date-format preference.
Route them through DateTimeField like every other date surface. Also
correct a stale diastolic-floor comment in the dashboard verdict.
…COST_CAP

The operator-cost cap carried two export names for one value; keep the
accurate one (it gates only the operator-key path) and drop the alias.
…n cap

The daily cap is now provider-aware; the seeded chain resolves to a
user-egress provider, so the refusal must be seeded at USER_PLAN_CAP
rather than the smaller operator cap to trip the gate.
DateField and DateTimeField front an sr-only native date input with a
formatted overlay carrying the 44px target, matching the file-upload
pattern the sweep already exempts. Generalise the exemption to any
visually-hidden or zero-size input.
The DateField/DateTimeField overlay input sits inside the wrapper's
border (~42px); the 44px tap target is the wrapper. Measure the closest
date-field wrapper so the sweep reflects the real affordance.
@MBombeck MBombeck merged commit 85a0819 into main Jun 26, 2026
13 checks passed
@MBombeck MBombeck deleted the release/v1.21.0 branch June 26, 2026 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant