Retry Pocket API failures and persist pending fetches#2
Conversation
Sync silently dropped recordings whose detail fetch returned 429: the catch in main() continued past the error, then set_last_sync() advanced the watermark, so subsequent runs no longer listed those recordings. - api_get/api_post retry on 429 and 5xx with exponential backoff, honoring the Retry-After header when present (5 attempts, base 2s, cap 60s). - Failed recording IDs are persisted to .seam/.pending-fetch and re-attempted on the next run regardless of the sync watermark. - Adds seed-people.py to seed .seam/people.json from existing analyses' speaker_maps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yoaquim
left a comment
There was a problem hiding this comment.
Retry + pending-fetch logic is solid — the watermark-advancing-past-failures bug is real data loss and the two-layer fix (retry with backoff first, persist failures to .pending-fetch second) is the right approach. Tests are thorough.
seed-people.py is useful as a bootstrap tool. One change requested on GENERIC_TOKENS — see inline comment.
Note: a follow-up PR is coming that introduces a people staging flow — instead of auto-adding inferred speakers directly to people.json, they'll land in people-pending.json where the user can confirm, merge into existing people (as aliases), or dismiss. The dismiss action will build the user's personal exclusion list over time, which reduces the need for a large hardcoded blocklist. seed-people.py can stay as-is for now since it's a manual script — the staging PR will adapt it.
|
Thanks for committing the suggestion! Two things still need fixing:
def load_generic_labels() -> set[str]:
labels = GENERIC_LABELS.copy()
custom_file = DATA_DIR / "generic-speakers.txt"
if custom_file.exists():
for line in custom_file.read_text().splitlines():
word = line.strip().lower()
if word and not word.startswith("#"):
labels.add(word)
return labelsThen call Note: our |
yoaquim
left a comment
There was a problem hiding this comment.
Approved. Will fix the remaining GENERIC_TOKENS → GENERIC_LABELS cleanup in a follow-up commit after merge.
…ELS (#8) ## Summary Cleanup from PR #2 merge. The suggestion commit added `GENERIC_LABELS` but didn't remove the old `GENERIC_TOKENS` block or update references. - Removed broken `GENERIC_TOKENS` block (missing closing brace from suggestion commit) - `is_generic()` now uses `GENERIC_LABELS` (15 universal terms) via parameter - Added `load_generic_labels()` that merges with user-defined `.seam/generic-speakers.txt` - `main()` calls `load_generic_labels()` for the full exclusion set ## Test plan - [x] `npm run lint` — 0 errors - [x] `npm test` — 89 tests pass - [x] `python3 scripts/seed-people.py` — runs without errors
Summary
main()continued past the error, andset_last_sync()then advanced the watermark, so those recordings disappeared from subsequentlist_recordings(start_date=…)results forever. In a recent local run, 45 fetches 429'd and were not recoverable without manual watermark surgery.api_get/api_postnow retry on 429 and 5xx with exponential backoff, honoringRetry-Afterwhen present (5 attempts, base 2s, cap 60s, jittered)..seam/.pending-fetch. The next run re-attempts them regardless of the sync watermark, then clears the entry on success. The watermark still advances on partial failure — pending-fetch is the safety net.scripts/seed-people.pyto seed.seam/people.jsonfrom existing analyses'speaker_mapentries (filters generic role labels like "Speaker 01", "Manager", "Patient").Test plan
npm test— 35 pytest, 27 vitest, all greenTestRequestWithRetrycovers 429-then-success, 5xx, no-retry on 4xx, exhaustion,Retry-AfterhonoredTestPendingFetchcovers round-trip, missing file, empty-write deletes, blank-line tolerance🤖 Generated with Claude Code