Skip to content

Retry Pocket API failures and persist pending fetches#2

Merged
yoaquim merged 2 commits into
yoaquim:mainfrom
jedibrillo:fix/pocket-sync-reliability
Apr 26, 2026
Merged

Retry Pocket API failures and persist pending fetches#2
yoaquim merged 2 commits into
yoaquim:mainfrom
jedibrillo:fix/pocket-sync-reliability

Conversation

@jedibrillo
Copy link
Copy Markdown
Collaborator

Summary

  • Sync silently dropped recordings whose detail fetch returned 429: the catch in main() continued past the error, and set_last_sync() then advanced the watermark, so those recordings disappeared from subsequent list_recordings(start_date=…) results forever. In a recent local run, 45 fetches 429'd and were not recoverable without manual watermark surgery.
  • api_get/api_post now retry on 429 and 5xx with exponential backoff, honoring Retry-After when present (5 attempts, base 2s, cap 60s, jittered).
  • Failed recording IDs are persisted to .seam/.pending-fetch. The next run re-attempts them regardless of the sync watermark, then clears the entry on success. The watermark still advances on partial failure — pending-fetch is the safety net.
  • Adds scripts/seed-people.py to seed .seam/people.json from existing analyses' speaker_map entries (filters generic role labels like "Speaker 01", "Manager", "Patient").

Test plan

  • npm test — 35 pytest, 27 vitest, all green
  • New tests: TestRequestWithRetry covers 429-then-success, 5xx, no-retry on 4xx, exhaustion, Retry-After honored
  • New tests: TestPendingFetch covers round-trip, missing file, empty-write deletes, blank-line tolerance
  • Run a real sync against an account whose recordings fell outside the watermark window after a prior 429 to confirm pending-fetch reattempts them

🤖 Generated with Claude Code

Sync silently dropped recordings whose detail fetch returned 429: the
catch in main() continued past the error, then set_last_sync() advanced
the watermark, so subsequent runs no longer listed those recordings.

- api_get/api_post retry on 429 and 5xx with exponential backoff,
  honoring the Retry-After header when present (5 attempts, base 2s,
  cap 60s).
- Failed recording IDs are persisted to .seam/.pending-fetch and
  re-attempted on the next run regardless of the sync watermark.
- Adds seed-people.py to seed .seam/people.json from existing
  analyses' speaker_maps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner

@yoaquim yoaquim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry + pending-fetch logic is solid — the watermark-advancing-past-failures bug is real data loss and the two-layer fix (retry with backoff first, persist failures to .pending-fetch second) is the right approach. Tests are thorough.

seed-people.py is useful as a bootstrap tool. One change requested on GENERIC_TOKENS — see inline comment.

Note: a follow-up PR is coming that introduces a people staging flow — instead of auto-adding inferred speakers directly to people.json, they'll land in people-pending.json where the user can confirm, merge into existing people (as aliases), or dismiss. The dismiss action will build the user's personal exclusion list over time, which reduces the need for a large hardcoded blocklist. seed-people.py can stay as-is for now since it's a manual script — the staging PR will adapt it.

Comment thread scripts/seed-people.py
@yoaquim
Copy link
Copy Markdown
Owner

yoaquim commented Apr 26, 2026

Thanks for committing the suggestion! Two things still need fixing:

  1. Remove the old GENERIC_TOKENS block (lines 22-30) — it's still there alongside the new GENERIC_LABELS. The is_generic() function on line 48 still references GENERIC_TOKENS, not GENERIC_LABELS. Need to delete the old block and update the reference.

  2. Add user-configurable file loading — the trimmed GENERIC_LABELS covers universals, but users need a way to add domain-specific exclusions (doctor, nurse, etc.) without editing code. Add a simple loader:

def load_generic_labels() -> set[str]:
    labels = GENERIC_LABELS.copy()
    custom_file = DATA_DIR / "generic-speakers.txt"
    if custom_file.exists():
        for line in custom_file.read_text().splitlines():
            word = line.strip().lower()
            if word and not word.startswith("#"):
                labels.add(word)
    return labels

Then call load_generic_labels() instead of using GENERIC_TOKENS directly in is_generic().

Note: our stage-people.py (merged in PR #3) already implements this pattern — you can reference it.

Copy link
Copy Markdown
Owner

@yoaquim yoaquim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Will fix the remaining GENERIC_TOKENS → GENERIC_LABELS cleanup in a follow-up commit after merge.

@yoaquim yoaquim merged commit 7f86fd4 into yoaquim:main Apr 26, 2026
1 check passed
yoaquim added a commit that referenced this pull request Apr 26, 2026
…ELS (#8)

## Summary

Cleanup from PR #2 merge. The suggestion commit added `GENERIC_LABELS`
but didn't remove the old `GENERIC_TOKENS` block or update references.

- Removed broken `GENERIC_TOKENS` block (missing closing brace from
suggestion commit)
- `is_generic()` now uses `GENERIC_LABELS` (15 universal terms) via
parameter
- Added `load_generic_labels()` that merges with user-defined
`.seam/generic-speakers.txt`
- `main()` calls `load_generic_labels()` for the full exclusion set

## Test plan

- [x] `npm run lint` — 0 errors
- [x] `npm test` — 89 tests pass
- [x] `python3 scripts/seed-people.py` — runs without errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants