Summary
Saved Links currently behaves like an attempt log, but users experience it as a list of unresolved URLs.
That mismatch creates confusing duplicates. If the same URL fails multiple times, Kompl can show multiple Saved Links rows for the same source_url. If the URL later succeeds through a new onboarding session, the old unresolved failure row may still remain because stale failure cleanup is session-scoped.
Current observed state
In one local wiki:
- 394 unresolved URL failure rows
- 267 unique unresolved URLs
- 80 duplicate URL groups
- 127 extra duplicate rows removable by keeping one row per exact
source_url
- 0 active compile sessions at inspection time
So Saved Links overstates unresolved work by about 47 percent in this case.
Why this feels wrong as a user
When I open Saved Links, I expect to see links that still need attention.
Instead, I may see repeated entries for the same URL because Kompl stores each failed ingest attempt as a separate unresolved row. That makes it hard to tell:
- how many unique links still need work
- which failures are real versus repeated attempts
- whether retrying actually improved anything
- whether a successfully imported URL is still incorrectly listed as unresolved
Likely cause
Failed URL ingest inserts a new ingest_failures row.
/api/compile/retry-failed cleans stale failures only for URLs in the same failed staging session. That works for same-session retry recovery, but not for a fresh reingest of an old Saved Link.
So this flow can leave stale rows:
- URL fails in session A.
- Saved Links shows the unresolved URL.
- User later reingests the same URL in session B.
- URL succeeds and becomes a source.
- Old session A failure row remains unresolved.
Expected behavior
Saved Links should represent unresolved URLs, not every failed attempt.
Desired behavior:
- At most one unresolved Saved Links row per canonical/exact URL.
- New failed attempts update or replace the existing unresolved row.
- If a URL imports successfully, any unresolved failure row for that URL is resolved or removed.
- Attempt history can still be preserved in activity logs or metadata.
- The Saved Links page count should match unique unresolved URLs.
Proposed implementation direction
A conservative fix could be:
- On
insertIngestFailure, check for an existing unresolved row with the same source_url.
- If one exists, update it with the newest attempt metadata instead of inserting a second unresolved row.
- On successful URL ingest, delete or mark resolved any unresolved
ingest_failures rows for that source_url, regardless of session.
- Regenerate the Saved Links page after either update.
- Add a one-time cleanup/migration or admin repair command for existing duplicate rows.
Acceptance criteria
- Repeated failures for the same URL produce one visible Saved Links entry.
- A later successful import removes that URL from Saved Links.
- Retry-failed behavior remains session-scoped where it needs to be, but successful ingest performs URL-scoped reconciliation.
- Tests cover duplicate failure insert, newer failure update, and successful reingest cleanup.
Summary
Saved Links currently behaves like an attempt log, but users experience it as a list of unresolved URLs.
That mismatch creates confusing duplicates. If the same URL fails multiple times, Kompl can show multiple Saved Links rows for the same
source_url. If the URL later succeeds through a new onboarding session, the old unresolved failure row may still remain because stale failure cleanup is session-scoped.Current observed state
In one local wiki:
source_urlSo Saved Links overstates unresolved work by about 47 percent in this case.
Why this feels wrong as a user
When I open Saved Links, I expect to see links that still need attention.
Instead, I may see repeated entries for the same URL because Kompl stores each failed ingest attempt as a separate unresolved row. That makes it hard to tell:
Likely cause
Failed URL ingest inserts a new
ingest_failuresrow./api/compile/retry-failedcleans stale failures only for URLs in the same failed staging session. That works for same-session retry recovery, but not for a fresh reingest of an old Saved Link.So this flow can leave stale rows:
Expected behavior
Saved Links should represent unresolved URLs, not every failed attempt.
Desired behavior:
Proposed implementation direction
A conservative fix could be:
insertIngestFailure, check for an existing unresolved row with the samesource_url.ingest_failuresrows for thatsource_url, regardless of session.Acceptance criteria