Skip to content

Saved Links should dedupe unresolved failures by URL and clear stale rows after successful reingest #139

Description

@ahjinsolo

Summary

Saved Links currently behaves like an attempt log, but users experience it as a list of unresolved URLs.

That mismatch creates confusing duplicates. If the same URL fails multiple times, Kompl can show multiple Saved Links rows for the same source_url. If the URL later succeeds through a new onboarding session, the old unresolved failure row may still remain because stale failure cleanup is session-scoped.

Current observed state

In one local wiki:

  • 394 unresolved URL failure rows
  • 267 unique unresolved URLs
  • 80 duplicate URL groups
  • 127 extra duplicate rows removable by keeping one row per exact source_url
  • 0 active compile sessions at inspection time

So Saved Links overstates unresolved work by about 47 percent in this case.

Why this feels wrong as a user

When I open Saved Links, I expect to see links that still need attention.

Instead, I may see repeated entries for the same URL because Kompl stores each failed ingest attempt as a separate unresolved row. That makes it hard to tell:

  • how many unique links still need work
  • which failures are real versus repeated attempts
  • whether retrying actually improved anything
  • whether a successfully imported URL is still incorrectly listed as unresolved

Likely cause

Failed URL ingest inserts a new ingest_failures row.

/api/compile/retry-failed cleans stale failures only for URLs in the same failed staging session. That works for same-session retry recovery, but not for a fresh reingest of an old Saved Link.

So this flow can leave stale rows:

  1. URL fails in session A.
  2. Saved Links shows the unresolved URL.
  3. User later reingests the same URL in session B.
  4. URL succeeds and becomes a source.
  5. Old session A failure row remains unresolved.

Expected behavior

Saved Links should represent unresolved URLs, not every failed attempt.

Desired behavior:

  • At most one unresolved Saved Links row per canonical/exact URL.
  • New failed attempts update or replace the existing unresolved row.
  • If a URL imports successfully, any unresolved failure row for that URL is resolved or removed.
  • Attempt history can still be preserved in activity logs or metadata.
  • The Saved Links page count should match unique unresolved URLs.

Proposed implementation direction

A conservative fix could be:

  1. On insertIngestFailure, check for an existing unresolved row with the same source_url.
  2. If one exists, update it with the newest attempt metadata instead of inserting a second unresolved row.
  3. On successful URL ingest, delete or mark resolved any unresolved ingest_failures rows for that source_url, regardless of session.
  4. Regenerate the Saved Links page after either update.
  5. Add a one-time cleanup/migration or admin repair command for existing duplicate rows.

Acceptance criteria

  • Repeated failures for the same URL produce one visible Saved Links entry.
  • A later successful import removes that URL from Saved Links.
  • Retry-failed behavior remains session-scoped where it needs to be, but successful ingest performs URL-scoped reconciliation.
  • Tests cover duplicate failure insert, newer failure update, and successful reingest cleanup.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions