Skip to content

Add retry-from-step support for compile sessions #140

Description

@ahjinsolo

Summary

Kompl needs a controlled way to rerun a compile from a selected pipeline step.

Right now retry behavior is mostly automatic:

  • session retry resumes from the first non-done step
  • retry-failed targets failed staging rows, failed drafts, or unextracted sources

That is useful for crashes and failed items, but it does not cover intentional reruns after the user changes settings or wants to redo only part of the pipeline.

User problem

Sometimes the pipeline did not fail, but the output is still wrong enough that the user needs a partial redo.

Examples:

  • extraction was fine, but planning produced too many noisy pages
  • user changes entity_promotion_threshold and wants to rerun planning
  • user changes min_draft_chars and wants to rerun drafting/commit
  • drafts failed or were low quality, but fetching and extraction should not run again
  • user cancelled during draft and wants to restart from draft, not from ingest

Today the practical options are either too broad or too manual.

Why current retry is not enough

/api/compile/retry follows step status. It does not let the user say, "I know extract/resolve/match are fine, rerun from plan."

/api/compile/retry-failed is narrower, but only for failed items. It does not handle quality reruns where the prior step is technically done.

Manual DB manipulation can work, but it is risky because compile_progress, page_plans, sources, extractions, and Saved Links all have related state.

Proposed behavior

Add an advanced retry action:

POST /api/compile/retry-from
{
  "session_id": "...",
  "step": "plan"
}

Supported steps could include:

  • ingest_urls
  • extract
  • resolve
  • match
  • plan
  • draft
  • crossref
  • commit
  • schema

The UI could expose this as an advanced action on the progress page: Retry from....

Important semantics

The route should reset compile_progress.steps from the selected step onward, but each step needs its own data rules.

Suggested rules:

  • extract: keep existing extractions by default, retry only missing extractions unless force mode is explicit.
  • resolve: rerun resolver from existing extractions.
  • match: rerun match from existing sources/extractions.
  • plan: clear non-committed page_plans, rebuild plans from current settings and existing extracted data.
  • draft: rerun only planned/failed drafts by default, with optional force reset for drafted non-committed plans.
  • crossref: rerun on drafted pages.
  • commit: rerun on crossreffed pages.
  • schema: rerun schema only.

Safety requirements

  • Refuse to start if another compile session is queued/running.
  • Show what will be rerun before starting.
  • Do not delete committed pages unless an explicit destructive mode exists.
  • Keep the existing retry and retry-failed paths working.
  • Log the selected retry step in activity/history.

Acceptance criteria

  • User can change entity_promotion_threshold, retry from plan, and get new plans without refetching URLs.
  • User can retry from draft without re-extracting sources.
  • User can retry from crossref or commit for downstream repair.
  • The progress UI shows the selected retry range accurately.
  • Tests cover step reset behavior and page_plans cleanup rules.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions