Skip to content

feat: add candidate-ID alignment filtering for query generation#107

Merged
lramir14 merged 4 commits intomainfrom
feat/querygen-candidate-id-filtering
Apr 13, 2026
Merged

feat: add candidate-ID alignment filtering for query generation#107
lramir14 merged 4 commits intomainfrom
feat/querygen-candidate-id-filtering

Conversation

@saschagobel
Copy link
Copy Markdown
Collaborator

@saschagobel saschagobel commented Apr 1, 2026

Summary

Introduce a deterministic filtering step for candidate-ID alignment in the synthetic query generation workflow. This enforces positional agreement between expected candidate IDs and LLM outputs.

Key changes

  • Add core/querygen/filtering.py:
    • filter_aligned_candidate_ids() for positional candidate-ID reconciliation
    • keeps only items whose candidate_id matches the expected ID at the same position
    • drops misaligned items without attempting reordering or heuristic repair
    • works for both planning (QueryBlueprint) and realization (RealizedQuery) outputs via a shared candidate_id field

Design notes

  • Alignment is treated as a hard positional contract:

    • shifted or mismatched IDs are considered invalid and are dropped
    • no attempt is made to repair drift, as this can silently corrupt downstream joins
  • This helper operates as a deterministic aggregate step:

    • not applied per batch
    • intended for use after concatenating stage outputs

Orchestration note

The intended call pattern in api/querygen.py is:

  • stage 1 (planning)
  • filtering (against pre-defined candidate IDs)
  • deduplication
  • stage 2 (realization)
  • filtering (against post–stage-1 selected candidate IDs)

Status

Ready for review.

Copy link
Copy Markdown
Collaborator

@lramir14 lramir14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The helper is simple, deterministic, and the tests cover the intended positional alignment behavior well. Only a minor non-blocking docstring nit from my side.

Comment thread src/pragmata/core/querygen/filtering.py Outdated
@saschagobel saschagobel force-pushed the feat/querygen-candidate-id-filtering branch from 535f120 to da73553 Compare April 10, 2026 19:07
@saschagobel saschagobel force-pushed the feat/querygen-candidate-id-filtering branch from da73553 to f6027d1 Compare April 12, 2026 17:52
@lramir14
Copy link
Copy Markdown
Collaborator

Re-reviewed after the latest commit. The docstring concern is addressed, and the implementation matches the intended deterministic positional filtering behavior described in the issue. Approving, and will proceed with the merge.

@lramir14 lramir14 merged commit 4ba9616 into main Apr 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants