Skip to content

Replace non-deterministic fallback IDs with explicit skip logic in EAC-CPF indexing#13

Merged
alexdryden merged 3 commits intoindex_creatorsfrom
copilot/sub-pr-8
Feb 26, 2026
Merged

Replace non-deterministic fallback IDs with explicit skip logic in EAC-CPF indexing#13
alexdryden merged 3 commits intoindex_creatorsfrom
copilot/sub-pr-8

Conversation

Copy link
Contributor

Copilot AI commented Feb 13, 2026

The traject configuration used Time.now.to_i and rand() to generate fallback IDs for creator records lacking standard identifiers. This broke idempotent indexing—the same record would receive different IDs across runs, creating duplicates instead of updates.

Changes

  • Remove non-deterministic fallback: Replace timestamp/random ID generation with context.skip!()
  • Add explicit error logging: Surface missing ID issues with clear context about what failed (no recordId, invalid filename pattern, missing entity data)

Rationale

Records reach the fallback only if:

  1. No <control>/<recordId> in XML
  2. Filename doesn't match creator_{type}_{id}.xml pattern
  3. No entity type/name to derive ID from

This indicates a data pipeline failure, not missing data. The normal flow (task_agent() → deterministic filenames) provides reliable IDs. Skipping malformed records preserves index integrity and surfaces issues for investigation rather than masking them with synthetic IDs.

# Before: Non-deterministic
fallback_id = "creator_unknown_#{Time.now.to_i}_#{rand(10000)}"
accumulator << fallback_id

# After: Explicit skip
context.logger.error("Cannot generate valid ID for record - skipping indexing. Source: #{source_file}")
context.skip!("Missing required ID data")

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…erministic fallbacks

Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Copilot AI changed the title [WIP] Update creator record generation and indexing alternatives Replace non-deterministic fallback IDs with explicit skip logic in EAC-CPF indexing Feb 13, 2026
Copilot AI requested a review from alexdryden February 13, 2026 18:58
@alexdryden alexdryden marked this pull request as ready for review February 26, 2026 16:37
@alexdryden alexdryden merged commit e1645c2 into index_creators Feb 26, 2026
@alexdryden alexdryden deleted the copilot/sub-pr-8 branch February 26, 2026 16:39
alexdryden added a commit that referenced this pull request Feb 26, 2026
…C-CPF indexing (#13)

* Skip indexing records without valid IDs instead of generating non-deterministic fallbacks

Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Co-authored-by: Alex Dryden <adryden3@illinois.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants