Skip to content

Commit e1645c2

Browse files
Copilotalexdryden
andauthored
Replace non-deterministic fallback IDs with explicit skip logic in EAC-CPF indexing (#13)
* Skip indexing records without valid IDs instead of generating non-deterministic fallbacks Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com> Co-authored-by: Alex Dryden <adryden3@illinois.edu>
1 parent 0fac5e0 commit e1645c2

1 file changed

Lines changed: 16 additions & 8 deletions

File tree

example_traject_config_eac_cpf.rb

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -79,10 +79,14 @@
7979
accumulator << generated_id
8080
context.logger.warn("Generated ID from name: #{generated_id}")
8181
else
82-
# Last resort: timestamp-based unique ID
83-
fallback_id = "creator_unknown_#{Time.now.to_i}_#{rand(10000)}"
84-
accumulator << fallback_id
85-
context.logger.error("Using fallback ID: #{fallback_id}")
82+
# No valid ID available - skip indexing this record
83+
# If we reach here, something has gone wrong with the data pipeline:
84+
# - No recordId in XML
85+
# - Filename doesn't match expected pattern
86+
# - No entity type or name in XML to generate from
87+
# Skipping ensures we don't create non-deterministic IDs that break idempotent indexing
88+
context.logger.error("Cannot generate valid ID for record - skipping indexing. Source: #{source_file}")
89+
context.skip!("Missing required ID data")
8690
end
8791
end
8892
else
@@ -102,10 +106,14 @@
102106
accumulator << generated_id
103107
context.logger.warn("Generated ID from name: #{generated_id}")
104108
else
105-
# Absolute last resort
106-
fallback_id = "creator_unknown_#{Time.now.to_i}_#{rand(10000)}"
107-
accumulator << fallback_id
108-
context.logger.error("Using fallback ID: #{fallback_id}")
109+
# No valid ID available - skip indexing this record
110+
# If we reach here, something has gone wrong with the data pipeline:
111+
# - No recordId in XML
112+
# - No filename available
113+
# - No entity type or name in XML to generate from
114+
# Skipping ensures we don't create non-deterministic IDs that break idempotent indexing
115+
context.logger.error("Cannot generate valid ID for record - skipping indexing. No filename or entity data available.")
116+
context.skip!("Missing required ID data")
109117
end
110118
end
111119
end

0 commit comments

Comments
 (0)