Describe the task
Investigate and fix the deduplication functionality in the spatial_join utility that is not properly removing duplicate opa_id records after spatial joins. During pipeline validation development, it was discovered that the expected deduplication by opa_id in the spatial join utility is failing, leading to duplicate records in the output. As a temporary workaround, individual deduplication logic was added to several pipeline services, but this addresses the symptom rather than the root cause. This ticket involves identifying why the spatial join deduplication is not working as designed, fixing the underlying issue, and then refactoring the affected services to remove the temporary workaround code.
Acceptance Criteria
Describe the task
Investigate and fix the deduplication functionality in the
spatial_joinutility that is not properly removing duplicateopa_idrecords after spatial joins. During pipeline validation development, it was discovered that the expected deduplication byopa_idin the spatial join utility is failing, leading to duplicate records in the output. As a temporary workaround, individual deduplication logic was added to several pipeline services, but this addresses the symptom rather than the root cause. This ticket involves identifying why the spatial join deduplication is not working as designed, fixing the underlying issue, and then refactoring the affected services to remove the temporary workaround code.Acceptance Criteria
spatial_joinutility to identify whyopa_iddeduplication is not functioning correctlyopa_iddeduplication