Skip to content

Replace COPY with INSERT ON CONFLICT for dedup support#620

Open
alxtkr77 wants to merge 1 commit intomlrun:developmentfrom
alxtkr77:fix/ML-12076-tsdb-dedup-on-conflict
Open

Replace COPY with INSERT ON CONFLICT for dedup support#620
alxtkr77 wants to merge 1 commit intomlrun:developmentfrom
alxtkr77:fix/ML-12076-tsdb-dedup-on-conflict

Conversation

@alxtkr77
Copy link
Member

Summary

  • Replace PostgreSQL COPY protocol with executemany + INSERT ... ON CONFLICT DO NOTHING in TimescaleDBTarget
  • When the target table has a UNIQUE constraint, duplicate rows from Kafka rebalance re-delivery are silently dropped at write time
  • Tables without UNIQUE constraints are unaffected — ON CONFLICT DO NOTHING with no matching constraint behaves as a regular INSERT

Changes Made

  • storey/timescaledb_target.py: Replace cur.copy(COPY ... FROM STDIN) with cur.executemany(INSERT ... ON CONFLICT DO NOTHING, records)
  • integration/test_timescaledb.py: Add test_timescaledb_dedup_with_unique_constraint — creates table with UNIQUE(endpoint_id, end_infer_time), emits 3 events + 3 duplicates, verifies only 3 rows stored

Testing

  • All 24 integration tests pass (including new dedup test)
  • Tested against TimescaleDB on vmdev61

Reference

  • Jira: ML-11979

Replace PostgreSQL COPY protocol with executemany INSERT ... ON CONFLICT
DO NOTHING in TimescaleDBTarget. This silently drops duplicate rows when
the table has a UNIQUE constraint, preventing duplicate predictions from
Kafka rebalance re-delivery.

Tables without UNIQUE constraints are unaffected — ON CONFLICT DO NOTHING
with no matching constraint behaves as a regular INSERT.

- Replace COPY FROM STDIN with executemany INSERT ... ON CONFLICT DO NOTHING
- Add integration test proving dedup with UNIQUE constraint
- Update docstrings to reflect new write mechanism

Fixes: ML-11979
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant