CDC techniques are generally recommended in the following order for production systems:
Log-based CDC → Native Change Feeds → Application-level Outbox → Trigger-based CDC → Query/Polling CDC
- Reads write-ahead logs / redo logs / binlogs generated by the DB engine.
- CDC connector maintains offsets / LSNs to ensure exactly-once or at-least-once delivery.
- Events are reconstructed with before/after images depending on DB and configuration.
- Typically streams into Kafka, Kinesis, Pulsar, or directly to sinks.
- PostgreSQL – WAL via logical decoding (
pgoutput) - MySQL / MariaDB – Row-based binary logs
- Oracle – Redo logs (GoldenGate, LogMiner)
- SQL Server – Transaction log
- MongoDB – Oplog
- DB2 – Log streams
- Full transaction context (commit boundaries, ordering)
- Supports hard deletes
- Handles multi-row and bulk operations efficiently
- Low write amplification on the source DB
- Decouples producers and consumers
- Enables replayability from log offsets
- Requires log retention tuning to avoid data loss
- Needs DDL handling strategy (schema registry, evolution rules)
- Can produce high event volumes that require downstream scaling
- Security concerns if logs contain sensitive data (PII masking needed)
- Offset-based recovery enables safe restarts
- Requires monitoring of lag, log growth, and connector health
✅ Primary choice for enterprise-grade CDC
Use for high-volume OLTP systems, real-time analytics, replication, and event streaming.
- Database exposes a managed stream or subscription
- Built-in checkpointing and retry
- Often integrated with cloud messaging services
- MongoDB – Change Streams
- DynamoDB – Streams → Kinesis
- Cosmos DB – Change Feed
- Cassandra – Native CDC (CommitLog)
- Managed cloud RDBMS CDC services
- No need to manage log offsets manually
- Highly available by design
- Tightly integrated with cloud ecosystem
- Simplified security and IAM integration
- Retention limits can cause irreversible data loss if consumers lag
- Limited historical replay
- Often less granular control over events
- Pricing can increase with throughput
- Recovery limited to retention window
- Requires consumer lag monitoring to avoid expiration
✅ Best option for NoSQL and fully managed cloud databases
Choose when operational simplicity outweighs flexibility.
- Row-level triggers fire synchronously during DML
- Writes deltas into audit / shadow tables
- Downstream process polls or streams these tables
- Oracle, SQL Server, PostgreSQL, MySQL
- Fine-grained filtering logic possible inside triggers
- Can enforce custom business rules
- Works even on very old DB versions
- Transaction coupling (trigger failure blocks writes)
- Increased deadlock risk
- Hard to test and version-control
- Complicates bulk loads and migrations
- Trigger failures can halt application writes
- Recovery often requires manual intervention
Use only when log-based and native CDC are unavailable and write volume is modest.
- Periodic SQL queries select rows where a column (e.g.,
updated_at) is greater than the last processed value (the high-water mark).
SELECT *
FROM orders
WHERE updated_at > :last_processed_timestamp;- Any SQL database
- Some NoSQL stores with timestamps or sequences
- Lowest implementation cost
- Simple to debug and reason about
- Easy to integrate with legacy ETL tools
- Misses intermediate updates
- Deletes require schema changes or tombstone tables
- Scaling requires partitioned polling
- Can cause hot indexes
- Simple checkpoint recovery
- Risk of duplicate or missing rows if timestamps are inconsistent
Recommendation
- Application writes domain change + event in same transaction
- Outbox table read by:
- Polling job, or
- Log-based CDC connector
- Events published to message broker
Application → DB (Data + Outbox) → CDC → Kafka / Event Bus
Any RDBMS backing microservices
- Eliminates dual-write problem
- Produces business-semantic events
- Supports exactly-once semantics with idempotent consumers
- Decouples internal schema from external consumers
- Requires application refactoring
- Event schema versioning required
- Ordering across aggregates can be complex
- Backfills are harder than log-based CDC
- Transactional consistency guarantees no lost events
- Requires idempotent publishing and consumer design
✅ Highly recommended for microservices & DDD architectures Often combined with log-based CDC for analytics and auditing.
| Technique | Latency | Scalability | Operational Complexity | Best For |
|---|---|---|---|---|
| Log-Based | Very Low | Very High | High | Enterprise OLTP, analytics |
| Native Feeds | Low | High | Medium | Cloud & NoSQL systems |
| Outbox | Low | High | Medium | Microservices events |
| Triggers | Low | Low | Medium | Legacy DBs |
| Polling | High | Low | Low | Batch ETL |
- Enterprise Data Platforms: Log-based CDC + Kafka
- Cloud-Native NoSQL: Native change feeds
- Microservices: Outbox + messaging
- Legacy Systems: Triggers (carefully)
- Reporting ETL: Polling / high-water-mark