[DNM] kafka-consumer: Add logging for DDL and DML events to improve downstream write visibility #3671

tenfyzhong · 2025-12-17T08:15:01Z

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

This PR enhances the observability of the Kafka consumer's writer component by adding detailed logging for DDL and DML events before they are written to the downstream sink. The changes improve debugging and monitoring capabilities by providing a clear audit trail of data being processed.

Key changes include:

Added comprehensive Info level logs for DDL events being written to the downstream, capturing schema, table, commit timestamp, query, and DDL type.
Added detailed Info level logs for DML events being written to the downstream, capturing schema, table, table ID, commit timestamp, start timestamp, row count, and a string representation of the rows.
The logging is placed immediately before the sink operations (WriteBlockEvent for DDL and AddDMLEvent for DML) to ensure the logged information reflects the exact data being sent.
For DDL events, logging is added in both the case where no preceding DML events need to be flushed and the case where DML events are flushed first.
For DML events, logging is added in both the DDL-triggered flush path and the watermark-triggered flush path.
Removed a less informative log line after adding a DML event in the watermark flush path, as it's superseded by the new, more detailed log.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Manual testing steps:

Start a TiCDC cluster with Kafka sink configured.
Perform DDL operations (e.g., CREATE TABLE, ALTER TABLE) on the upstream TiDB.
Perform DML operations (e.g., INSERT, UPDATE, DELETE) on the upstream TiDB.
Observe the TiCDC logs to verify that the new detailed logs for DDL and DML events appear as expected when the Kafka consumer writer processes them.

Questions

Will it cause performance regression or break compatibility?

No. This change only adds logging at the Info level. It does not alter the data flow, processing logic, or any external interfaces. The performance impact is negligible as logging operations are inexpensive compared to the sink operations.

Do you need to update user documentation, design documentation or monitoring documentation?

No. This is an internal observability improvement and does not require updates to user-facing documentation.

Release note

None

- Add logging for DDL events before writing to downstream when no DML events are pending - Add logging for each DML event when writing to downstream in both DDL flush and watermark flush paths - Move existing DML flush log to more appropriate location with additional details - Include schema, table, commitTs, row count, and row data in logs for better observability Signed-off-by: tenfyzhong <tenfy@tenfy.cn>

ti-chi-bot · 2025-12-17T08:15:07Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hongyunyan for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist · 2025-12-17T08:15:19Z

Summary of Changes

Hello @tenfyzhong, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the observability of the Kafka consumer's writer component by integrating detailed Info level logging for both DDL and DML events. These new logs are strategically placed immediately before events are written to the downstream sink, providing a precise audit trail of data processing. The change aims to improve debugging and monitoring capabilities without impacting performance or compatibility.

Highlights

Enhanced DDL Event Logging: Introduced detailed Info level logging for DDL events, capturing schema, table, commit timestamp, query, and DDL type, just before they are written to the downstream sink. This logging is applied in both scenarios: when no preceding DML events need flushing and when DML events are flushed first.
Enhanced DML Event Logging: Implemented comprehensive Info level logging for DML events, including schema, table, table ID, commit timestamp, start timestamp, row count, and a string representation of rows, prior to being sent to the downstream. This logging covers both DDL-triggered and watermark-triggered flush paths.
Improved Log Detail: Replaced a less informative DML log line in the watermark flush path with the new, more detailed logging, providing richer context for debugging and monitoring.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request enhances observability by adding detailed logging for DDL and DML events. The changes are well-targeted and improve visibility into the data being written to the downstream sink.

My review includes a few suggestions:

A high-severity concern regarding the logging of full DML row data at the Info level, which could impact performance and security. I've recommended removing this verbose field.
A medium-severity suggestion to improve the readability of DDL event logs by logging the event type as a string.
A medium-severity suggestion to refactor the new logging logic into helper functions to reduce code duplication and improve maintainability.

Overall, these are valuable additions for debugging and monitoring. Addressing the feedback will make the implementation more robust and production-ready.

gemini-code-assist · 2025-12-17T08:16:59Z

cmd/kafka-consumer/writer.go

+		log.Info("write DML event to downstream",
+			zap.String("schema", e.TableInfo.GetSchemaName()),
+			zap.String("table", e.TableInfo.GetTableName()),
+			zap.Int64("tableID", e.GetTableID()),
+			zap.Uint64("commitTs", e.GetCommitTs()),
+			zap.Uint64("startTs", e.GetStartTs()),
+			zap.Int32("rowCount", e.Len()),
+			zap.String("rows", e.String()))


A couple of points on this DML logging:

(High Severity) Logging the full row content with zap.String("rows", e.String()) at the Info level can cause significant performance issues and log bloat, and may leak sensitive data in production environments. It's strongly recommended to remove this field from the log statement. The existing metadata provides good visibility. If full row data is needed for debugging, it should be at the Debug level.

(Maintainability) This logging logic is duplicated for the watermark flush path (lines 315-323). Extracting this into a shared helper function would improve code clarity and reduce redundancy.

log.Info("write DML event to downstream", zap.String("schema", e.TableInfo.GetSchemaName()), zap.String("table", e.TableInfo.GetTableName()), zap.Int64("tableID", e.GetTableID()), zap.Uint64("commitTs", e.GetCommitTs()), zap.Uint64("startTs", e.GetStartTs()), zap.Int32("rowCount", e.Len()))

gemini-code-assist · 2025-12-17T08:16:59Z

cmd/kafka-consumer/writer.go

+		log.Info("write DML event to downstream",
+			zap.String("schema", e.TableInfo.GetSchemaName()),
+			zap.String("table", e.TableInfo.GetTableName()),
+			zap.Int64("tableID", e.GetTableID()),
+			zap.Uint64("commitTs", e.GetCommitTs()),
+			zap.Uint64("startTs", e.GetStartTs()),
+			zap.Int32("rowCount", e.Len()),
+			zap.String("rows", e.String()))


Logging the full DML event content via e.String() at the Info level can introduce significant performance overhead and log bloat, especially with large transactions or frequent updates. It may also expose sensitive data in production logs. It's recommended to remove zap.String("rows", e.String()) to only log metadata about the DML event. If detailed row data is necessary for debugging, it should be logged at the Debug level.

log.Info("write DML event to downstream", zap.String("schema", e.TableInfo.GetSchemaName()), zap.String("table", e.TableInfo.GetTableName()), zap.Int64("tableID", e.GetTableID()), zap.Uint64("commitTs", e.GetCommitTs()), zap.Uint64("startTs", e.GetStartTs()), zap.Int32("rowCount", e.Len()))

gemini-code-assist · 2025-12-17T08:16:59Z

cmd/kafka-consumer/writer.go

+		log.Info("write DDL event to downstream",
+			zap.String("schema", ddl.GetSchemaName()),
+			zap.String("table", ddl.GetTableName()),
+			zap.Uint64("commitTs", ddl.GetCommitTs()),
+			zap.String("query", ddl.Query),
+			zap.Any("type", ddl.Type))


A few suggestions for this logging block:

The DDL event type is being logged as a raw byte value with zap.Any("type", ddl.Type). It would be more readable to log its string representation using ddl.GetDDLType(), for example with zap.Stringer("type", ddl.GetDDLType()).

This logging logic is duplicated later in the file (lines 215-221). To improve maintainability, consider extracting this into a helper function, for example logDDLEvent(ddl *commonEvent.DDLEvent). A similar refactoring could be applied to the DML event logging.

Suggested change

log.Info("write DDL event to downstream",

zap.String("schema", ddl.GetSchemaName()),

zap.String("table", ddl.GetTableName()),

zap.Uint64("commitTs", ddl.GetCommitTs()),

zap.String("query", ddl.Query),

zap.Any("type", ddl.Type))

log.Info("write DDL event to downstream",

zap.String("schema", ddl.GetSchemaName()),

zap.String("table", ddl.GetTableName()),

zap.Uint64("commitTs", ddl.GetCommitTs()),

zap.String("query", ddl.Query),

zap.Stringer("type", ddl.GetDDLType()))

gemini-code-assist · 2025-12-17T08:16:59Z

cmd/kafka-consumer/writer.go

+			log.Info("write DDL event to downstream",
+				zap.String("schema", ddl.GetSchemaName()),
+				zap.String("table", ddl.GetTableName()),
+				zap.Uint64("commitTs", ddl.GetCommitTs()),
+				zap.String("query", ddl.Query),
+				zap.Any("type", ddl.Type))


The DDL event type is being logged as a raw byte value using zap.Any("type", ddl.Type). This makes the log less readable as it will show a number instead of the action type. It would be more informative to log the string representation of the DDL type. You can achieve this by using ddl.GetDDLType(), which returns a model.ActionType that has a String() method. Using zap.Stringer is a good practice for this.

log.Info("write DDL event to downstream", zap.String("schema", ddl.GetSchemaName()), zap.String("table", ddl.GetTableName()), zap.Uint64("commitTs", ddl.GetCommitTs()), zap.String("query", ddl.Query), zap.Stringer("type", ddl.GetDDLType()))

- Add `build_kafka_consumer` target to build Docker image for Kafka consumer - Use DOCKER_BUILDKIT=1 for improved build performance - Tag image with KAFKA_CONSUMER_TAG or fallback to GITBRANCH Signed-off-by: tenfyzhong <tenfy@tenfy.cn>

…vents" This reverts commit ac94e7a.

…events" This reverts commit d13b978.

ti-chi-bot · 2025-12-29T02:50:58Z

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

_{📖 For more info, you can check the "Contribute Code" section in the development guide.}

ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. labels Dec 17, 2025

ti-chi-bot bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 17, 2025

tenfyzhong changed the title ~~Add logging for DDL and DML events to improve downstream write visibility~~ [DNM] kafka-consumer: Add logging for DDL and DML events to improve downstream write visibility Dec 17, 2025

gemini-code-assist bot reviewed Dec 17, 2025

View reviewed changes

ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 19, 2025

Revert "chroe(kafka-consumer): add detailed logging for DDL and DML e…

d13b978

…vents" This reverts commit ac94e7a.

ti-chi-bot bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 23, 2025

Reapply "chroe(kafka-consumer): add detailed logging for DDL and DML …

aadd772

…events" This reverts commit d13b978.

ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DNM] kafka-consumer: Add logging for DDL and DML events to improve downstream write visibility #3671

[DNM] kafka-consumer: Add logging for DDL and DML events to improve downstream write visibility #3671

tenfyzhong commented Dec 17, 2025

Uh oh!

ti-chi-bot bot commented Dec 17, 2025

Uh oh!

gemini-code-assist bot commented Dec 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 17, 2025

Uh oh!

gemini-code-assist bot Dec 17, 2025

Uh oh!

gemini-code-assist bot Dec 17, 2025

Uh oh!

gemini-code-assist bot Dec 17, 2025

Uh oh!

ti-chi-bot bot commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[DNM] kafka-consumer: Add logging for DDL and DML events to improve downstream write visibility #3671

Are you sure you want to change the base?

[DNM] kafka-consumer: Add logging for DDL and DML events to improve downstream write visibility #3671

Conversation

tenfyzhong commented Dec 17, 2025

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Uh oh!

ti-chi-bot bot commented Dec 17, 2025

Uh oh!

gemini-code-assist bot commented Dec 17, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant