Skip to content

[FLINK-39699][tests] Stabilize flaky tests in KafkaSinkITCase and KafkaWriterFaultToleranceITCase#254

Merged
1996fanrui merged 3 commits into
apache:mainfrom
Savonitar:FLINK-39699
May 19, 2026
Merged

[FLINK-39699][tests] Stabilize flaky tests in KafkaSinkITCase and KafkaWriterFaultToleranceITCase#254
1996fanrui merged 3 commits into
apache:mainfrom
Savonitar:FLINK-39699

Conversation

@Savonitar
Copy link
Copy Markdown
Contributor

@Savonitar Savonitar commented May 18, 2026

Stabilizes three related CI flakes across two test classes. All three failures shared the same family of symptom, a one-shot read or unsynchronized exception assertion that worked on local machines but raced under CI load.

E.g. In PR #252 we are facing flaky test issue


2026-05-18T06:42:15.7157135Z Test org.apache.flink.connector.kafka.sink.KafkaSinkITCase.rescaleListing[5->2] failed with:
2026-05-18T06:42:15.7157690Z java.lang.AssertionError:
2026-05-18T06:42:15.7158029Z Expecting Optional to contain a value but it was empty.
2026-05-18T06:42:15.7158745Z    at org.apache.flink.connector.kafka.sink.KafkaSinkITCase.getCheckpointPath(KafkaSinkITCase.java:463)
2026-05-18T06:42:15.7159765Z    at org.apache.flink.connector.kafka.sink.KafkaSinkITCase.rescaleListing(KafkaSinkITCase.java:425)

Copy link
Copy Markdown
Contributor

@spuru9 spuru9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Added a few comments.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too is a similar function, worth replacing as well. WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, replaced 👍

void setUp() {
topic = UUID.randomUUID().toString();
createTestTopic(topic, 1, TOPIC_REPLICATION_FACTOR);
Properties adminProperties = new Properties();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getKafkaClientConfiguration can be reused in the setup, the properties seems generic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getKafkaClientConfiguration() returns consumer properties (+zookeeper properties, which will be tackled in https://issues.apache.org/jira/browse/FLINK-39705).
Meanwhile here, we need only admin/generic properties.
WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes then looks fine

@Savonitar Savonitar marked this pull request as draft May 18, 2026 19:58
…ilable tests

testFlush/testWrite/testCloseExceptionWhenKafkaUnavailable all rely on
the producer still having undelivered work when KAFKA_CONTAINER.stop()
takes effect. Under CI load the sender thread can ship and ack the
buffered record before stop() returns, so the operation under test
(flush/write/close) has nothing to fail on and the .rootCause()
assertion fires with "Expecting actual not to be null" instead of
seeing the expected NetworkException / TimeoutException.

Drain a warm-up record before stopping the broker, then issue the real
write while the broker is down. The producer's metadata is already
cached so write() returns immediately; the sender fails to deliver
(retries=0); the operation under test reliably surfaces the underlying
exception.
@Savonitar Savonitar changed the title [FLINK-39699][tests] Wait for completed checkpoint stats in KafkaSinkITCase [FLINK-39699][tests] Stabilize flaky tests in KafkaSinkITCase and KafkaWriterFaultToleranceITCase May 19, 2026
@Savonitar Savonitar requested a review from spuru9 May 19, 2026 11:00
@spuru9
Copy link
Copy Markdown
Contributor

spuru9 commented May 19, 2026

@Savonitar Can you move the PR to ready for review? Also is the other PR too now part of this as mentioned in the PR title.

@Savonitar
Copy link
Copy Markdown
Contributor Author

@Savonitar Can you move the PR to ready for review? Also is the other PR too now part of this as mentioned in the PR title.

I will when it is ready.

Copy link
Copy Markdown
Contributor

@spuru9 spuru9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Savonitar Savonitar marked this pull request as ready for review May 19, 2026 13:15
Copy link
Copy Markdown
Member

@1996fanrui 1996fanrui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, LGTM

@1996fanrui 1996fanrui merged commit 887d594 into apache:main May 19, 2026
13 checks passed
@Savonitar
Copy link
Copy Markdown
Contributor Author

Thanks for the reviews/comments/merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants