Skip to content

Conversation

@leixm
Copy link
Contributor

@leixm leixm commented Oct 31, 2025

What changes were proposed in this pull request?

After RATIS-2329, NettyRpcProxy will close connection in exceptionCaught, NettyClient#writeAndFlush should support throwing AlreadyClosedException, otherwise, the following problems will occur.

25/10/29 11:13:52,786 WARN [1@group-47BEDE733167->2-LogAppenderDefault-LogAppenderDaemon] LogAppender: 1@group-47BEDE733167->2-LogAppenderDefault: Failed to installSnapshot SingleFileSnapshotInfo(t:1, i:197600537):[/mnt/ssd/0/celeborn_ratis/c5196f6d-2c34-3ed3-8b8a-47bede733167/sm/snapshot.1_197600537]: java.lang.IllegalStateException: STATE MISMATCHED: In NettyClient, current state CLOSED is not one of the expected states [RUNNING]

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/RATIS-2349

How was this patch tested?

Existing UTs.

@szetszwo
Copy link
Contributor

@leixm , thanks for working no this! Please file another JIRA for the follow up. It would be easier to track the issues.

@OneSizeFitsQuorum
Copy link
Contributor

OneSizeFitsQuorum commented Nov 2, 2025

Hi, Please take a look for the failed UT, could you reproduce it in your local environment?

Error:  org.apache.ratis.grpc.TestReadOnlyRequestWithGrpc.testReadAfterWrite -- Time elapsed: 1.677 s <<< FAILURE!
org.opentest4j.AssertionFailedError: expected: <true> but was: <false>
	at org.apache.ratis.ReadOnlyRequestTests.testReadAfterWriteImpl(ReadOnlyRequestTests.java:314)
	at org.apache.ratis.server.impl.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:143)
	at org.apache.ratis.server.impl.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:121)
	at org.apache.ratis.ReadOnlyRequestTests.testReadAfterWrite(ReadOnlyRequestTests.java:289)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at java.util.ArrayList.forEach(ArrayList.java:1259)

@leixm
Copy link
Contributor Author

leixm commented Nov 3, 2025

Hi, Please take a look for the failed UT, could you reproduce it in your local environment?

Error:  org.apache.ratis.grpc.TestReadOnlyRequestWithGrpc.testReadAfterWrite -- Time elapsed: 1.677 s <<< FAILURE!
org.opentest4j.AssertionFailedError: expected: <true> but was: <false>
	at org.apache.ratis.ReadOnlyRequestTests.testReadAfterWriteImpl(ReadOnlyRequestTests.java:314)
	at org.apache.ratis.server.impl.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:143)
	at org.apache.ratis.server.impl.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:121)
	at org.apache.ratis.ReadOnlyRequestTests.testReadAfterWrite(ReadOnlyRequestTests.java:289)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at java.util.ArrayList.forEach(ArrayList.java:1259)

I tried to reproduce this unit test failure locally, this looks wired.

  • Running testReadAfterWrite alone resulted in a stable pass.
  • Running TestReadOnlyRequestWithGrpc mostly succeeded, with a small chance of failing, but only testFollowerLinearizableReadParallel and testFollowerLeaseReadParallel failed; testReadAfterWrite did not fail.

@leixm leixm changed the title Follow RATIS-2329. NettyClient#writeAndFlush should support throwing AlreadyClosedException RATIS-2349. NettyClient#writeAndFlush should support throwing AlreadyClosedException Nov 3, 2025
@szetszwo
Copy link
Contributor

szetszwo commented Nov 3, 2025

@leixm , @OneSizeFitsQuorum , since this PR only change ratis -netty but not ratis -grpc, the test failure of TestReadOnlyRequestWithGrpc must be unrelated.

Just have submitted a 10x10 run on master: https://github.com/apache/ratis/actions/runs/19023405871

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

@szetszwo
Copy link
Contributor

szetszwo commented Nov 3, 2025

Just have submitted a 10x10 run on master: https://github.com/apache/ratis/actions/runs/19023405871

It does fail intermittently; filed RATIS-2350

@szetszwo szetszwo merged commit c69361c into apache:master Nov 3, 2025
29 of 30 checks passed
@szetszwo
Copy link
Contributor

szetszwo commented Nov 3, 2025

@leixm , thanks for working on this!

@OneSizeFitsQuorum , thanks for reviewing this!

szetszwo pushed a commit to szetszwo/ratis that referenced this pull request Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants