-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-36746][core] Fix the deadlock bug in SerializedThrowable #27186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| serializedThrowable = (SerializedThrowable) s; | ||
| } else { | ||
| serializedThrowable = new SerializedThrowable(s); | ||
| if (alreadySeen.add(s)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is draught - it would be worth checking if the throwable is already seen before doing any logic to consider adding it.
e1f838c to
41a9b67
Compare
|
Hi, @Myasuka Could u help take a look if you had the free time ? |
Co-authored-by: raoraoxiong <[email protected]>
41a9b67 to
f947dd1
Compare
Myasuka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please improve the test code to avoid possible endless running.
| } | ||
|
|
||
| @Test | ||
| void testCyclicSuppressedThrowableConcurrentSerialized() throws InterruptedException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't this test is stable without timeout limit, if we do not fix the original code in flink-core/src/main/java/org/apache/flink/util/SerializedThrowable.java, this test would run endlessly without quit properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @Myasuka Thanks for your reminding.
I increased the number of test runs for the current test case using RepeatedTest and found that deadlocks can indeed still occur.
We considered a solution: making all methods of SerializedThrowable private, thereby controlling the creation of SerializedThrowable objects through a single unified entry point. The goal was to try to elevate the locking hierarchy to the serialized exception object itself.
Unfortunately, the following situation can still occur:
When one thread calls the unified creation method, another thread may already be holding the object lock of another exception object referenced within the exception. In such a case, a deadlock can still potentially happen.
Looking forward to your thoughts on this~
Any input is aprreciated !
What is the purpose of the change
Brief change log
Verifying this change
Add the related test cases in org.apache.flink.runtime.util.SerializedThrowableTest
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes / no)Documentation