-
Notifications
You must be signed in to change notification settings - Fork 87
Fix deadlock in RestrictedSecurity caused by JAR verification and hash check #1122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: openj9
Are you sure you want to change the base?
Conversation
|
I explored alternative approaches other than scanning all threads but couldn’t find any feasible solution. I still believe that scanning all thread stacks is the most reliable way to resolve this deadlock. The call to parser.checkHashValues() occurs only once, during the static initialization of the Providers class at JVM startup. At that early stage, the JVM typically has only a small number of active threads. I also added an improvement to skip scanning VM service threads using the following checks: In my test, the total number of scanned threads decreased from 18 (all threads) to 3 (after skipping VM service threads). Additionally, the scan terminates immediately after finding the first matching, so it doesn’t traverse entire stacks unnecessarily. So, I don’t believe scanning all thread stacks (skip scanning VM service threads) will have a noticeable performance impact compared to checking only the current thread’s stack. |
This isn't generally true. Yes the call to |
We have the following codes in the RestrictedSecurity.java. The |
|
Yes, I agreed that a call to However, you haven't addressed my concern that the number of threads present at the time that call occurs (if it occurs) is unbounded. I wrote a simple test that creates an arbitrary number of threads (e.g. 1000) before triggering loading the security infrastructure: tracing showed that 1019 threads stacks needed to be considered. Perhaps we need to revisit how we manage the interaction between |
To check the potential impact of scanning all thread stacks, I created a simple test program that spawns 1,000 active threads, each with 40 stack frames before invoking the same scanning logic used in And also, since |
|
@keithc-ca Any review suggestions or better ideas regarding this issue? |
|
I was thinking something like this: private static void checkHashValues(boolean fromProviders) {
ProfileParser parser = profileParser;
if (parser != null) {
if (fromProviders) {
enableCheckHashes = true;
}
if (enableCheckHashes && (hashPauseCount.get() == 0)) {
profileParser = null;
parser.checkHashValues();
}
}
}
private static final AtomicInteger hashPauseCount = new AtomicInteger(0);
public static void pauseHashCheck() {
if (profileParser != null) {
hashPauseCount.incrementAndGet();
}
}
public static void resumeHashCheck() {
if (profileParser != null) {
hashPauseCount.decrementAndGet();
}
}In each method of |
|
Yes, in our team meeting we discussed a similar approach using a pause counter in JarVerifier to pause the hash check while signature verification is in progress. I’ll work on identifying where this pause check should be added. |
8164e37 to
8759cf4
Compare
|
From the below deadlock stacks:
Also, I removed the
I re-ran the deadlock test after these code changes with my debug output — the output below shows it works as expected. @keithc-ca Please review and advise. |
| hashPauseCount.incrementAndGet(); | ||
| } | ||
|
|
||
| public static void resumeHashCheck() { | ||
| hashPauseCount.decrementAndGet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the hash check is complete, there's no need to update hashPauseCount:
if (profileParser != null) {
hashPauseCount.incrementAndGet();
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the profileParser != null check as suggested, but the Deadlock test still fails for the same reason.
I believe the cause is that when the MainThread enters pauseHashCheck(), the profileParser has not yet finished initializing, so it’s still null and the pause counter isn’t incremented. Meanwhile, Thread-1 enters checkHashValues() just after the profileParser finishes initializing. Since the pause count is still 0, it proceeds to call parser.checkHashValues() without skipping, which eventually triggers the deadlock.
I suggest two possible solutions:
-
Keep the current way without the
profileParser != nullcheck. When not in RestrictedSecurity mode, the pause counter will still be incremented and decremented in JarFile, but hash checking won’t actually occur sinceprofileParseris always null in non-RestrictedSecurity mode. -
Add a
RestrictedSecurity.isEnabled()check in JarFile beforeRestrictedSecurity.pauseHashCheck();andRestrictedSecurity.resumeHashCheck();to make sure the pause counter won't changed in non-RestrictedSecurity mode.
| */ | ||
| public static boolean canServiceBeRegistered(Service service) { | ||
| if (securityEnabled) { | ||
| checkHashValues(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should this be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the checkHashValues(false); call from RestrictedSecurity.canServiceBeRegistered() based on the following:
- The purpose of calling
checkHashValues(false);was to ensure that ifenableCheckHashesis set to true but the hash check is skipped due toisJarVerifierInStackTrace()returning false, then the hash check would eventually happen later. - However, both
isServiceAllowed()andisProviderAllowed()already invokecheckHashValues(false);before checking the services and providers. This guarantees the hash check will occur, so the call incanServiceBeRegistered()is redundant. - Most importantly, since the hash check is now paused during startup when another thread is performing jar-file verification, the hash check will later be triggered by the
canServiceBeRegistered()call — and this occurs during provider service registration, when the provider is not yet fully initialized. At that moment, the hash check will throw a“SHA256 No such algorithm”error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I agree with your assessment, and it's unrelated to the purpose of this change. Please leave this call alone; it can be proposed separately later, if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it back and tested again — it no longer causes the “SHA256: No such algorithm” error. It’s possible that my previous development or build environment had other code interfering with the result.
| if (jv != null && !jvInitialized) { | ||
| Object key = beginInit(); | ||
| try { | ||
| RestrictedSecurity.pauseHashCheck(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call should be made before entering the try statement.
I don't expect this is sufficient: I also see uses of JarVerifier in JarInputStream and Manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the deadlock test exception stacks:
"MainThread" prio=5 Id=19 WAITING
at [email protected]/sun.security.util.SignatureFileVerifier.<init>(SignatureFileVerifier.java:124)
at [email protected]/java.util.jar.JarVerifier.processEntry(JarVerifier.java:311)
at [email protected]/java.util.jar.JarVerifier.update(JarVerifier.java:242)
at [email protected]/java.util.jar.JarFile.initializeVerifier(JarFile.java:762)
at [email protected]/java.util.jar.JarFile.ensureInitialization(JarFile.java:1038)
- locked java.util.jar.JarFile@f4df9a0f
at [email protected]/java.util.jar.JavaUtilJarAccessImpl.ensureInitialization(JavaUtilJarAccessImpl.java:72)
The lock is held on java.util.jar.JarFile@f4df9a0f when entering the synchronized JarFile.ensureInitialization() method, so I added the pause counter there.
As for JarInputStream and Manifest, they don’t contain any synchronized methods or blocks, so they don’t directly hold any locks. In the case of JarVerifier, it is initialized from JarFile. Since I’ve already added the pause counter in synchronized method ensureInitialization() before entering the JarVerifier, that should be sufficient. Please advise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't disagree with the change to JarFile, but you haven't convinced me that there are no scenarios involving JarInputStream or Manifest that could have one or more methods of JarVerifier on the call stack that were detected via isJarVerifierInStackTrace() prior to this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the PR that originally introduced isJarVerifierInStackTrace(). It was designed to address a different issue — when a JAR verification process starts before the RestrictedSecurity profile is loaded, the hash calculation is triggered as part of that verification, causing a nested JAR verification and subsequent errors. To prevent this, the hash calculation for a profile was skipped if it was initiated by a JAR verification process and deferred until later in the loading sequence.
Since that issue is unrelated to the current PR, I’ve reintroduced isJarVerifierInStackTrace() to maintain its original behavior.
For this PR, it addresses the deadlock issue. The lock isn’t on JarVerifier — it’s on the JarFile, which is what causes the deadlock. Therefore, I added the pause counter inside the synchronized JarFile.ensureInitialization() method to pause the hash check whenever the JarFile is locked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe isJarVerifierInStackTrace() is necessary. Both problems can be solved by appropriate use of pauseHashCheck() and resumeHashCheck().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since isJarVerifierInStackTrace() addresses a different issue than the deadlock in this PR, I suggest handling its removal in a separate PR, as debugging and verifying isJarVerifierInStackTrace() involve an entirely different problem.
| enableCheckHashes = true; | ||
| } | ||
| if (enableCheckHashes && !isJarVerifierInStackTrace()) { | ||
| if (enableCheckHashes && !isJarVerifierInStackTrace() && (hashPauseCount.get() == 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method isJarVerifierInStackTrace() should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it back since the isJarVerifierInStackTrace() method has been restored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isJarVerifierInStackTrace() should not be restored; please use pause/resume instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. The isJarVerifierInStackTrace() fix relates to a different issue. If its removal is needed, I still recommend handling that in a separate PR. This current PR specifically addresses the issue caused by java.util.jar.JarFile and checkHashValues. If you'd like, I can open an issue for tracking and addressing whether isJarVerifierInStackTrace() can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You haven't convinced me that this addresses all the possible interactions between JarVerifier (possibly on other threads) and RestrictedSecurity. As I pointed out, there are uses of JarVerifier by classes other than JarFile.
Please have another look at #1122 (comment). As I said, there are only ten methods in total defined by that class. I don't think all of them need to use the pause/resume API contemplated here, but even if we start there (modify all ten methods), we don't need to change any users of that class and we don't need isJarVerifierInStackTrace() anymore.
002bf4a to
24a05d4
Compare
|
@keithc-ca Replied to some review comments and also updated the code — please review. |
closed/src/java.base/share/classes/openj9/internal/security/RestrictedSecurity.java
Outdated
Show resolved
Hide resolved
|
Please also update the commit summary line: it should be no more than 70 characters long. |
…h check This commit updates isJarVerifierInStackTrace() to inspect all thread stacks, resolving the deadlock that occurred in ProfileParser.checkHashValues(). In the failing scenario, one thread (Thread-1) calls SecureRandom.getInstance(), which triggers checkHashValues() in RestrictedSecurity mode. This method attempts to obtain a MessageDigest from provider OpenJCEPlusFIPS, requiring the corresponding module to be loaded. However, another thread (Thread-2) is already holding the loader lock while performing signature verification for a signed class. Thread-1 is then blocked waiting for that module to load, while Thread-2 waits for the MessageDigest initialization to complete, resulting in a circular deadlock. Previously, isJarVerifierInStackTrace() only checked the current thread’s stack to decide whether to skip check hash. It has now been updated to scan the stack traces of all threads, but except for the VM service threads, allowing it to detect active JAR verification in other non-VM threads and prevent potential cross-thread deadlocks. Signed-off-by: Tao Liu <[email protected]>
Updated the commit summary, no more than 70 characters long. |
keithc-ca
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please resolve the merge conflict.
| enableCheckHashes = true; | ||
| } | ||
| if (enableCheckHashes && !isJarVerifierInStackTrace()) { | ||
| if (enableCheckHashes && !isJarVerifierInStackTrace() && (hashPauseCount.get() == 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isJarVerifierInStackTrace() should not be restored; please use pause/resume instead.
This PR updates isJarVerifierInStackTrace() to inspect all thread stacks, resolving the deadlock that occurred in ProfileParser.checkHashValues().
In the failing scenario, one thread (Thread-1) calls SecureRandom.getInstance(), which triggers checkHashValues() in RestrictedSecurity mode. This method attempts to obtain a MessageDigest from provider OpenJCEPlusFIPS, requiring the corresponding module to be loaded. However, another thread (Thread-2) is already holding the loader lock while performing signature verification for a signed class. Thread-1 is then blocked waiting for that module to load, while Thread-2 waits for the MessageDigest initialization to complete, resulting in a circular deadlock.
Previously, isJarVerifierInStackTrace() only checked the current thread’s stack to decide whether to skip check hash. It has now been updated to scan the stack traces of all threads, but except for the VM service threads, allowing it to detect active JAR verification in other non-VM threads and prevent potential cross-thread deadlocks.