Skip to content

Conversation

@taoliult
Copy link
Contributor

@taoliult taoliult commented Oct 28, 2025

This PR updates isJarVerifierInStackTrace() to inspect all thread stacks, resolving the deadlock that occurred in ProfileParser.checkHashValues().

In the failing scenario, one thread (Thread-1) calls SecureRandom.getInstance(), which triggers checkHashValues() in RestrictedSecurity mode. This method attempts to obtain a MessageDigest from provider OpenJCEPlusFIPS, requiring the corresponding module to be loaded. However, another thread (Thread-2) is already holding the loader lock while performing signature verification for a signed class. Thread-1 is then blocked waiting for that module to load, while Thread-2 waits for the MessageDigest initialization to complete, resulting in a circular deadlock.

Previously, isJarVerifierInStackTrace() only checked the current thread’s stack to decide whether to skip check hash. It has now been updated to scan the stack traces of all threads, but except for the VM service threads, allowing it to detect active JAR verification in other non-VM threads and prevent potential cross-thread deadlocks.

@taoliult
Copy link
Contributor Author

I explored alternative approaches other than scanning all threads but couldn’t find any feasible solution. I still believe that scanning all thread stacks is the most reliable way to resolve this deadlock.

The call to parser.checkHashValues() occurs only once, during the static initialization of the Providers class at JVM startup. At that early stage, the JVM typically has only a small number of active threads.

I also added an improvement to skip scanning VM service threads using the following checks:

Thread th = e.getKey();
boolean isVmThread = th.isDaemon() || th.getPriority() > 7;
if (isVmThread) continue; // Skip VM service threads.

In my test, the total number of scanned threads decreased from 18 (all threads) to 3 (after skipping VM service threads).

Additionally, the scan terminates immediately after finding the first matching, so it doesn’t traverse entire stacks unnecessarily.

So, I don’t believe scanning all thread stacks (skip scanning VM service threads) will have a noticeable performance impact compared to checking only the current thread’s stack.

@keithc-ca
Copy link
Member

The call to parser.checkHashValues() occurs only once, during the static initialization of the Providers class at JVM startup. At that early stage, the JVM typically has only a small number of active threads.

This isn't generally true. Yes the call to parser.checkHashValues() will occur at most once, but there's no guarantee that it will be called early so there's no bound on the number of active threads when any of the (possibly many) calls to RestrictedSecurity.checkHashValues(boolean) each of which may call isJarVerifierInStackTrace().

@taoliult
Copy link
Contributor Author

The call to parser.checkHashValues() occurs only once, during the static initialization of the Providers class at JVM startup. At that early stage, the JVM typically has only a small number of active threads.

This isn't generally true. Yes the call to parser.checkHashValues() will occur at most once, but there's no guarantee that it will be called early so there's no bound on the number of active threads when any of the (possibly many) calls to RestrictedSecurity.checkHashValues(boolean) each of which may call isJarVerifierInStackTrace().

We have the following codes in the RestrictedSecurity.java.

    public static void checkHashValues() {
        checkHashValues(true);
    }

    private static void checkHashValues(boolean fromProviders) {
        ProfileParser parser = profileParser;
        if (parser != null) {
            if (fromProviders) {
                enableCheckHashes = true;
            }
            if (enableCheckHashes && !isJarVerifierInStackTrace()) {
                profileParser = null;
                parser.checkHashValues();
            }
        }
    }

The parser.checkHashValues() can only be invoked once per JVM because the static profileParser reference is set to null immediately before the parser.checkHashValues(), ensuring that all subsequent invocations detect profileParser == null and therefore not invoke parser.checkHashValues() again.

@keithc-ca
Copy link
Member

Yes, I agreed that a call to parser.checkHashValues() will occur at most once.

However, you haven't addressed my concern that the number of threads present at the time that call occurs (if it occurs) is unbounded. I wrote a simple test that creates an arbitrary number of threads (e.g. 1000) before triggering loading the security infrastructure: tracing showed that 1019 threads stacks needed to be considered.

Perhaps we need to revisit how we manage the interaction between JarVerifier and RestrictedSecurity.

@taoliult
Copy link
Contributor Author

Yes, I agreed that a call to parser.checkHashValues() will occur at most once.

However, you haven't addressed my concern that the number of threads present at the time that call occurs (if it occurs) is unbounded. I wrote a simple test that creates an arbitrary number of threads (e.g. 1000) before triggering loading the security infrastructure: tracing showed that 1019 threads stacks needed to be considered.

Perhaps we need to revisit how we manage the interaction between JarVerifier and RestrictedSecurity.

To check the potential impact of scanning all thread stacks, I created a simple test program that spawns 1,000 active threads, each with 40 stack frames before invoking the same scanning logic used in isJarVerifierInStackTrace(). Under this scenario, the total scan completed in roughly 11–23 ms using fyre redhat VM (average is 16 ms) with all 1,000 threads traversed.

And also, since Thread.getAllStackTraces() performs an in-memory snapshot without blocking or I/O, and this check runs only once during JVM initialization. So, the scan adds only a few milliseconds, which I don’t think it will have a real impact on performance. Please advise.

@taoliult
Copy link
Contributor Author

taoliult commented Nov 5, 2025

@keithc-ca Any review suggestions or better ideas regarding this issue?

@keithc-ca
Copy link
Member

I was thinking something like this:

    private static void checkHashValues(boolean fromProviders) {
        ProfileParser parser = profileParser;
        if (parser != null) {
            if (fromProviders) {
                enableCheckHashes = true;
            }
            if (enableCheckHashes && (hashPauseCount.get() == 0)) {
                profileParser = null;
                parser.checkHashValues();
            }
        }
    }

    private static final AtomicInteger hashPauseCount = new AtomicInteger(0);

    public static void pauseHashCheck() {
        if (profileParser != null) {
            hashPauseCount.incrementAndGet();
        }
    }

    public static void resumeHashCheck() {
        if (profileParser != null) {
            hashPauseCount.decrementAndGet();
        }
    }

In each method of JarVerifier, that are entry points to starting jar verification, we would add a pair of calls RestrictedSecurity.pauseHashCheck(); and RestrictedSecurity.resumeHashCheck(); (using try / finally as necessary). I count only ten methods in JarVerifier - not all of them will need to change (for example, the constructor doesn't need to change).

@taoliult
Copy link
Contributor Author

taoliult commented Nov 6, 2025

Yes, in our team meeting we discussed a similar approach using a pause counter in JarVerifier to pause the hash check while signature verification is in progress. I’ll work on identifying where this pause check should be added.

@taoliult taoliult force-pushed the Deadlock2 branch 4 times, most recently from 8164e37 to 8759cf4 Compare November 10, 2025 16:08
@taoliult
Copy link
Contributor Author

From the below deadlock stacks:

"MainThread" prio=5 Id=19 WAITING
        at [email protected]/sun.security.util.SignatureFileVerifier.<init>(SignatureFileVerifier.java:124)
        at [email protected]/java.util.jar.JarVerifier.processEntry(JarVerifier.java:311)
        at [email protected]/java.util.jar.JarVerifier.update(JarVerifier.java:242)
        at [email protected]/java.util.jar.JarFile.initializeVerifier(JarFile.java:762)
        at [email protected]/java.util.jar.JarFile.ensureInitialization(JarFile.java:1038)
        - locked java.util.jar.JarFile@f4df9a0f
        at [email protected]/java.util.jar.JavaUtilJarAccessImpl.ensureInitialization(JavaUtilJarAccessImpl.java:72)
……
        at [email protected]/java.lang.ClassLoader.loadClass(ClassLoader.java:1094)
        at app//Deadlock.main(Deadlock.java:62)

"Thread-1" prio=5 Id=20 BLOCKED on java.util.jar.JarFile@f4df9a0f owned by "MainThread" Id=19
        at [email protected]/java.util.zip.ZipFile.getEntry(ZipFile.java:346)
        at [email protected]/java.util.jar.JarFile.getEntry(JarFile.java:518)
        at [email protected]/java.util.jar.JarFile.getJarEntry(JarFile.java:473)
……
        at [email protected]/java.security.MessageDigest.getInstance(MessageDigest.java:248)
        at [email protected]/openj9.internal.security.RestrictedSecurity$ProfileParser.checkHashValues(RestrictedSecurity.java:1700)
        at [email protected]/openj9.internal.security.RestrictedSecurity.checkHashValues(RestrictedSecurity.java:192)
        at [email protected]/openj9.internal.security.RestrictedSecurity.checkHashValues(RestrictedSecurity.java:173)
        at [email protected]/sun.security.jca.Providers.<clinit>(Providers.java:114)
        at [email protected]/sun.security.jca.GetInstance.getInstance(GetInstance.java:156)
        at [email protected]/java.security.SecureRandom.getInstance(SecureRandom.java:406)
        at app//Deadlock.run(Deadlock.java:46)
        at [email protected]/java.lang.Thread.run(Thread.java:853)

MainThread holds the lock on java.util.jar.JarFile@f4df9a0f when entering the synchronized JarFile.ensureInitialization() and then waits for the provider to initialize. Meanwhile, Thread-1 is in checkHashValues() and is blocked waiting for the same java.util.jar.JarFile lock during provider initialization. So, for adding the pause counter, JarFile.ensureInitialization() method should be the correct place for adding it.

Also, I removed the checkHashValues(false); call from RestrictedSecurity.canServiceBeRegistered() based on the following:

  1. The purpose of calling checkHashValues(false); was to ensure that if enableCheckHashes is set to true but the hash check is skipped due to isJarVerifierInStackTrace() returning false, then the hash check would eventually happen later.
  2. However, both isServiceAllowed() and isProviderAllowed() already invoke checkHashValues(false); before checking the services and providers. This guarantees the hash check will occur, so the call in canServiceBeRegistered() is redundant.
  3. Most importantly, since the hash check is now paused during startup when another thread is performing jar-file verification, the hash check will later be triggered by the canServiceBeRegistered() call — and this occurs during provider service registration, when the provider is not yet fully initialized. At that moment, the hash check will throw a “SHA256 No such algorithm” error.

I re-ran the deadlock test after these code changes with my debug output — the output below shows it works as expected.

PAUSE hashPauseCount.get(): 1
PAUSE HashCheck in JarFile
RESUME hashPauseCount.get(): 0
RESUME HashCheck in JarFile
Loading class: com.abc.Tst1
SystemClassLoader: jdk.internal.loader.ClassLoaders$AppClassLoader@3a65be7c
PAUSE hashPauseCount.get(): 1
PAUSE HashCheck in JarFile
checkHashValues hashPauseCount.get(): 1
RESUME hashPauseCount.get(): 0
RESUME HashCheck in JarFile
OK: class com.abc.Tst1
checkHashValues hashPauseCount.get(): 0
getInstance() ok: SecureRandom
    
STATUS:Passed.

@keithc-ca Please review and advise.

Comment on lines 177 to 169
hashPauseCount.incrementAndGet();
}

public static void resumeHashCheck() {
hashPauseCount.decrementAndGet();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the hash check is complete, there's no need to update hashPauseCount:

        if (profileParser != null) {
            hashPauseCount.incrementAndGet();
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the profileParser != null check as suggested, but the Deadlock test still fails for the same reason.

I believe the cause is that when the MainThread enters pauseHashCheck(), the profileParser has not yet finished initializing, so it’s still null and the pause counter isn’t incremented. Meanwhile, Thread-1 enters checkHashValues() just after the profileParser finishes initializing. Since the pause count is still 0, it proceeds to call parser.checkHashValues() without skipping, which eventually triggers the deadlock.

I suggest two possible solutions:

  1. Keep the current way without the profileParser != null check. When not in RestrictedSecurity mode, the pause counter will still be incremented and decremented in JarFile, but hash checking won’t actually occur since profileParser is always null in non-RestrictedSecurity mode.

  2. Add a RestrictedSecurity.isEnabled() check in JarFile before RestrictedSecurity.pauseHashCheck(); and RestrictedSecurity.resumeHashCheck(); to make sure the pause counter won't changed in non-RestrictedSecurity mode.

*/
public static boolean canServiceBeRegistered(Service service) {
if (securityEnabled) {
checkHashValues(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the checkHashValues(false); call from RestrictedSecurity.canServiceBeRegistered() based on the following:

  1. The purpose of calling checkHashValues(false); was to ensure that if enableCheckHashes is set to true but the hash check is skipped due to isJarVerifierInStackTrace() returning false, then the hash check would eventually happen later.
  2. However, both isServiceAllowed() and isProviderAllowed() already invoke checkHashValues(false); before checking the services and providers. This guarantees the hash check will occur, so the call in canServiceBeRegistered() is redundant.
  3. Most importantly, since the hash check is now paused during startup when another thread is performing jar-file verification, the hash check will later be triggered by the canServiceBeRegistered() call — and this occurs during provider service registration, when the provider is not yet fully initialized. At that moment, the hash check will throw a “SHA256 No such algorithm” error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree with your assessment, and it's unrelated to the purpose of this change. Please leave this call alone; it can be proposed separately later, if desired.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it back and tested again — it no longer causes the “SHA256: No such algorithm” error. It’s possible that my previous development or build environment had other code interfering with the result.

if (jv != null && !jvInitialized) {
Object key = beginInit();
try {
RestrictedSecurity.pauseHashCheck();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call should be made before entering the try statement.

I don't expect this is sufficient: I also see uses of JarVerifier in JarInputStream and Manifest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the deadlock test exception stacks:

"MainThread" prio=5 Id=19 WAITING
        at [email protected]/sun.security.util.SignatureFileVerifier.<init>(SignatureFileVerifier.java:124)
        at [email protected]/java.util.jar.JarVerifier.processEntry(JarVerifier.java:311)
        at [email protected]/java.util.jar.JarVerifier.update(JarVerifier.java:242)
        at [email protected]/java.util.jar.JarFile.initializeVerifier(JarFile.java:762)
        at [email protected]/java.util.jar.JarFile.ensureInitialization(JarFile.java:1038)
        - locked java.util.jar.JarFile@f4df9a0f
        at [email protected]/java.util.jar.JavaUtilJarAccessImpl.ensureInitialization(JavaUtilJarAccessImpl.java:72)

The lock is held on java.util.jar.JarFile@f4df9a0f when entering the synchronized JarFile.ensureInitialization() method, so I added the pause counter there.

As for JarInputStream and Manifest, they don’t contain any synchronized methods or blocks, so they don’t directly hold any locks. In the case of JarVerifier, it is initialized from JarFile. Since I’ve already added the pause counter in synchronized method ensureInitialization() before entering the JarVerifier, that should be sufficient. Please advise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't disagree with the change to JarFile, but you haven't convinced me that there are no scenarios involving JarInputStream or Manifest that could have one or more methods of JarVerifier on the call stack that were detected via isJarVerifierInStackTrace() prior to this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the PR that originally introduced isJarVerifierInStackTrace(). It was designed to address a different issue — when a JAR verification process starts before the RestrictedSecurity profile is loaded, the hash calculation is triggered as part of that verification, causing a nested JAR verification and subsequent errors. To prevent this, the hash calculation for a profile was skipped if it was initiated by a JAR verification process and deferred until later in the loading sequence.

Since that issue is unrelated to the current PR, I’ve reintroduced isJarVerifierInStackTrace() to maintain its original behavior.

For this PR, it addresses the deadlock issue. The lock isn’t on JarVerifier — it’s on the JarFile, which is what causes the deadlock. Therefore, I added the pause counter inside the synchronized JarFile.ensureInitialization() method to pause the hash check whenever the JarFile is locked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe isJarVerifierInStackTrace() is necessary. Both problems can be solved by appropriate use of pauseHashCheck() and resumeHashCheck().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since isJarVerifierInStackTrace() addresses a different issue than the deadlock in this PR, I suggest handling its removal in a separate PR, as debugging and verifying isJarVerifierInStackTrace() involve an entirely different problem.

enableCheckHashes = true;
}
if (enableCheckHashes && !isJarVerifierInStackTrace()) {
if (enableCheckHashes && !isJarVerifierInStackTrace() && (hashPauseCount.get() == 0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method isJarVerifierInStackTrace() should be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it back since the isJarVerifierInStackTrace() method has been restored.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isJarVerifierInStackTrace() should not be restored; please use pause/resume instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. The isJarVerifierInStackTrace() fix relates to a different issue. If its removal is needed, I still recommend handling that in a separate PR. This current PR specifically addresses the issue caused by java.util.jar.JarFile and checkHashValues. If you'd like, I can open an issue for tracking and addressing whether isJarVerifierInStackTrace() can be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You haven't convinced me that this addresses all the possible interactions between JarVerifier (possibly on other threads) and RestrictedSecurity. As I pointed out, there are uses of JarVerifier by classes other than JarFile.

Please have another look at #1122 (comment). As I said, there are only ten methods in total defined by that class. I don't think all of them need to use the pause/resume API contemplated here, but even if we start there (modify all ten methods), we don't need to change any users of that class and we don't need isJarVerifierInStackTrace() anymore.

@taoliult taoliult force-pushed the Deadlock2 branch 2 times, most recently from 002bf4a to 24a05d4 Compare November 11, 2025 15:13
@taoliult
Copy link
Contributor Author

@keithc-ca Replied to some review comments and also updated the code — please review.

@keithc-ca
Copy link
Member

Please also update the commit summary line: it should be no more than 70 characters long.

…h check

This commit updates isJarVerifierInStackTrace() to inspect all
thread stacks, resolving the deadlock that occurred in
ProfileParser.checkHashValues().

In the failing scenario, one thread (Thread-1) calls
SecureRandom.getInstance(), which triggers checkHashValues()
in RestrictedSecurity mode. This method attempts to obtain
a MessageDigest from provider OpenJCEPlusFIPS, requiring the
corresponding module to be loaded. However, another thread
(Thread-2) is already holding the loader lock while performing
signature verification for a signed class. Thread-1 is then
blocked waiting for that module to load, while Thread-2 waits
for the MessageDigest initialization to complete, resulting
in a circular deadlock.

Previously, isJarVerifierInStackTrace() only checked the
current thread’s stack to decide whether to skip check hash.
It has now been updated to scan the stack traces of all threads,
but except for the VM service threads, allowing it to detect
active JAR verification in other non-VM threads and prevent
potential cross-thread deadlocks.

Signed-off-by: Tao Liu <[email protected]>
@taoliult
Copy link
Contributor Author

Please also update the commit summary line: it should be no more than 70 characters long.

Updated the commit summary, no more than 70 characters long.

@taoliult taoliult changed the title Fix deadlock in RestrictedSecurity mode caused by JAR verification and checkHashValues Fix deadlock in RestrictedSecurity caused by JAR verification and hash check Nov 14, 2025
Copy link
Member

@keithc-ca keithc-ca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve the merge conflict.

enableCheckHashes = true;
}
if (enableCheckHashes && !isJarVerifierInStackTrace()) {
if (enableCheckHashes && !isJarVerifierInStackTrace() && (hashPauseCount.get() == 0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isJarVerifierInStackTrace() should not be restored; please use pause/resume instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants