Skip to content

cleanUpGlobalClassValue() causes stuck RUNNABLE threads and 100% CPU under high concurrency #917

@vinstenld

Description

@vinstenld

Jenkins and plugins versions report

Jenkins: 2.346.3
OS: Linux - 5.10.x (EKS, containerized)
Java: 11.0.16.1 - Eclipse Adoptium (OpenJDK 64-Bit Server VM)
script-security: 1190.v65867a_a_47126
workflow-cps: 2729.2732.vda_e3f07b_5a_f8
Groovy: 2.4.21
ClassValue implementation: org.codehaus.groovy.reflection.GroovyClassValuePreJava7
Node: m6i.4xlarge (16 vCPU, 64GB RAM)

What Operating System are you using (both controller, and any agents involved in the problem)?

Linux (Amazon EKS, containerized). Controller runs as a single pod on a dedicated m6i.4xlarge node (16 vCPU). Agents run as Kubernetes pods on separate nodes.

Reproduction steps

Run a Jenkins instance with 40+ concurrent pipelines, each with multiple script {} blocks
In our case, a Matrix-style job (qa-unittestmulticonfig) spawns 20-270 parallel stages, each in its own Kubernetes pod with script {} blocks
When many pipeline stages complete around the same time, multiple threads call SecureGroovyScript.evaluate() → cleanUpLoader() → cleanUpGlobalClassValue() concurrently
Threads get stuck in cleanUpGlobalClassValue() at line 264 in RUNNABLE state, consuming CPU indefinitely
Load climbs from normal (5-15) to 100-300+ and never comes down without a JVM restart
We experienced 5 restarts in 4 days due to this issue

The bug appears to have two components:

a) O(n²) ArrayList removal (related to #898 / PR #910):
The toRemove list uses Iterator.remove() which shifts all remaining elements on each removal.

b) Unsynchronized concurrent map iteration:

Collection entries = (Collection) groovyClassValuePreJava7Map.getMethod("values").invoke(map);
for (Object entry : entries) { // ← NOT synchronized
toRemove.add(...)
}
Multiple threads iterate and modify GroovyClassValuePreJava7Map simultaneously. Under high concurrency, the map's internal linked list can form a cycle, causing the for loop to iterate forever.

This only affects the GroovyClassValuePreJava7 code path. We confirmed our Jenkins uses this implementation:

ClassValue implementation: org.codehaus.groovy.reflection.GroovyClassValuePreJava7

Expected Results

cleanUpGlobalClassValue() should complete in milliseconds regardless of concurrency. Threads should not get permanently stuck in RUNNABLE state.

Actual Results

Thread dump captured during a load spike (load average 65+ on 16 cores) showed 85 threads stuck in SecureGroovyScript.cleanUpGlobalClassValue (plus 184 threads stuck in the same method in workflow-cps plugin's CpsFlowExecution.cleanUpGlobalClassValue):

"Handling GET /blue/rest/search/ from ip : Jetty (winstone)-376747"
cpu=988260ms ← burned 16 MINUTES of pure CPU time
elapsed=9112s ← running for 2.5 HOURS
java.lang.Thread.State: RUNNABLE

at SecureGroovyScript.cleanUpGlobalClassValue(SecureGroovyScript.java:264)
at SecureGroovyScript.cleanUpLoader(SecureGroovyScript.java:202)
at SecureGroovyScript.evaluate(SecureGroovyScript.java:446)
at org.biouno.unochoice.model.GroovyScript.eval(GroovyScript.java:179)

Anything else?

This appears to be the same root issue as #898 / JENKINS-75200. PR #910 addresses the O(n²) removal, and the author reports dramatic CPU drop on a 10k+ job Jenkins instance after applying the fix.

However, PR #910 may not fully address the concurrent map corruption issue (the infinite loop from unsynchronized iteration of GroovyClassValuePreJava7Map.values()). A defensive copy before iteration would prevent this:

// Instead of:
Collection entries = (Collection) groovyClassValuePreJava7Map.getMethod("values").invoke(map);

// Use:
Collection entries = new ArrayList<>((Collection) groovyClassValuePreJava7Map.getMethod("values").invoke(map));
The workflow-cps plugin has identical code in CpsFlowExecution.cleanUpGlobalClassValue() and similar open issues (JENKINS-54757, JENKINS-73802).

Issue #898: #898
PR #910 (fix): #910
JENKINS-75200: https://issues.jenkins.io/browse/JENKINS-75200
Buggy code (line 243): https://github.com/jenkinsci/script-security-plugin/blob/master/src/main/java/org/jenkinsci/plugins/scriptsecurity/sandbox/groovy/SecureGroovyScript.java#L243
workflow-cps related issues (for reference):

JENKINS-54757: jenkinsci/workflow-cps-plugin#1496
JENKINS-73802: jenkinsci/workflow-cps-plugin#1690

Are you interested in contributing a fix?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions