Skip to content

get secure boot MCO GCP cluster#80817

Open
ptalgulk01 wants to merge 3 commits into
openshift:mainfrom
ptalgulk01:mco-secure-boot
Open

get secure boot MCO GCP cluster#80817
ptalgulk01 wants to merge 3 commits into
openshift:mainfrom
ptalgulk01:mco-secure-boot

Conversation

@ptalgulk01

@ptalgulk01 ptalgulk01 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

This PR updates OpenShift CI infrastructure to add secure-boot (confidential computing) testing coverage for Machine Config Operator (MCO) on GCP IPI clusters as part of the OpenShift 4.22 private-release nightly jobs.

Key Changes

1. Split GCP secure-boot confidential-computing nightly job into 3 MCO variants

  • Updated ci-operator/config/openshift/openshift-tests-private/openshift-openshift-tests-private-release-4.22__amd64-nightly.yaml:
    • Replaced the single secureboot confidential-compute “mini-perm-f14” job with three new jobs:
      • gcp-ipi-confidential-secureboot-mini-perm-f14-mco
      • gcp-ipi-confidential-secureboot-mini-perm-f14-mco-a
      • gcp-ipi-confidential-secureboot-mini-perm-f14-mco-b
  • All three use the cucushift-installer-rehearse-gcp-ipi-secureboot-confidential-computing workflow and run the openshift-e2e-test-qe chain with:
    • CONFIDENTIAL_COMPUTE: AMDEncryptedVirtualization
    • COMPUTE_CONFIDENTIAL_COMPUTE: AMDEncryptedVirtualizationNestedPaging
    • COMPUTE_NODE_TYPE: n2d-standard-2
    • CONTROL_PLANE_NODE_TYPE: c3d-standard-4
    • COMPUTE_NODE_REPLICAS: "2" (reduced node replicas for the MCO secureboot test)
    • GCP_INSTALL_USE_MINIMAL_PERMISSIONS: "yes"
  • Cron scheduling is staggered to spread load:
    • gcp-ipi-confidential-secureboot-mini-perm-f14-mco: 44 2 2,16 * *
    • ...-mco-a: 47 3 3,17 * *
    • ...-mco-b: 50 4 4,18 * *

2. Added an interactive debug “pause” gate after cluster readiness

  • Updated these test command scripts to wait after the cluster is ready:
    • ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh
    • ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh
  • New behavior:
    • After readiness, the script polls for /tmp/continue.
    • The wait is controlled by DEBUG_WAIT_TIMEOUT (default 10800 seconds / 3 hours).
    • While waiting, it prints elapsed-time status periodically.
    • It proceeds when /tmp/continue is present or when the timeout is reached (allowing manual inspection during rehearsals).

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

A Prow job identifier in the 4.22 amd64-nightly CI config is renamed and expanded by appending -mco, creating three job variants. Both test runner shell scripts gain a blocking debug wait gate after cluster readiness confirmation that polls for /tmp/continue with periodic status messages until the file appears or a timeout is reached.

Changes

Job Configuration and Debug Wait Gates

Layer / File(s) Summary
GCP confidential secureboot job configuration
ci-operator/config/openshift/openshift-tests-private/openshift-openshift-tests-private-release-4.22__amd64-nightly.yaml
The gcp-ipi-confidential-secureboot-mini-perm-f14 job is replaced by three variants: gcp-ipi-confidential-secureboot-mini-perm-f14-mco, -mco-a, and -mco-b, each with updated GCP env configuration including instance sizing and replica-related variables.
Debug wait gates in test runner scripts
ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh, ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh
Both scripts add an identical blocking gate after cluster readiness: sets DEBUG_WAIT_TIMEOUT (default 10800 seconds), prints instructions for creating /tmp/continue, polls every 10 seconds, logs elapsed time every 300 seconds, and proceeds once the file is detected or timeout is reached.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested labels

lgtm, approved, rehearsals-ack

Suggested reviewers

  • memodi
🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding secure boot MCO (Machine Config Operator) support for GCP cluster testing, which is the primary purpose of the three new periodic test jobs.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR does not modify any Ginkgo test definitions. Changes are limited to YAML CI config and shell scripts, which don't contain Ginkgo test names.
Test Structure And Quality ✅ Passed This PR contains no Ginkgo test code—only YAML configuration and bash scripts. The custom check applies only to Go/Ginkgo tests, making it not applicable to this PR.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR. Changes consist of CI job configuration updates (YAML) and shell script modifications for debug wait functionality, which are not subject to the MicroS...
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. Changes are limited to CI configuration (YAML) and shell scripts, not Go test code.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only CI/test infrastructure (job configs and test scripts), not deployment manifests, operator code, or controllers. Topology-aware scheduling check does not apply.
Ote Binary Stdout Contract ✅ Passed This PR modifies only YAML configuration and bash shell scripts. The OTE Binary Stdout Contract check applies exclusively to Go binaries with process-level code (main, init, etc.), not shell script...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests added. PR modifies CI config files and shell scripts only, not test code with IPv4/connectivity assumptions.
No-Weak-Crypto ✅ Passed No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret comparisons found. PR adds CI/CD job configs and debug wait l...
Container-Privileges ✅ Passed No container privilege escalation configurations (privileged: true, hostPID, hostNetwork, hostIPC, SYS_ADMIN, allowPrivilegeEscalation) were found in the PR changes.
No-Sensitive-Data-In-Logs ✅ Passed The PR adds debug wait functionality that logs only non-sensitive data: numeric timeout values (DEBUG_WAIT_TIMEOUT in seconds), elapsed time counters, and instructional text. The KUBECONFIG path is...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ptalgulk01
Once this PR has been reviewed and has the lgtm label, please assign memodi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot requested review from jechen0648 and memodi June 22, 2026 06:00

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh`:
- Around line 7-8: The export statement at line 7 prepends the writable /tmp
directory to PATH before trusted system paths, creating a security vulnerability
for command shadowing. Change the PATH export statement to append /tmp to the
end of the PATH variable instead (use $PATH:/tmp/) so that trusted directories
are searched first before the writable /tmp directory.
- Around line 10-14: The unconditional debug gate using the while loop that
waits for the `/tmp/continue` file can cause CI runs to deadlock indefinitely.
To fix this, wrap the entire while loop block (the echo statement and the
do-done loop checking for `/tmp/continue`) with a conditional check for an
environment variable to make the debug gate opt-in (e.g., check if DEBUG_MODE or
DEBUG_GATE is set). Additionally, add a bounded timeout mechanism to prevent
infinite waiting—either count iterations of the sleep command or use a timeout
wrapper so that if the debug flag is enabled but the continue file is never
created, the script will proceed after a reasonable duration rather than hanging
until the job times out. This ensures regular CI runs proceed normally while
allowing debugging only when explicitly enabled with a safety timeout.

In
`@ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh`:
- Around line 164-165: The export statement on line 164 prepends /tmp to PATH,
creating a command-hijacking vulnerability. Remove the export PATH=/tmp/:$PATH
line entirely, or if /tmp is genuinely needed, append it to the end of PATH
instead of prepending it (export PATH=$PATH:/tmp). This ensures system paths are
searched first before the world-writable /tmp directory when resolving the
extended-platform-tests command in the which call on line 165.
- Around line 167-171: The debug wait loop checking for /tmp/continue is
unconditional and will block indefinitely in standard test runs until timeout.
Gate the entire block containing the echo "WAITING FOR DEBUG..." message and the
while loop with sleep 10 behind a debug environment variable check, and add a
maximum wait duration (such as a counter that limits iterations or a timeout
mechanism) to prevent indefinite blocking. This ensures the wait loop only
activates when debugging is explicitly enabled and will fail fast after a
reasonable timeout instead of stalling the job.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 35e03e43-12f3-4702-9e25-33a9e6528714

📥 Commits

Reviewing files that changed from the base of the PR and between a9c97b9 and 095775d.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift/openshift-tests-private/openshift-openshift-tests-private-release-4.22-periodics.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (3)
  • ci-operator/config/openshift/openshift-tests-private/openshift-openshift-tests-private-release-4.22__amd64-nightly.yaml
  • ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh
  • ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh

Comment on lines +10 to +14
echo "WAITING FOR DEBUG..."
while [ ! -f "/tmp/continue" ]
do
sleep 10
done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Unconditional debug gate can deadlock longduration jobs.

Lines [10]-[14] block forever unless /tmp/continue is manually created, so regular CI runs can hang until timeout. Make this gate opt-in (env flag) and add a bounded timeout.

Suggested fix
-echo "WAITING FOR DEBUG..."
-while [ ! -f "/tmp/continue" ]
-do
-    sleep 10
-done
+if [[ "${DEBUG_HOLD:-false}" == "true" ]]; then
+  echo "WAITING FOR DEBUG (max ${DEBUG_HOLD_TIMEOUT_SECONDS:-900}s)..."
+  timeout "${DEBUG_HOLD_TIMEOUT_SECONDS:-900}" bash -c 'until [[ -f /tmp/continue ]]; do sleep 10; done'
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "WAITING FOR DEBUG..."
while [ ! -f "/tmp/continue" ]
do
sleep 10
done
if [[ "${DEBUG_HOLD:-false}" == "true" ]]; then
echo "WAITING FOR DEBUG (max ${DEBUG_HOLD_TIMEOUT_SECONDS:-900}s)..."
timeout "${DEBUG_HOLD_TIMEOUT_SECONDS:-900}" bash -c 'until [[ -f /tmp/continue ]]; do sleep 10; done'
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh`
around lines 10 - 14, The unconditional debug gate using the while loop that
waits for the `/tmp/continue` file can cause CI runs to deadlock indefinitely.
To fix this, wrap the entire while loop block (the echo statement and the
do-done loop checking for `/tmp/continue`) with a conditional check for an
environment variable to make the debug gate opt-in (e.g., check if DEBUG_MODE or
DEBUG_GATE is set). Additionally, add a bounded timeout mechanism to prevent
infinite waiting—either count iterations of the sleep command or use a timeout
wrapper so that if the debug flag is enabled but the continue file is never
created, the script will proceed after a reasonable duration rather than hanging
until the job times out. This ensures regular CI runs proceed normally while
allowing debugging only when explicitly enabled with a safety timeout.

Comment on lines +167 to +171
echo "WAITING FOR DEBUG..."
while [ ! -f "/tmp/continue" ]
do
sleep 10
done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Debug wait loop is a blocking failure path in standard test runs.

Lines [167]-[171] create an unconditional wait on /tmp/continue, which can stall jobs until timeout. Gate this behind a debug env var and enforce a max wait duration.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh`
around lines 167 - 171, The debug wait loop checking for /tmp/continue is
unconditional and will block indefinitely in standard test runs until timeout.
Gate the entire block containing the echo "WAITING FOR DEBUG..." message and the
while loop with sleep 10 behind a debug environment variable check, and add a
maximum wait duration (such as a counter that limits iterations or a timeout
mechanism) to prevent indefinite blocking. This ensures the wait loop only
activates when debugging is explicitly enabled and will fail fast after a
reasonable timeout instead of stalling the job.

@ptalgulk01

Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-confidential-secureboot-mini-perm-f14-mco

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@ptalgulk01: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@ptalgulk01

Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-confidential-secureboot-mini-perm-f14-mco

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@ptalgulk01: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@ptalgulk01

Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-confidential-secureboot-mini-perm-f14-mco

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@ptalgulk01: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh (2)

7-7: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid prioritizing writable /tmp in PATH.

Line 7 puts /tmp before trusted system paths, which enables command shadowing risk in CI pods.

Suggested fix
-export PATH=/tmp/:$PATH
+export PATH="${PATH}:/tmp"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh`
at line 7, The export PATH statement on line 7 prepends `/tmp/` to the PATH,
creating a command shadowing security risk by allowing malicious scripts in /tmp
to override legitimate system commands. Remove the `/tmp/:` prefix from the PATH
export or, if /tmp binaries are needed, append `/tmp` to the end of PATH using
`export PATH=$PATH:/tmp` instead to preserve the priority of trusted system
paths.

10-35: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Gate the debug wait so normal runs don’t stall by default.

This block is always active and can delay every run up to 3 hours before tests start. Make it opt-in and keep the timeout only for debug sessions.

Suggested fix
-# Debug wait with timeout (default 3 hours)
-DEBUG_WAIT_TIMEOUT="${DEBUG_WAIT_TIMEOUT:-10800}"
-echo "=========================================="
-echo "DEBUG WAIT ACTIVE - Cluster ready for debugging"
-echo "=========================================="
-echo "Timeout: ${DEBUG_WAIT_TIMEOUT} seconds"
-echo "To continue tests, create the signal file:"
-echo "  oc rsh <test-pod>"
-echo "  touch /tmp/continue"
-echo "=========================================="
-
-elapsed=0
-while [ ! -f "/tmp/continue" ] && [ $elapsed -lt $DEBUG_WAIT_TIMEOUT ]
-do
-    sleep 10
-    elapsed=$((elapsed + 10))
-    if [ $((elapsed % 300)) -eq 0 ]; then
-        echo "Debug wait: $elapsed / $DEBUG_WAIT_TIMEOUT seconds elapsed..."
-    fi
-done
-
-if [ -f "/tmp/continue" ]; then
-    echo "Continue signal received. Proceeding with tests..."
-else
-    echo "Debug wait timeout reached. Proceeding with tests..."
-fi
+if [[ "${DEBUG_WAIT_ENABLED:-false}" == "true" ]]; then
+    DEBUG_WAIT_TIMEOUT="${DEBUG_WAIT_TIMEOUT:-10800}"
+    echo "=========================================="
+    echo "DEBUG WAIT ACTIVE - Cluster ready for debugging"
+    echo "=========================================="
+    echo "Timeout: ${DEBUG_WAIT_TIMEOUT} seconds"
+    echo "To continue tests, create the signal file:"
+    echo "  oc rsh <test-pod>"
+    echo "  touch /tmp/continue"
+    echo "=========================================="
+
+    elapsed=0
+    while [[ ! -f "/tmp/continue" ]] && (( elapsed < DEBUG_WAIT_TIMEOUT )); do
+        sleep 10
+        elapsed=$((elapsed + 10))
+        if (( elapsed % 300 == 0 )); then
+            echo "Debug wait: $elapsed / $DEBUG_WAIT_TIMEOUT seconds elapsed..."
+        fi
+    done
+
+    if [[ -f "/tmp/continue" ]]; then
+        echo "Continue signal received. Proceeding with tests..."
+    else
+        echo "Debug wait timeout reached. Proceeding with tests..."
+    fi
+fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh`
around lines 10 - 35, The debug wait block starting with the echo statement
about DEBUG WAIT ACTIVE is currently always executed, which can stall normal
test runs for up to 3 hours. Wrap the entire debug wait block (including all
echo statements, the while loop that checks for /tmp/continue, and the final
if-else condition) in a conditional check that only executes this code when an
opt-in environment variable is explicitly set (such as DEBUG_WAIT=true). This
ensures the debug wait mechanism only activates during intentional debug
sessions while allowing normal runs to proceed without delays.
ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh (2)

167-192: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make debug wait opt-in instead of always-on.

The wait gate currently runs for every invocation and can delay regular test jobs by up to 10800 seconds.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh`
around lines 167 - 192, The debug wait block unconditionally executes and waits
for the continue signal or timeout, which delays regular test jobs. To make this
opt-in, wrap the entire debug wait logic (from the DEBUG_WAIT_TIMEOUT variable
assignment through the final if-else block checking for the continue signal) in
a conditional that only executes when an opt-in environment variable is
explicitly set to enable debugging. Add a check at the start of this block such
that the debug wait only activates when a variable like DEBUG_MODE or similar is
explicitly enabled, allowing regular test runs to skip this waiting period
entirely by default.

164-164: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not prepend /tmp ahead of trusted binary paths.

Line 164 increases command-hijack risk by searching writable /tmp first.

Suggested fix
-export PATH=/tmp/:$PATH
+export PATH="${PATH}:/tmp"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh`
at line 164, Remove `/tmp` from being prepended to the PATH in the export
statement at line 164. The current `export PATH=/tmp/:$PATH` creates a command
hijacking vulnerability by allowing any executable in the world-writable `/tmp`
directory to be executed before trusted system binaries. Either remove this line
entirely if `/tmp` is not needed in the PATH, or if absolutely necessary, append
it to the end of PATH instead using `export PATH=$PATH:/tmp/` to keep trusted
system binary paths first in the search order.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh`:
- Line 22: The while loop condition on line 22 uses the test bracket [ ] for
numeric comparison with the environment-derived variable $DEBUG_WAIT_TIMEOUT,
which can fail if the variable contains malformed or non-numeric values. Replace
the numeric comparison portion using arithmetic evaluation (( ... )) instead of
[ ] brackets, as arithmetic context is safer and more robust for numeric
comparisons with potentially invalid input. Keep the file existence check as-is
and combine both conditions appropriately in the while loop.

In
`@ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh`:
- Line 179: The while loop condition uses the test command syntax [ $elapsed -lt
$DEBUG_WAIT_TIMEOUT ] which triggers shellcheck SC2086 concerns. Replace this
conditional test with bash arithmetic comparison syntax (( elapsed <
DEBUG_WAIT_TIMEOUT )) instead. Update line 179 to use arithmetic comparison
operators within double parentheses, removing the square brackets and changing
the comparison operator from -lt to <. This is safer and avoids variable
expansion issues with env-derived input.

---

Duplicate comments:
In
`@ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh`:
- Line 7: The export PATH statement on line 7 prepends `/tmp/` to the PATH,
creating a command shadowing security risk by allowing malicious scripts in /tmp
to override legitimate system commands. Remove the `/tmp/:` prefix from the PATH
export or, if /tmp binaries are needed, append `/tmp` to the end of PATH using
`export PATH=$PATH:/tmp` instead to preserve the priority of trusted system
paths.
- Around line 10-35: The debug wait block starting with the echo statement about
DEBUG WAIT ACTIVE is currently always executed, which can stall normal test runs
for up to 3 hours. Wrap the entire debug wait block (including all echo
statements, the while loop that checks for /tmp/continue, and the final if-else
condition) in a conditional check that only executes this code when an opt-in
environment variable is explicitly set (such as DEBUG_WAIT=true). This ensures
the debug wait mechanism only activates during intentional debug sessions while
allowing normal runs to proceed without delays.

In
`@ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh`:
- Around line 167-192: The debug wait block unconditionally executes and waits
for the continue signal or timeout, which delays regular test jobs. To make this
opt-in, wrap the entire debug wait logic (from the DEBUG_WAIT_TIMEOUT variable
assignment through the final if-else block checking for the continue signal) in
a conditional that only executes when an opt-in environment variable is
explicitly set to enable debugging. Add a check at the start of this block such
that the debug wait only activates when a variable like DEBUG_MODE or similar is
explicitly enabled, allowing regular test runs to skip this waiting period
entirely by default.
- Line 164: Remove `/tmp` from being prepended to the PATH in the export
statement at line 164. The current `export PATH=/tmp/:$PATH` creates a command
hijacking vulnerability by allowing any executable in the world-writable `/tmp`
directory to be executed before trusted system binaries. Either remove this line
entirely if `/tmp` is not needed in the PATH, or if absolutely necessary, append
it to the end of PATH instead using `export PATH=$PATH:/tmp/` to keep trusted
system binary paths first in the search order.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 6b79a8d2-526c-4aca-bab5-0f5b2277a875

📥 Commits

Reviewing files that changed from the base of the PR and between 4b933b6 and 903d2fc.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift/openshift-tests-private/openshift-openshift-tests-private-release-4.22-periodics.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (3)
  • ci-operator/config/openshift/openshift-tests-private/openshift-openshift-tests-private-release-4.22__amd64-nightly.yaml
  • ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh
  • ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • ci-operator/config/openshift/openshift-tests-private/openshift-openshift-tests-private-release-4.22__amd64-nightly.yaml

echo "=========================================="

elapsed=0
while [ ! -f "/tmp/continue" ] && [ $elapsed -lt $DEBUG_WAIT_TIMEOUT ]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Harden timeout comparison against malformed env input.

Line 22 uses an unquoted env-derived numeric value in [; invalid values can break the condition. Prefer arithmetic (( ... )) (or validate integer first).

🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 22-22: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh`
at line 22, The while loop condition on line 22 uses the test bracket [ ] for
numeric comparison with the environment-derived variable $DEBUG_WAIT_TIMEOUT,
which can fail if the variable contains malformed or non-numeric values. Replace
the numeric comparison portion using arithmetic evaluation (( ... )) instead of
[ ] brackets, as arithmetic context is safer and more robust for numeric
comparisons with potentially invalid input. Keep the file existence check as-is
and combine both conditions appropriately in the while loop.

Source: Linters/SAST tools

echo "=========================================="

elapsed=0
while [ ! -f "/tmp/continue" ] && [ $elapsed -lt $DEBUG_WAIT_TIMEOUT ]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use arithmetic comparison for DEBUG_WAIT_TIMEOUT.

Line 179 should avoid [ $elapsed -lt $DEBUG_WAIT_TIMEOUT ] with env-derived input; (( elapsed < DEBUG_WAIT_TIMEOUT )) is safer and avoids shellcheck SC2086 concerns.

🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 179-179: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh`
at line 179, The while loop condition uses the test command syntax [ $elapsed
-lt $DEBUG_WAIT_TIMEOUT ] which triggers shellcheck SC2086 concerns. Replace
this conditional test with bash arithmetic comparison syntax (( elapsed <
DEBUG_WAIT_TIMEOUT )) instead. Update line 179 to use arithmetic comparison
operators within double parentheses, removing the square brackets and changing
the comparison operator from -lt to <. This is safer and avoids variable
expansion issues with env-derived input.

Source: Linters/SAST tools

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@ptalgulk01, pj-rehearse: unable to set up jobs ERROR:

failed resolve ReleaseBuildConfiguration: Failed resolve MultiStageTestConfiguration: test/gcp-ipi-confidential-secureboot-mini-perm-f14-mco: workflow/cucushift-installer-rehearse-gcp-ipi-secureboot-confidential-computing: parameter "DEBUG_WAIT_TIMEOUT" is overridden in [test/gcp-ipi-confidential-secureboot-mini-perm-f14-mco] but not declared in any step

If the problem persists, please contact Test Platform.

The script already has DEBUG_WAIT_TIMEOUT with a default value of 10800 (3 hours).
The timeout parameter needs to be declared in the workflow, not the job config.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@ptalgulk01

Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-confidential-secureboot-mini-perm-f14-mco

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@ptalgulk01: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

- Created 3 identical jobs: mco, mco-a, mco-b with staggered schedules
- Reduced COMPUTE_NODE_REPLICAS to 2 (avoid zone provisioning issues)
- Added debug wait in openshift-extended-test step (3hr timeout)
- Debug wait activates AFTER cluster ready, BEFORE tests start
- Signal file /tmp/continue to skip wait early

Jobs schedule:
- mco: 2nd & 16th at 2:44 AM
- mco-a: 3rd & 17th at 3:47 AM
- mco-b: 4th & 18th at 4:50 AM

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@ptalgulk01: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-aws-ebs-csi-driver-operator-release-4.14-e2e-aws-csi-extended openshift/aws-ebs-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-cloud-provider-aws-main-regression-clusterinfra-aws-ipi-ccm openshift/cloud-provider-aws presubmit Registry content changed
pull-ci-openshift-cloud-provider-aws-release-5.1-regression-clusterinfra-aws-ipi-ccm openshift/cloud-provider-aws presubmit Registry content changed
pull-ci-openshift-cloud-provider-aws-release-5.0-regression-clusterinfra-aws-ipi-ccm openshift/cloud-provider-aws presubmit Registry content changed
pull-ci-openshift-cloud-provider-aws-release-4.23-regression-clusterinfra-aws-ipi-ccm openshift/cloud-provider-aws presubmit Registry content changed
pull-ci-openshift-cloud-provider-aws-release-4.22-regression-clusterinfra-aws-ipi-ccm openshift/cloud-provider-aws presubmit Registry content changed
pull-ci-openshift-cloud-provider-aws-release-4.21-regression-clusterinfra-aws-ipi-ccm openshift/cloud-provider-aws presubmit Registry content changed
pull-ci-openshift-cloud-provider-aws-release-4.20-regression-clusterinfra-aws-ipi-ccm openshift/cloud-provider-aws presubmit Registry content changed
pull-ci-openshift-cloud-provider-vsphere-main-regression-clusterinfra-vsphere-ipi-ccm openshift/cloud-provider-vsphere presubmit Registry content changed
pull-ci-openshift-cloud-provider-vsphere-release-5.1-regression-clusterinfra-vsphere-ipi-ccm openshift/cloud-provider-vsphere presubmit Registry content changed
pull-ci-openshift-cloud-provider-vsphere-release-5.0-regression-clusterinfra-vsphere-ipi-ccm openshift/cloud-provider-vsphere presubmit Registry content changed
pull-ci-openshift-cloud-provider-vsphere-release-4.23-regression-clusterinfra-vsphere-ipi-ccm openshift/cloud-provider-vsphere presubmit Registry content changed
pull-ci-openshift-cloud-provider-vsphere-release-4.22-regression-clusterinfra-vsphere-ipi-ccm openshift/cloud-provider-vsphere presubmit Registry content changed
pull-ci-openshift-cloud-provider-vsphere-release-4.21-regression-clusterinfra-vsphere-ipi-ccm openshift/cloud-provider-vsphere presubmit Registry content changed
pull-ci-openshift-cloud-provider-vsphere-release-4.20-regression-clusterinfra-vsphere-ipi-ccm openshift/cloud-provider-vsphere presubmit Registry content changed
pull-ci-openshift-cloud-provider-vsphere-release-4.19-regression-clusterinfra-vsphere-ipi-ccm openshift/cloud-provider-vsphere presubmit Registry content changed
pull-ci-openshift-cloud-provider-vsphere-release-4.18-regression-clusterinfra-vsphere-ipi-ccm openshift/cloud-provider-vsphere presubmit Registry content changed
pull-ci-openshift-cloud-provider-vsphere-release-4.17-regression-clusterinfra-vsphere-ipi-ccm openshift/cloud-provider-vsphere presubmit Registry content changed
pull-ci-openshift-cloud-provider-azure-main-regression-clusterinfra-azure-ipi-ccm openshift/cloud-provider-azure presubmit Registry content changed
pull-ci-openshift-cloud-provider-azure-release-5.1-regression-clusterinfra-azure-ipi-ccm openshift/cloud-provider-azure presubmit Registry content changed
pull-ci-openshift-cloud-provider-azure-release-5.0-regression-clusterinfra-azure-ipi-ccm openshift/cloud-provider-azure presubmit Registry content changed
pull-ci-openshift-cloud-provider-azure-release-4.23-regression-clusterinfra-azure-ipi-ccm openshift/cloud-provider-azure presubmit Registry content changed
pull-ci-openshift-cloud-provider-azure-release-4.22-regression-clusterinfra-azure-ipi-ccm openshift/cloud-provider-azure presubmit Registry content changed
pull-ci-openshift-cloud-provider-azure-release-4.21-regression-clusterinfra-azure-ipi-ccm openshift/cloud-provider-azure presubmit Registry content changed
pull-ci-openshift-cloud-provider-azure-release-4.20-regression-clusterinfra-azure-ipi-ccm openshift/cloud-provider-azure presubmit Registry content changed

A total of 4481 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@ptalgulk01

Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-confidential-secureboot-mini-perm-f14-mco periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-confidential-secureboot-mini-perm-f14-mco-a periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-confidential-secureboot-mini-perm-f14-mco-b

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@ptalgulk01: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@ptalgulk01: job(s): either don't exist or were not found to be affected, and cannot be rehearsed

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh (1)

274-274: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Style: Unnecessary $/{} in arithmetic context.

Same shellcheck SC2004 style warnings as the primary script.

✨ Optional style fix
-echo "Timeout: ${DEBUG_WAIT_TIMEOUT} seconds ($(($DEBUG_WAIT_TIMEOUT / 3600)) hours)"
+echo "Timeout: ${DEBUG_WAIT_TIMEOUT} seconds ($((DEBUG_WAIT_TIMEOUT / 3600)) hours)"
-    echo "Debug wait timeout reached after $(($DEBUG_WAIT_TIMEOUT / 3600)) hours. Proceeding with tests..."
+    echo "Debug wait timeout reached after $((DEBUG_WAIT_TIMEOUT / 3600)) hours. Proceeding with tests..."

Also applies to: 301-301

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh`
at line 274, The arithmetic expansion contains unnecessary dollar sign syntax
triggering SC2004 shellcheck warning. In the echo statement at line 274 and its
equivalent at line 301, remove the dollar sign prefix before DEBUG_WAIT_TIMEOUT
within the arithmetic expansion context. Inside $((...)), variable references do
not require the $ prefix, so change $(($DEBUG_WAIT_TIMEOUT / 3600)) to use just
the variable name directly in the arithmetic expression.
ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh (1)

280-280: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Style: Unnecessary $/{} in arithmetic context.

Lines 280 and 307 use $(($DEBUG_WAIT_TIMEOUT / 3600)) which works but is stylistically inconsistent. ShellCheck recommends removing the $ prefix inside $((...)).

✨ Optional style fix
-echo "Timeout: ${DEBUG_WAIT_TIMEOUT} seconds ($(($DEBUG_WAIT_TIMEOUT / 3600)) hours)"
+echo "Timeout: ${DEBUG_WAIT_TIMEOUT} seconds ($((DEBUG_WAIT_TIMEOUT / 3600)) hours)"
-    echo "Debug wait timeout reached after $(($DEBUG_WAIT_TIMEOUT / 3600)) hours. Proceeding with tests..."
+    echo "Debug wait timeout reached after $((DEBUG_WAIT_TIMEOUT / 3600)) hours. Proceeding with tests..."

Also applies to: 307-307

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh`
at line 280, The arithmetic expansion expressions on line 280 and line 307 use
an unnecessary `$` prefix before the variable name inside the `$((...))` syntax.
Remove the `$` prefix from `$DEBUG_WAIT_TIMEOUT` inside the arithmetic
expansion, changing `$(($DEBUG_WAIT_TIMEOUT / 3600))` to `$((DEBUG_WAIT_TIMEOUT
/ 3600))` on both occurrences to follow shell style conventions where variable
prefixes are not needed within arithmetic expansion contexts.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh`:
- Line 287: In the while loop condition on the line with the timeout check,
replace the test bracket expression with unquoted variables for numeric
comparison with arithmetic evaluation syntax. Specifically, change the numeric
comparison of the elapsed variable against DEBUG_WAIT_TIMEOUT from using test
brackets `[ $elapsed -lt $DEBUG_WAIT_TIMEOUT ]` to using arithmetic evaluation
`(( elapsed < DEBUG_WAIT_TIMEOUT ))` to ensure the comparison is hardened
against malformed input.
- Around line 268-303: The DEBUG_WAIT_TIMEOUT variable in the debug wait logic
is defaulting to 10800 seconds (3 hours), which breaks backward compatibility
for all jobs using this shared step. Change the default value in the parameter
expansion from 10800 to 0 in the line
`DEBUG_WAIT_TIMEOUT="${DEBUG_WAIT_TIMEOUT:-10800}"` so that no debug wait occurs
by default. MCO job-specific configurations can then explicitly set
DEBUG_WAIT_TIMEOUT to 10800 in their environment configuration if they need the
3-hour debug wait behavior.

In
`@ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh`:
- Line 293: The numeric comparison using the test operator [ $elapsed -lt
$DEBUG_WAIT_TIMEOUT ] is vulnerable to failures with non-numeric input in the
variables. Replace the test operator expression with arithmetic evaluation using
double parentheses (( $elapsed < $DEBUG_WAIT_TIMEOUT )) in the while loop
condition. This provides safer and more robust numeric comparison in bash
without requiring variable quoting and handles edge cases with malformed input
more gracefully.
- Around line 274-309: The DEBUG_WAIT_TIMEOUT variable defaults to 10800
seconds, causing all jobs using this shared step-registry script to block for up
to 3 hours and breaking backward compatibility across the CI infrastructure.
Change the default value of DEBUG_WAIT_TIMEOUT from 10800 to 0 to maintain
backward compatibility, ensuring the debug wait only activates when explicitly
enabled. This allows existing jobs to proceed normally without modification
while new jobs or specific test configurations can explicitly set
DEBUG_WAIT_TIMEOUT to 10800 in their environment configuration to enable the
debug wait behavior.

---

Nitpick comments:
In
`@ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh`:
- Line 274: The arithmetic expansion contains unnecessary dollar sign syntax
triggering SC2004 shellcheck warning. In the echo statement at line 274 and its
equivalent at line 301, remove the dollar sign prefix before DEBUG_WAIT_TIMEOUT
within the arithmetic expansion context. Inside $((...)), variable references do
not require the $ prefix, so change $(($DEBUG_WAIT_TIMEOUT / 3600)) to use just
the variable name directly in the arithmetic expression.

In
`@ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh`:
- Line 280: The arithmetic expansion expressions on line 280 and line 307 use an
unnecessary `$` prefix before the variable name inside the `$((...))` syntax.
Remove the `$` prefix from `$DEBUG_WAIT_TIMEOUT` inside the arithmetic
expansion, changing `$(($DEBUG_WAIT_TIMEOUT / 3600))` to `$((DEBUG_WAIT_TIMEOUT
/ 3600))` on both occurrences to follow shell style conventions where variable
prefixes are not needed within arithmetic expansion contexts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 4c513e07-2203-4ff6-9d96-4fb50b0cea62

📥 Commits

Reviewing files that changed from the base of the PR and between 2f049e9 and 9a94d8b.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift/openshift-tests-private/openshift-openshift-tests-private-release-4.22-periodics.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (3)
  • ci-operator/config/openshift/openshift-tests-private/openshift-openshift-tests-private-release-4.22__amd64-nightly.yaml
  • ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh
  • ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh

Comment on lines +268 to +303
# Debug wait with timeout (default 3 hours) - placed AFTER cluster is ready
DEBUG_WAIT_TIMEOUT="${DEBUG_WAIT_TIMEOUT:-10800}"
echo "=========================================="
echo "DEBUG WAIT ACTIVE - Cluster ready for debugging"
echo "=========================================="
echo "Cluster is fully provisioned and ready!"
echo "Timeout: ${DEBUG_WAIT_TIMEOUT} seconds ($(($DEBUG_WAIT_TIMEOUT / 3600)) hours)"
echo ""
echo "To access the cluster:"
echo " 1. Find this pod in Prow job logs"
echo " 2. oc rsh -n test-credentials <pod-name>"
echo " 3. export KUBECONFIG=\${SHARED_DIR}/kubeconfig"
echo " 4. oc get nodes"
echo ""
echo "To continue tests, create the signal file:"
echo " touch /tmp/continue"
echo "=========================================="

elapsed=0
while [ ! -f "/tmp/continue" ] && [ $elapsed -lt $DEBUG_WAIT_TIMEOUT ]
do
sleep 10
elapsed=$((elapsed + 10))
if [ $((elapsed % 300)) -eq 0 ]; then
hours=$((elapsed / 3600))
minutes=$(((elapsed % 3600) / 60))
echo "Debug wait: ${hours}h ${minutes}m elapsed (${elapsed}s / ${DEBUG_WAIT_TIMEOUT}s)..."
fi
done

if [ -f "/tmp/continue" ]; then
echo "Continue signal received. Proceeding with tests..."
else
echo "Debug wait timeout reached after $(($DEBUG_WAIT_TIMEOUT / 3600)) hours. Proceeding with tests..."
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Unconditional debug wait breaks all longduration jobs using this shared step.

Identical issue to the primary test script: this shared step-registry script now has a 3-hour default wait that affects all jobs using the longduration test chain, not just the new MCO jobs. This breaks backward compatibility.

🔧 Recommended fix: Default to zero
-DEBUG_WAIT_TIMEOUT="${DEBUG_WAIT_TIMEOUT:-10800}"
+DEBUG_WAIT_TIMEOUT="${DEBUG_WAIT_TIMEOUT:-0}"
+
+if [ "$DEBUG_WAIT_TIMEOUT" -eq 0 ]; then
+    echo "Debug wait disabled (DEBUG_WAIT_TIMEOUT=0)"
+else
 echo "=========================================="

Then set DEBUG_WAIT_TIMEOUT: "10800" explicitly in your MCO job env config.

🧰 Tools
🪛 Shellcheck (0.11.0)

[style] 274-274: $/${} is unnecessary on arithmetic variables.

(SC2004)


[info] 287-287: Double quote to prevent globbing and word splitting.

(SC2086)


[style] 301-301: $/${} is unnecessary on arithmetic variables.

(SC2004)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh`
around lines 268 - 303, The DEBUG_WAIT_TIMEOUT variable in the debug wait logic
is defaulting to 10800 seconds (3 hours), which breaks backward compatibility
for all jobs using this shared step. Change the default value in the parameter
expansion from 10800 to 0 in the line
`DEBUG_WAIT_TIMEOUT="${DEBUG_WAIT_TIMEOUT:-10800}"` so that no debug wait occurs
by default. MCO job-specific configurations can then explicitly set
DEBUG_WAIT_TIMEOUT to 10800 in their environment configuration if they need the
3-hour debug wait behavior.

echo "=========================================="

elapsed=0
while [ ! -f "/tmp/continue" ] && [ $elapsed -lt $DEBUG_WAIT_TIMEOUT ]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Harden numeric comparison against malformed input.

Same issue as the primary script: use arithmetic evaluation (( ... )) instead of test brackets with unquoted variable.

🔧 Recommended fix
-while [ ! -f "/tmp/continue" ] && [ $elapsed -lt $DEBUG_WAIT_TIMEOUT ]
+while [ ! -f "/tmp/continue" ] && (( elapsed < DEBUG_WAIT_TIMEOUT ))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while [ ! -f "/tmp/continue" ] && [ $elapsed -lt $DEBUG_WAIT_TIMEOUT ]
while [ ! -f "/tmp/continue" ] && (( elapsed < DEBUG_WAIT_TIMEOUT ))
🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 287-287: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift-extended/test/longduration/openshift-extended-test-longduration-commands.sh`
at line 287, In the while loop condition on the line with the timeout check,
replace the test bracket expression with unquoted variables for numeric
comparison with arithmetic evaluation syntax. Specifically, change the numeric
comparison of the elapsed variable against DEBUG_WAIT_TIMEOUT from using test
brackets `[ $elapsed -lt $DEBUG_WAIT_TIMEOUT ]` to using arithmetic evaluation
`(( elapsed < DEBUG_WAIT_TIMEOUT ))` to ensure the comparison is hardened
against malformed input.

Comment on lines +274 to +309
# Debug wait with timeout (default 3 hours) - placed AFTER cluster is ready
DEBUG_WAIT_TIMEOUT="${DEBUG_WAIT_TIMEOUT:-10800}"
echo "=========================================="
echo "DEBUG WAIT ACTIVE - Cluster ready for debugging"
echo "=========================================="
echo "Cluster is fully provisioned and ready!"
echo "Timeout: ${DEBUG_WAIT_TIMEOUT} seconds ($(($DEBUG_WAIT_TIMEOUT / 3600)) hours)"
echo ""
echo "To access the cluster:"
echo " 1. Find this pod in Prow job logs"
echo " 2. oc rsh -n test-credentials <pod-name>"
echo " 3. export KUBECONFIG=\${SHARED_DIR}/kubeconfig"
echo " 4. oc get nodes"
echo ""
echo "To continue tests, create the signal file:"
echo " touch /tmp/continue"
echo "=========================================="

elapsed=0
while [ ! -f "/tmp/continue" ] && [ $elapsed -lt $DEBUG_WAIT_TIMEOUT ]
do
sleep 10
elapsed=$((elapsed + 10))
if [ $((elapsed % 300)) -eq 0 ]; then
hours=$((elapsed / 3600))
minutes=$(((elapsed % 3600) / 60))
echo "Debug wait: ${hours}h ${minutes}m elapsed (${elapsed}s / ${DEBUG_WAIT_TIMEOUT}s)..."
fi
done

if [ -f "/tmp/continue" ]; then
echo "Continue signal received. Proceeding with tests..."
else
echo "Debug wait timeout reached after $(($DEBUG_WAIT_TIMEOUT / 3600)) hours. Proceeding with tests..."
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Unconditional debug wait breaks all jobs using this shared step.

This step-registry script is shared across many CI jobs, not just the new MCO variants. Setting DEBUG_WAIT_TIMEOUT to default 10800 seconds (3 hours) means every job that uses the openshift-e2e-test-qe chain will now block for up to 3 hours unless someone manually creates /tmp/continue in the pod.

This breaks backward compatibility and will cause unintended delays or timeouts across the CI infrastructure.

🔧 Recommended fix: Default to zero or gate behind env flag

Option 1 (Recommended): Default to 0 for backward compatibility

-DEBUG_WAIT_TIMEOUT="${DEBUG_WAIT_TIMEOUT:-10800}"
+DEBUG_WAIT_TIMEOUT="${DEBUG_WAIT_TIMEOUT:-0}"
+
+if [ "$DEBUG_WAIT_TIMEOUT" -eq 0 ]; then
+    echo "Debug wait disabled (DEBUG_WAIT_TIMEOUT=0)"
+else
 echo "=========================================="
 echo "DEBUG WAIT ACTIVE - Cluster ready for debugging"

Then in your MCO job configs (lines 3750-3780 of the YAML file), explicitly set:

env:
  DEBUG_WAIT_TIMEOUT: "10800"

Option 2: Gate behind explicit enable flag

+if [[ "${ENABLE_DEBUG_WAIT}" != "true" ]]; then
+    echo "Debug wait disabled. Set ENABLE_DEBUG_WAIT=true to enable."
+else
 DEBUG_WAIT_TIMEOUT="${DEBUG_WAIT_TIMEOUT:-10800}"
 echo "=========================================="
🧰 Tools
🪛 Shellcheck (0.11.0)

[style] 280-280: $/${} is unnecessary on arithmetic variables.

(SC2004)


[info] 293-293: Double quote to prevent globbing and word splitting.

(SC2086)


[style] 307-307: $/${} is unnecessary on arithmetic variables.

(SC2004)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh`
around lines 274 - 309, The DEBUG_WAIT_TIMEOUT variable defaults to 10800
seconds, causing all jobs using this shared step-registry script to block for up
to 3 hours and breaking backward compatibility across the CI infrastructure.
Change the default value of DEBUG_WAIT_TIMEOUT from 10800 to 0 to maintain
backward compatibility, ensuring the debug wait only activates when explicitly
enabled. This allows existing jobs to proceed normally without modification
while new jobs or specific test configurations can explicitly set
DEBUG_WAIT_TIMEOUT to 10800 in their environment configuration to enable the
debug wait behavior.

echo "=========================================="

elapsed=0
while [ ! -f "/tmp/continue" ] && [ $elapsed -lt $DEBUG_WAIT_TIMEOUT ]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Harden numeric comparison against malformed input.

The unquoted $elapsed in the test condition can fail if DEBUG_WAIT_TIMEOUT contains non-numeric input. Use arithmetic evaluation for safer numeric comparison.

🔧 Recommended fix
-while [ ! -f "/tmp/continue" ] && [ $elapsed -lt $DEBUG_WAIT_TIMEOUT ]
+while [ ! -f "/tmp/continue" ] && (( elapsed < DEBUG_WAIT_TIMEOUT ))
 do
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while [ ! -f "/tmp/continue" ] && [ $elapsed -lt $DEBUG_WAIT_TIMEOUT ]
while [ ! -f "/tmp/continue" ] && (( elapsed < DEBUG_WAIT_TIMEOUT ))
do
🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 293-293: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh`
at line 293, The numeric comparison using the test operator [ $elapsed -lt
$DEBUG_WAIT_TIMEOUT ] is vulnerable to failures with non-numeric input in the
variables. Replace the test operator expression with arithmetic evaluation using
double parentheses (( $elapsed < $DEBUG_WAIT_TIMEOUT )) in the while loop
condition. This provides safer and more robust numeric comparison in bash
without requiring variable quoting and handles edge cases with malformed input
more gracefully.

@ptalgulk01

Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-confidential-secureboot-mini-perm-f14-mco

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@ptalgulk01: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci

openshift-ci Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@ptalgulk01: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-confidential-secureboot-mini-perm-f14-mco 9a94d8b link unknown /pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-confidential-secureboot-mini-perm-f14-mco
ci/rehearse/periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-confidential-secureboot-mini-perm-f14-mco-a 9a94d8b link unknown /pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-confidential-secureboot-mini-perm-f14-mco-a
ci/rehearse/periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-confidential-secureboot-mini-perm-f14-mco-b 9a94d8b link unknown /pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-confidential-secureboot-mini-perm-f14-mco-b

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant