Skip to content

Add Orion regression detection to PerfCI pipeline#1251

Open
arpsharm wants to merge 2 commits into
mainfrom
orion-regression-detection
Open

Add Orion regression detection to PerfCI pipeline#1251
arpsharm wants to merge 2 commits into
mainfrom
orion-regression-detection

Conversation

@arpsharm

@arpsharm arpsharm commented Jul 1, 2026

Copy link
Copy Markdown
Member

Summary

  • New pipeline step 06: runs Orion hunter change-point detection against ES after workloads complete
  • 7 config YAMLs for all PerfCI workloads (bootstorm, windows-bootstorm, hammerdb, uperf, fio, vdbench, sysbench)
  • HTML reports served at port 9091 in workload/date format
  • Slack notifications enabled (pass/fail per workload with report links)
  • Step 05 triggers 06 on success

Summary by CodeRabbit

  • New Features
    • Added a PerfCI Orion regression analysis workflow that cleans the workspace, checks out the benchmark runner, runs workload-specific evaluations, and generates time-stamped report outputs.
    • Publishes run summaries to Slack (including per-workload report links) and emails failure details with build context.
    • Introduced new Orion performance test configurations for bootstorm, fio, HammerDB, sysbench, uperf, and vdbench (VM/pod variants), plus Windows bootstorm tracking.
  • Bug Fixes
    • Replaced the deployment “TBD” step with an automatic, non-blocking trigger to the next deployment stage.

@openshift-ci openshift-ci Bot requested review from RobertKrawitz and ebattat July 1, 2026 11:22
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ad8df9aa-17b5-43d6-b528-60bb36acd563

📥 Commits

Reviewing files that changed from the base of the PR and between f35145c and 65248e2.

📒 Files selected for processing (9)
  • jenkins/PerfCI/05_PerfCI_Backup_Report_Deployment/Jenkinsfile
  • jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile
  • orion-configs/perfci-bootstorm.yaml
  • orion-configs/perfci-fio.yaml
  • orion-configs/perfci-hammerdb.yaml
  • orion-configs/perfci-sysbench.yaml
  • orion-configs/perfci-uperf.yaml
  • orion-configs/perfci-vdbench.yaml
  • orion-configs/perfci-windows-bootstorm.yaml
👮 Files not reviewed due to content moderation or server errors (9)
  • jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile
  • jenkins/PerfCI/05_PerfCI_Backup_Report_Deployment/Jenkinsfile
  • orion-configs/perfci-sysbench.yaml
  • orion-configs/perfci-vdbench.yaml
  • orion-configs/perfci-uperf.yaml
  • orion-configs/perfci-fio.yaml
  • orion-configs/perfci-bootstorm.yaml
  • orion-configs/perfci-hammerdb.yaml
  • orion-configs/perfci-windows-bootstorm.yaml

📝 Walkthrough
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding Orion-based regression detection to the PerfCI pipeline.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch orion-regression-detection

Comment @coderabbitai help to get the list of available commands.

@arpsharm

arpsharm commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile (1)

44-44: 🚀 Performance & Scalability | 🔵 Trivial

Reports directory not cleaned up; unbounded growth across runs.

/opt/orion-reports/<workload>/<date> accumulates every run's output with no retention/cleanup logic, unlike the Cleanup stage which wipes the Jenkins workspace. Over time this could exhaust disk space on the agent serving the reports.

Also applies to: 49-50

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile` at line 44, The report
output under reportsDir in the Jenkinsfile is never pruned, so per-run
workload/date folders keep accumulating and can fill the agent disk. Update the
deployment flow around the report generation logic and the Cleanup stage to add
retention for /opt/orion-reports, using the existing reportsDir symbol to locate
it; implement deletion or age/count-based pruning of old workload/date
subdirectories before or after each run so only recent reports are kept.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile`:
- Around line 51-60: The Orion invocation in the Jenkinsfile is masking failures
with unconditional suppression, so regression detection never fails the build.
Update the workload loop around the `orion --hunter-analyze` `sh` step to
capture each command’s exit status instead of appending `|| true`, aggregate any
failures across workloads, and surface them after the loop using an appropriate
build status such as `unstable()` or a failure summary. Keep the `post { success
{ ... } }` reporting aligned with the new failure handling so reports are not
shown as fully successful when `orion` exits non-zero.
- Around line 76-78: The success notification in the Jenkins pipeline is
pointing to the wrong Orion report location. Update the `success` block so it
uses the actual report host/path that serves Orion reports instead of
`${ELASTICSEARCH}:9091`, and use the existing deployment/reporting symbols in
this Jenkinsfile to find the correct value. If no stable host is available,
change the message to emit the report path or another accurate link target
rather than a misleading URL.

---

Nitpick comments:
In `@jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile`:
- Line 44: The report output under reportsDir in the Jenkinsfile is never
pruned, so per-run workload/date folders keep accumulating and can fill the
agent disk. Update the deployment flow around the report generation logic and
the Cleanup stage to add retention for /opt/orion-reports, using the existing
reportsDir symbol to locate it; implement deletion or age/count-based pruning of
old workload/date subdirectories before or after each run so only recent reports
are kept.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2ba47c1b-3e0c-41cd-aa37-6ecf40d2b0dd

📥 Commits

Reviewing files that changed from the base of the PR and between 3046322 and ec6b950.

📒 Files selected for processing (9)
  • jenkins/PerfCI/05_PerfCI_Backup_Report_Deployment/Jenkinsfile
  • jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile
  • orion-configs/perfci-bootstorm.yaml
  • orion-configs/perfci-fio.yaml
  • orion-configs/perfci-hammerdb.yaml
  • orion-configs/perfci-sysbench.yaml
  • orion-configs/perfci-uperf.yaml
  • orion-configs/perfci-vdbench.yaml
  • orion-configs/perfci-windows-bootstorm.yaml

Comment thread jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile Outdated
Comment thread jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile (1)

5-10: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Unused QUAY_BENCHMARK_RUNNER_REPOSITORY credential binding.

This credential is used for podman image cleanup in the sibling 05 pipeline but isn't referenced anywhere in this file.

♻️ Proposed fix
     environment {
         CONTACT_EMAIL = credentials('perfci_contact_email')
         ELASTICSEARCH = credentials('perfci_elasticsearch')
         ELASTICSEARCH_PORT = credentials('perfci_elasticsearch_port')
-        QUAY_BENCHMARK_RUNNER_REPOSITORY = credentials('perfci_quay_repository')
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile` around lines 5 - 10,
The Jenkinsfile environment block binds QUAY_BENCHMARK_RUNNER_REPOSITORY but it
is never used in this pipeline. Remove the unused credential binding from the
environment section, or if it is intended for future use, reference it from the
relevant stage logic in this Jenkinsfile so the variable is actually consumed.
orion-configs/perfci-hammerdb.yaml (1)

1-73: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Consider YAML anchors to reduce duplication across vm/pod/db_type variants.

Each of the six test blocks repeats an identical metrics structure, differing only in metadata. A YAML anchor/alias for the shared metrics list would reduce copy-paste risk if thresholds change later.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@orion-configs/perfci-hammerdb.yaml` around lines 1 - 73, The six HammerDB
test entries repeat the same metrics definition, so update the perfci-hammerdb
YAML to use a shared YAML anchor/alias for the common metrics list and reference
it from each test block. Keep the existing test names and metadata for each
vm/pod/db_type variant, but centralize the identical tpm metric structure so
changes to threshold or aggregation only need to be made once.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile`:
- Around line 5-10: The Jenkinsfile environment block binds
QUAY_BENCHMARK_RUNNER_REPOSITORY but it is never used in this pipeline. Remove
the unused credential binding from the environment section, or if it is intended
for future use, reference it from the relevant stage logic in this Jenkinsfile
so the variable is actually consumed.

In `@orion-configs/perfci-hammerdb.yaml`:
- Around line 1-73: The six HammerDB test entries repeat the same metrics
definition, so update the perfci-hammerdb YAML to use a shared YAML anchor/alias
for the common metrics list and reference it from each test block. Keep the
existing test names and metadata for each vm/pod/db_type variant, but centralize
the identical tpm metric structure so changes to threshold or aggregation only
need to be made once.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9dce8624-ab8a-4894-82e2-10de45129c8f

📥 Commits

Reviewing files that changed from the base of the PR and between 3046322 and ec6b950.

📒 Files selected for processing (9)
  • jenkins/PerfCI/05_PerfCI_Backup_Report_Deployment/Jenkinsfile
  • jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile
  • orion-configs/perfci-bootstorm.yaml
  • orion-configs/perfci-fio.yaml
  • orion-configs/perfci-hammerdb.yaml
  • orion-configs/perfci-sysbench.yaml
  • orion-configs/perfci-uperf.yaml
  • orion-configs/perfci-vdbench.yaml
  • orion-configs/perfci-windows-bootstorm.yaml

@arpsharm arpsharm force-pushed the orion-regression-detection branch from ec6b950 to b352e3e Compare July 1, 2026 12:01
@arpsharm

arpsharm commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile (1)

52-61: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Consider a timeout around the orion invocation.

Each workload's orion --hunter-analyze call has no timeout; a hang (e.g., ES connectivity stall) would block the whole pipeline stage indefinitely across all seven workloads with no recovery.

♻️ Wrap the sh step in a timeout
-                        def rc = sh(returnStatus: true, script: """
-                            source /opt/orion-venv/bin/activate
-                            orion --es-server ${esServer} \
-                              --metadata-index ${idx} \
-                              --benchmark-index ${idx} \
-                              --lookback 15d \
-                              --hunter-analyze --viz \
-                              --save-output-path ${outDir}/results.txt \
-                              --config ${configPath}
-                        """)
+                        def rc = 0
+                        timeout(time: 30, unit: 'MINUTES') {
+                            rc = sh(returnStatus: true, script: """
+                                source /opt/orion-venv/bin/activate
+                                orion --es-server ${esServer} \
+                                  --metadata-index ${idx} \
+                                  --benchmark-index ${idx} \
+                                  --lookback 15d \
+                                  --hunter-analyze --viz \
+                                  --save-output-path ${outDir}/results.txt \
+                                  --config ${configPath}
+                            """)
+                        }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile` around lines 52 - 61,
The `orion --hunter-analyze` shell step in the Jenkinsfile currently has no
execution timeout, so a stalled workload can hang the entire stage indefinitely.
Wrap the `sh(returnStatus: true, script: ...)` call that assigns `rc` in a
Jenkins `timeout` block, using a duration appropriate for one workload, and keep
the existing `orion` arguments unchanged so each workload can fail fast and the
pipeline can continue or recover cleanly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile`:
- Around line 48-66: The Orion workload loop in the Jenkinsfile now sets
currentBuild.result to UNSTABLE, but there is no post action to notify on that
status. Add an unstable post block alongside the existing success/failure
handling so detected regressions or Orion errors still send the team an email
and report-link message. Use the existing Jenkins post section in this pipeline
and keep the notification behavior aligned with the current success/failure
logic.

---

Nitpick comments:
In `@jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile`:
- Around line 52-61: The `orion --hunter-analyze` shell step in the Jenkinsfile
currently has no execution timeout, so a stalled workload can hang the entire
stage indefinitely. Wrap the `sh(returnStatus: true, script: ...)` call that
assigns `rc` in a Jenkins `timeout` block, using a duration appropriate for one
workload, and keep the existing `orion` arguments unchanged so each workload can
fail fast and the pipeline can continue or recover cleanly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8de9348a-87e0-4ad7-9ba6-49e474b08896

📥 Commits

Reviewing files that changed from the base of the PR and between ec6b950 and b352e3e.

📒 Files selected for processing (9)
  • jenkins/PerfCI/05_PerfCI_Backup_Report_Deployment/Jenkinsfile
  • jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile
  • orion-configs/perfci-bootstorm.yaml
  • orion-configs/perfci-fio.yaml
  • orion-configs/perfci-hammerdb.yaml
  • orion-configs/perfci-sysbench.yaml
  • orion-configs/perfci-uperf.yaml
  • orion-configs/perfci-vdbench.yaml
  • orion-configs/perfci-windows-bootstorm.yaml
✅ Files skipped from review due to trivial changes (4)
  • orion-configs/perfci-uperf.yaml
  • orion-configs/perfci-fio.yaml
  • orion-configs/perfci-windows-bootstorm.yaml
  • orion-configs/perfci-bootstorm.yaml
🚧 Files skipped from review as they are similar to previous changes (4)
  • orion-configs/perfci-vdbench.yaml
  • jenkins/PerfCI/05_PerfCI_Backup_Report_Deployment/Jenkinsfile
  • orion-configs/perfci-sysbench.yaml
  • orion-configs/perfci-hammerdb.yaml

Comment thread jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile
@arpsharm

arpsharm commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile`:
- Around line 84-89: Slack notification failures are being hidden by the `||
true` in the `sh` step that posts to `chat.postMessage`. Remove the
unconditional suppression and make the Jenkinsfile step in the Slack
notification block fail or at least surface the curl exit status when the post
does not succeed. Use the existing `slackMsg` and `SLACK_TOKEN`/`SLACK_CHANNEL`
flow to keep the same behavior, but ensure failures from bad credentials,
network issues, or invalid payloads are visible in the pipeline logs.
- Around line 84-89: The Slack notification step in the Jenkins pipeline is
interpolating SLACK_TOKEN into the Groovy string passed to sh, which can embed
the secret in the script text. Update the Slack chat.postMessage call in the
pipeline’s sh block to use a single-quoted shell script so the shell expands
$SLACK_TOKEN from the environment at runtime, and keep the surrounding Slack
message logic unchanged.
- Around line 57-66: The Orion/ES invocation inside the `sh` step can hang
indefinitely if Elasticsearch stops responding, so wrap the `sh(returnStatus:
true, script: ...)` call in a `timeout` block and handle timeout failures
cleanly. Update the `Jenkinsfile` logic around the Orion command in the
deployment stage so the stage cannot block forever, and keep the existing `rc`
handling intact for non-timeout failures.
- Around line 53-69: Update the result classification in the Jenkinsfile loop
that runs the orion command so non-zero exits from execution failures are not
reported as true regressions. In the indexMap.each block, distinguish between a
genuine benchmark regression and command/config/connectivity errors by
inspecting the orion invocation outcome around sh(returnStatus: true, script:
...), and only set status to 'regression' when the benchmark actually detects
one; otherwise mark failures with a separate non-regression state that can be
surfaced differently in results.add and the Slack summary.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 46941d93-1bcc-4d5e-a8e0-115815ec69ee

📥 Commits

Reviewing files that changed from the base of the PR and between 3046322 and 8201efe.

📒 Files selected for processing (9)
  • jenkins/PerfCI/05_PerfCI_Backup_Report_Deployment/Jenkinsfile
  • jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile
  • orion-configs/perfci-bootstorm.yaml
  • orion-configs/perfci-fio.yaml
  • orion-configs/perfci-hammerdb.yaml
  • orion-configs/perfci-sysbench.yaml
  • orion-configs/perfci-uperf.yaml
  • orion-configs/perfci-vdbench.yaml
  • orion-configs/perfci-windows-bootstorm.yaml

Comment on lines +53 to +69
indexMap.each { workload, idx ->
def configPath = "${configsDir}/perfci-${workload}.yaml"
def outDir = "${reportsDir}/${workload}/${runDate}"
sh "mkdir -p ${outDir}"
def rc = sh(returnStatus: true, script: """
source /opt/orion-venv/bin/activate
orion --es-server ${esServer} \
--metadata-index ${idx} \
--benchmark-index ${idx} \
--lookback 15d \
--hunter-analyze --viz \
--save-output-path ${outDir}/results.txt \
--config ${configPath}
""")
def status = (rc == 0) ? 'pass' : 'regression'
results.add([name: workload, status: status, link: "${reportsUrl}/${workload}/${runDate}/"])
}

@coderabbitai coderabbitai Bot Jul 2, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Orion command failures are indistinguishable from actual regressions.

status is derived purely from rc, so a non-zero exit from a bad config, missing config file, or ES connectivity error is reported identically to a genuine detected regression ('regression'). This will surface false "regressions" in the Slack summary for infrastructure/config problems, undermining trust in the alert.

♻️ Suggested distinction between error and regression
-                        def status = (rc == 0) ? 'pass' : 'regression'
+                        // Orion returns a distinct non-zero code for detected regressions vs. tool errors;
+                        // verify actual exit-code semantics and branch accordingly.
+                        def status = (rc == 0) ? 'pass' : (rc == 1 ? 'regression' : 'error')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
indexMap.each { workload, idx ->
def configPath = "${configsDir}/perfci-${workload}.yaml"
def outDir = "${reportsDir}/${workload}/${runDate}"
sh "mkdir -p ${outDir}"
def rc = sh(returnStatus: true, script: """
source /opt/orion-venv/bin/activate
orion --es-server ${esServer} \
--metadata-index ${idx} \
--benchmark-index ${idx} \
--lookback 15d \
--hunter-analyze --viz \
--save-output-path ${outDir}/results.txt \
--config ${configPath}
""")
def status = (rc == 0) ? 'pass' : 'regression'
results.add([name: workload, status: status, link: "${reportsUrl}/${workload}/${runDate}/"])
}
indexMap.each { workload, idx ->
def configPath = "${configsDir}/perfci-${workload}.yaml"
def outDir = "${reportsDir}/${workload}/${runDate}"
sh "mkdir -p ${outDir}"
def rc = sh(returnStatus: true, script: """
source /opt/orion-venv/bin/activate
orion --es-server ${esServer} \
--metadata-index ${idx} \
--benchmark-index ${idx} \
--lookback 15d \
--hunter-analyze --viz \
--save-output-path ${outDir}/results.txt \
--config ${configPath}
""")
// Orion returns a distinct non-zero code for detected regressions vs. tool errors;
// verify actual exit-code semantics and branch accordingly.
def status = (rc == 0) ? 'pass' : (rc == 1 ? 'regression' : 'error')
results.add([name: workload, status: status, link: "${reportsUrl}/${workload}/${runDate}/"])
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile` around lines 53 - 69,
Update the result classification in the Jenkinsfile loop that runs the orion
command so non-zero exits from execution failures are not reported as true
regressions. In the indexMap.each block, distinguish between a genuine benchmark
regression and command/config/connectivity errors by inspecting the orion
invocation outcome around sh(returnStatus: true, script: ...), and only set
status to 'regression' when the benchmark actually detects one; otherwise mark
failures with a separate non-regression state that can be surfaced differently
in results.add and the Slack summary.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have but overengineered for now. If ES is down or config is bad, you'd see it in the Jenkins console. A false "regression" alert is still better than silence.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, something went wrong! Please try again later. 🐰 💔

Comment on lines +57 to +66
def rc = sh(returnStatus: true, script: """
source /opt/orion-venv/bin/activate
orion --es-server ${esServer} \
--metadata-index ${idx} \
--benchmark-index ${idx} \
--lookback 15d \
--hunter-analyze --viz \
--save-output-path ${outDir}/results.txt \
--config ${configPath}
""")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

No timeout around the Orion/ES invocation.

If Elasticsearch is unresponsive, this sh step (and thus the whole stage/agent) can hang indefinitely, since neither the step nor the loop enforces a timeout.

🔧 Suggested fix using `timeout` block
-                        def rc = sh(returnStatus: true, script: """
-                            source /opt/orion-venv/bin/activate
-                            orion --es-server ${esServer} \
-                              --metadata-index ${idx} \
-                              --benchmark-index ${idx} \
-                              --lookback 15d \
-                              --hunter-analyze --viz \
-                              --save-output-path ${outDir}/results.txt \
-                              --config ${configPath}
-                        """)
+                        def rc
+                        timeout(time: 15, unit: 'MINUTES') {
+                            rc = sh(returnStatus: true, script: """
+                                source /opt/orion-venv/bin/activate
+                                orion --es-server ${esServer} \
+                                  --metadata-index ${idx} \
+                                  --benchmark-index ${idx} \
+                                  --lookback 15d \
+                                  --hunter-analyze --viz \
+                                  --save-output-path ${outDir}/results.txt \
+                                  --config ${configPath}
+                            """)
+                        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def rc = sh(returnStatus: true, script: """
source /opt/orion-venv/bin/activate
orion --es-server ${esServer} \
--metadata-index ${idx} \
--benchmark-index ${idx} \
--lookback 15d \
--hunter-analyze --viz \
--save-output-path ${outDir}/results.txt \
--config ${configPath}
""")
def rc
timeout(time: 15, unit: 'MINUTES') {
rc = sh(returnStatus: true, script: """
source /opt/orion-venv/bin/activate
orion --es-server ${esServer} \
--metadata-index ${idx} \
--benchmark-index ${idx} \
--lookback 15d \
--hunter-analyze --viz \
--save-output-path ${outDir}/results.txt \
--config ${configPath}
""")
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile` around lines 57 - 66,
The Orion/ES invocation inside the `sh` step can hang indefinitely if
Elasticsearch stops responding, so wrap the `sh(returnStatus: true, script:
...)` call in a `timeout` block and handle timeout failures cleanly. Update the
`Jenkinsfile` logic around the Orion command in the deployment stage so the
stage cannot block forever, and keep the existing `rc` handling intact for
non-timeout failures.

Comment on lines +84 to +89
sh """
curl -s -X POST https://slack.com/api/chat.postMessage \
-H 'Authorization: Bearer ${SLACK_TOKEN}' \
-H 'Content-Type: application/json' \
-d '{"channel":"${SLACK_CHANNEL}","text":"${slackMsg}"}' || true
"""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Slack notification failures are silently swallowed.

|| true on the curl call means a failed Slack post (bad token, network issue, malformed JSON from unescaped slackMsg) is never surfaced — the only regression signal (per the team's stated plan of relying on Slack instead of email) can silently fail to be delivered.

🔧 Suggested fix to surface curl failures
-                    sh """
-                        curl -s -X POST https://slack.com/api/chat.postMessage \
-                          -H 'Authorization: Bearer ${SLACK_TOKEN}' \
-                          -H 'Content-Type: application/json' \
-                          -d '{"channel":"${SLACK_CHANNEL}","text":"${slackMsg}"}' || true
-                    """
+                    def slackRc = sh(returnStatus: true, script: """
+                        curl -s -X POST https://slack.com/api/chat.postMessage \
+                          -H 'Authorization: Bearer ${SLACK_TOKEN}' \
+                          -H 'Content-Type: application/json' \
+                          -d '{"channel":"${SLACK_CHANNEL}","text":"${slackMsg}"}'
+                    """)
+                    if (slackRc != 0) {
+                        echo "WARNING: Failed to post Slack notification (exit ${slackRc})"
+                    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sh """
curl -s -X POST https://slack.com/api/chat.postMessage \
-H 'Authorization: Bearer ${SLACK_TOKEN}' \
-H 'Content-Type: application/json' \
-d '{"channel":"${SLACK_CHANNEL}","text":"${slackMsg}"}' || true
"""
def slackRc = sh(returnStatus: true, script: """
curl -s -X POST https://slack.com/api/chat.postMessage \
-H 'Authorization: Bearer ${SLACK_TOKEN}' \
-H 'Content-Type: application/json' \
-d '{"channel":"${SLACK_CHANNEL}","text":"${slackMsg}"}'
""")
if (slackRc != 0) {
echo "WARNING: Failed to post Slack notification (exit ${slackRc})"
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile` around lines 84 - 89,
Slack notification failures are being hidden by the `|| true` in the `sh` step
that posts to `chat.postMessage`. Remove the unconditional suppression and make
the Jenkinsfile step in the Slack notification block fail or at least surface
the curl exit status when the post does not succeed. Use the existing `slackMsg`
and `SLACK_TOKEN`/`SLACK_CHANNEL` flow to keep the same behavior, but ensure
failures from bad credentials, network issues, or invalid payloads are visible
in the pipeline logs.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

sed -n '1,160p' jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile

Repository: redhat-performance/benchmark-runner

Length of output: 4712


Avoid Groovy-interpolating SLACK_TOKEN in sh. Use a single-quoted shell block so the shell expands $SLACK_TOKEN from the environment instead of embedding the secret into the script text.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@jenkins/PerfCI/06_PerfCI_Orion_Deployment/Jenkinsfile` around lines 84 - 89,
The Slack notification step in the Jenkins pipeline is interpolating SLACK_TOKEN
into the Groovy string passed to sh, which can embed the secret in the script
text. Update the Slack chat.postMessage call in the pipeline’s sh block to use a
single-quoted shell script so the shell expands $SLACK_TOKEN from the
environment at runtime, and keep the surrounding Slack message logic unchanged.

@arpsharm arpsharm force-pushed the orion-regression-detection branch from 8201efe to 589a686 Compare July 2, 2026 10:55
@arpsharm arpsharm force-pushed the orion-regression-detection branch from 589a686 to 65248e2 Compare July 2, 2026 10:57
@arpsharm

arpsharm commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

@ebattat ebattat left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci

openshift-ci Bot commented Jul 2, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: arpsharm, ebattat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved label Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants