Add winstress_vm, linstress_vm workloads and WORKLOAD_CONFIG override by ebattat · Pull Request #1252 · redhat-performance/benchmark-runner

ebattat · 2026-07-02T12:09:49Z

Summary

Add winstress_vm workload: Windows CPU+memory stress test with Python multiprocessing, supports scale via BootstormVM, configurable stress_cpu/stress_memory/stress_duration
Add linstress_vm workload: Linux Fedora equivalent using cloud-init embedded stress script, supports scale via BootstormVM, same metrics for direct Windows/Linux comparison
Add WORKLOAD_CONFIG env var to override template defaults (requests_cpu, requests_memory, sockets, cores, threads) for any workload at runtime
Add pin_node0 to pin logic for node selector support

Test plan

make golden_files — golden files regenerated
make test_golden_files — golden file tests pass
PYTHONPATH=. python3 -m pytest -v tests/unittest/ — all 26 unit tests pass
Manual test: WORKLOAD=winstress_vm with scale on Windows VMs
Manual test: WORKLOAD=linstress_vm with scale on Fedora VMs
Verify WORKLOAD_CONFIG override works for different workloads

🤖 Assisted-by: Claude Code

Summary by CodeRabbit

New Features
- Added VM-based stress workloads for Linux (linstress) and Windows (winstress), including new workload templates and end-to-end runners.
- Introduced an optional workload configuration override via an environment variable to customize rendered templates.
Bug Fixes
- Improved node pinning behavior so pinning activates when either node pin setting is provided.
- Updated workload-to-namespace mapping for the new Linux/Windows VM runs.
Tests
- Added/updated golden manifests and unit test fixtures for the new templates and scenarios.

- Add winstress_vm: Windows CPU+memory stress test with Python multiprocessing, supports scale via BootstormVM, configurable stress_cpu/stress_memory/stress_duration - Add linstress_vm: Linux Fedora equivalent using cloud-init embedded stress script, supports scale via BootstormVM, same metrics for direct Win/Lin comparison - Add WORKLOAD_CONFIG env var to override template defaults (requests_cpu, requests_memory, sockets, cores, threads) for any workload at runtime - Add pin_node0 to pin logic for node selector support Assisted-by: Claude Code Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

openshift-ci · 2026-07-02T12:10:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ebattat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [ebattat]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-07-02T12:10:20Z

📝 Walkthrough

Walkthrough

The PR adds linstress and winstress VM workload support across environment wiring, template rendering, workload templates, workload runners, and updated golden manifests for the generated Kubernetes and KubeVirt resources.

Changes

VM workload support

Layer / File(s)	Summary
Rendering and environment wiring `benchmark_runner/main/environment_variables.py`, `benchmark_runner/common/template_operations/template_operations.py`	Template rendering accepts `WORKLOAD_CONFIG` overrides, and the new VM workloads, namespaces, and pinning behavior are registered in environment variables.
Linstress templates `benchmark_runner/common/template_operations/templates/linstress/...`	The linstress template data and VM template define the cloud-init stress workload, VM topology, and secret-backed boot wiring.
Winstress templates `benchmark_runner/common/template_operations/templates/winstress/...`	The winstress template data, PowerShell stress script, DataVolume, and VM template define the Windows stress workload and storage wiring.
Linstress runner `benchmark_runner/workloads/linstress_vm.py`	The linstress runner creates VMs, collects stress reports, normalizes results, uploads Elasticsearch data, and handles cleanup.
Winstress runner `benchmark_runner/workloads/winstress_vm.py`	The winstress runner copies the stress script, prepares the VM, collects results, uploads Elasticsearch data, and orchestrates cleanup.
Linstress and winstress golden manifests `tests/unittest/benchmark_runner/common/template_operations/golden_files/...`	The golden fixtures add the namespace, DataVolume, Secret, and VM manifests for the linstress and winstress scenarios across the CI, perf_ci, chaos_ci, func_ci, test_ci, and release variants.

Estimated code review effort: 5 (Critical) | ~120 minutes

Possibly related PRs

redhat-performance/benchmark-runner#1219: Also changes TemplateOperations.__generate_yamls_internal in the same rendering pipeline.
redhat-performance/benchmark-runner#1226: Also updates workload registration in benchmark_runner/main/environment_variables.py.
redhat-performance/benchmark-runner#1229: Also touches workload list and namespace mapping logic for VM workloads.

Suggested labels: enhancement

Suggested reviewers: RobertKrawitz, arpsharm

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: two new workloads plus the WORKLOAD_CONFIG override.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch winstress-linstress

_{Comment @coderabbitai help to get the list of available commands.}

Assisted-by: Claude Code Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…om linstress Assisted-by: Claude Code Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml (1)
9-14: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Remove the hardcoded VM login credential. benchmark_runner/common/template_operations/templates/linstress/internal_data/linstress_vm_template.yaml still sets password: fedora with ssh_pwauth: true; use key-based auth or a per-run secret instead of a fixed, well-known password.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml`
around lines 9 - 14, The VM template still hardcodes a known login password in
the linstress cloud-init data, so update the relevant template generation path
in linstress_vm_template.yaml to remove the fixed password and disable password
auth by default. Use key-based authentication or inject a per-run secret through
the template parameters instead, and ensure the generated golden file reflects
the updated behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml`:
- Around line 9-14: The VM template still hardcodes a known login password in
the linstress cloud-init data, so update the relevant template generation path
in linstress_vm_template.yaml to remove the fixed password and disable password
auth by default. Use key-based authentication or inject a per-run secret through
the template parameters instead, and ensure the generated golden file reflects
the updated behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5db5e8e5-18b9-4126-8efc-8c48195cf61b

📥 Commits

Reviewing files that changed from the base of the PR and between ee29469 and dfbc898.

📒 Files selected for processing (4)

benchmark_runner/common/template_operations/templates/linstress/linstress_data_template.yaml
benchmark_runner/main/environment_variables.py
tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml

💤 Files with no reviewable changes (1)

benchmark_runner/common/template_operations/templates/linstress/linstress_data_template.yaml

✅ Files skipped from review due to trivial changes (1)

tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml

🚧 Files skipped from review as they are similar to previous changes (1)

benchmark_runner/main/environment_variables.py

ebattat · 2026-07-02T13:55:39Z

@coderabbitai full review

coderabbitai · 2026-07-02T13:56:23Z

✅ Action performed

Full review finished.

coderabbitai

Actionable comments posted: 13

🧹 Nitpick comments (6)

benchmark_runner/common/template_operations/template_operations.py (1)
156-173: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Duplicate override logic — extract a shared helper.

The new workload_config block (Line 165-173) duplicates the hammerdb_config block (Line 156-163) almost verbatim (filter empties, diff unknown keys, warn, log, merge). Consider factoring this into a single private helper to avoid future divergence between the two override mechanisms.
♻️ Suggested refactor
+    `@staticmethod`
+    def __apply_config_override(config: dict, config_name: str, render_data: dict):
+        config = {k: v for k, v in config.items() if v != ''}
+        if not config:
+            return
+        unknown_keys = set(config.keys()) - set(render_data.keys())
+        if unknown_keys:
+            logger.warning(f'{config_name} unknown keys (will be ignored): {unknown_keys}')
+        logger.info(f'{config_name} override: {config}')
+        render_data.update(config)
+
         hammerdb_config = self.__environment_variables_dict.get('hammerdb_config', {})
         if hammerdb_config and ('hammerdb' in self.__workload_name or 'winmssql' in self.__workload_name):
-            hammerdb_config = {k: v for k, v in hammerdb_config.items() if v != ''}
-            unknown_keys = set(hammerdb_config.keys()) - set(render_data.keys())
-            if unknown_keys:
-                logger.warning(f'HAMMERDB_CONFIG unknown keys (will be ignored): {unknown_keys}')
-            logger.info(f'HAMMERDB_CONFIG override: {hammerdb_config}')
-            render_data.update(hammerdb_config)
+            self.__apply_config_override(hammerdb_config, 'HAMMERDB_CONFIG', render_data)

         workload_config = self.__environment_variables_dict.get('workload_config', {})
-        if workload_config:
-            workload_config = {k: v for k, v in workload_config.items() if v != ''}
-            unknown_keys = set(workload_config.keys()) - set(render_data.keys())
-            if unknown_keys:
-                logger.warning(f'WORKLOAD_CONFIG unknown keys (will be ignored): {unknown_keys}')
-            logger.info(f'WORKLOAD_CONFIG override: {workload_config}')
-            render_data.update(workload_config)
+        if workload_config:
+            self.__apply_config_override(workload_config, 'WORKLOAD_CONFIG', render_data)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark_runner/common/template_operations/template_operations.py` around
lines 156 - 173, The new workload_config handling duplicates the existing
hammerdb_config override flow in TemplateOperations, so extract the shared
“filter empty values, detect unknown keys, log, and merge into render_data”
behavior into a private helper. Reuse that helper from the hammerdb_config and
workload_config branches in template_operations.py so both paths stay consistent
and future changes only need to be made once. Keep the workload_name condition
for hammerdb_config in place, and preserve the existing
logger.warning/logger.info behavior through the shared helper.
benchmark_runner/common/template_operations/templates/winstress/internal_data/winstress_vm_template.yaml (1)
4-12: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Repeated name-computation logic across the template (DRY).

The {% if scale %}...{% else %}...{% endif %} pattern for VM name/labels and root-disk name is duplicated across 4+ locations. Consider computing these once via Jinja {% set %} at the top and reusing the variables, reducing risk of divergence on future edits.
♻️ Example refactor
 metadata:
-  {% if scale -%}
-  name: winstress-{{ kind }}-{{ trunc_uuid }}-{{ scale }}
-  labels:
-    kubevirt.io/vm: winstress-{{ kind }}-{{ trunc_uuid }}-{{ scale }}
-  {%- else -%}
-  name: winstress-{{ kind }}-{{ trunc_uuid }}
-  labels:
-    kubevirt.io/vm: winstress-{{ kind }}-{{ trunc_uuid }}
-  {%- endif %}
+  {%- set vm_name = 'winstress-' ~ kind ~ '-' ~ trunc_uuid ~ (('-' ~ scale) if scale else '') -%}
+  {%- set root_disk_name = 'winstress-' ~ kind ~ '-root-disk-' ~ trunc_uuid ~ (('-' ~ scale) if scale else '') -%}
+  name: {{ vm_name }}
+  labels:
+    kubevirt.io/vm: {{ vm_name }}
Then reference {{ vm_name }} / {{ root_disk_name }} at the other occurrences (labels, disks, volumes, dataVolumeTemplates).
Also applies to: 23-27, 53-57, 113-122, 135-139
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@benchmark_runner/common/template_operations/templates/winstress/internal_data/winstress_vm_template.yaml`
around lines 4 - 12, The VM name/label logic in the template is duplicated in
multiple places and should be centralized for consistency. In
winstress_vm_template.yaml, add Jinja {% set %} variables near the top for the
computed VM name and root disk name, then replace the repeated {% if scale
%}...{% else %}...{% endif %} blocks in the template sections with those shared
variables (for labels, disks, volumes, and dataVolumeTemplates). Use the
existing template symbols like kind, trunc_uuid, and scale to build vm_name and
root_disk_name once, then reference them everywhere else.
benchmark_runner/workloads/linstress_vm.py (1)
53-53: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Remove unnecessary f-string prefix.

Ruff flags F541: the string has no placeholders.
-            check_cmd = f'test -f /tmp/stress_report.json && echo found'
+            check_cmd = 'test -f /tmp/stress_report.json && echo found'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark_runner/workloads/linstress_vm.py` at line 53, The check_cmd
assignment in linstress_vm.py is using an unnecessary f-string prefix even
though there are no placeholders. Update the check_cmd string literal in the
workload setup code to a plain string so it no longer triggers Ruff F541,
keeping the same shell command content.
Source: Linters/SAST tools
benchmark_runner/main/environment_variables.py (1)
107-128: 📐 Maintainability & Code Quality | 🔵 Trivial

Document new workloads in help text/README.

winstress_vm and linstress_vm were added to environment_variables_dict['workloads'], but the workload list documented in the coding guidelines' README content doesn't mention them yet. As per coding guidelines, **/*.py: "Document supported workload types in code comments or help text for WORKLOAD parameter (workloads: stressng_pod, stressng_vm, uperf_pod, uperf_vm, sysbench_pod, sysbench_vm, hammerdb variants, vdbench variants, fio variants, winfio_vm, bootstorm_vm, windows_vm, winmssql_vm, clusterbuster)" — please update the corresponding documentation to include the two new workload names.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark_runner/main/environment_variables.py` around lines 107 - 128, The
workload list in the help text/README is missing the newly supported
`winstress_vm` and `linstress_vm` entries added in `EnvironmentVariables`;
update the documentation that describes the WORKLOAD parameter so it matches the
values in `environment_variables_dict['workloads']`. Locate the workload
documentation section referenced by the coding guidelines and add these two
names alongside the existing workload variants, keeping the list consistent with
the implementation.
Source: Coding guidelines
tests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_False/winstress_vm.yaml (1)
96-102: 🔒 Security & Privacy | 🔵 Trivial | ⚖️ Poor tradeoff

Temporary Everyone:F grant on SSH key directory.

Line 100 grants Everyone:F on .ssh and authorized_keys (needed to take ownership/delete a pre-existing file) before line 101 restores restrictive ACLs. This creates a brief window where any local principal can read/write the SSH authorized_keys file during boot. Consider using a SYSTEM-owned/administrator-only takeover approach (e.g., takeown as SYSTEM without the Everyone:F grant) to avoid the permissive window.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_False/winstress_vm.yaml`
around lines 96 - 102, The `winstress_vm.yaml` cloud-init `runcmd` block briefly
opens `.ssh` and `authorized_keys` to `Everyone:F`, which should be removed.
Update the `userData` setup steps so the existing file can be taken over and
deleted using a SYSTEM/Administrator-only approach in the same `runcmd`
sequence, then keep the final restrictive ACL restoration in place. Reference
the `userData` SSH setup commands to locate the temporary permission grant and
replace it with a non-permissive ownership reset flow.
tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml (1)
126-129: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Redundant package install.

dnf install -y python3-psutil || true duplicates the packages: - python3-psutil cloud-config directive above; if intentional as a fallback for module-ordering issues, a brief comment would help future readers.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml`
around lines 126 - 129, The runcmd in the linstress_vm.yaml template is
redundantly reinstalling python3-psutil even though it is already declared in
the packages section. Update the runcmd block in the linstress_vm template to
remove the duplicate dnf install step, or keep it only if truly needed as a
fallback and add a brief inline comment explaining why. Use the runcmd and
packages entries in func_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml to
locate the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@benchmark_runner/common/template_operations/templates/linstress/internal_data/linstress_vm_template.yaml`:
- Around line 78-96: The monitoring loop in the linstress template is
oversleeping for short runs because `intervals = max(1, duration // 30)` always
forces at least one 30-second wait, so `burn_cpu` and `burn_memory` can finish
before sampling starts. Update the loop in the monitoring section to use
elapsed-time tracking against the requested `duration` instead of a fixed
interval count, and only sleep for the remaining time needed before each sample.
Keep the sampling logic and `cpu_procs`/`mem_proc` joins in place, but make sure
the first sample is taken within the actual stress window even when `duration <
30`.

In
`@benchmark_runner/common/template_operations/templates/winstress/internal_data/01_run_stress_template.ps1`:
- Line 6: The $reportFile variable in 01_run_stress_template.ps1 is assigned but
never reused, while the report path is hardcoded again later in the script.
Update the logic around $reportFile so the same variable is used wherever the
stress report path is needed, including the report-writing and report-reading
spots in the script, and remove the duplicated literal path to keep it
consistent with $stressDir.

In
`@benchmark_runner/common/template_operations/templates/winstress/internal_data/windows_dv_template.yaml`:
- Line 6: The DataVolume name in the Windows clone template is hardcoded and
shared across multiple Windows workloads, which can cause collisions during
concurrent runs. Update the name in windows_dv_template.yaml to be unique per
workload or per run by incorporating a suffix such as the existing template
uuid/identifier, and make sure the Windows clone path stays consistent in the
template that defines the DataVolume object.

In `@benchmark_runner/workloads/linstress_vm.py`:
- Around line 35-43: _remove the per-VM error-log call from the multiprocessing
phase helpers and let the top-level run() failure handling own it. In
_create_vm, _collect_results, and _delete_vm, avoid calling
self.save_error_logs() inside the child-process path; instead, just propagate
the exception and keep failure logging centralized in run()’s except block,
which already runs in the main process. Use the existing _run_parallel_phases
flow and the BootstormVM-style pattern as the reference for locating these step
functions._
- Around line 64-70: The SCP/report path in linstress_vm’s result handling is
swallowing failures by only logging a warning and returning from _scp_file
handling, which lets _upload_results skip failed VMs and still complete
successfully. Update the failure path around the stress report fetch in
linstress_vm to track missing-result/SCP failures per VM, and have the workload
lifecycle logic raise an exception or otherwise mark the run failed when no
usable results are produced. Make sure the fix is wired through the existing
_upload_results flow so repeated VM failures are not silently ignored.
- Around line 54-62: The polling loop in linstress_vm is using a range counter
with OC.DELAY, but the virtctl_ssh call can take much longer than that, so the
real wait can exceed self._timeout. Update the timeout logic in the stress
report check loop to use a wall-clock deadline or elapsed-time tracking around
virtctl_ssh and time.sleep, so the loop exits once the overall timeout is
reached. Keep the change localized to the polling block that calls
self._virtctl.virtctl_ssh and logs the “Waiting for stress_report.json” message.

In `@benchmark_runner/workloads/winstress_vm.py`:
- Around line 65-91: The failure handling in `_prepare_vm` and `_run_stress` is
inconsistent with `_create_vm`, `_collect_results`, and `_delete_vm`, because
these methods only log warnings and swallow exceptions. Update `_prepare_vm` and
`_run_stress` in `winstress_vm.py` to propagate failures by re-raising after
logging, so a VM that never becomes SSH-ready or fails during stress does not
continue into later phases and produce misleading results.
- Around line 98-106: The wait loop in winstress_vm’s stress_report.json polling
uses a fixed iteration count instead of real elapsed time, so the combined
virtctl_ssh timeout and sleep can exceed the intended self._timeout budget.
Update the polling in the method that builds check_cmd and calls
virtctl.virtctl_ssh to track wall-clock elapsed time (or a deadline) and stop
once the total time reaches self._timeout, keeping the per-check SSH timeout
bounded as well. Ensure the logger message and break condition still reflect the
same stress_report.json check logic.
- Around line 57-63: The worker helper error handlers in _create_vm(),
_collect_results(), and _delete_vm() are doing failure logging via
save_error_logs(), which should not run inside multiprocessing.Process workers.
Remove the save_error_logs() call from those except blocks and keep them only
re-raising the exception, then update run() to perform the error logging after
_run_parallel_phases() returns so the Elasticsearch upload happens in the parent
process.

In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml`:
- Around line 8-12: The cloud-init settings in the linstress VM template use a
hardcoded `user`/`password` pair with `ssh_pwauth` enabled, which should be
removed from this benchmark fixture. Update the `linstress_vm.yaml` template to
stop using fixed credentials in this cloud-config block and instead follow the
same safer approach used by the Windows workload (prefer SSH public key
injection and disable password auth), or at minimum generate a per-run random
password rather than a literal default.

In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml`:
- Around line 8-12: The cloud-init rendered by the linstress VM template is
using hardcoded weak credentials and enabling SSH password login. Update the
linstress cloud-init/template that produces linstress_vm.yaml so it no longer
sets a fixed user/password pair and does not enable ssh_pwauth; switch the
template to key-based auth only or generate a unique per-run password. Use the
cloud-init generation path associated with linstress_vm and the template/data
source that fills user, password, and ssh_pwauth to locate the fix.

In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml`:
- Around line 6-12: The VM userdata in the linstress template hardcodes the
fedora account password, keeps it non-expiring, and enables password SSH auth.
Update the cloud-config in the linstress VM template so it relies on the SSH key
flow used by linstress_vm.py, and remove or template the password-related fields
and ssh_pwauth setting instead of shipping fixed credentials.

In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/release_linstress_vm_ODF_PVC_False/linstress_vm.yaml`:
- Around line 9-12: The cloud-init template for the linstress VM is using
hardcoded weak SSH credentials and password auth, which should be removed for
benchmark_runner artifacts. Update the template behind the linstress_vm.yaml
asset so the user/password values are not static (for example, inject a per-run
secret through the data template or switch the VM to SSH key auth), and keep the
relevant cloud-init fields such as ssh_pwauth and chpasswd aligned with that
change.

---

Nitpick comments:
In `@benchmark_runner/common/template_operations/template_operations.py`:
- Around line 156-173: The new workload_config handling duplicates the existing
hammerdb_config override flow in TemplateOperations, so extract the shared
“filter empty values, detect unknown keys, log, and merge into render_data”
behavior into a private helper. Reuse that helper from the hammerdb_config and
workload_config branches in template_operations.py so both paths stay consistent
and future changes only need to be made once. Keep the workload_name condition
for hammerdb_config in place, and preserve the existing
logger.warning/logger.info behavior through the shared helper.

In
`@benchmark_runner/common/template_operations/templates/winstress/internal_data/winstress_vm_template.yaml`:
- Around line 4-12: The VM name/label logic in the template is duplicated in
multiple places and should be centralized for consistency. In
winstress_vm_template.yaml, add Jinja {% set %} variables near the top for the
computed VM name and root disk name, then replace the repeated {% if scale
%}...{% else %}...{% endif %} blocks in the template sections with those shared
variables (for labels, disks, volumes, and dataVolumeTemplates). Use the
existing template symbols like kind, trunc_uuid, and scale to build vm_name and
root_disk_name once, then reference them everywhere else.

In `@benchmark_runner/main/environment_variables.py`:
- Around line 107-128: The workload list in the help text/README is missing the
newly supported `winstress_vm` and `linstress_vm` entries added in
`EnvironmentVariables`; update the documentation that describes the WORKLOAD
parameter so it matches the values in `environment_variables_dict['workloads']`.
Locate the workload documentation section referenced by the coding guidelines
and add these two names alongside the existing workload variants, keeping the
list consistent with the implementation.

In `@benchmark_runner/workloads/linstress_vm.py`:
- Line 53: The check_cmd assignment in linstress_vm.py is using an unnecessary
f-string prefix even though there are no placeholders. Update the check_cmd
string literal in the workload setup code to a plain string so it no longer
triggers Ruff F541, keeping the same shell command content.

In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_False/winstress_vm.yaml`:
- Around line 96-102: The `winstress_vm.yaml` cloud-init `runcmd` block briefly
opens `.ssh` and `authorized_keys` to `Everyone:F`, which should be removed.
Update the `userData` setup steps so the existing file can be taken over and
deleted using a SYSTEM/Administrator-only approach in the same `runcmd`
sequence, then keep the final restrictive ACL restoration in place. Reference
the `userData` SSH setup commands to locate the temporary permission grant and
replace it with a non-permissive ownership reset flow.

In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml`:
- Around line 126-129: The runcmd in the linstress_vm.yaml template is
redundantly reinstalling python3-psutil even though it is already declared in
the packages section. Update the runcmd block in the linstress_vm template to
remove the duplicate dnf install step, or keep it only if truly needed as a
fallback and add a brief inline comment explaining why. Use the runcmd and
packages entries in func_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml to
locate the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5d3c4db4-ec96-4cf5-a93f-f2de947eb4b1

📥 Commits

Reviewing files that changed from the base of the PR and between f35145c and dfbc898.

📒 Files selected for processing (64)

benchmark_runner/common/template_operations/template_operations.py
benchmark_runner/common/template_operations/templates/linstress/__init__.py
benchmark_runner/common/template_operations/templates/linstress/internal_data/__init__.py
benchmark_runner/common/template_operations/templates/linstress/internal_data/linstress_vm_template.yaml
benchmark_runner/common/template_operations/templates/linstress/linstress_data_template.yaml
benchmark_runner/common/template_operations/templates/winstress/__init__.py
benchmark_runner/common/template_operations/templates/winstress/internal_data/01_run_stress_template.ps1
benchmark_runner/common/template_operations/templates/winstress/internal_data/__init__.py
benchmark_runner/common/template_operations/templates/winstress/internal_data/windows_dv_template.yaml
benchmark_runner/common/template_operations/templates/winstress/internal_data/winstress_vm_template.yaml
benchmark_runner/common/template_operations/templates/winstress/winstress_data_template.yaml
benchmark_runner/main/environment_variables.py
benchmark_runner/workloads/linstress_vm.py
benchmark_runner/workloads/winstress_vm.py
tests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_linstress_vm_ODF_PVC_False/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_linstress_vm_ODF_PVC_True/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_False/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_False/windows_dv.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_False/winstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_True/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_True/windows_dv.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winstress_vm_ODF_PVC_True/winstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_False/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_True/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winstress_vm_ODF_PVC_False/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winstress_vm_ODF_PVC_False/windows_dv.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winstress_vm_ODF_PVC_False/winstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winstress_vm_ODF_PVC_True/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winstress_vm_ODF_PVC_True/windows_dv.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winstress_vm_ODF_PVC_True/winstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_False/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_True/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winstress_vm_ODF_PVC_False/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winstress_vm_ODF_PVC_False/windows_dv.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winstress_vm_ODF_PVC_False/winstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winstress_vm_ODF_PVC_True/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winstress_vm_ODF_PVC_True/windows_dv.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winstress_vm_ODF_PVC_True/winstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/release_linstress_vm_ODF_PVC_False/linstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/release_linstress_vm_ODF_PVC_False/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/release_linstress_vm_ODF_PVC_True/linstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/release_linstress_vm_ODF_PVC_True/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/release_winstress_vm_ODF_PVC_False/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/release_winstress_vm_ODF_PVC_False/windows_dv.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/release_winstress_vm_ODF_PVC_False/winstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/release_winstress_vm_ODF_PVC_True/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/release_winstress_vm_ODF_PVC_True/windows_dv.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/release_winstress_vm_ODF_PVC_True/winstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_linstress_vm_ODF_PVC_False/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_linstress_vm_ODF_PVC_True/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winstress_vm_ODF_PVC_False/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winstress_vm_ODF_PVC_False/windows_dv.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winstress_vm_ODF_PVC_False/winstress_vm.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winstress_vm_ODF_PVC_True/namespace.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winstress_vm_ODF_PVC_True/windows_dv.yaml
tests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winstress_vm_ODF_PVC_True/winstress_vm.yaml

coderabbitai · 2026-07-02T14:14:35Z

+              intervals = max(1, duration // 30)
+              samples = []
+              for i in range(intervals):
+                  time.sleep(30)
+                  mem = psutil.virtual_memory()
+                  cpu_pct = psutil.cpu_percent(interval=1)
+                  sample = {
+                      'time_sec': (i+1)*30,
+                      'cpu_percent': cpu_pct,
+                      'mem_percent': mem.percent,
+                      'mem_used_gb': round(mem.used / (1024**3), 1),
+                      'mem_total_gb': round(mem.total / (1024**3), 1)
+                  }
+                  samples.append(sample)
+                  print(f"At {sample['time_sec']}s: CPU={cpu_pct}% MEM={mem.percent}% ({sample['mem_used_gb']}GB/{sample['mem_total_gb']}GB)")
+
+              [p.join() for p in cpu_procs]
+              if mem_proc:
+                  mem_proc.join()


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Monitoring loop can sleep past the actual stress duration.

intervals = max(1, duration // 30) forces at least one time.sleep(30) iteration even when duration < 30. In that case burn_cpu/burn_memory workers finish long before the monitor wakes up, so the captured CPU/memory samples reflect post-stress idle state rather than the actual stress window — producing misleading benchmark data for short-duration runs (a common case for quick/test_ci iterations).

🐛 Proposed fix using elapsed-time tracking instead of a fixed iteration count

- intervals = max(1, duration // 30) - samples = [] - for i in range(intervals): - time.sleep(30) + samples = [] + elapsed_total = 0 + while elapsed_total < duration: + step = min(30, duration - elapsed_total) + time.sleep(step) + elapsed_total += step mem = psutil.virtual_memory() cpu_pct = psutil.cpu_percent(interval=1) sample = { - 'time_sec': (i+1)*30, + 'time_sec': elapsed_total,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

intervals = max(1, duration // 30)

samples = []

for i in range(intervals):

time.sleep(30)

mem = psutil.virtual_memory()

cpu_pct = psutil.cpu_percent(interval=1)

sample = {

'time_sec': (i+1)*30,

'cpu_percent': cpu_pct,

'mem_percent': mem.percent,

'mem_used_gb': round(mem.used / (1024**3), 1),

'mem_total_gb': round(mem.total / (1024**3), 1)

}

samples.append(sample)

print(f"At {sample['time_sec']}s: CPU={cpu_pct}% MEM={mem.percent}% ({sample['mem_used_gb']}GB/{sample['mem_total_gb']}GB)")

[p.join() for p in cpu_procs]

if mem_proc:

mem_proc.join()

samples = []

elapsed_total = 0

while elapsed_total < duration:

step = min(30, duration - elapsed_total)

time.sleep(step)

elapsed_total += step

mem = psutil.virtual_memory()

cpu_pct = psutil.cpu_percent(interval=1)

sample = {

'time_sec': elapsed_total,

'cpu_percent': cpu_pct,

'mem_percent': mem.percent,

'mem_used_gb': round(mem.used / (1024**3), 1),

'mem_total_gb': round(mem.total / (1024**3), 1)

}

samples.append(sample)

print(f"At {sample['time_sec']}s: CPU={cpu_pct}% MEM={mem.percent}% ({sample['mem_used_gb']}GB/{sample['mem_total_gb']}GB)")

[p.join() for p in cpu_procs]

if mem_proc:

mem_proc.join()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/common/template_operations/templates/linstress/internal_data/linstress_vm_template.yaml` around lines 78 - 96, The monitoring loop in the linstress template is oversleeping for short runs because `intervals = max(1, duration // 30)` always forces at least one 30-second wait, so `burn_cpu` and `burn_memory` can finish before sampling starts. Update the loop in the monitoring section to use elapsed-time tracking against the requested `duration` instead of a fixed interval count, and only sleep for the remaining time needed before each sample. Keep the sampling logic and `cpu_procs`/`mem_proc` joins in place, but make sure the first sample is taken within the actual stress window even when `duration < 30`.

coderabbitai · 2026-07-02T14:14:35Z

+$stressDir = 'C:\tools\stress'
+$pythonExe = 'C:\Program Files\Python312\python.exe'
+$stressScript = "$stressDir\stress.py"
+$reportFile = "$stressDir\stress_report.json"


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Unused $reportFile variable; report path duplicated instead of reused.

Static analysis flags $reportFile as assigned but never used. The literal path C:\tools\stress\stress_report.json is hardcoded again at Lines 114 and 116, risking drift if $stressDir changes.

♻️ Proposed fix

- with open(r'C:\tools\stress\stress_report.json', 'w') as f: + with open(r'$reportFile', 'w') as f: json.dump(report, f, indent=2) - print('Report saved to C:\\tools\\stress\\stress_report.json') + print('Report saved to $reportFile')

Based on learnings, PSScriptAnalyzer flagged: "The variable 'reportFile' is assigned but never used."
Also applies to: 114-116

🧰 Tools

🪛 PSScriptAnalyzer (1.25.0)

[warning] 6-6: The variable 'reportFile' is assigned but never used.

(PSUseDeclaredVarsMoreThanAssignments)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/common/template_operations/templates/winstress/internal_data/01_run_stress_template.ps1` at line 6, The $reportFile variable in 01_run_stress_template.ps1 is assigned but never reused, while the report path is hardcoded again later in the script. Update the logic around $reportFile so the same variable is used wherever the stress report path is needed, including the report-writing and report-reading spots in the script, and remove the duplicated literal path to keep it consistent with $stressDir.

Source: Linters/SAST tools

coderabbitai · 2026-07-02T14:14:35Z

+metadata:
+  annotations:
+    cdi.kubevirt.io/storage.deleteAfterCompletion: "false"
+  name: windows-clone-dv


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check whether another workload template defines a DV with the same name rg -n "windows-clone-dv" benchmark_runner/common/template_operations/templates

Repository: redhat-performance/benchmark-runner

Length of output: 1243

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the relevant Windows templates for uniqueness/uuid usage sed -n '1,220p' benchmark_runner/common/template_operations/templates/windows/internal_data/windows_dv_template.yaml printf '\n---\n' sed -n '1,240p' benchmark_runner/common/template_operations/templates/windows/internal_data/windows_vm_template.yaml printf '\n---\n' sed -n '1,220p' benchmark_runner/common/template_operations/templates/winstress/internal_data/windows_dv_template.yaml printf '\n---\n' sed -n '1,260p' benchmark_runner/common/template_operations/templates/winstress/internal_data/winstress_vm_template.yaml printf '\n---\n' sed -n '1,220p' benchmark_runner/common/template_operations/templates/winmssql/internal_data/windows_dv_template.yaml printf '\n---\n' sed -n '1,260p' benchmark_runner/common/template_operations/templates/winmssql/internal_data/winmssql_vm_template.yaml printf '\n---\n' sed -n '1,220p' benchmark_runner/common/template_operations/templates/winfio/internal_data/windows_dv_template.yaml printf '\n---\n' sed -n '1,260p' benchmark_runner/common/template_operations/templates/winfio/internal_data/winfio_vm_template.yaml # Search for uuid usage in Windows templates printf '\n=== uuid hits ===\n' rg -n "uuid|windows-clone-dv" benchmark_runner/common/template_operations/templates/{windows,winstress,winmssql,winfio} -g '*.yaml'

Repository: redhat-performance/benchmark-runner

Length of output: 37077

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect Windows workload data templates for URL/uuid and whether the clone DV is intended to be shared sed -n '1,220p' benchmark_runner/common/template_operations/templates/windows/windows_data_template.yaml printf '\n---\n' sed -n '1,220p' benchmark_runner/common/template_operations/templates/winstress/winstress_data_template.yaml printf '\n---\n' sed -n '1,220p' benchmark_runner/common/template_operations/templates/winmssql/winmssql_data_template.yaml printf '\n---\n' sed -n '1,220p' benchmark_runner/common/template_operations/templates/winfio/winfio_data_template.yaml printf '\n=== references to windows-clone-dv in Python code ===\n' rg -n "windows-clone-dv|dataVolumeTemplates|source:\s*$|source:\n\s*pvc:" benchmark_runner -g '*.py' -g '*.yaml' printf '\n=== any cleanup/delete logic for windows clone DV ===\n' rg -n "deleteAfterCompletion|delete.*DataVolume|DataVolume.*delete|windows-clone" benchmark_runner -g '*.py' -g '*.yaml'

Repository: redhat-performance/benchmark-runner

Length of output: 3365

🏁 Script executed:

#!/bin/bash set -euo pipefail # Check whether namespace is run-specific or shared, and whether clone DV names are reused intentionally rg -n "namespace:|name: windows-clone-dv|deleteAfterCompletion|uuid: \{\{ uuid \}\}" benchmark_runner/common/template_operations/templates -g '*.yaml' -g '*.py' -g '*.json' printf '\n=== common template data / namespace sources ===\n' sed -n '1,220p' benchmark_runner/common/template_operations/templates/common_template.yaml printf '\n---\n' rg -n "namespace" benchmark_runner/main benchmark_runner/common/template_operations -g '*.py' -g '*.yaml'

Repository: redhat-performance/benchmark-runner

Length of output: 32316

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect workload namespace grouping for the Windows workloads sed -n '120,220p' benchmark_runner/main/environment_variables.py printf '\n---\n' sed -n '430,470p' benchmark_runner/main/environment_variables.py printf '\n=== windows-related workload names ===\n' rg -n "\"(windows|winstress|winmssql|winfio)[^\"]*\"" benchmark_runner/main/environment_variables.py

Repository: redhat-performance/benchmark-runner

Length of output: 8725

Make the Windows clone DataVolume name unique

windows-clone-dv is shared by windows, winstress, winmssql, and winfio in the same benchmark-runner namespace, so concurrent Windows runs will contend for the same DataVolume object and can fail. Add a workload/run suffix such as {{ uuid }} or otherwise make this name unique.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/common/template_operations/templates/winstress/internal_data/windows_dv_template.yaml` at line 6, The DataVolume name in the Windows clone template is hardcoded and shared across multiple Windows workloads, which can cause collisions during concurrent runs. Update the name in windows_dv_template.yaml to be unique per workload or per run by incorporating a suffix such as the existing template uuid/identifier, and make sure the Windows clone path stays consistent in the template that defines the DataVolume object.

coderabbitai · 2026-07-02T14:14:35Z

+    def _create_vm(self, vm_num: str):
+        try:
+            vm_name = self._get_vm_name(vm_num)
+            self._oc.create_async(yaml=self._get_vm_yaml(vm_num))
+            self._oc.wait_for_vm_create(vm_name=vm_name)
+            logger.info(f'VM {vm_name} created, stress script running via cloud-init')
+        except Exception as err:
+            self.save_error_logs()
+            raise err


🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win

Don't call save_error_logs() from per-VM step functions run via _run_parallel_phases.

_create_vm, _collect_results, and _delete_vm are dispatched as parallel phases (likely multiprocessing child processes, same pattern as BootstormVM). Calling self.save_error_logs() from within these child-process functions risks the documented OpenSSL/SIGSEGV crash across fork boundaries. run()'s own except block (lines 170-172) already handles failure logging after phases complete, in the main process — the per-step calls are redundant and unsafe.

Based on learnings, "avoid calling _save_error_logs() from within multiprocessing child processes... this can trigger the documented SSL/SIGSEGV multiprocessing crash," and the recommended pattern is to do all HTTPS work in the main process after child processes complete or entirely within child processes consistently — not scattered ad hoc in both.

🔧 Proposed fix: drop per-step error-log calls, let `run()` handle it

def _create_vm(self, vm_num: str): try: vm_name = self._get_vm_name(vm_num) self._oc.create_async(yaml=self._get_vm_yaml(vm_num)) self._oc.wait_for_vm_create(vm_name=vm_name) logger.info(f'VM {vm_name} created, stress script running via cloud-init') except Exception as err: - self.save_error_logs() raise err

except Exception as err: - self.save_error_logs() raise err def _delete_vm(self, vm_num: str): try: vm_name = self._get_vm_name(vm_num) self._oc.delete_vm_sync(yaml=self._get_vm_yaml(vm_num), vm_name=vm_name) except Exception as err: - self.save_error_logs() raise err

Also applies to: 103-105, 107-113

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/workloads/linstress_vm.py` around lines 35 - 43, _remove the per-VM error-log call from the multiprocessing phase helpers and let the top-level run() failure handling own it. In _create_vm, _collect_results, and _delete_vm, avoid calling self.save_error_logs() inside the child-process path; instead, just propagate the exception and keep failure logging centralized in run()’s except block, which already runs in the main process. Use the existing _run_parallel_phases flow and the BootstormVM-style pattern as the reference for locating these step functions._

Source: Learnings

coderabbitai · 2026-07-02T14:14:35Z

+            for elapsed in range(0, self._timeout, OC.DELAY):
+                check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd,
+                                                  namespace=self.__namespace, key_path=self.__ssh_key_path,
+                                                  username=self.__username, timeout=60)
+                if check and 'found' in check:
+                    break
+                if elapsed > 0 and elapsed % 300 == 0:
+                    logger.info(f'Waiting for stress_report.json on VM {vm_name}... ({elapsed}s)')
+                time.sleep(OC.DELAY)


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Polling loop can overrun the declared timeout.

Each iteration performs an SSH call (up to 60s) and then sleeps OC.DELAY, but the loop counter advances only by OC.DELAY per iteration — so if SSH calls are slow, actual wall-clock wait can far exceed self._timeout. As per path instructions, timeout handling for workloads should be checked carefully; here the bound is effectively iterations * (ssh_call_time + OC.DELAY) rather than self._timeout.

⏱️ Proposed fix: use wall-clock deadline instead of loop-counter

- for elapsed in range(0, self._timeout, OC.DELAY): - check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd, - namespace=self.__namespace, key_path=self.__ssh_key_path, - username=self.__username, timeout=60) - if check and 'found' in check: - break - if elapsed > 0 and elapsed % 300 == 0: - logger.info(f'Waiting for stress_report.json on VM {vm_name}... ({elapsed}s)') - time.sleep(OC.DELAY) + deadline = time.time() + self._timeout + while time.time() < deadline: + check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd, + namespace=self.__namespace, key_path=self.__ssh_key_path, + username=self.__username, timeout=60) + if check and 'found' in check: + break + logger.info(f'Waiting for stress_report.json on VM {vm_name}...') + time.sleep(OC.DELAY)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for elapsed in range(0, self._timeout, OC.DELAY):

check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd,

namespace=self.__namespace, key_path=self.__ssh_key_path,

username=self.__username, timeout=60)

if check and 'found' in check:

break

if elapsed > 0 and elapsed % 300 == 0:

logger.info(f'Waiting for stress_report.json on VM {vm_name}... ({elapsed}s)')

time.sleep(OC.DELAY)

deadline = time.time() + self._timeout

while time.time() < deadline:

check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd,

namespace=self.__namespace, key_path=self.__ssh_key_path,

username=self.__username, timeout=60)

if check and 'found' in check:

break

logger.info(f'Waiting for stress_report.json on VM {vm_name}...')

time.sleep(OC.DELAY)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/workloads/linstress_vm.py` around lines 54 - 62, The polling loop in linstress_vm is using a range counter with OC.DELAY, but the virtctl_ssh call can take much longer than that, so the real wait can exceed self._timeout. Update the timeout logic in the stress report check loop to use a wall-clock deadline or elapsed-time tracking around virtctl_ssh and time.sleep, so the loop exits once the overall timeout is reached. Keep the change localized to the polling block that calls self._virtctl.virtctl_ssh and logs the “Waiting for stress_report.json” message.

Source: Path instructions

coderabbitai · 2026-07-02T14:14:35Z

+            check_cmd = f'powershell -Command "if (Test-Path \'{self.__remote_dir}/stress_report.json\') {{ echo found }}"'
+            for elapsed in range(0, self._timeout, OC.DELAY):
+                check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd,
+                                                  namespace=self.__namespace, key_path=self.__ssh_key_path,
+                                                  username=self.__username, timeout=60)
+                if check and 'found' in check:
+                    break
+                logger.info(f'Waiting for stress_report.json on VM {vm_name}... ({elapsed}s)')
+                time.sleep(OC.DELAY)


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Wait loop can exceed the intended self._timeout budget.

The loop iterates self._timeout // OC.DELAY times, but each iteration also issues a virtctl_ssh check with its own independent timeout=60 plus time.sleep(OC.DELAY). If the SSH check is slow/hanging, real elapsed time is iterations * (up_to_60 + OC.DELAY), which can far exceed self._timeout — the loop bound only counts iterations, not actual elapsed wall-clock time.

🕒 Proposed fix using actual elapsed time

- check_cmd = f'powershell -Command "if (Test-Path \'{self.__remote_dir}/stress_report.json\') {{ echo found }}"' - for elapsed in range(0, self._timeout, OC.DELAY): + check_cmd = f'powershell -Command "if (Test-Path \'{self.__remote_dir}/stress_report.json\') {{ echo found }}"' + deadline = time.time() + self._timeout + while time.time() < deadline: check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd, namespace=self.__namespace, key_path=self.__ssh_key_path, username=self.__username, timeout=60) if check and 'found' in check: break - logger.info(f'Waiting for stress_report.json on VM {vm_name}... ({elapsed}s)') + logger.info(f'Waiting for stress_report.json on VM {vm_name}...') time.sleep(OC.DELAY)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

check_cmd = f'powershell -Command "if (Test-Path \'{self.__remote_dir}/stress_report.json\') {{ echo found }}"'

for elapsed in range(0, self._timeout, OC.DELAY):

check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd,

namespace=self.__namespace, key_path=self.__ssh_key_path,

username=self.__username, timeout=60)

if check and 'found' in check:

break

logger.info(f'Waiting for stress_report.json on VM {vm_name}... ({elapsed}s)')

time.sleep(OC.DELAY)

check_cmd = f'powershell -Command "if (Test-Path \'{self.__remote_dir}/stress_report.json\') {{ echo found }}"'

deadline = time.time() + self._timeout

while time.time() < deadline:

check = self._virtctl.virtctl_ssh(vm_name=vm_name, command=check_cmd,

namespace=self.__namespace, key_path=self.__ssh_key_path,

username=self.__username, timeout=60)

if check and 'found' in check:

break

logger.info(f'Waiting for stress_report.json on VM {vm_name}...')

time.sleep(OC.DELAY)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmark_runner/workloads/winstress_vm.py` around lines 98 - 106, The wait loop in winstress_vm’s stress_report.json polling uses a fixed iteration count instead of real elapsed time, so the combined virtctl_ssh timeout and sleep can exceed the intended self._timeout budget. Update the polling in the method that builds check_cmd and calls virtctl.virtctl_ssh to track wall-clock elapsed time (or a deadline) and stop once the total time reaches self._timeout, keeping the per-check SSH timeout bounded as well. Ensure the logger message and break condition still reflect the same stress_report.json check logic.

Source: Path instructions

coderabbitai · 2026-07-02T14:14:35Z

+    #cloud-config
+    user: fedora
+    password: fedora
+    chpasswd: { expire: False }
+    ssh_pwauth: true


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Hardcoded weak default credentials with password SSH auth enabled.

Cloud-init sets user: fedora / password: fedora with ssh_pwauth: true, enabling password-based SSH login with a well-known default credential pair. Recommend disabling ssh_pwauth in favor of injecting an SSH public key (as done for the Windows workload), or at minimum generating a random per-run password rather than a fixed literal.

As per coding guidelines, benchmark_runner path instructions state to "Flag any hardcoded credentials or secrets."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml` around lines 8 - 12, The cloud-init settings in the linstress VM template use a hardcoded `user`/`password` pair with `ssh_pwauth` enabled, which should be removed from this benchmark fixture. Update the `linstress_vm.yaml` template to stop using fixed credentials in this cloud-config block and instead follow the same safer approach used by the Windows workload (prefer SSH public key injection and disable password auth), or at minimum generate a per-run random password rather than a literal default.

Source: Path instructions

coderabbitai · 2026-07-02T14:14:35Z

+    #cloud-config
+    user: fedora
+    password: fedora
+    chpasswd: { expire: False }
+    ssh_pwauth: true


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Hardcoded weak credentials + password SSH auth enabled.

The generated cloud-init sets user: fedora, password: fedora, and ssh_pwauth: true. This bakes a trivially guessable password into every VM instance with SSH password authentication enabled, exposing the VM to unauthorized access on the pod network. Since this is a rendered golden file, the root cause is the linstress data/cloud-init template — recommend switching to key-based auth only (disable ssh_pwauth) or generating a random per-run password.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_linstress_vm_ODF_PVC_True/linstress_vm.yaml` around lines 8 - 12, The cloud-init rendered by the linstress VM template is using hardcoded weak credentials and enabling SSH password login. Update the linstress cloud-init/template that produces linstress_vm.yaml so it no longer sets a fixed user/password pair and does not enable ssh_pwauth; switch the template to key-based auth only or generate a unique per-run password. Use the cloud-init generation path associated with linstress_vm and the template/data source that fills user, password, and ssh_pwauth to locate the fix.

Source: Path instructions

coderabbitai · 2026-07-02T14:14:35Z

+stringData:
+  userdata: |
+    #cloud-config
+    user: fedora
+    password: fedora
+    chpasswd: { expire: False }
+    ssh_pwauth: true


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Locate linstress-related templates and workload code git ls-files | rg 'linstress|Linstress|lin_stress' echo '--- template files ---' git ls-files | rg 'benchmark_runner/common/template_operations/templates/.+linstress|benchmark_runner/common/template_operations/templates/.+lin_stress' echo '--- workload files ---' git ls-files | rg 'benchmark_runner/workloads/.+linstress|benchmark_runner/workloads/.+lin_stress' echo '--- tests for golden files ---' git ls-files | rg 'tests/.+linstress|tests/.+golden_files/.+linstress'

Repository: redhat-performance/benchmark-runner

Length of output: 3288

🏁 Script executed:

#!/bin/bash set -euo pipefail echo '--- linstress_vm_template.yaml ---' wc -l benchmark_runner/common/template_operations/templates/linstress/internal_data/linstress_vm_template.yaml sed -n '1,220p' benchmark_runner/common/template_operations/templates/linstress/internal_data/linstress_vm_template.yaml | cat -n echo '--- linstress_data_template.yaml ---' wc -l benchmark_runner/common/template_operations/templates/linstress/linstress_data_template.yaml sed -n '1,220p' benchmark_runner/common/template_operations/templates/linstress/linstress_data_template.yaml | cat -n echo '--- linstress_vm.py ---' wc -l benchmark_runner/workloads/linstress_vm.py sed -n '1,260p' benchmark_runner/workloads/linstress_vm.py | cat -n

Repository: redhat-performance/benchmark-runner

Length of output: 19449

Hardcoded VM credentials and password SSH auth
benchmark_runner/common/template_operations/templates/linstress/internal_data/linstress_vm_template.yaml:8-13 hardcodes fedora/fedora, leaves the password non-expiring, and enables ssh_pwauth. If the VM only needs the SSH key path used by linstress_vm.py, drop password login or make these values templated.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_linstress_vm_ODF_PVC_False/linstress_vm.yaml` around lines 6 - 12, The VM userdata in the linstress template hardcodes the fedora account password, keeps it non-expiring, and enables password SSH auth. Update the cloud-config in the linstress VM template so it relies on the SSH key flow used by linstress_vm.py, and remove or template the password-related fields and ssh_pwauth setting instead of shipping fixed credentials.

Source: Path instructions

coderabbitai · 2026-07-02T14:14:35Z

+    user: fedora
+    password: fedora
+    chpasswd: { expire: False }
+    ssh_pwauth: true


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Hardcoded weak default credentials for a network-exposed VM.

The cloud-init template sets user: fedora / password: fedora and enables ssh_pwauth: true, meaning every linstress_vm deployment boots with a publicly-known, static SSH password. Since this VM has SSH exposed via a masquerade interface, this is a real attack surface, especially since these test clusters can be long-lived.

Recommend generating a random password per-run (like the deadbeef suffix used for names) and injecting it via the data template, or switching to SSH-key-based auth instead of password auth.

As per path instructions, "Flag any hardcoded credentials or secrets" for benchmark_runner/**.

🔒 Suggested direction

- user: fedora - password: fedora - chpasswd: { expire: False } - ssh_pwauth: true + user: fedora + ssh_pwauth: false + ssh_authorized_keys: + - "{{ linstress_ssh_public_key }}"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/benchmark_runner/common/template_operations/golden_files/release_linstress_vm_ODF_PVC_False/linstress_vm.yaml` around lines 9 - 12, The cloud-init template for the linstress VM is using hardcoded weak SSH credentials and password auth, which should be removed for benchmark_runner artifacts. Update the template behind the linstress_vm.yaml asset so the user/password values are not static (for example, inject a per-run secret through the data template or switch the VM to SSH key auth), and keep the relevant cloud-init fields such as ssh_pwauth and chpasswd aligned with that change.

Source: Path instructions

openshift-ci Bot requested a review from RobertKrawitz July 2, 2026 12:10

openshift-ci Bot added the approved label Jul 2, 2026

ebattat self-assigned this Jul 2, 2026

ebattat requested a review from rishubhjain July 2, 2026 12:12

ebattat and others added 2 commits July 2, 2026 15:38

Add stress_duration to WORKLOAD_CONFIG example comment

ee29469

Assisted-by: Claude Code Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update WORKLOAD_CONFIG example and remove perf_ci stress overrides fr…

dfbc898

…om linstress Assisted-by: Claude Code Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Jul 2, 2026

View reviewed changes

Uh oh!

Conversation

ebattat commented Jul 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ebattat commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ebattat commented Jul 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

ebattat commented Jul 2, 2026 •

edited

Loading

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading