Skip to content

[integ-tests-framework] Fix SSH banner timeout when using bastion gateway in RemoteCommandExecutor#7338

Merged
hanwen-cluster merged 2 commits intoaws:developfrom
hanwen-cluster:developapr15
Apr 15, 2026
Merged

[integ-tests-framework] Fix SSH banner timeout when using bastion gateway in RemoteCommandExecutor#7338
hanwen-cluster merged 2 commits intoaws:developfrom
hanwen-cluster:developapr15

Conversation

@hanwen-cluster
Copy link
Copy Markdown
Contributor

@hanwen-cluster hanwen-cluster commented Apr 15, 2026

Description of changes

The SSH gateway command used for bastion connections was missing -o StrictHostKeyChecking=no and -i {cluster.ssh_key}. When the bastion host key was not in known_hosts, the gateway ssh process would prompt interactively for host key verification. Since paramiko runs this process non-interactively, the prompt blocks stdin indefinitely, preventing the SSH banner from reaching paramiko and causing a paramiko.ssh_exception.SSHException: Error reading SSH protocol banner after the 60-second banner timeout.

This only manifested in environments where ~/.ssh/config did not already set StrictHostKeyChecking no globally. Environments that had this configured (or had the bastion host already in known_hosts) were unaffected.

Tests

The following tests have passed:

{%- import 'common.jinja2' as common with context -%}
{{- common.OSS_COMMERCIAL_X86.append("rocky8") or "" -}}
{{- common.OSS_COMMERCIAL_X86.append("rocky9") or "" -}}
---
test-suites:
  schedulers:
    test_slurm_accounting.py::test_slurm_accounting:
      dimensions:
        - regions: ["ap-south-1"]
          instances:  {{ common.INSTANCES_DEFAULT_X86 }}
          oss: ["alinux2023"]
          schedulers: ["slurm"]
    test_slurm_accounting.py::test_slurm_accounting_external_dbd:
      dimensions:
        - regions: [ "ap-south-1" ]
          instances: {{ common.INSTANCES_DEFAULT_X86 }}
          oss: ["ubuntu2404"]
          schedulers: ["slurm"]

Checklist

  • Make sure you are pointing to the right branch.
  • If you're creating a patch for a branch other than develop add the branch name as prefix in the PR title (e.g. [release-3.6]).
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…eway in RemoteCommandExecutor

The SSH gateway command used for bastion connections was missing -o StrictHostKeyChecking=no and -i {cluster.ssh_key}. When the bastion host key was not in known_hosts, the gateway ssh process would prompt interactively for host key verification. Since paramiko runs this process non-interactively, the prompt blocks stdin indefinitely, preventing the SSH banner from reaching paramiko and causing a paramiko.ssh_exception.SSHException: Error reading SSH protocol banner after the 60-second banner timeout.

This only manifested in environments where ~/.ssh/config did not already set StrictHostKeyChecking no globally. Environments that had this configured (or had the bastion host already in known_hosts) were unaffected.
@hanwen-cluster hanwen-cluster requested review from a team as code owners April 15, 2026 18:39
@hanwen-cluster hanwen-cluster added the skip-changelog-update Disables the check that enforces changelog updates in PRs label Apr 15, 2026
@hanwen-cluster hanwen-cluster merged commit b745bb8 into aws:develop Apr 15, 2026
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-changelog-update Disables the check that enforces changelog updates in PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants