Skip to content

ci: improve debuggability of Ceph and Rook failures#5773

Draft
nixpanic wants to merge 7 commits into
ceph:develfrom
nixpanic:testing/ceph-tentacle-5772
Draft

ci: improve debuggability of Ceph and Rook failures#5773
nixpanic wants to merge 7 commits into
ceph:develfrom
nixpanic:testing/ceph-tentacle-5772

Conversation

@nixpanic

@nixpanic nixpanic commented Nov 18, 2025

Copy link
Copy Markdown
Member

The CephFS test check data persist after recreating pod has been disabled when Ceph-CSI is deployed with Helm. This PR is used for troubleshooting the issue and re-enabling the test.

Because enabling debugging information in Ceph and Rook isn't trivial while running the e2e suite, CEPH_DEBUG has been introduced so that future debugging sessions should become easier.

See-also: https://tracker.ceph.com/issues/73997
Depends-on: #5672


Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@nixpanic

Copy link
Copy Markdown
Member Author

/test ci/centos/mini-e2e-helm/k8s-1.34/cephfs

@mergify mergify Bot added the component/testing Additional test cases or CI work label Nov 18, 2025
@nixpanic

Copy link
Copy Markdown
Member Author

/test ci/centos/mini-e2e-operator/k8s-1.34/cephfs

2 similar comments
@nixpanic

Copy link
Copy Markdown
Member Author

/test ci/centos/mini-e2e-operator/k8s-1.34/cephfs

@nixpanic

Copy link
Copy Markdown
Member Author

/test ci/centos/mini-e2e-operator/k8s-1.34/cephfs

@nixpanic

Copy link
Copy Markdown
Member Author

/test ci/centos/mini-e2e-helm/k8s-1.34/cephfs

@nixpanic nixpanic added dependency/ceph depends on core Ceph functionality keepalive This label can be used to disable stale bot activiity in the repo labels Nov 26, 2025
@nixpanic nixpanic force-pushed the testing/ceph-tentacle-5772 branch from 5ad687c to 5491a3b Compare November 27, 2025 13:47
@nixpanic

Copy link
Copy Markdown
Member Author

/test ci/centos/mini-e2e-helm/k8s-1.34/cephfs

@nixpanic nixpanic force-pushed the testing/ceph-tentacle-5772 branch from 5491a3b to d18b3a1 Compare November 27, 2025 14:13
@nixpanic

Copy link
Copy Markdown
Member Author

/test ci/centos/mini-e2e-helm/k8s-1.34/cephfs

@nixpanic nixpanic force-pushed the testing/ceph-tentacle-5772 branch from d18b3a1 to 93f0939 Compare November 27, 2025 15:18
@nixpanic

Copy link
Copy Markdown
Member Author

/test ci/centos/mini-e2e-helm/k8s-1.34/cephfs

1 similar comment
@nixpanic

Copy link
Copy Markdown
Member Author

/test ci/centos/mini-e2e-helm/k8s-1.34/cephfs

@nixpanic nixpanic added the ci/skip/multi-arch-build skip building on multiple architectures label Nov 27, 2025
@nixpanic

Copy link
Copy Markdown
Member Author

/test ci/centos/mini-e2e-helm/k8s-1.34/cephfs

@nixpanic

Copy link
Copy Markdown
Member Author

Successful run with Squid (added forced test failure to collect the logs):
https://jenkins-ceph-csi.apps.ocp.cloud.ci.centos.org/job/mini-e2e-helm_k8s-1.34-cephfs/9/display/redirect

@nixpanic nixpanic force-pushed the testing/ceph-tentacle-5772 branch from d914640 to 93f0939 Compare December 11, 2025 11:42
@nixpanic

Copy link
Copy Markdown
Member Author

/test ci/centos/mini-e2e-helm/k8s-1.34/cephfs

nixpanic added 7 commits April 7, 2026 10:32
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Hopefully this helps identifying the failure when Ceph Tentacle is
deployed.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
When Rook fails deploying OSDs, the logs of the ceph-osd-prepare Pods
can contain useful pointers to the issue.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
Some jobs fail with a rather unclear message:

> failed to check data persist in pvc: data not persistent expected data
> checking data persist received data ...unreadable

By logging the written and read bytes instead of the strings, things
should become easier to understand where it goes wrong.

At least the `echo` command should not append a linebreak (-n option).
Without the linebreak, both strings should be comparable directly.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Add CEPH_DEBUG to build.env to enabled MDS and OSD debugging on demand.

Assisted-by: AskBob <askbob@ibm.com>
Signed-off-by: Niels de Vos <ndevos@ibm.com>
@nixpanic nixpanic force-pushed the testing/ceph-tentacle-5772 branch from 93f0939 to 38a90a3 Compare April 7, 2026 11:30
@nixpanic nixpanic changed the title e2e: re-enable CephFS data validation test on Helm ci: improve debuggability of Ceph and Rook failures Apr 7, 2026
@mergify mergify Bot added the bug Something isn't working label Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ci/skip/multi-arch-build skip building on multiple architectures component/testing Additional test cases or CI work dependency/ceph depends on core Ceph functionality keepalive This label can be used to disable stale bot activiity in the repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant