ci: improve debuggability of Ceph and Rook failures#5773
Draft
nixpanic wants to merge 7 commits into
Draft
Conversation
Member
Author
|
/test ci/centos/mini-e2e-helm/k8s-1.34/cephfs |
Member
Author
|
/test ci/centos/mini-e2e-operator/k8s-1.34/cephfs |
2 similar comments
Member
Author
|
/test ci/centos/mini-e2e-operator/k8s-1.34/cephfs |
Member
Author
|
/test ci/centos/mini-e2e-operator/k8s-1.34/cephfs |
Member
Author
|
/test ci/centos/mini-e2e-helm/k8s-1.34/cephfs |
5ad687c to
5491a3b
Compare
Member
Author
|
/test ci/centos/mini-e2e-helm/k8s-1.34/cephfs |
5491a3b to
d18b3a1
Compare
Member
Author
|
/test ci/centos/mini-e2e-helm/k8s-1.34/cephfs |
d18b3a1 to
93f0939
Compare
Member
Author
|
/test ci/centos/mini-e2e-helm/k8s-1.34/cephfs |
1 similar comment
Member
Author
|
/test ci/centos/mini-e2e-helm/k8s-1.34/cephfs |
Member
Author
Member
Author
|
/test ci/centos/mini-e2e-helm/k8s-1.34/cephfs |
Member
Author
|
Successful run with Squid (added forced test failure to collect the logs): |
d914640 to
93f0939
Compare
Member
Author
|
/test ci/centos/mini-e2e-helm/k8s-1.34/cephfs |
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Hopefully this helps identifying the failure when Ceph Tentacle is deployed. Signed-off-by: Niels de Vos <ndevos@ibm.com>
When Rook fails deploying OSDs, the logs of the ceph-osd-prepare Pods can contain useful pointers to the issue. Signed-off-by: Niels de Vos <ndevos@ibm.com>
Some jobs fail with a rather unclear message: > failed to check data persist in pvc: data not persistent expected data > checking data persist received data ...unreadable By logging the written and read bytes instead of the strings, things should become easier to understand where it goes wrong. At least the `echo` command should not append a linebreak (-n option). Without the linebreak, both strings should be comparable directly. Signed-off-by: Niels de Vos <ndevos@ibm.com>
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Add CEPH_DEBUG to build.env to enabled MDS and OSD debugging on demand. Assisted-by: AskBob <askbob@ibm.com> Signed-off-by: Niels de Vos <ndevos@ibm.com>
93f0939 to
38a90a3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The CephFS test
check data persist after recreating podhas been disabled when Ceph-CSI is deployed with Helm. This PR is used for troubleshooting the issue and re-enabling the test.Because enabling debugging information in Ceph and Rook isn't trivial while running the e2e suite,
CEPH_DEBUGhas been introduced so that future debugging sessions should become easier.See-also: https://tracker.ceph.com/issues/73997
Depends-on: #5672
Show available bot commands
These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:
/retest ci/centos/<job-name>: retest the<job-name>after unrelatedfailure (please report the failure too!)