Skip to content

fix: support containerd v2 sandbox config and ignore pause image pull…#44

Merged
chinameok merged 2 commits intomasterfrom
fix/containerd-sandbox-compat
Apr 16, 2026
Merged

fix: support containerd v2 sandbox config and ignore pause image pull…#44
chinameok merged 2 commits intomasterfrom
fix/containerd-sandbox-compat

Conversation

@jiazhiguang
Copy link
Copy Markdown

@jiazhiguang jiazhiguang commented Apr 11, 2026

… errors

Containerd v2 renamed sandbox_image to sandbox in its config. The sed command now replaces both keys so the pause image is set correctly regardless of the containerd version. Also adds ignorePreflightErrors: [ImagePull] to init/join nodeRegistration so kubeadm does not fail when the built-in pause tag differs from the registry.

Summary by CodeRabbit

  • Documentation
    • Replaced static allocation pool guidance with a machine config pool workflow for control-plane and worker nodes.
    • Introduced primary vs. additional NIC schema for per-slot networking and updated add/expand/revert rules.
    • Updated bootstrap guidance to deploy a local-image loading step that imports pre-exported container images and restarts the container runtime before kubeadm.
    • Added kubelet ignorePreflightErrors for ImagePull and updated readiness/troubleshooting to reference machine config pool status.
  • Chores
    • Added kubelet abbreviation to spellchecker.

… errors

Containerd v2 renamed `sandbox_image` to `sandbox` in its config. The
sed command now replaces both keys so the pause image is set correctly
regardless of the containerd version. Also adds `ignorePreflightErrors:
[ImagePull]` to init/join nodeRegistration so kubeadm does not fail when
the built-in pause tag differs from the registry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 11, 2026

Warning

Rate limit exceeded

@jiazhiguang has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 40 minutes and 20 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 40 minutes and 20 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1c2ec15c-4cf6-4e93-955c-c8af83098c88

📥 Commits

Reviewing files that changed from the base of the PR and between 6dcbee4 and e4a4897.

📒 Files selected for processing (7)
  • .cspell/abbreviations.txt
  • docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx
  • docs/en/create-cluster/vmware-vsphere/extension-scenarios.mdx
  • docs/en/create-cluster/vmware-vsphere/index.mdx
  • docs/en/create-cluster/vmware-vsphere/parameter-checklist.mdx
  • docs/en/manage-nodes/vmware-vsphere.mdx
  • docs/en/upgrade-cluster/vmware-vsphere.mdx

Walkthrough

Replaced VSphereResourcePool-based static allocation pool docs/manifests with VSphereMachineConfigPool equivalents, changed node-slot network schema from resources[]configs[] and networknetwork.primary/network.additional, switched resourcePoolRefmachineConfigPoolRef, introduced /usr/local/bin/capv-load-local-images.sh for bootstrap image loading (replacing inline sed), and added kubeadm ignorePreflightErrors: [ImagePull].

Changes

Cohort / File(s) Summary
Create-cluster manifest & bootstrap
docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx
Replaced VSphereResourcePool usage with VSphereMachineConfigPool; migrated node-slot schema (resources[]configs[], networknetwork.primary/network.additional); changed resourcePoolRefmachineConfigPoolRef (kind updated); removed inline sed containerd tweak and added /usr/local/bin/capv-load-local-images.sh via template files plus preKubeadmCommands to import /root/images/*.tar into containerd and restart containerd; added ignorePreflightErrors: [ImagePull] to kubeadm nodeRegistration.
vSphere scenarios, parameters & management docs
docs/en/create-cluster/vmware-vsphere/extension-scenarios.mdx, docs/en/create-cluster/vmware-vsphere/parameter-checklist.mdx, docs/en/create-cluster/vmware-vsphere/index.mdx, docs/en/manage-nodes/vmware-vsphere.mdx, docs/en/upgrade-cluster/vmware-vsphere.mdx
Renamed CAPV “static allocation pool” to “machine config pool”; updated examples/workflows to use VSphereMachineConfigPool.spec.configs[] and network.primary/network.additional; adjusted NIC add/remove, data-disk extension, worker scale-out, and troubleshooting wording to reference MachineConfigPoolReady and machine config pool slot semantics.
Misc
.cspell/abbreviations.txt
Added kubelet to abbreviations.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Template as VM Template (with files + preKubeadm)
    participant VM as Provisioned VM
    participant Containerd as containerd
    participant FileSys as /root/images/*.tar
    participant Kubeadm as kubeadm/kubelet

    Note over Template,VM: Bootstrap script injected at /usr/local/bin/capv-load-local-images.sh (via Template.files)
    Template->>VM: boot + run preKubeadmCommands (invoke capv-load-local-images.sh)
    VM->>FileSys: wait for /root/images/*.tar presence
    FileSys-->>VM: .tar files present
    VM->>Containerd: restart containerd (ensure /var/lib/containerd mounted)
    VM->>Containerd: run `ctr` import on /root/images/*.tar
    Containerd-->>VM: images available
    VM->>Kubeadm: proceed with kubeadm init/join (kubelet starts with ignorePreflightErrors: [ImagePull])
    Kubeadm-->>VM: cluster bootstrap completes
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • wgkingk

Poem

🐰 I hop through docs where pools rearranged,
configs split and NICs exchanged.
A script naps in files, then wakes containerd,
tucks images in, so kubeadm runs unflawed.
Hop—clusters bloom, my fluffy tail wagged! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The PR title addresses containerd v2 sandbox config and pause image pull handling, which are core topics in the changeset. However, the title does not reflect the large-scale migration from VSphereResourcePool to VSphereMachineConfigPool that dominates the file changes across five documentation files. Clarify whether the main change is the containerd v2 compatibility fix or the resource pool migration. If both are equally important, consider a title that captures the full scope of changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/containerd-sandbox-compat

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Apr 11, 2026

Deploying alauda-immutable-infra with  Cloudflare Pages  Cloudflare Pages

Latest commit: e4a4897
Status: ✅  Deploy successful!
Preview URL: https://746ee86a.alauda-immutable-infra.pages.dev
Branch Preview URL: https://fix-containerd-sandbox-compa.alauda-immutable-infra.pages.dev

View logs

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx`:
- Around line 877-885: The bootstrap wait loops in the capv-load-local-images.sh
blocks (the mountpoint check for /var/lib/containerd and the systemctl is-active
--quiet containerd loop) can block forever; add bounded timeouts by introducing
a max retry/count or timeout variable (e.g., MAX_WAIT_SECONDS or MAX_RETRIES),
incrementing a counter each sleep iteration, and breaking with a clear error log
and non-zero exit when the limit is reached; apply the same change to both the
mountpoint loop and the systemctl loop (and replicate for the other similar
blocks), optionally allowing the timeout to be configurable via an environment
variable for flexibility.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c761246b-dce4-4eee-b725-7198544dd5fd

📥 Commits

Reviewing files that changed from the base of the PR and between afc8292 and 96b80d2.

📒 Files selected for processing (6)
  • docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx
  • docs/en/create-cluster/vmware-vsphere/extension-scenarios.mdx
  • docs/en/create-cluster/vmware-vsphere/index.mdx
  • docs/en/create-cluster/vmware-vsphere/parameter-checklist.mdx
  • docs/en/manage-nodes/vmware-vsphere.mdx
  • docs/en/upgrade-cluster/vmware-vsphere.mdx
✅ Files skipped from review due to trivial changes (2)
  • docs/en/upgrade-cluster/vmware-vsphere.mdx
  • docs/en/create-cluster/vmware-vsphere/index.mdx

Comment thread docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx
@jiazhiguang jiazhiguang force-pushed the fix/containerd-sandbox-compat branch from 96b80d2 to 94bb276 Compare April 14, 2026 09:25
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx (1)

87-89: Remove duplicated baseline bullets to avoid future drift.

Line 87 and Line 88 repeat the same baseline points already listed at Line 68 and Line 69. Keeping a single list improves maintainability.

✂️ Suggested cleanup
-In the baseline workflow:
-
-- One `VSphereMachineConfigPool` is used for control plane nodes.
-- One `VSphereMachineConfigPool` is used for worker nodes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx` around
lines 87 - 89, Remove the duplicated baseline bullet points that repeat "One
`VSphereMachineConfigPool` is used for control plane nodes." and "One
`VSphereMachineConfigPool` is used for worker nodes." — keep only the original
occurrences (the first list around the baseline section) and delete the repeated
bullets later in the file so the `VSphereMachineConfigPool` statements appear
only once to prevent future drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx`:
- Around line 87-89: Remove the duplicated baseline bullet points that repeat
"One `VSphereMachineConfigPool` is used for control plane nodes." and "One
`VSphereMachineConfigPool` is used for worker nodes." — keep only the original
occurrences (the first list around the baseline section) and delete the repeated
bullets later in the file so the `VSphereMachineConfigPool` statements appear
only once to prevent future drift.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0d223ca5-3472-4fa7-9217-f9e1e665b212

📥 Commits

Reviewing files that changed from the base of the PR and between 96b80d2 and 94bb276.

📒 Files selected for processing (7)
  • .cspell/abbreviations.txt
  • docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx
  • docs/en/create-cluster/vmware-vsphere/extension-scenarios.mdx
  • docs/en/create-cluster/vmware-vsphere/index.mdx
  • docs/en/create-cluster/vmware-vsphere/parameter-checklist.mdx
  • docs/en/manage-nodes/vmware-vsphere.mdx
  • docs/en/upgrade-cluster/vmware-vsphere.mdx
✅ Files skipped from review due to trivial changes (5)
  • .cspell/abbreviations.txt
  • docs/en/upgrade-cluster/vmware-vsphere.mdx
  • docs/en/create-cluster/vmware-vsphere/index.mdx
  • docs/en/manage-nodes/vmware-vsphere.mdx
  • docs/en/create-cluster/vmware-vsphere/parameter-checklist.mdx

@chinameok
Copy link
Copy Markdown
Collaborator

PR 把“明确告诉 containerd 用哪个 pause 镜像”的做法删了,换成“从 /root/images/*.tar 导入本地镜像”。但新文档没有说明 /root/images/*.tar 从哪里来,也没有保证导入后的镜像名字就是 containerd 真正要用的名字,所以部署时还是可能拉不到 pause 镜像。

举个例子:

tar 里导入的镜像名:
registry.example.local/tkestack/pause:3.10

containerd 默认要找的镜像名:
registry.k8s.io/pause:3.10

这两个名字对 containerd 来说是两个不同镜像。即使底层镜像内容一样,只要名字不匹配,containerd 还是会认为 registry.k8s.io/pause:3.10 不存在,然后尝试去外网拉,离线或私有 registry 环境就会失败。

ignorePreflightErrors: [ImagePull] 也不是根治。它只是让 kubeadm 的预检查阶段不要因为镜像拉取失败而中止,但后面 kubelet/containerd 创建 Pod sandbox 时仍然需要 pause 镜像。如果 pause 镜像不存在或名字不匹配,Pod 还是起不来。

建议补一个明确方案:

  • 恢复原来的 containerd sandbox image 配置,把 pause 镜像明确设成 <image_registry>/tkestack/pause:<pause_image_tag>
  • 或者保留本地 tar 导入,但文档必须说明 /root/images/*.tar 如何准备,并且脚本要把导入的 pause 镜像 tag 成 containerd 实际要找的名字。
  • 或者证明 VM 模板里已经内置了正确名字的 pause 镜像,并把这个前置条件写进文档。

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx`:
- Around line 872-903: The capv-load-local-images.sh import loop should fail
fast: ensure the script treats "no tar files found" and any failed ctr import as
fatal errors. Enable safe glob handling (e.g., nullglob) or explicitly check for
matching files before the for-loop that iterates /root/images/*.tar, increment
image_count on each successful ctr -n k8s.io images import, and immediately exit
non‑zero if any ctr import fails; after the loop, if image_count is zero, print
an explanatory message and exit non‑zero so bootstrap fails early rather than
proceeding with zero or partially imported images.
- Line 99: Update the docs to explicitly state which sandbox/pause image
reference(s) must exist in containerd after importing /root/images/*.tar (or
require a retag step); reference the import script capv-load-local-images.sh and
the containerd setting [plugins."io.containerd.grpc.v1.cri"].sandbox_image so
readers know to either import the exact image name expected by kubelet (e.g.,
the configured sandbox image reference) or run a mandatory retag command after
import to match containerd’s sandbox_image; also mention that
ignorePreflightErrors: [ImagePull] does not bypass kubelet CRI image resolution
at runtime.
- Around line 953-955: Update the documentation around the kubeadm config block
that uses ignorePreflightErrors: [ImagePull] to explicitly state that this is
safe only because images are preloaded by the capv-load-local-images.sh script;
mention the script name and that it runs prior to kubeadm bootstrap so ImagePull
checks can be ignored, and add a warning advising not to remove or reorder the
script without removing the ignorePreflightErrors entry. Edit the three affected
sections (the kubeadm config blocks near lines referencing ignorePreflightErrors
and the surrounding explanatory text) to add a short sentence linking
capv-load-local-images.sh to the ImagePull exemption and a caution about
potential misconfiguration if the script is changed or omitted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 49f92c83-62ec-4353-87c8-e4a9a9818107

📥 Commits

Reviewing files that changed from the base of the PR and between 94bb276 and 6dcbee4.

📒 Files selected for processing (2)
  • docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx
  • docs/en/create-cluster/vmware-vsphere/parameter-checklist.mdx

Comment thread docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx
Comment thread docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx
Comment thread docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx
@jiazhiguang jiazhiguang force-pushed the fix/containerd-sandbox-compat branch from 6dcbee4 to e86e5fe Compare April 16, 2026 01:51
…ineConfigPool rename

* Replace the preKubeadmCommands sed that rewrote sandbox_image /
  sandbox in /etc/containerd/config.toml with capv-load-local-images.sh
  that imports /root/images/*.tar into containerd. Remove the now-unused
  <pause_image_tag> parameter. ignorePreflightErrors: [ImagePull] is
  kept as a safety net.
* Document that /root/images/*.tar must include the sandbox (pause)
  image whose reference exactly matches containerd's configured
  sandbox_image (v1) or sandbox (v2) value.
* Harden capv-load-local-images.sh to abort on missing /root/images
  directory, import failures, or zero tar files.
* Rename all references from VSphereResourcePool to
  VSphereMachineConfigPool to match the CRD rename.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jiazhiguang jiazhiguang force-pushed the fix/containerd-sandbox-compat branch from e86e5fe to e4a4897 Compare April 16, 2026 01:54
@chinameok chinameok merged commit f39ae80 into master Apr 16, 2026
3 checks passed
@chinameok chinameok deleted the fix/containerd-sandbox-compat branch April 16, 2026 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants