diff --git a/.cspell/abbreviations.txt b/.cspell/abbreviations.txt index fbd0c3b..57aca96 100644 --- a/.cspell/abbreviations.txt +++ b/.cspell/abbreviations.txt @@ -1 +1,2 @@ CAPV +kubelet diff --git a/docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx b/docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx index a6908ba..8c5c897 100644 --- a/docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx +++ b/docs/en/create-cluster/vmware-vsphere/create-cluster-in-global.mdx @@ -11,7 +11,7 @@ queries: # Creating a VMware vSphere Cluster in the global Cluster -This document explains how to create a VMware vSphere workload cluster from the `global` cluster by using the standard CAPV mode that connects directly to vCenter. The procedure covers a minimum supported topology with one datacenter, one NIC per node, and static IP allocation through `VSphereResourcePool`. +This document explains how to create a VMware vSphere workload cluster from the `global` cluster by using the standard CAPV mode that connects directly to vCenter. The procedure covers a minimum supported topology with one datacenter, one NIC per node, and static IP allocation through `VSphereMachineConfigPool`. ## Scenarios @@ -24,14 +24,14 @@ Use this document in the following scenarios: This document applies to the following deployment model: - CAPV connects directly to vCenter. -- Control plane and worker nodes both use `VSphereResourcePool` for static IP allocation and data disks. +- Control plane and worker nodes both use `VSphereMachineConfigPool` for static IP allocation and data disks. - `ClusterResourceSet` delivers the vSphere CPI component automatically. - The first validation uses one datacenter and one NIC per node. This document does not apply to the following scenarios: - A deployment that depends on vSphere Supervisor or `vm-operator`. -- A deployment that does not use `VSphereResourcePool`. +- A deployment that does not use `VSphereMachineConfigPool`. - A first-time deployment that enables multiple datacenters, multiple NICs, and complex disk extensions at the same time. This document is written for the current platform environment. The `kube-ovn` delivery path depends on platform controllers that consume annotations on the `Cluster` resource, so this workflow is not intended to be a generic standalone CAPV deployment guide outside the platform context. @@ -61,12 +61,12 @@ In this workflow, `ClusterResourceSet` is used to deliver the vSphere CPI resour The vSphere CPI component is delivered to the workload cluster through `ClusterResourceSet`. It connects workload nodes to the vSphere infrastructure so the cluster can report infrastructure identities and complete cloud-provider initialization. -### CAPV static allocation pool +### machine config pool -The CAPV static allocation pool is the `VSphereResourcePool` custom resource. In the baseline workflow: +The machine config pool is the `VSphereMachineConfigPool` custom resource. In the baseline workflow: -- One CAPV static allocation pool is used for control plane nodes. -- One CAPV static allocation pool is used for worker nodes. +- One machine config pool is used for control plane nodes. +- One machine config pool is used for worker nodes. Each node slot includes the hostname, datacenter, static IP assignment, and optional data disk definitions. @@ -84,8 +84,8 @@ Also distinguish the following value formats: In the baseline workflow: -- One `VSphereResourcePool` is used for control plane nodes. -- One `VSphereResourcePool` is used for worker nodes. +- One `VSphereMachineConfigPool` is used for control plane nodes. +- One `VSphereMachineConfigPool` is used for worker nodes. ### VM template requirements @@ -96,6 +96,8 @@ The VM template used by this workflow should meet the following minimum requirem 3. It includes VMware Tools or `open-vm-tools`. 4. It includes `containerd`. 5. It includes the baseline components required by kubeadm bootstrap. +6. It includes pre-exported container image tar files under `/root/images/`. These files are imported into containerd by `capv-load-local-images.sh` before kubeadm runs, so that node bootstrap does not depend on pulling images from a remote registry. +7. The `/root/images/*.tar` files **must** include the sandbox (pause) image whose reference exactly matches the `sandbox_image` value (containerd v1) or `sandbox` value (containerd v2) configured in `/etc/containerd/config.toml`. For example, if containerd is configured with `sandbox_image = "registry.example.com/tkestack/pause:3.10"`, one of the tar files must contain that exact image reference. A mismatch causes containerd to pull the sandbox image from the network, which defeats the purpose of local preloading and fails in air-gapped environments. Static IP configuration, hostname injection, and other initialization settings depend on `cloud-init`. Node IP reporting depends on guest tools. @@ -107,8 +109,8 @@ Create a local working directory and store the manifests with the following layo capv-cluster/ ├── 00-namespace.yaml ├── 01-vsphere-credentials-secret.yaml -├── 02-vsphereresourcepool-control-plane.yaml -├── 03-vsphereresourcepool-worker.yaml +├── 02-vspheremachineconfigpool-control-plane.yaml +├── 03-vspheremachineconfigpool-worker.yaml ├── 10-cluster.yaml ├── 15-vsphere-cpi-clusterresourceset.yaml ├── 20-control-plane.yaml @@ -502,17 +504,21 @@ Apply the manifest: kubectl apply -f 15-vsphere-cpi-clusterresourceset.yaml ``` -### Create the static allocation pools +### Create the machine config pools -Create the control plane static allocation pool. +Create the control plane machine config pool. + + +Each node slot declares its NIC layout under `network.primary` (required) and `network.additional` (optional list). The primary NIC's `networkName` is required, and the provider derives the Kubernetes node name, the kubelet serving certificate DNS SAN, and the kubelet `node-ip` from `hostname` and the resolved primary NIC addresses. The `hostname` must be a valid DNS-1123 subdomain. + `deviceName` is optional. If you do not need to force the guest NIC name, remove the `deviceName` line from every node slot. The provider assigns NIC names such as `eth0`, `eth1` by NIC order. -```yaml title="02-vsphereresourcepool-control-plane.yaml" +```yaml title="02-vspheremachineconfigpool-control-plane.yaml" apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 -kind: VSphereResourcePool +kind: VSphereMachineConfigPool metadata: name: namespace: @@ -523,16 +529,17 @@ spec: name: datacenter: "" releaseDelayHours: - resources: + configs: - hostname: "" datacenter: "" network: - - networkName: "" - deviceName: "" - ip: "/" - gateway: "" - dns: - - "" + primary: + networkName: "" + deviceName: "" + ip: "/" + gateway: "" + dns: + - "" persistentDisks: - name: var-cpaas sizeGiB: @@ -550,12 +557,13 @@ spec: - hostname: "" datacenter: "" network: - - networkName: "" - deviceName: "" - ip: "/" - gateway: "" - dns: - - "" + primary: + networkName: "" + deviceName: "" + ip: "/" + gateway: "" + dns: + - "" persistentDisks: - name: var-cpaas sizeGiB: @@ -573,12 +581,13 @@ spec: - hostname: "" datacenter: "" network: - - networkName: "" - deviceName: "" - ip: "/" - gateway: "" - dns: - - "" + primary: + networkName: "" + deviceName: "" + ip: "/" + gateway: "" + dns: + - "" persistentDisks: - name: var-cpaas sizeGiB: @@ -595,11 +604,11 @@ spec: wipeFilesystem: true ``` -Create the worker static allocation pool. +Create the worker machine config pool. -```yaml title="03-vsphereresourcepool-worker.yaml" +```yaml title="03-vspheremachineconfigpool-worker.yaml" apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 -kind: VSphereResourcePool +kind: VSphereMachineConfigPool metadata: name: namespace: @@ -610,16 +619,17 @@ spec: name: datacenter: "" releaseDelayHours: - resources: + configs: - hostname: "" datacenter: "" network: - - networkName: "" - deviceName: "" - ip: "/" - gateway: "" - dns: - - "" + primary: + networkName: "" + deviceName: "" + ip: "/" + gateway: "" + dns: + - "" persistentDisks: - name: var-cpaas sizeGiB: @@ -634,8 +644,8 @@ spec: Apply both manifests: ```bash -kubectl apply -f 02-vsphereresourcepool-control-plane.yaml -kubectl apply -f 03-vsphereresourcepool-worker.yaml +kubectl apply -f 02-vspheremachineconfigpool-control-plane.yaml +kubectl apply -f 03-vspheremachineconfigpool-worker.yaml ``` ### Create the control plane objects @@ -663,9 +673,9 @@ spec: network: devices: - networkName: "" - resourcePoolRef: + machineConfigPoolRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 - kind: VSphereResourcePool + kind: VSphereMachineConfigPool name: namespace: --- @@ -860,13 +870,44 @@ spec: - group: "operator.connectors.alauda.io" resources: ["connectorscores", "connectorsgits", "connectorsocis"] - level: Metadata + - path: /usr/local/bin/capv-load-local-images.sh + owner: "root:root" + permissions: "0755" + content: | + #!/bin/bash + set -euo pipefail + until mountpoint -q /var/lib/containerd; do + echo "waiting for /var/lib/containerd mount" + sleep 1 + done + systemctl restart containerd + until systemctl is-active --quiet containerd; do + echo "waiting for containerd" + sleep 1 + done + if [ ! -d "/root/images" ]; then + echo "ERROR: /root/images directory not found" >&2 + exit 1 + fi + image_count=0 + for image_file in /root/images/*.tar; do + if [ -f "$image_file" ]; then + echo "importing image: $image_file" + ctr -n k8s.io images import "$image_file" + image_count=$((image_count + 1)) + fi + done + if [ "$image_count" -eq 0 ]; then + echo "ERROR: no tar files found in /root/images" >&2 + exit 1 + fi + echo "imported $image_count images" preKubeadmCommands: - hostnamectl set-hostname "{{ ds.meta_data.hostname }}" - echo "::1 ipv6-localhost ipv6-loopback localhost6 localhost6.localdomain6" >/etc/hosts - echo "127.0.0.1 {{ ds.meta_data.hostname }} {{ local_hostname }} localhost localhost.localdomain localhost4 localhost4.localdomain4" >>/etc/hosts - while ! ip route | grep -q "default via"; do sleep 1; done; echo "NetworkManager started" - - mkdir -p /run/cluster-api && (command -v restorecon >/dev/null 2>&1 && restorecon -Rv /run/cluster-api || true) - - sed -i 's|sandbox_image = .*|sandbox_image = "/tkestack/pause:"|' /etc/containerd/config.toml && systemctl restart containerd + - /usr/local/bin/capv-load-local-images.sh postKubeadmCommands: - chmod 600 /var/lib/kubelet/config.yaml clusterConfiguration: @@ -910,6 +951,8 @@ spec: initConfiguration: nodeRegistration: criSocket: /var/run/containerd/containerd.sock + ignorePreflightErrors: + - ImagePull kubeletExtraArgs: cloud-provider: external node-labels: kube-ovn/role=master @@ -919,6 +962,8 @@ spec: joinConfiguration: nodeRegistration: criSocket: /var/run/containerd/containerd.sock + ignorePreflightErrors: + - ImagePull kubeletExtraArgs: cloud-provider: external node-labels: kube-ovn/role=master @@ -959,9 +1004,9 @@ spec: network: devices: - networkName: "" - resourcePoolRef: + machineConfigPoolRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 - kind: VSphereResourcePool + kind: VSphereMachineConfigPool name: namespace: --- @@ -987,9 +1032,43 @@ spec: "tlsCertFile": "/etc/kubernetes/pki/kubelet.crt", "tlsPrivateKeyFile": "/etc/kubernetes/pki/kubelet.key" } + - path: /usr/local/bin/capv-load-local-images.sh + owner: "root:root" + permissions: "0755" + content: | + #!/bin/bash + set -euo pipefail + until mountpoint -q /var/lib/containerd; do + echo "waiting for /var/lib/containerd mount" + sleep 1 + done + systemctl restart containerd + until systemctl is-active --quiet containerd; do + echo "waiting for containerd" + sleep 1 + done + if [ ! -d "/root/images" ]; then + echo "ERROR: /root/images directory not found" >&2 + exit 1 + fi + image_count=0 + for image_file in /root/images/*.tar; do + if [ -f "$image_file" ]; then + echo "importing image: $image_file" + ctr -n k8s.io images import "$image_file" + image_count=$((image_count + 1)) + fi + done + if [ "$image_count" -eq 0 ]; then + echo "ERROR: no tar files found in /root/images" >&2 + exit 1 + fi + echo "imported $image_count images" joinConfiguration: nodeRegistration: criSocket: /var/run/containerd/containerd.sock + ignorePreflightErrors: + - ImagePull kubeletExtraArgs: cloud-provider: external volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/" @@ -1001,8 +1080,7 @@ spec: - echo "::1 ipv6-localhost ipv6-loopback localhost6 localhost6.localdomain6" >/etc/hosts - echo "127.0.0.1 {{ ds.meta_data.hostname }} {{ local_hostname }} localhost localhost.localdomain localhost4 localhost4.localdomain4" >>/etc/hosts - while ! ip route | grep -q "default via"; do sleep 1; done; echo "NetworkManager started" - - mkdir -p /run/cluster-api && (command -v restorecon >/dev/null 2>&1 && restorecon -Rv /run/cluster-api || true) - - sed -i 's|sandbox_image = .*|sandbox_image = "/tkestack/pause:"|' /etc/containerd/config.toml && systemctl restart containerd + - /usr/local/bin/capv-load-local-images.sh postKubeadmCommands: - chmod 600 /var/lib/kubelet/config.yaml users: @@ -1120,7 +1198,7 @@ Prioritize the following checks: - If `ClusterResourceSet` exists but no `ClusterResourceSetBinding` is created, check whether the controller has the required delete permission on the referenced `ConfigMap` and `Secret` resources. - If the network plugin is not installed, verify that the required cluster annotations are present and that the platform controllers processed them. - If the `cpaas.io/registry-address` annotation is missing, verify the public registry credential and the platform controller that injects the annotation. -- If a machine is stuck in `Provisioning`, check `VSphereMachine` conditions for `ResourcePoolReady` — it shows whether slot allocation failed due to pool binding or datacenter mismatch. +- If a machine is stuck in `Provisioning`, check `VSphereMachine` conditions for `MachineConfigPoolReady` — it shows whether slot allocation failed due to pool binding or datacenter mismatch. - If a VM is waiting for IP allocation, verify VMware Tools, the static IP settings, and `VSphereVM.status.addresses`. - If datastore space is exhausted, verify whether old VM directories or `.vmdk` files remain in the target datastore. - If the template system disk size does not match the manifest values, verify that `diskGiB` is not smaller than the template disk size. diff --git a/docs/en/create-cluster/vmware-vsphere/extension-scenarios.mdx b/docs/en/create-cluster/vmware-vsphere/extension-scenarios.mdx index d3c9a51..aea48a3 100644 --- a/docs/en/create-cluster/vmware-vsphere/extension-scenarios.mdx +++ b/docs/en/create-cluster/vmware-vsphere/extension-scenarios.mdx @@ -34,46 +34,52 @@ Before you begin, ensure the following conditions are met: When nodes require an additional management, storage, or service network, extend the manifests in the following resources: -- `02-vsphereresourcepool-control-plane.yaml` -- `03-vsphereresourcepool-worker.yaml` +- `02-vspheremachineconfigpool-control-plane.yaml` +- `03-vspheremachineconfigpool-worker.yaml` - `20-control-plane.yaml` - `30-workers-md-0.yaml` - `04-failure-domains.yaml` if failure domains are enabled -Add the second NIC to each node slot in the static allocation pools: +Each node slot declares its NIC layout under `network.primary` and `network.additional`. The primary NIC is used to derive the kubelet `node-ip` and remains the node's primary identity; additional NICs are merged after it in the order listed. + +Add the second NIC to each control plane node slot in the machine config pools: ```yaml network: -- networkName: "" - deviceName: "" - ip: "/" - gateway: "" - dns: - - "" -- networkName: "" - deviceName: "" - ip: "/" - gateway: "" - dns: - - "" + primary: + networkName: "" + deviceName: "" + ip: "/" + gateway: "" + dns: + - "" + additional: + - networkName: "" + deviceName: "" + ip: "/" + gateway: "" + dns: + - "" ``` Apply the same pattern to the worker node slots: ```yaml network: -- networkName: "" - deviceName: "" - ip: "/" - gateway: "" - dns: - - "" -- networkName: "" - deviceName: "" - ip: "/" - gateway: "" - dns: - - "" + primary: + networkName: "" + deviceName: "" + ip: "/" + gateway: "" + dns: + - "" + additional: + - networkName: "" + deviceName: "" + ip: "/" + gateway: "" + dns: + - "" ``` Add the second NIC to the machine templates: @@ -108,7 +114,7 @@ When you move between one NIC and two NICs, apply the following rules: Update all of the following fields together: -1. `VSphereResourcePool.spec.resources[].network` +1. `VSphereMachineConfigPool.spec.configs[].network.additional` (append the second NIC entry; keep `network.primary` unchanged) 2. `VSphereMachineTemplate.spec.template.spec.network.devices` 3. `VSphereFailureDomain.spec.topology.networks` when failure domains are enabled @@ -116,7 +122,7 @@ Update all of the following fields together: Remove the second NIC block from all of the following fields: -1. The second NIC entry in `VSphereResourcePool.spec.resources[].network` +1. The second NIC entry in `VSphereMachineConfigPool.spec.configs[].network.additional` (leave the list empty or remove the `additional` key entirely) 2. The second device entry in `VSphereMachineTemplate.spec.template.spec.network.devices` 3. The second network name in `VSphereFailureDomain.spec.topology.networks` @@ -284,14 +290,14 @@ The baseline deployment includes the following required data disks: - **Control plane nodes**: `var-cpaas`, `var-lib-containerd`, and `var-lib-etcd` (3 disks per node). Do not remove any of these disks. The `var-lib-etcd` disk must set `wipeFilesystem: true` to allow `kubeadm join` during rolling updates. - **Worker nodes**: `var-cpaas` and `var-lib-containerd` (2 disks per node). Do not remove any of these disks. -If a node needs additional data disks beyond the required set, append more entries to the same `persistentDisks` list in the corresponding `VSphereResourcePool` node slot. The following optional fields are especially relevant here: +If a node needs additional data disks beyond the required set, append more entries to the same `persistentDisks` list in the corresponding `VSphereMachineConfigPool` node slot. The following optional fields are especially relevant here: - **`mountPath`**: If set, the disk is formatted and mounted at the specified path. If omitted, the disk is attached as a raw device with a symlink at `/dev/disk/by-capv/`, allowing an external process to manage it at runtime. - **`wipeFilesystem`**: When `true`, disk content is wiped on the first boot of a new VM. Normal reboots and manual service restarts are not affected. Defaults to `false`. ## Scale out worker nodes -Worker scale-out depends on the relationship between `MachineDeployment.spec.replicas` and the available node slots in the worker CAPV static allocation pool, `VSphereResourcePool.spec.resources[]`. +Worker scale-out depends on the relationship between `MachineDeployment.spec.replicas` and the available node slots in the worker machine config pool, `VSphereMachineConfigPool.spec.configs[]`. Apply the following rules: @@ -301,7 +307,7 @@ Apply the following rules: Use the following order when you scale out workers: -1. Add new worker node slots to `03-vsphereresourcepool-worker.yaml`. +1. Add new worker node slots to `03-vspheremachineconfigpool-worker.yaml`. 2. Increase `MachineDeployment.spec.replicas` in `30-workers-md-0.yaml`. The following example adds a new worker slot: @@ -310,11 +316,12 @@ The following example adds a new worker slot: - hostname: "" datacenter: "" network: - - networkName: "" - ip: "/" - gateway: "" - dns: - - "" + primary: + networkName: "" + ip: "/" + gateway: "" + dns: + - "" persistentDisks: - name: var-cpaas sizeGiB: diff --git a/docs/en/create-cluster/vmware-vsphere/index.mdx b/docs/en/create-cluster/vmware-vsphere/index.mdx index ef7e3f9..1b1af36 100644 --- a/docs/en/create-cluster/vmware-vsphere/index.mdx +++ b/docs/en/create-cluster/vmware-vsphere/index.mdx @@ -31,6 +31,6 @@ The baseline workflow in this section is intentionally limited to a minimum supp - One NIC per node - Three control plane nodes - One worker node -- Static IP allocation through `VSphereResourcePool` +- Static IP allocation through `VSphereMachineConfigPool` When you need a larger or more complex topology, first complete the baseline deployment and then apply the changes described in [Extension Scenarios](./extension-scenarios.mdx). diff --git a/docs/en/create-cluster/vmware-vsphere/parameter-checklist.mdx b/docs/en/create-cluster/vmware-vsphere/parameter-checklist.mdx index 781d399..7f2c097 100644 --- a/docs/en/create-cluster/vmware-vsphere/parameter-checklist.mdx +++ b/docs/en/create-cluster/vmware-vsphere/parameter-checklist.mdx @@ -43,9 +43,9 @@ Use this checklist in the following order: The following terms are used consistently throughout the VMware vSphere cluster-creation documents. -### CAPV static allocation pool +### machine config pool -A CAPV static allocation pool is the `VSphereResourcePool` custom resource. It predefines node slots. Each slot can include: +A machine config pool is the `VSphereMachineConfigPool` custom resource. It predefines node slots. Each slot can include: - A node hostname - A target datacenter @@ -53,16 +53,23 @@ A CAPV static allocation pool is the `VSphereResourcePool` custom resource. It p - Persistent disk definitions :::warning -Each `VSphereResourcePool` can only be referenced by a single `KubeadmControlPlane` or a single `MachineDeployment`. Do not share one `VSphereResourcePool` across multiple control plane or worker groups. If a pool is already bound to another consumer, the `VSphereMachine` will report a `ResourcePoolReady=False` condition with reason `PoolBoundToOtherConsumer`. +Each `VSphereMachineConfigPool` can only be referenced by a single `KubeadmControlPlane` or a single `MachineDeployment`. Do not share one `VSphereMachineConfigPool` across multiple control plane or worker groups. If a pool is already bound to another consumer, the `VSphereMachine` will report a `MachineConfigPoolReady=False` condition with reason `PoolBoundToOtherConsumer`. ::: ### Node slot -A node slot is an entry under `VSphereResourcePool.spec.resources[]`. A single slot usually maps to one node, such as `cp-01` or `worker-01`. +A node slot is an entry under `VSphereMachineConfigPool.spec.configs[]`. A single slot usually maps to one node, such as `cp-01` or `worker-01`. The slot `hostname` drives the Kubernetes node name, the kubelet serving certificate DNS SAN, and (combined with the resolved primary NIC addresses) the kubelet `node-ip`; it must be a valid DNS-1123 subdomain. + +### Slot network layout + +Each slot declares its NIC layout under `network.primary` and `network.additional`: + +- `network.primary` is required. Its `networkName` must be set and is used as the node-ip source for the kubelet. +- `network.additional` is an optional list of extra NICs merged after the primary NIC in the order listed. ### `deviceName` -`deviceName` is an optional field in the `VSphereResourcePool` network configuration. It is used to control the NIC name seen inside the guest operating system, such as `eth0` or `eth1`. +`deviceName` is an optional field in the `VSphereMachineConfigPool` network configuration. It is used to control the NIC name seen inside the guest operating system, such as `eth0` or `eth1`. Use the following distinctions when you fill the values: @@ -78,7 +85,7 @@ A vCenter resource pool is the native vCenter inventory object, for example: /Datacenter1/host/cluster1/Resources ``` -This value is different from CAPV's `VSphereResourcePool`. In the extension scenarios, this path is used by `VSphereDeploymentZone.spec.placementConstraint.resourcePool`. +In the extension scenarios, this path is used by `VSphereDeploymentZone.spec.placementConstraint.resourcePool`. ### Compute cluster @@ -144,6 +151,8 @@ The template should also meet the following requirements: - It includes VMware Tools or `open-vm-tools`. - It includes `containerd`. - It includes the baseline components required by kubeadm bootstrap. +- It includes pre-exported container image tar files under `/root/images/`. These files are imported into containerd by `capv-load-local-images.sh` before kubeadm runs, so that node bootstrap does not depend on pulling images from a remote registry. +- The `/root/images/*.tar` files **must** include the sandbox (pause) image whose reference exactly matches the `sandbox_image` value (containerd v1) or `sandbox` value (containerd v2) configured in `/etc/containerd/config.toml`. For example, if containerd is configured with `sandbox_image = "registry.example.com/tkestack/pause:3.10"`, one of the tar files must contain that exact image reference. A mismatch causes containerd to pull the sandbox image from the network, which defeats the purpose of local preloading and fails in air-gapped environments. ## Load Balancer Prerequisites @@ -165,7 +174,6 @@ The template should also meet the following requirements: | Pod CIDR | `` | Yes | Must not overlap with existing networks. | `10.244.0.0/16` | - | | Service CIDR | `` | Yes | Must not overlap with existing networks. | `10.96.0.0/12` | - | | Image registry | `` | Yes | Private registry address. kubeadm `imageRepository` is set to `/tkestack`. | `registry.example.local` | - | -| Sandbox (pause) image tag | `` | Yes | Tag for the pause image used by containerd as the pod sandbox. The full reference is `/tkestack/pause:`. | `3.10` | - | | `kube-ovn` version | `` | Yes | Must match the platform network plugin requirements. | `v4.2.26` | - | | `kube-ovn-join-cidr` | `` | Yes | Must not overlap with other networks. | `100.64.0.0/16` | - | | CoreDNS image tag | `` | Yes | Use the tag approved for the Kubernetes version. | `1.12.4` | - | @@ -190,27 +198,27 @@ The template should also meet the following requirements: | Prefix length | `` | Yes | Used with each node IP address. | `24` | - | | DNS server 1 | `` | Yes | DNS server for the primary NIC. | `10.10.0.10` | - | -### Control plane static allocation pool +### Control plane machine config pool | Parameter | Placeholder | Required | Validation or Notes | Example | Actual Value | |-----------|-------------|----------|---------------------|---------|--------------| -| Control plane pool name | `` | Yes | CAPV static allocation pool name for control plane nodes. | `demo-cluster-control-plane-pool` | - | -| Control plane node 1 hostname | `` | Yes | Recommended hostname for the first control plane node. | `cp-01` | - | +| Control plane pool name | `` | Yes | machine config pool name for control plane nodes. | `demo-cluster-control-plane-pool` | - | +| Control plane node 1 hostname | `` | Yes | Node name and kubelet serving cert SAN for the first control plane node. Must be a valid DNS-1123 subdomain. | `cp-01` | - | | Control plane node 1 datacenter | `` | Yes | Usually the same as the default datacenter. | `dc-a` | - | | Control plane node 1 IP address | `` | Yes | IPv4 address only, without the prefix length. | `10.10.10.11` | - | -| Control plane node 2 hostname | `` | Yes | Recommended hostname for the second control plane node. | `cp-02` | - | +| Control plane node 2 hostname | `` | Yes | Node name and kubelet serving cert SAN for the second control plane node. Must be a valid DNS-1123 subdomain. | `cp-02` | - | | Control plane node 2 datacenter | `` | Yes | Usually the same as the default datacenter. | `dc-a` | - | | Control plane node 2 IP address | `` | Yes | IPv4 address only, without the prefix length. | `10.10.10.12` | - | -| Control plane node 3 hostname | `` | Yes | Recommended hostname for the third control plane node. | `cp-03` | - | +| Control plane node 3 hostname | `` | Yes | Node name and kubelet serving cert SAN for the third control plane node. Must be a valid DNS-1123 subdomain. | `cp-03` | - | | Control plane node 3 datacenter | `` | Yes | Usually the same as the default datacenter. | `dc-a` | - | | Control plane node 3 IP address | `` | Yes | IPv4 address only, without the prefix length. | `10.10.10.13` | - | -### Worker static allocation pool +### Worker machine config pool | Parameter | Placeholder | Required | Validation or Notes | Example | Actual Value | |-----------|-------------|----------|---------------------|---------|--------------| -| Worker pool name | `` | Yes | CAPV static allocation pool name for worker nodes. | `demo-cluster-worker-pool` | - | -| Worker node 1 hostname | `` | Yes | Recommended hostname for the first worker node. | `worker-01` | - | +| Worker pool name | `` | Yes | machine config pool name for worker nodes. | `demo-cluster-worker-pool` | - | +| Worker node 1 hostname | `` | Yes | Node name and kubelet serving cert SAN for the first worker node. Must be a valid DNS-1123 subdomain. | `worker-01` | - | | Worker node 1 datacenter | `` | Yes | Usually the same as the default datacenter. | `dc-a` | - | | Worker node 1 IP address | `` | Yes | IPv4 address only, without the prefix length. | `10.10.10.21` | - | | Worker node 2 hostname | `` | No | Used when you scale out the worker pool. | `worker-02` | - | @@ -331,7 +339,7 @@ Before you start the deployment, confirm all of the following items: 6. The Pod CIDR, Service CIDR, and `kube-ovn-join-cidr` do not overlap with existing networks. 7. The VM template is available in every required datacenter. 8. The required datastores and vCenter resource pool paths are confirmed. -9. The static allocation pool values for the minimum single-datacenter topology are complete. +9. The machine config pool values for the minimum single-datacenter topology are complete. 10. The baseline system disk and data disk sizing is confirmed. 11. Every required parameter has a real value. diff --git a/docs/en/manage-nodes/vmware-vsphere.mdx b/docs/en/manage-nodes/vmware-vsphere.mdx index fc40b15..51df1dd 100644 --- a/docs/en/manage-nodes/vmware-vsphere.mdx +++ b/docs/en/manage-nodes/vmware-vsphere.mdx @@ -11,14 +11,14 @@ queries: # Managing Nodes on VMware vSphere -This document explains how to manage worker nodes on VMware vSphere after the baseline cluster is running. Node lifecycle operations are managed through `VSphereResourcePool`, `VSphereMachineTemplate`, `KubeadmConfigTemplate`, and `MachineDeployment` resources. +This document explains how to manage worker nodes on VMware vSphere after the baseline cluster is running. Node lifecycle operations are managed through `VSphereMachineConfigPool`, `VSphereMachineTemplate`, `KubeadmConfigTemplate`, and `MachineDeployment` resources. ## Prerequisites Before you begin, ensure the following conditions are met: - The workload cluster was created successfully. See [Creating Clusters on VMware vSphere](../create-cluster/vmware-vsphere/). -- The worker CAPV static allocation pool has enough available slots. +- The worker machine config pool has enough available slots. - The control plane is healthy and reachable. - You know which manifest files currently define the worker nodes. @@ -27,20 +27,20 @@ Before you begin, ensure the following conditions are met: ### Scale out worker nodes -When you add more worker nodes, update the worker static allocation pool before you increase the replica count. +When you add more worker nodes, update the worker machine config pool before you increase the replica count. -1. Add one or more new node slots to `03-vsphereresourcepool-worker.yaml`. +1. Add one or more new node slots to `03-vspheremachineconfigpool-worker.yaml`. 2. Update `replicas` in `30-workers-md-0.yaml`. 3. Apply the updated manifests. Use the following order: ```bash -kubectl apply -f 03-vsphereresourcepool-worker.yaml +kubectl apply -f 03-vspheremachineconfigpool-worker.yaml kubectl apply -f 30-workers-md-0.yaml ``` -**Note:** If `MachineDeployment.spec.replicas` is greater than the number of available slots in `VSphereResourcePool.spec.resources[]`, the new worker nodes cannot be assigned correctly. +**Note:** If `MachineDeployment.spec.replicas` is greater than the number of available slots in `VSphereMachineConfigPool.spec.configs[]`, the new worker nodes cannot be assigned correctly. ### Roll out updated worker node configuration \{#roll-out-updated-worker-node-configuration} @@ -178,10 +178,10 @@ Confirm the following results: Use the following checks first when worker node management fails: -- Check `VSphereMachine` conditions for `ResourcePoolReady`. If `False`, the reason indicates why slot allocation failed: +- Check `VSphereMachine` conditions for `MachineConfigPoolReady`. If `False`, the reason indicates why slot allocation failed: - `PoolBoundToOtherConsumer`: the pool is already bound to a different `KubeadmControlPlane` or `MachineDeployment`. - `NoAvailableSlots`: no slots match the required datacenter or failure domain. -- Verify that the worker CAPV static allocation pool still has free slots. +- Verify that the worker machine config pool still has free slots. - Verify that the worker IP addresses, gateway, and DNS settings are correct. - Verify that the worker VM template still matches the required Kubernetes version and guest-tools requirements. - Check `VSphereVM.status.addresses` when a node is waiting for IP allocation. diff --git a/docs/en/upgrade-cluster/vmware-vsphere.mdx b/docs/en/upgrade-cluster/vmware-vsphere.mdx index 3d360a0..0a6ffc7 100644 --- a/docs/en/upgrade-cluster/vmware-vsphere.mdx +++ b/docs/en/upgrade-cluster/vmware-vsphere.mdx @@ -30,7 +30,7 @@ Before you begin, ensure the following conditions are met: - The control plane is healthy and reachable. - All nodes are in the `Ready` state. - The target VM template is compatible with the target Kubernetes version. -- The CAPV static allocation pools have enough capacity for rolling updates. +- The machine config pools have enough capacity for rolling updates. :::warning **Templates are immutable**