runtime: Enable VM Templating Support for CLH#461
Open
Camelron wants to merge 3 commits into
Open
Conversation
Add support for VM Template factory on the clh path.
In order to support snapshot/restore-based VM templating,
the following changes were needed:
1. For clh.go, implement SaveVM, PauseVM, restoreVM, ResumeVM
2. Remove initrd config check for VM Templating path. The
root disk image (when using image mode) is created in memory
and therefore captured in the VM snapshot.
3. Truncate the memory file to the size of the VM at factory VM
create time. This allows CLH to use the memory file
as the backing for the template VM memory, allowing O(1)
snapshot times.
4. CLH uses memory zones as backing for its memory on the template paths
5. Update StartVM in CLH to use the restore path when template is
configured and available
Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
Add k8s-vm-templating-test.bats which exercises pod create with the factory initialized on the target node. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
The behavior we had before was that, for a starting k8s pod, it sees enable_template=true and therefore: 1. Tries NewFactory with fetchOnly=true 2. When that fails (because template.Fetch fails to find the artifacts, we retry with fetchOnly=false. This creates a direct factory which creates the template from scratch (hence we pay a full pod sandbox boot time here) and then restores from that. Hence the boot times are strictly worse on this path. Now, even when enable_template=true, we don't try to force a direct factory. Instead we just revert to the standard sandbox boot path. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
There was a problem hiding this comment.
Pull request overview
Enable VM templating (factory) workflows for Cloud Hypervisor (CLH) and add CI coverage for templating + EROFS-based Kubernetes integration scenarios. This fits into the runtime’s factory/template stack (CLH snapshot/restore support) and the kata-deploy packaging path (containerd snapshotter configuration) to make templating viable in the targeted CI environments.
Changes:
- Plumb an EROFS “merged vs unmerged” mode from Helm values → kata-deploy env → containerd TOML edits (including safe removal of stale
max_unmerged_layers). - Add/extend CLH templating support via pause/snapshot/restore, update template state handling, and expand unit/integration tests.
- Extend GitHub Actions K8s matrix to run EROFS + VM templating scenarios and register the new Bats test.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/packaging/kata-deploy/helm-chart/kata-deploy/values.yaml | Adds snapshotter.erofsMergeMode chart value documentation. |
| tools/packaging/kata-deploy/helm-chart/kata-deploy/templates/_helpers.tpl | Adds helper to emit EROFS_MERGE_MODE env plus additional shared Helm helper templates. |
| tools/packaging/kata-deploy/binary/src/utils/toml.rs | Adds delete_toml_value() helper and unit tests. |
| tools/packaging/kata-deploy/binary/src/config.rs | Adds EROFS_MERGE_MODE config parsing + validation and logs it. |
| tools/packaging/kata-deploy/binary/src/artifacts/snapshotters.rs | Uses merge-mode to set or delete max_unmerged_layers and updates Go-shim guard logic. |
| tests/integration/kubernetes/run_kubernetes_tests.sh | Registers new k8s-vm-templating.bats in test suite. |
| tests/integration/kubernetes/k8s-vm-templating.bats | Adds Kubernetes integration test for VM templating (factory) in clh/qemu non-confidential + blockfile/erofs setups. |
| tests/gha-run-k8s-common.sh | Adds CI plumbing for EROFS sizing mode + merge mode into Helm values. |
| src/runtime/virtcontainers/factory/template/template_test.go | Expands template factory tests to use per-VM storage paths and validates state file creation. |
| src/runtime/virtcontainers/factory/template/template_linux.go | Updates templating paths/behavior (memory truncation, CLH state filename, DevicesStatePath usage). |
| src/runtime/virtcontainers/factory/factory_linux.go | Resets additional hypervisor sandbox-identifying fields when creating new VMs from base config. |
| src/runtime/virtcontainers/clh.go | Adds CLH pause/snapshot/restore plumbing and restore-from-template detection; adjusts snapshot/save behavior. |
| src/runtime/virtcontainers/clh_test.go | Extends CLH client mock and adds unit tests for restore/snapshot. |
| src/runtime/pkg/katautils/create.go | Changes factory load failure behavior to fall back to direct boot (no “create new factory” retry). |
| src/runtime/pkg/katautils/config.go | Removes factory initrd requirement check. |
| src/runtime/pkg/katautils/config_test.go | Updates factory config test expectations to match validation change. |
| .github/workflows/run-k8s-tests-on-free-runner.yaml | Adds EROFS + unmerged-mode scenarios for clh/qemu lts runs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
183
to
191
| func (t *template) checkTemplateVM() error { | ||
| _, err := os.Stat(t.statePath + "/memory") | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| _, err = os.Stat(t.statePath + "/state") | ||
| _, err = os.Stat(t.deviceStatePath()) | ||
| return err | ||
| } |
Comment on lines
+563
to
+586
| // First call restoreVM without the VM snapshot files (state.json, config.json) present. | ||
| err = clh.restoreVM(context.Background()) | ||
| // An error is expected because restoreVM expects the VM snapshot files to be present. | ||
| assert.Error(err) | ||
| assert.Contains(err.Error(), filepath.Join(clhConfig.VMStorePath, "state.json")) | ||
|
|
||
| // Now create the VM snapshot files and call restoreVM again. | ||
| err = os.MkdirAll(clhConfig.VMStorePath, os.ModePerm) | ||
| assert.NoError(err, "failed to create dir %s", clhConfig.VMStorePath) | ||
| stateFile := filepath.Join(clhConfig.VMStorePath, "state.json") | ||
| configFile := filepath.Join(clhConfig.VMStorePath, "config.json") | ||
| err = os.WriteFile(stateFile, []byte("{}"), 0o600) | ||
| assert.NoError(err) | ||
| err = os.WriteFile(configFile, []byte("{}"), 0o600) | ||
| assert.NoError(err) | ||
|
|
||
| // Call restoreVM again, this time it should succeed. | ||
| err = clh.restoreVM(context.Background()) | ||
| assert.NoError(err) | ||
|
|
||
| if assert.NotNil(mockClient.restoreRequest) { | ||
| expectedSourceURL := "file://" + clhConfig.VMStorePath | ||
| assert.Equal(expectedSourceURL, mockClient.restoreRequest.GetSourceUrl()) | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Mirror of upstream PR kata-containers#13196
Adds support for VM templating with runtime-go clh. Introduces CI for erofs nodes + a factory VM test. See above PR for full description.