Skip to content

runtime: Enable VM Templating Support for CLH#461

Open
Camelron wants to merge 3 commits into
msft-previewfrom
cameronbaird/msft-preview/clh-go-templating
Open

runtime: Enable VM Templating Support for CLH#461
Camelron wants to merge 3 commits into
msft-previewfrom
cameronbaird/msft-preview/clh-go-templating

Conversation

@Camelron

@Camelron Camelron commented Jun 24, 2026

Copy link
Copy Markdown

Mirror of upstream PR kata-containers#13196

Adds support for VM templating with runtime-go clh. Introduces CI for erofs nodes + a factory VM test. See above PR for full description.

Camelron added 3 commits June 24, 2026 18:36
Add support for VM Template factory on the clh path.

In order to support snapshot/restore-based VM templating,
the following changes were needed:
1. For clh.go, implement SaveVM, PauseVM, restoreVM, ResumeVM
2. Remove initrd config check for VM Templating path. The
        root disk image (when using image mode) is created in memory
        and therefore captured in the VM snapshot.
3. Truncate the memory file to the size of the VM at factory VM
        create time. This allows CLH to use the memory file
        as the backing for the template VM memory, allowing O(1)
        snapshot times.
4. CLH uses memory zones as backing for its memory on the template paths
5. Update StartVM in CLH to use the restore path when template is
        configured and available

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
Add k8s-vm-templating-test.bats which exercises pod create
with the factory initialized on the target node.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
The behavior we had before was that, for a starting k8s pod,
it sees enable_template=true and therefore:

1. Tries NewFactory with fetchOnly=true
2. When that fails (because template.Fetch fails to find the artifacts,
	we retry with fetchOnly=false. This creates a direct factory
	which creates the template from scratch
	(hence we pay a full pod sandbox boot time here)
	and then restores from that. Hence the boot times
	are strictly worse on this path.

Now, even when enable_template=true, we don't try to force a direct factory.
Instead we just revert to the standard sandbox boot path.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
Copilot AI review requested due to automatic review settings June 24, 2026 18:52

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enable VM templating (factory) workflows for Cloud Hypervisor (CLH) and add CI coverage for templating + EROFS-based Kubernetes integration scenarios. This fits into the runtime’s factory/template stack (CLH snapshot/restore support) and the kata-deploy packaging path (containerd snapshotter configuration) to make templating viable in the targeted CI environments.

Changes:

  • Plumb an EROFS “merged vs unmerged” mode from Helm values → kata-deploy env → containerd TOML edits (including safe removal of stale max_unmerged_layers).
  • Add/extend CLH templating support via pause/snapshot/restore, update template state handling, and expand unit/integration tests.
  • Extend GitHub Actions K8s matrix to run EROFS + VM templating scenarios and register the new Bats test.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tools/packaging/kata-deploy/helm-chart/kata-deploy/values.yaml Adds snapshotter.erofsMergeMode chart value documentation.
tools/packaging/kata-deploy/helm-chart/kata-deploy/templates/_helpers.tpl Adds helper to emit EROFS_MERGE_MODE env plus additional shared Helm helper templates.
tools/packaging/kata-deploy/binary/src/utils/toml.rs Adds delete_toml_value() helper and unit tests.
tools/packaging/kata-deploy/binary/src/config.rs Adds EROFS_MERGE_MODE config parsing + validation and logs it.
tools/packaging/kata-deploy/binary/src/artifacts/snapshotters.rs Uses merge-mode to set or delete max_unmerged_layers and updates Go-shim guard logic.
tests/integration/kubernetes/run_kubernetes_tests.sh Registers new k8s-vm-templating.bats in test suite.
tests/integration/kubernetes/k8s-vm-templating.bats Adds Kubernetes integration test for VM templating (factory) in clh/qemu non-confidential + blockfile/erofs setups.
tests/gha-run-k8s-common.sh Adds CI plumbing for EROFS sizing mode + merge mode into Helm values.
src/runtime/virtcontainers/factory/template/template_test.go Expands template factory tests to use per-VM storage paths and validates state file creation.
src/runtime/virtcontainers/factory/template/template_linux.go Updates templating paths/behavior (memory truncation, CLH state filename, DevicesStatePath usage).
src/runtime/virtcontainers/factory/factory_linux.go Resets additional hypervisor sandbox-identifying fields when creating new VMs from base config.
src/runtime/virtcontainers/clh.go Adds CLH pause/snapshot/restore plumbing and restore-from-template detection; adjusts snapshot/save behavior.
src/runtime/virtcontainers/clh_test.go Extends CLH client mock and adds unit tests for restore/snapshot.
src/runtime/pkg/katautils/create.go Changes factory load failure behavior to fall back to direct boot (no “create new factory” retry).
src/runtime/pkg/katautils/config.go Removes factory initrd requirement check.
src/runtime/pkg/katautils/config_test.go Updates factory config test expectations to match validation change.
.github/workflows/run-k8s-tests-on-free-runner.yaml Adds EROFS + unmerged-mode scenarios for clh/qemu lts runs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 183 to 191
func (t *template) checkTemplateVM() error {
_, err := os.Stat(t.statePath + "/memory")
if err != nil {
return err
}

_, err = os.Stat(t.statePath + "/state")
_, err = os.Stat(t.deviceStatePath())
return err
}
Comment on lines +563 to +586
// First call restoreVM without the VM snapshot files (state.json, config.json) present.
err = clh.restoreVM(context.Background())
// An error is expected because restoreVM expects the VM snapshot files to be present.
assert.Error(err)
assert.Contains(err.Error(), filepath.Join(clhConfig.VMStorePath, "state.json"))

// Now create the VM snapshot files and call restoreVM again.
err = os.MkdirAll(clhConfig.VMStorePath, os.ModePerm)
assert.NoError(err, "failed to create dir %s", clhConfig.VMStorePath)
stateFile := filepath.Join(clhConfig.VMStorePath, "state.json")
configFile := filepath.Join(clhConfig.VMStorePath, "config.json")
err = os.WriteFile(stateFile, []byte("{}"), 0o600)
assert.NoError(err)
err = os.WriteFile(configFile, []byte("{}"), 0o600)
assert.NoError(err)

// Call restoreVM again, this time it should succeed.
err = clh.restoreVM(context.Background())
assert.NoError(err)

if assert.NotNil(mockClient.restoreRequest) {
expectedSourceURL := "file://" + clhConfig.VMStorePath
assert.Equal(expectedSourceURL, mockClient.restoreRequest.GetSourceUrl())
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants