Skip to content

Conversation

@lyarwood
Copy link

@lyarwood lyarwood commented Oct 21, 2025

Summary

This PR introduces an initial KubeVirt toolset for the kubernetes-mcp-server, enabling AI agents to create and manage virtual machines through MCP tools.

New Tools

vm_create

Creates VirtualMachines with intelligent parameter handling:

  • Workload resolution: Accepts OS names (fedora, ubuntu, rhel) or full container disk URLs
  • Automatic resource selection: Resolves preferences and instance types from size/performance hints
  • Flexible configuration: Supports explicit instancetype specification or automatic selection
  • Autostart option: Optional parameter to create running VMs (runStrategy: Always)

Example:

{
  "name": "vm_create",
  "arguments": {
    "namespace": "my-namespace",
    "name": "my-vm",
    "workload": "fedora",
    "size": "large",
    "performance": "compute-optimized"
  }
}

vm_start

Starts halted VirtualMachines by changing runStrategy to Always.

vm_stop

Stops running VirtualMachines by changing runStrategy to Halted.

vm_pause

Pauses a running VirtualMachine by calling the pause sub-resource API of the VMI.

vm_troubleshoot

Provides comprehensive diagnostic guidance for broken VMs:

  • Step-by-step troubleshooting instructions
  • Common issues and solutions
  • Checks for VirtualMachine, VirtualMachineInstance, DataVolumes, virt-launcher pods, and events

Key Features

  • Single-call VM creation: No multi-stage plan/execute workflow
  • Intelligent defaults: Automatic resolution of container disk images, preferences, and instance types
  • Complete lifecycle management: Create → Start → Stop → Troubleshoot
  • Idempotent operations: Tools can be called multiple times safely
  • Non-destructive: Provides safe alternatives to delete/recreate patterns

Testing Results

Validated using the gevals (Generative AI Evaluations) framework across 5 different AI agents/models with 6 tasks each:

Version Success Rate Notes
Without toolset (baseline) 23.3% (7/30) Generic Kubernetes tools only
Original toolset (vm_create + vm_troubleshoot) 93.3% (28/30) Initial implementation
Improved toolset (+ vm_start/stop) 100% (30/30) Improved implementation based on testing feedback above

Agent Performance (Improved Toolset)

Agent Success Rate Improvement from Baseline
Claude Code 100% (6/6) +100%
Gemini 100% (6/6) +500%
gemini-2.0-flash 100% (6/6) +500%
gemini-2.5-pro 100% (6/6) +200%
Granite 3.3 8B 100% (6/6) ∞ (from 0%)

Key Achievement: The toolset enables even smaller models like Granite 3.3 8B (which failed all tasks without specialized tools) to achieve perfect success rates.

Design Principles

  1. Single-purpose tools: Each tool has one clear responsibility
  2. Intelligent parameter handling: Flexible inputs with smart defaults
  3. Abstraction of complexity: Hides KubeVirt API details from agents
  4. Complete domain coverage: No gaps forcing inappropriate tool usage
  5. Error recovery support: Multiple valid approaches enable graceful fallback

Implementation Details

  • Location: pkg/toolsets/kubevirt/
  • Structure:
    • vm/create/ - VM creation with template rendering
    • vm/start/ - Non-destructive VM startup
    • vm/stop/ - Non-destructive VM shutdown
    • vm/troubleshoot/ - Diagnostic guidance generation
  • Dependencies: Uses Kubernetes dynamic client for VirtualMachine resources
  • Testing: Comprehensive unit tests and integration tests via gevals

A set of gevals test result/report documents is available outside of this PR under pkg/toolsets/kubevirt/tests/results/ setting out how gevals has been used to first introduce and then improve the new toolset.

@lyarwood lyarwood force-pushed the kubevirt branch 2 times, most recently from 9aa2732 to 4eac816 Compare November 3, 2025 20:27
@lyarwood lyarwood force-pushed the kubevirt branch 2 times, most recently from a135353 to 24ad072 Compare November 4, 2025 19:02
@lyarwood

This comment was marked as outdated.

@Cali0707
Copy link
Collaborator

Cali0707 commented Nov 5, 2025

it would be nice if gevals included /cost and /context data allowing us to assert against it potentially.

@lyarwood +1 from my side on that being nice, the reason we haven't been able to add it there is that we use claude code in the non-interactive setup in gevals, and we haven't been able to figure out how to get that information in the non interactive setup. If you have any ideas, let me know and/or open a PR!

@lyarwood lyarwood force-pushed the kubevirt branch 2 times, most recently from 30c0ec0 to c01fe35 Compare November 7, 2025 15:19
@lyarwood lyarwood marked this pull request as ready for review November 7, 2025 15:23
@lyarwood lyarwood changed the title WIP feat(kubevirt): Add VM management toolset feat(kubevirt): Add VM management toolset Nov 7, 2025
@lyarwood lyarwood changed the title feat(kubevirt): Add VM management toolset feat(kubevirt): Add basic VM management toolset Nov 7, 2025
Copy link

@codingben codingben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work Lee. Please consider creating a single VM package with tool.go that will have all VM's actions in one single place, it would help to avoid duplications and will be much better in terms of readablity and maintainability.

Copy link

@codingben codingben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Lee.

Just an opinion: For the beginning I see too many folders and scripts to achieve a few VM's actions via MCP tooling. I'd ask the project's maintainers opinion about this, eventually this repository's codebase can be very huge.

@lyarwood
Copy link
Author

Thanks Lee.

Just an opinion: For the beginning I see too many folders and scripts to achieve a few VM's actions via MCP tooling. I'd ask the project's maintainers opinion about this, eventually this repository's codebase can be very huge.

Assuming you are talking about the test directory, I agree it is indeed pretty large at the moment but I've already spoken to folks about improvements to the gevals framework that would help reduce that. I plan on working on introducing builtin agent support and configurable models for the openai-agent this week but until that is merged we will need to carry the extra scripts and config for now.

Comment on lines 34 to 40
// RESTConfig returns the Kubernetes REST configuration
func (k *Kubernetes) RESTConfig() *rest.Config {
if k.manager == nil {
return nil
}
return k.manager.cfg
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a ToRESTConfig method to cover this functionality and implement the RESTClientGetter interface required for Helm:

// ToRESTConfig returns the rest.Config object (genericclioptions.RESTClientGetter)
func (m *Manager) ToRESTConfig() (*rest.Config, error) {
return m.cfg, nil
}

Similarly, there's already a ToDiscoveryClient method too (I think you're creating it at some point in the toolset).

Nonetheless, we need to prevent accessing any kubernetes API without checking if the resource is allowed by configuration:

// isAllowed checks the resource is in denied list or not.
// If it is in denied list, this function returns false.
func isAllowed(
staticConfig *config.StaticConfig, // TODO: maybe just use the denied resource slice
gvk *schema.GroupVersionKind,
) bool {
if staticConfig == nil {
return true
}
for _, val := range staticConfig.DeniedResources {
// If kind is empty, that means Group/Version pair is denied entirely
if val.Kind == "" {
if gvk.Group == val.Group && gvk.Version == val.Version {
return false
}
}
if gvk.Group == val.Group &&
gvk.Version == val.Version &&
gvk.Kind == val.Kind {
return false
}
}
return true
}

In general, we don't want to directly expose the Kuberentes API to the toolsets implementors and provide whatetever is needed in the kubernetes package.

Copy link
Author

@lyarwood lyarwood Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manusa ACK apologies, pushed 9a977dc that should address this.


// ResourcesListByGVR lists resources using a GroupVersionResource directly.
// Access control is enforced through the RESTMapper.
// Use this when you need to query arbitrary resources not covered by ResourcesList.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyarwood what is the use case where the ResourcesList doesn't work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah apologies, should be fixed by 9cd97b4

Introduces a new KubeVirt toolset providing virtual machine management
capabilities through MCP tools. The vm_create tool generates comprehensive
creation plans with pre-creation validation of instance types, preferences,
and container disk images, enabling AI assistants to help users create
VirtualMachines with appropriate resource configurations.

The tool supports:
- Workload specification via OS names or container disk URLs
- Auto-selection of instance types based on size/performance hints
- DataSource integration for common OS images
- Comprehensive validation and planning before resource creation

Assisted-By: Claude <[email protected]>
Signed-off-by: Lee Yarwood <[email protected]>
Add lifecycle management tools for starting and stopping VirtualMachines.
These tools provide simple, single-action operations that prevent
destructive workarounds like delete/recreate.

Assisted-By: Claude <[email protected]>
Signed-off-by: Lee Yarwood <[email protected]>
Add optional autostart parameter to vm_create tool that sets runStrategy
to Always instead of Halted, allowing VMs to be created and started in
a single operation.

Assisted-By: Claude <[email protected]>
Signed-off-by: Lee Yarwood <[email protected]>
Add GetRequiredString, GetOptionalString, and GetOptionalBool methods to
ToolHandlerParams type to eliminate code duplication across kubevirt VM
tools. These methods provide a cleaner, reusable API for extracting
parameters from tool call arguments.

Assisted-By: Claude <[email protected]>
Signed-off-by: Lee Yarwood <[email protected]>
Extend GetOptionalString method to accept a default value parameter,
allowing callers to specify what value to return when a parameter is
missing or invalid. This simplifies code by eliminating post-call
default value checks.

Use variadic parameters to make the default value argument optional in
GetOptionalString. Callers can now either provide a default value or
omit it to get empty string behavior. This provides more flexibility and
cleaner call sites when empty string is the desired default.

Assisted-By: Claude <[email protected]>
Signed-off-by: Lee Yarwood <[email protected]>
Refactor all kubevirt VM tool tests to use external test packages (_test
suffix) and test only public behavior through the Tools() API. This
ensures tests verify the public interface rather than implementation
details.

Assisted-By: Claude <[email protected]>
Signed-off-by: Lee Yarwood <[email protected]>
Implements a new KubeVirt VM pause tool that uses the subresource API to
pause a running VirtualMachine's associated VirtualMachineInstance.
Unlike vm_start and vm_stop which modify the VM's runStrategy, this tool
directly calls the VMI pause subresource endpoint at
/apis/subresources.kubevirt.io/v1/namespaces/{namespace}/virtualmachineinstances/{name}/pause.

Assisted-By: Claude <[email protected]>
Signed-off-by: Lee Yarwood <[email protected]>
Introduces a dedicated vm_delete tool to simplify VirtualMachine deletion.
Based on agent evaluation results, multiple agents were searching for a
vm_delete tool instead of using the generic resources_delete tool.

Assisted-By: Claude <[email protected]>
Signed-off-by: Lee Yarwood <[email protected]>
Remove direct RESTConfig() exposure from Kubernetes type and add
RESTConfigForGVK method that validates resource access through
AccessControlRESTMapper before allowing operations. Update all kubevirt
tools to use GVK instead of GVR with ResourcesList and use controlled
RESTConfigForGVK method instead of creating their own uncontrolled
dynamic clients.

This prevents toolsets from bypassing the denied resources configuration
and ensures all API access goes through the access control layer.

Assisted-By: Claude <[email protected]>
Signed-off-by: Lee Yarwood <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants