Add AgentGroup CRD and a controller to run a fleet of sandboxes by Sanchit2662 · Pull Request #1 · Sanchit2662/agentcube

Sanchit2662 · 2026-05-18T09:55:06Z

Right now agentcube starts one sandbox per task. That single sandbox has to do everything at once, running the code and handling the agent runtime side. For bigger tasks that doesn't really hold up. Usually you want a few agents doing different jobs and working together on the same task.

So I started on the multi-agent side of things . This PR is the first piece of it.

what i added

A new CRD called AgentGroup. You list a bunch of agents in it, and each agent just points at an existing CodeInterpreter or AgentRuntime for its sandbox template. Then there's an AgentGroupController that watches these and actually brings the fleet up.

The controller itself is pretty simple. When you create an AgentGroup it goes to Pending, then Initializing while it creates one Sandbox per agent, and then Running once all of them report Ready. If the spec is wrong or a runtime reference doesn't exist it goes to Failed instead. Every Sandbox it creates gets an owner reference back to the AgentGroup, so deleting the group cleans up all its sandboxes on its own and I didn't need a finalizer for that.

I followed the same controller patterns the CodeInterpreterReconciler already uses, so it should feel consistent with the rest of the codebase. It uses GenerationChangedPredicate and only writes status when something actually changed, which avoids the controller waking itself up in a loop.

what i left out on purpose

I kept this small so it's easy to review. It only does the cold path, meaning it creates plain Sandbox objects directly. No warm pool or SandboxClaim batching yet. A few things I want to do in follow up PRs:

gang scheduling so the whole fleet schedules together or not at all
a shared context store so agents can pass partial results around
a message bus so agents can actually talk to each other
the retry and degraded failure policies (right now everything just behaves like FailFast)

how to try it

kubectl apply -f example/agent-group/agent-group.yaml
kubectl get agentgroup research-task -w

You should see it move Pending to Initializing to Running, and kubectl get sandboxes -l runtime.agentcube.io/agent-group=research-task shows the sandboxes it created.

tests

I added unit tests for the controller with the controller-runtime fake client. They cover the phase transitions, the all-ready and partially-ready cases, bad specs, a missing runtime reference, losing a sandbox while the group is Running, and checking the sandboxes come out owned by the group. go build, go vet and the tests all pass, and the deepcopy plus CRD manifest are regenerated so codegen is in sync.

Keeping this as a draft since it's the starting point and there's more coming.

Adds an AgentGroup custom resource and a controller that brings up a fleet of agent sandboxes for one task, instead of a single sandbox per task. The controller creates one agent-sandbox Sandbox per agent, watches their Ready condition, and moves the group through Pending -> Initializing -> Running. This is a first slice toward multi-agentcube support (issue volcano-sh#301). Gang scheduling, the shared context store and the inter-agent message bus are intentionally left out of this change. Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>

Adds a Dependencies field of directed AgentDependency edges to AgentGroupSpec so the Hierarchical agent graph is part of the API contract from the start. The controller validates that every edge references a known agent and rejects self-edges; ordered startup by these edges is left as later work. Peer topology is accepted by the CRD enum but not implemented, so the controller now rejects it explicitly with an UnsupportedTopology failure instead of treating it as a silent no-op. Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>

Sanchit2662 closed this May 18, 2026

Sanchit2662 reopened this May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AgentGroup CRD and a controller to run a fleet of sandboxes#1

Add AgentGroup CRD and a controller to run a fleet of sandboxes#1
Sanchit2662 wants to merge 2 commits into
mainfrom
poc/multi-agentcube

Sanchit2662 commented May 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Sanchit2662 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

what i added

what i left out on purpose

how to try it

tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sanchit2662 commented May 18, 2026 •

edited

Loading