Skip to content

Proposal for review#349

Open
AAdIprog wants to merge 1 commit into
volcano-sh:mainfrom
AAdIprog:Propasal-Review
Open

Proposal for review#349
AAdIprog wants to merge 1 commit into
volcano-sh:mainfrom
AAdIprog:Propasal-Review

Conversation

@AAdIprog
Copy link
Copy Markdown

What type of PR is this?
/kind documentation

What this PR does / why we need it:
Adds the LFX Mentorship proposal for supporting multi-AgentCube capability. The proposal lays out the problem statement, proposed MultiAgentRuntime design, control-plane/store/router changes, rollout plan, testing strategy, and risks so the work can be reviewed and tracked in one place.

Which issue(s) this PR fixes:
Fixes #301

Special notes for your reviewer:
This is a proposal-only PR, not an implementation PR. It is intended to document the design direction and milestone plan for the multi-agent capability work.

Does this PR introduce a user-facing change?:
Yes

Adds a detailed proposal for MultiAgentRuntime support in AgentCube, including design, implementation plan, testing strategy, and rollout timeline.

Copilot AI review requested due to automatic review settings May 18, 2026 20:47
@volcano-sh-bot volcano-sh-bot added the kind/documentation Improvements or additions to documentation label May 18, 2026
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kevin-wangzefeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

Welcome @AAdIprog! It looks like this is your first PR to volcano-sh/agentcube 🎉

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.12%. Comparing base (524e55e) to head (155cd6d).
⚠️ Report is 54 commits behind head on main.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #349      +/-   ##
==========================================
+ Coverage   47.57%   49.12%   +1.55%     
==========================================
  Files          30       30              
  Lines        2819     2858      +39     
==========================================
+ Hits         1341     1404      +63     
+ Misses       1338     1301      -37     
- Partials      140      153      +13     
Flag Coverage Δ
unittests 49.12% <ø> (+1.55%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an LFX mentorship proposal for MultiAgentRuntime / multi-AgentCube support, but the submitted document is currently hard to review due to placement and formatting issues, and the PR also includes unrelated lockfile churn.

Changes:

  • Adds a new multi-agent capability proposal document.
  • Modifies the Docusaurus package-lock.json with peer metadata changes unrelated to the proposal.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 9 comments.

File Description
proposalmd.md New mentorship/design proposal for MultiAgentRuntime support.
docs/agentcube/package-lock.json Unrelated lockfile metadata changes.
Files not reviewed (1)
  • docs/agentcube/package-lock.json: Language not supported
Comments suppressed due to low confidence (11)

proposalmd.md:32

  • The metadata section is flattened into a single run-on line (FieldValueProject...) instead of a readable Markdown table/list, so key fields such as title, applicant, branch, and date cannot be parsed reliably by readers.
FieldValueProjectCNCF Volcano / AgentCubeMentorshipLFX 2026 Term 2 (Jun – Aug 2026)TitleSupport multi-AgentCube CapabilityTracking Issuevolcano-sh/agentcube#301Parent Designvolcano-sh/volcano#4686Target BranchmainApplicantAadi ShahGitHubAAdIprogEmail[aadishah132@gmail.com](mailto:aadishah132@gmail.com)TimezoneIST (UTC+5:30)Weekly Hours30–40 hrs/week (weeks 8 and 11 used as buffer)Mentor@hzxuzhonghu (Zhonghu Xu)Proposal Versionv1.0Date2026-05-16

proposalmd.md:123

  • This code example appears to contain pasted UI language-selector text and malformed backticks rather than a fenced Go block. As written, the CRD example will render as a single unreadable line and readers cannot copy or review the proposed API shape.
go

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML``   // +kubebuilder:object:root=true  // +kubebuilder:subresource:status  // +kubebuilder:resource:scope=Namespaced  type MultiAgentRuntime struct {      metav1.TypeMeta   `json:",inline"`      metav1.ObjectMeta `json:"metadata,omitempty"`      Spec   MultiAgentRuntimeSpec   `json:"spec"`      Status MultiAgentRuntimeStatus `json:"status,omitempty"`  }  type MultiAgentRuntimeSpec struct {      // +kubebuilder:validation:MinItems=2      // +kubebuilder:validation:MaxItems=16      Roles         []AgentRoleSpec    `json:"roles"`      Lifecycle     GroupLifecycle     `json:"lifecycle,omitempty"`      // +kubebuilder:default={type: ServiceDNS}      Communication GroupCommunication `json:"communication,omitempty"`      // +kubebuilder:default="15m"      SessionTimeout     *metav1.Duration `json:"sessionTimeout,omitempty"`      // +kubebuilder:default="8h"      MaxSessionDuration *metav1.Duration `json:"maxSessionDuration,omitempty"`  }  type AgentRoleSpec struct {      // +kubebuilder:validation:Pattern="^[a-z0-9]([-a-z0-9]*[a-z0-9])?$"      Name        string           `json:"name"`      RuntimeRef  string           `json:"runtimeRef"`      Coordinator bool             `json:"coordinator,omitempty"`      Dependencies []string        `json:"dependencies,omitempty"`      // +kubebuilder:default=1      // +kubebuilder:validation:Maximum=8      Replicas      *int32         `json:"replicas,omitempty"`      // +kubebuilder:default="Never"      // +kubebuilder:validation:Enum=Never;OnFailure      RestartPolicy RolePolicy     `json:"restartPolicy,omitempty"`      EnvOverrides  []corev1.EnvVar `json:"envOverrides,omitempty"`  }   ``

proposalmd.md:144

  • The createSandboxGroup example has the same corrupted code-block formatting (Plain textANTLR...) and is collapsed onto one line, which makes the central control-plane algorithm unreadable in the proposal.
go

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML`   func (s *Server) createSandboxGroup(      ctx context.Context,      dyn dynamic.Interface,      mar *runtimev1alpha1.MultiAgentRuntime,  ) (*types.CreateGroupResponse, error) {      ordered, err := topoSort(mar.Spec.Roles)      if err != nil { return nil, err }      groupSessionID := "grp-" + uuid.New().String()      created := make([]createdRole, 0, len(ordered))      needRollback := true      defer func() {          if !needRollback { return }          rollbackCtx, cancel := context.WithTimeout(              context.Background(), groupRollbackTimeout)          defer cancel()          for i := len(created) - 1; i >= 0; i-- {              s.rollbackSandboxCreation(                  dyn, created[i].sandbox, nil, created[i].sessionID)          }          _ = s.k8sClient.DeleteHeadlessService(rollbackCtx, mar.Namespace, groupSessionID)      }()      svc, err := s.k8sClient.CreateHeadlessService(ctx, mar, groupSessionID)      if err != nil { return nil, err }      var coordinator *createdRole      for _, role := range ordered {          sandbox, entry, err := buildSandboxByAgentRuntime(              mar.Namespace, role.RuntimeRef, s.informers)          if err != nil { return nil, err }          sandbox.Labels[GroupLabel] = groupSessionID          sandbox.Labels[RoleLabel]  = role.Name          sandbox.Spec.PodTemplate.Spec.Subdomain = groupSessionID          sandbox.Spec.PodTemplate.Spec.Hostname  = role.Name          injectGroupEnvVars(&sandbox.Spec.PodTemplate, role, ordered, mar)          resultChan := s.sandboxController.WatchSandboxOnce(ctx, sandbox.Namespace, sandbox.Name)          resp, err := s.createSandbox(ctx, dyn, sandbox, nil, entry, resultChan)          s.sandboxController.UnWatchSandbox(sandbox.Namespace, sandbox.Name)          if err != nil {              if mar.Spec.Lifecycle.StartupPolicy == BestEffort && !role.Coordinator {                  appendDegradedRole(mar, role.Name, err)                  continue              }              return nil, fmt.Errorf("role %s: %w", role.Name, err)          }          cr := createdRole{name: role.Name, sandbox: sandbox,              sessionID: resp.SessionID, resp: resp}          created = append(created, cr)          if role.Coordinator { coordinator = &cr }      }      if coordinator == nil {          return nil, errors.New("group has no coordinator role")      }      if err := s.storeClient.SaveAgentGroup(ctx, groupSessionID,          buildGroupManifest(mar, created, svc, coordinator)); err != nil {          return nil, err      }      needRollback = false      return &types.CreateGroupResponse{          GroupSessionID:       groupSessionID,          CoordinatorSessionID: coordinator.sessionID,          ServiceName:          svc.Name,          Roles:                buildRoleSummaries(created),      }, nil  }   `

proposalmd.md:152

  • The store interface snippet is not a valid fenced code block and is collapsed onto one line with pasted editor metadata, making the proposed API additions difficult to review or copy accurately.
go

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML`   SaveAgentGroup(ctx context.Context, manifest *types.AgentGroupManifest) error  GetAgentGroup(ctx context.Context, groupSessionID string) (*types.AgentGroupManifest, error)  DeleteAgentGroup(ctx context.Context, groupSessionID string) error   `

proposalmd.md:160

  • The HTTP route snippet is malformed and flattened into a single line with pasted language-selector text, so the documented routes are not readable as a code example.
go

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML`   v1Group.POST("/multi-agent-runtime", s.handleMultiAgentRuntimeCreate)  v1Group.DELETE("/multi-agent-runtime/groups/:groupId", s.handleDeleteAgentGroup)  v1Group.GET("/multi-agent-runtime/groups/:groupId/topology", s.handleGetGroupTopology)   `

proposalmd.md:170

  • The headless Service example is collapsed into a malformed inline block with pasted editor metadata, which prevents the Kubernetes resource design from being reviewed as formatted Go code.
go

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML`   func (k *K8sClient) CreateHeadlessService(      ctx context.Context,      mar *runtimev1alpha1.MultiAgentRuntime,      groupSessionID string,  ) (*corev1.Service, error) {      svc := &corev1.Service{          ObjectMeta: metav1.ObjectMeta{              Name:      groupSessionID,              Namespace: mar.Namespace,              OwnerReferences: []metav1.OwnerReference{{                  APIVersion:         mar.APIVersion,                  Kind:               mar.Kind,                  Name:               mar.Name,                  UID:                mar.UID,                  Controller:         ptr.To(true),                  BlockOwnerDeletion: ptr.To(true),              }},              Labels: map[string]string{GroupLabel: groupSessionID},          },          Spec: corev1.ServiceSpec{              ClusterIP: corev1.ClusterIPNone,              Selector:  map[string]string{GroupLabel: groupSessionID},              PublishNotReadyAddresses: false,          },      }      return k.clientset.CoreV1().Services(mar.Namespace).Create(          ctx, svc, metav1.CreateOptions{})  }   `

proposalmd.md:178

  • The files-to-modify section is flattened into a single unstructured line instead of a Markdown table, making it difficult to tell which paths are new, modified, generated, or optional.
PathChangepkg/apis/runtime/v1alpha1/multiagentruntime\_types.goNEW — CRD type definitionpkg/apis/runtime/v1alpha1/register.goMODIFIED — MultiAgentRuntimeKind + GVK blockmanifests/charts/base/crds/runtime.agentcube.volcano.sh\_multiagentruntimes.yamlNEW — generated by make gen-crdpkg/common/types/types.goMODIFIED — MultiAgentRuntimeKind constantpkg/common/types/group.goNEW — AgentGroupManifest, request/response DTOspkg/workloadmanager/multiagent.goNEW — createSandboxGrouppkg/workloadmanager/multiagent\_controller.goNEW (phase 4) — reconcilerpkg/workloadmanager/k8s\_client.goMODIFIED — CreateHeadlessService, DeleteHeadlessServicepkg/workloadmanager/server.goMODIFIED — three new routespkg/workloadmanager/handlers.goMODIFIED — handleMultiAgentRuntimeCreate, handleDeleteAgentGroup, handleGetGroupTopologypkg/store/interface.goMODIFIED — three new methodspkg/store/store\_redis.go / store\_valkey.goMODIFIED — implement three methods + testspkg/router/session\_manager.goMODIFIED — switch-case for MultiAgentRuntimeKindsdk-python/agentcube/multi\_agent.pyNEW — MultiAgentRuntimeClientexample/multi-agent-pcap/NEW — pcap-analyzer rewritten as MultiAgentRuntimedocs/design/multi-agent-design.mdNEW — design docdocs/devguide/multi-agent.mdNEW — user guide with NetworkPolicy exampletest/e2e/e2e\_test.goMODIFIED — TestMultiAgentCreate, TestMultiAgentRollback, TestMultiAgentBestEffort

proposalmd.md:182

  • The 12-week timeline is collapsed into one long line, so week numbers, date ranges, phases, and deliverables run together and the project plan is effectively unreadable.
WeekDatesPhaseKey Deliverable1Jun 2–8BondingDesign doc Discussion published; draft PR open with file layout; notes posted to #3012Jun 9–15Bonding→CorePR 1 merged: CRD-only (multiagentruntime\_types.go, codegen, Kind constants)3Jun 16–22CoreStore extensions (SaveAgentGroup, GetAgentGroup, DeleteAgentGroup) on Redis + Valkey; AgentGroupManifest type4Jun 23–29CorePR 2 opened: createSandboxGroup (Atomic only), 3 new HTTP routes, session\_manager switch-case5Jun 30–Jul 6CoreCreateHeadlessService wired; DNS verified end-to-end; PR 2 merged6Jul 7–13TopologyPR 3: topoSort + cycle detection, dependency env-var injection, handleGetGroupTopology7Jul 14–20Topologypcap-analyzer rewritten as MultiAgentRuntime (example/multi-agent-pcap/); E2E added8Jul 21–27BufferBuffer / mid-project review; load-test (100 groups × 3 roles) if no slippage9Jul 28–Aug 3PolishBestEffort startup policy; admission webhook (coordinator uniqueness, runtimeRef, cycle, role-name)10Aug 4–10PolishPR 4 opened: BestEffort + webhook + Python SDK (MultiAgentRuntimeClient) + docs11Aug 11–17PolishMultiAgentRuntimeReconciler (feature-flagged); Prometheus metrics wired; PR 4 review iterations12Aug 18–24PolishPR 4 merged; blog post draft; LFX final report; issue #301 closed with PR links

proposalmd.md:222

  • The documentation plan is flattened into a single line, losing the intended audience/output/location/phase columns and making the doc deliverables hard to verify.
AudienceOutputWherePhaseMentor + maintainersDesign doc (Sections 5+6 summary)GitHub DiscussionWeek 1Mentor + maintainersdocs/design/multi-agent-design.md (canonical)RepoWeek 10Operatorsdocs/devguide/multi-agent.md (user guide + NetworkPolicy example)RepoWeek 10Operatorsexample/multi-agent-pcap/ with READMERepoWeek 7SDK userssdk-python/examples/multi\_agent\_usage.pyRepoWeek 10Wider communityBlog post on volcano.sh/blogVolcano blogWeek 12

proposalmd.md:226

  • The risk table is flattened into one run-on line, so individual risks, likelihoods, impacts, and mitigations cannot be reliably distinguished.
#RiskLikelihoodImpactMitigationR1Mentor disagrees with headless-Service-DNS transportMediumMediumWeek 1 design doc proposes ServiceDNS with named alternatives; switch cost ≤4 daysR2Another applicant merges similar work in parallelMediumHighCollaboration over competition: will propose pairing on PR 2 if both selectedR3agent-sandbox upstream API changes mid-mentorshipMediumHighPin to v0.1.1 in go.mod; raise flag in mentor sync if upstream cuts v0.2.0R4Scope creep (A2A / MCP / streaming / dashboards)HighMediumSection 14 enumerates follow-ups; new asks become separate issues, not PR ridersR516 roles × 8 replicas = 128 sandboxes in one call (DoS)LowHighSchema-level caps (MaxItems=16, Maximum=8); admission webhook enforces per-namespace quotaR6Personal scheduling conflictMediumLowWeek 8 buffer absorbs; will pre-load PR work by one week if conflict known >2 weeks aheadR7Reconciler scope too large to finish in week 11MediumLowGated behind AGENTCUBE\_GROUP\_SELF\_HEAL flag; can ship as follow-up

proposalmd.md:245

  • The candidate-evidence table is collapsed into a single paragraph, which makes the claimed project needs, evidence, and gaps difficult to map to each other.
Project NeedMy EvidenceGap to CloseGo (1.22+) on production servicesOpen-source PRs above (OpenKruise, Volcano, KubeStellar — all Go-based)NoneKubernetes controller-runtime / kubebuilderOpenKruise #2303, #2313; Volcano #695 — all touch controller reconciliation pathsNoneCRD design, OpenAPI schema, admission webhooksVolcano #692 adds a new API surface; familiarity through coursework and personal K8s projectsNoneclient-go informers, dynamic clientApplied in Volcano #695 E2E scaffolding; personal homelab controllersNoneRedis / Valkey integrationPersonal projects (Valkey API is Redis-compatible)NoneK8s Services, headless DNS, NetworkPolicyStudied in depth for this proposal; applied in homelab multi-service setupsNoneAgent ecosystem (LangGraph, MCP, A2A)Experimented with LangGraph pipelines; followed MCP/A2A specs closelyNoneagent-sandbox (sigs.k8s.io) internalsNo prior PRs — see 2-week reading plan below2-week rampAgentCube codebase specificallyNo prior PRs — all file/line citations in Sections 4–6 personally verified2-week ramp

Comment thread proposalmd.md
@@ -0,0 +1,345 @@
LFX Mentorship 2026 — Term 2
Comment thread proposalmd.md

Open-source contributions demonstrating this background:

Repository / PRDescriptionKubeStellar #648Japanese i18n translations; eliminated externally hardcoded strings across user-facing elements to improve maintainability.KubeStellar #623Italian translations across components; removed hardcoded strings from external sources for better global accessibility.OpenKruise #2303Fixed livenessProbe handling in the manager's YAML processing; improved pod lifecycle reliability with Kubernetes probes.OpenKruise #2302Resolved stuck deployments caused by image-tag updates where the digest was unchanged, ensuring correct rollout execution.OpenKruise #2313Replaced static TODO placeholders in the CR controller with dynamic logic for reliable container-image propagation.Volcano #442Updated the Volcano README with bug fixes, new sections on project structure and contribution guidelines, refined setup instructions.Volcano #692Enabled AI/ML model (v7.8) support in the Kthena router for open-source LLM integration; new server configs, traffic routing examples, and local testing instructions.Volcano #695Implemented E2E tests covering the full ModelServing lifecycle (create/update/delete), validating multi-phase workflows and timeouts.
"resolved": "https://registry.npmjs.org/@algolia/client-search/-/client-search-5.46.2.tgz",
"integrity": "sha512-ZsOJqu4HOG5BlvIFnMU0YKjQ9ZI6r3C31dg2jk5kMWPSdhJpYL9xa5hEe7aieE+707dXeMI4ej3diy6mXdZpgA==",
"license": "MIT",
"peer": true,
Comment thread proposalmd.md

go

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` func (s *Server) createSandboxGroup( ctx context.Context, dyn dynamic.Interface, mar *runtimev1alpha1.MultiAgentRuntime, ) (*types.CreateGroupResponse, error) { ordered, err := topoSort(mar.Spec.Roles) if err != nil { return nil, err } groupSessionID := "grp-" + uuid.New().String() created := make([]createdRole, 0, len(ordered)) needRollback := true defer func() { if !needRollback { return } rollbackCtx, cancel := context.WithTimeout( context.Background(), groupRollbackTimeout) defer cancel() for i := len(created) - 1; i >= 0; i-- { s.rollbackSandboxCreation( dyn, created[i].sandbox, nil, created[i].sessionID) } _ = s.k8sClient.DeleteHeadlessService(rollbackCtx, mar.Namespace, groupSessionID) }() svc, err := s.k8sClient.CreateHeadlessService(ctx, mar, groupSessionID) if err != nil { return nil, err } var coordinator *createdRole for _, role := range ordered { sandbox, entry, err := buildSandboxByAgentRuntime( mar.Namespace, role.RuntimeRef, s.informers) if err != nil { return nil, err } sandbox.Labels[GroupLabel] = groupSessionID sandbox.Labels[RoleLabel] = role.Name sandbox.Spec.PodTemplate.Spec.Subdomain = groupSessionID sandbox.Spec.PodTemplate.Spec.Hostname = role.Name injectGroupEnvVars(&sandbox.Spec.PodTemplate, role, ordered, mar) resultChan := s.sandboxController.WatchSandboxOnce(ctx, sandbox.Namespace, sandbox.Name) resp, err := s.createSandbox(ctx, dyn, sandbox, nil, entry, resultChan) s.sandboxController.UnWatchSandbox(sandbox.Namespace, sandbox.Name) if err != nil { if mar.Spec.Lifecycle.StartupPolicy == BestEffort && !role.Coordinator { appendDegradedRole(mar, role.Name, err) continue } return nil, fmt.Errorf("role %s: %w", role.Name, err) } cr := createdRole{name: role.Name, sandbox: sandbox, sessionID: resp.SessionID, resp: resp} created = append(created, cr) if role.Coordinator { coordinator = &cr } } if coordinator == nil { return nil, errors.New("group has no coordinator role") } if err := s.storeClient.SaveAgentGroup(ctx, groupSessionID, buildGroupManifest(mar, created, svc, coordinator)); err != nil { return nil, err } needRollback = false return &types.CreateGroupResponse{ GroupSessionID: groupSessionID, CoordinatorSessionID: coordinator.sessionID, ServiceName: svc.Name, Roles: buildRoleSummaries(created), }, nil } `
Comment thread proposalmd.md

go

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` func (s *Server) createSandboxGroup( ctx context.Context, dyn dynamic.Interface, mar *runtimev1alpha1.MultiAgentRuntime, ) (*types.CreateGroupResponse, error) { ordered, err := topoSort(mar.Spec.Roles) if err != nil { return nil, err } groupSessionID := "grp-" + uuid.New().String() created := make([]createdRole, 0, len(ordered)) needRollback := true defer func() { if !needRollback { return } rollbackCtx, cancel := context.WithTimeout( context.Background(), groupRollbackTimeout) defer cancel() for i := len(created) - 1; i >= 0; i-- { s.rollbackSandboxCreation( dyn, created[i].sandbox, nil, created[i].sessionID) } _ = s.k8sClient.DeleteHeadlessService(rollbackCtx, mar.Namespace, groupSessionID) }() svc, err := s.k8sClient.CreateHeadlessService(ctx, mar, groupSessionID) if err != nil { return nil, err } var coordinator *createdRole for _, role := range ordered { sandbox, entry, err := buildSandboxByAgentRuntime( mar.Namespace, role.RuntimeRef, s.informers) if err != nil { return nil, err } sandbox.Labels[GroupLabel] = groupSessionID sandbox.Labels[RoleLabel] = role.Name sandbox.Spec.PodTemplate.Spec.Subdomain = groupSessionID sandbox.Spec.PodTemplate.Spec.Hostname = role.Name injectGroupEnvVars(&sandbox.Spec.PodTemplate, role, ordered, mar) resultChan := s.sandboxController.WatchSandboxOnce(ctx, sandbox.Namespace, sandbox.Name) resp, err := s.createSandbox(ctx, dyn, sandbox, nil, entry, resultChan) s.sandboxController.UnWatchSandbox(sandbox.Namespace, sandbox.Name) if err != nil { if mar.Spec.Lifecycle.StartupPolicy == BestEffort && !role.Coordinator { appendDegradedRole(mar, role.Name, err) continue } return nil, fmt.Errorf("role %s: %w", role.Name, err) } cr := createdRole{name: role.Name, sandbox: sandbox, sessionID: resp.SessionID, resp: resp} created = append(created, cr) if role.Coordinator { coordinator = &cr } } if coordinator == nil { return nil, errors.New("group has no coordinator role") } if err := s.storeClient.SaveAgentGroup(ctx, groupSessionID, buildGroupManifest(mar, created, svc, coordinator)); err != nil { return nil, err } needRollback = false return &types.CreateGroupResponse{ GroupSessionID: groupSessionID, CoordinatorSessionID: coordinator.sessionID, ServiceName: svc.Name, Roles: buildRoleSummaries(created), }, nil } `
Comment thread proposalmd.md

go

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` func (s *Server) createSandboxGroup( ctx context.Context, dyn dynamic.Interface, mar *runtimev1alpha1.MultiAgentRuntime, ) (*types.CreateGroupResponse, error) { ordered, err := topoSort(mar.Spec.Roles) if err != nil { return nil, err } groupSessionID := "grp-" + uuid.New().String() created := make([]createdRole, 0, len(ordered)) needRollback := true defer func() { if !needRollback { return } rollbackCtx, cancel := context.WithTimeout( context.Background(), groupRollbackTimeout) defer cancel() for i := len(created) - 1; i >= 0; i-- { s.rollbackSandboxCreation( dyn, created[i].sandbox, nil, created[i].sessionID) } _ = s.k8sClient.DeleteHeadlessService(rollbackCtx, mar.Namespace, groupSessionID) }() svc, err := s.k8sClient.CreateHeadlessService(ctx, mar, groupSessionID) if err != nil { return nil, err } var coordinator *createdRole for _, role := range ordered { sandbox, entry, err := buildSandboxByAgentRuntime( mar.Namespace, role.RuntimeRef, s.informers) if err != nil { return nil, err } sandbox.Labels[GroupLabel] = groupSessionID sandbox.Labels[RoleLabel] = role.Name sandbox.Spec.PodTemplate.Spec.Subdomain = groupSessionID sandbox.Spec.PodTemplate.Spec.Hostname = role.Name injectGroupEnvVars(&sandbox.Spec.PodTemplate, role, ordered, mar) resultChan := s.sandboxController.WatchSandboxOnce(ctx, sandbox.Namespace, sandbox.Name) resp, err := s.createSandbox(ctx, dyn, sandbox, nil, entry, resultChan) s.sandboxController.UnWatchSandbox(sandbox.Namespace, sandbox.Name) if err != nil { if mar.Spec.Lifecycle.StartupPolicy == BestEffort && !role.Coordinator { appendDegradedRole(mar, role.Name, err) continue } return nil, fmt.Errorf("role %s: %w", role.Name, err) } cr := createdRole{name: role.Name, sandbox: sandbox, sessionID: resp.SessionID, resp: resp} created = append(created, cr) if role.Coordinator { coordinator = &cr } } if coordinator == nil { return nil, errors.New("group has no coordinator role") } if err := s.storeClient.SaveAgentGroup(ctx, groupSessionID, buildGroupManifest(mar, created, svc, coordinator)); err != nil { return nil, err } needRollback = false return &types.CreateGroupResponse{ GroupSessionID: groupSessionID, CoordinatorSessionID: coordinator.sessionID, ServiceName: svc.Name, Roles: buildRoleSummaries(created), }, nil } `
Comment thread proposalmd.md
Comment on lines +152 to +154
Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` SaveAgentGroup(ctx context.Context, manifest *types.AgentGroupManifest) error GetAgentGroup(ctx context.Context, groupSessionID string) (*types.AgentGroupManifest, error) DeleteAgentGroup(ctx context.Context, groupSessionID string) error `

Storage layout: group:{grp-xxx} HASH (manifest JSON) + group:active ZSET (for GC scan).
Comment thread proposalmd.md
Comment on lines +162 to +170
The router does not gain new routes. One switch-case addition in session\_manager.go:117–126 routes MultiAgentRuntimeKind create calls to the new endpoint.

#### 6.5 Headless Service per group

CreateHeadlessService in k8s\_client.go (~30 LOC): ClusterIP: None, selector {group: groupSessionID}, OwnerReferences pointing to the MultiAgentRuntime CR. Deleting the CR cascades to the Service via K8s GC. Manual cleanup in the rollback path is only needed if the CR itself was never created.

go

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` func (k *K8sClient) CreateHeadlessService( ctx context.Context, mar *runtimev1alpha1.MultiAgentRuntime, groupSessionID string, ) (*corev1.Service, error) { svc := &corev1.Service{ ObjectMeta: metav1.ObjectMeta{ Name: groupSessionID, Namespace: mar.Namespace, OwnerReferences: []metav1.OwnerReference{{ APIVersion: mar.APIVersion, Kind: mar.Kind, Name: mar.Name, UID: mar.UID, Controller: ptr.To(true), BlockOwnerDeletion: ptr.To(true), }}, Labels: map[string]string{GroupLabel: groupSessionID}, }, Spec: corev1.ServiceSpec{ ClusterIP: corev1.ClusterIPNone, Selector: map[string]string{GroupLabel: groupSessionID}, PublishNotReadyAddresses: false, }, } return k.clientset.CoreV1().Services(mar.Namespace).Create( ctx, svc, metav1.CreateOptions{}) } `
Comment thread proposalmd.md

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` v1Group.POST("/multi-agent-runtime", s.handleMultiAgentRuntimeCreate) v1Group.DELETE("/multi-agent-runtime/groups/:groupId", s.handleDeleteAgentGroup) v1Group.GET("/multi-agent-runtime/groups/:groupId/topology", s.handleGetGroupTopology) `

The router does not gain new routes. One switch-case addition in session\_manager.go:117–126 routes MultiAgentRuntimeKind create calls to the new endpoint.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive mentorship proposal for implementing multi-AgentCube capabilities, featuring a new MultiAgentRuntime CRD and orchestration logic. Review feedback highlights the need to revert unrelated dependency noise in package-lock.json and relocate the proposal file to the documentation directory. Additionally, several corrections are requested to fix broken Markdown formatting in tables and code blocks, and to address a logic gap in the pseudo-code regarding role replicas.

"resolved": "https://registry.npmjs.org/@algolia/client-search/-/client-search-5.46.2.tgz",
"integrity": "sha512-ZsOJqu4HOG5BlvIFnMU0YKjQ9ZI6r3C31dg2jk5kMWPSdhJpYL9xa5hEe7aieE+707dXeMI4ej3diy6mXdZpgA==",
"license": "MIT",
"peer": true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The package-lock.json file contains numerous changes adding "peer": true to various dependencies. These changes appear unrelated to the addition of the mentorship proposal and add significant noise to the pull request. Please revert these changes unless they are strictly necessary for the documentation site's functionality.

Comment thread proposalmd.md
@@ -0,0 +1,345 @@
LFX Mentorship 2026 — Term 2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The proposal file proposalmd.md is currently located in the root directory. To maintain a clean project structure and follow existing conventions, consider moving it to a more appropriate location such as docs/proposals/ or docs/agentcube/proposals/.

Comment thread proposalmd.md

Open-source contributions demonstrating this background:

Repository / PRDescriptionKubeStellar #648Japanese i18n translations; eliminated externally hardcoded strings across user-facing elements to improve maintainability.KubeStellar #623Italian translations across components; removed hardcoded strings from external sources for better global accessibility.OpenKruise #2303Fixed livenessProbe handling in the manager's YAML processing; improved pod lifecycle reliability with Kubernetes probes.OpenKruise #2302Resolved stuck deployments caused by image-tag updates where the digest was unchanged, ensuring correct rollout execution.OpenKruise #2313Replaced static TODO placeholders in the CR controller with dynamic logic for reliable container-image propagation.Volcano #442Updated the Volcano README with bug fixes, new sections on project structure and contribution guidelines, refined setup instructions.Volcano #692Enabled AI/ML model (v7.8) support in the Kthena router for open-source LLM integration; new server configs, traffic routing examples, and local testing instructions.Volcano #695Implemented E2E tests covering the full ModelServing lifecycle (create/update/delete), validating multi-phase workflows and timeouts.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The table formatting for the open-source contributions is broken. It appears the content was pasted without the necessary Markdown table syntax (pipes and separators).

Suggested change
Repository / PRDescriptionKubeStellar #648Japanese i18n translations; eliminated externally hardcoded strings across user-facing elements to improve maintainability.KubeStellar #623Italian translations across components; removed hardcoded strings from external sources for better global accessibility.OpenKruise #2303Fixed livenessProbe handling in the manager's YAML processing; improved pod lifecycle reliability with Kubernetes probes.OpenKruise #2302Resolved stuck deployments caused by image-tag updates where the digest was unchanged, ensuring correct rollout execution.OpenKruise #2313Replaced static TODO placeholders in the CR controller with dynamic logic for reliable container-image propagation.Volcano #442Updated the Volcano README with bug fixes, new sections on project structure and contribution guidelines, refined setup instructions.Volcano #692Enabled AI/ML model (v7.8) support in the Kthena router for open-source LLM integration; new server configs, traffic routing examples, and local testing instructions.Volcano #695Implemented E2E tests covering the full ModelServing lifecycle (create/update/delete), validating multi-phase workflows and timeouts.
| Repository / PR | Description |
| :--- | :--- |
| KubeStellar #648 | Japanese i18n translations; eliminated externally hardcoded strings across user-facing elements to improve maintainability. |
| KubeStellar #623 | Italian translations across components; removed hardcoded strings from external sources for better global accessibility. |
| OpenKruise #2303 | Fixed livenessProbe handling in the manager's YAML processing; improved pod lifecycle reliability with Kubernetes probes. |
| OpenKruise #2302 | Resolved stuck deployments caused by image-tag updates where the digest was unchanged, ensuring correct rollout execution. |
| OpenKruise #2313 | Replaced static TODO placeholders in the CR controller with dynamic logic for reliable container-image propagation. |
| Volcano #442 | Updated the Volcano README with bug fixes, new sections on project structure and contribution guidelines, refined setup instructions. |
| Volcano #692 | Enabled AI/ML model (v7.8) support in the Kthena router for open-source LLM integration; new server configs, traffic routing examples, and local testing instructions. |
| Volcano #695 | Implemented E2E tests covering the full ModelServing lifecycle (create/update/delete), validating multi-phase workflows and timeouts. |

Comment thread proposalmd.md

### 1\. Title & Metadata

FieldValueProjectCNCF Volcano / AgentCubeMentorshipLFX 2026 Term 2 (Jun – Aug 2026)TitleSupport multi-AgentCube CapabilityTracking Issuevolcano-sh/agentcube#301Parent Designvolcano-sh/volcano#4686Target BranchmainApplicantAadi ShahGitHubAAdIprogEmail[aadishah132@gmail.com](mailto:aadishah132@gmail.com)TimezoneIST (UTC+5:30)Weekly Hours30–40 hrs/week (weeks 8 and 11 used as buffer)Mentor@hzxuzhonghu (Zhonghu Xu)Proposal Versionv1.0Date2026-05-16
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The metadata table formatting is broken. Please use standard Markdown table syntax for better readability.

Suggested change
FieldValueProjectCNCF Volcano / AgentCubeMentorshipLFX 2026 Term 2 (Jun – Aug 2026)TitleSupport multi-AgentCube CapabilityTracking Issuevolcano-sh/agentcube#301Parent Designvolcano-sh/volcano#4686Target BranchmainApplicantAadi ShahGitHubAAdIprogEmail[aadishah132@gmail.com](mailto:aadishah132@gmail.com)TimezoneIST (UTC+5:30)Weekly Hours30–40 hrs/week (weeks 8 and 11 used as buffer)Mentor@hzxuzhonghu (Zhonghu Xu)Proposal Versionv1.0Date2026-05-16
| Field | Value |
| :--- | :--- |
| Project | CNCF Volcano / AgentCube |
| Mentorship | LFX 2026 Term 2 (Jun – Aug 2026) |
| Title | Support multi-AgentCube Capability |
| Tracking Issue | volcano-sh/agentcube#301 |
| Parent Design | volcano-sh/volcano#4686 |
| Target Branch | main |
| Applicant | Aadi Shah |
| GitHub | AAdIprog |
| Email | [aadishah132@gmail.com](mailto:aadishah132@gmail.com) |
| Timezone | IST (UTC+5:30) |
| Weekly Hours | 30–40 hrs/week (weeks 8 and 11 used as buffer) |
| Mentor | @hzxuzhonghu (Zhonghu Xu) |
| Proposal Version | v1.0 |
| Date | 2026-05-16 |

Comment thread proposalmd.md

#### Pain points

Pain PointWhere in CodeUser must manually orchestrate 3 separate sandbox lifecyclespcap\_analyzer.py (whole file)Inter-agent endpoints must be hand-discovered (env vars or hardcoded)pcap-analyzer/deployment.yaml:50–58Failure in one agent leaves siblings running — no cleanupNot implemented anywhereNo shared session: each agent gets its own x-agentcube-session-idhandlers.go:266–271No group-level GC: sandboxes survive until idle timeout if user code crashesgarbage\_collection.go operates per-sandbox
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pain points table formatting is broken. Proper Markdown table syntax is required for it to render correctly.

Suggested change
Pain PointWhere in CodeUser must manually orchestrate 3 separate sandbox lifecyclespcap\_analyzer.py (whole file)Inter-agent endpoints must be hand-discovered (env vars or hardcoded)pcap-analyzer/deployment.yaml:50–58Failure in one agent leaves siblings running — no cleanupNot implemented anywhereNo shared session: each agent gets its own x-agentcube-session-idhandlers.go:266–271No group-level GC: sandboxes survive until idle timeout if user code crashesgarbage\_collection.go operates per-sandbox
| Pain Point | Where in Code |
| :--- | :--- |
| User must manually orchestrate 3 separate sandbox lifecycles | pcap_analyzer.py (whole file) |
| Inter-agent endpoints must be hand-discovered (env vars or hardcoded) | pcap-analyzer/deployment.yaml:50–58 |
| Failure in one agent leaves siblings running — no cleanup | Not implemented anywhere |
| No shared session: each agent gets its own x-agentcube-session-id | handlers.go:266–271 |
| No group-level GC: sandboxes survive until idle timeout if user code crashes | garbage_collection.go operates per-sandbox |

Comment thread proposalmd.md

go

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML`` // +kubebuilder:object:root=true // +kubebuilder:subresource:status // +kubebuilder:resource:scope=Namespaced type MultiAgentRuntime struct { metav1.TypeMeta `json:",inline"` metav1.ObjectMeta `json:"metadata,omitempty"` Spec MultiAgentRuntimeSpec `json:"spec"` Status MultiAgentRuntimeStatus `json:"status,omitempty"` } type MultiAgentRuntimeSpec struct { // +kubebuilder:validation:MinItems=2 // +kubebuilder:validation:MaxItems=16 Roles []AgentRoleSpec `json:"roles"` Lifecycle GroupLifecycle `json:"lifecycle,omitempty"` // +kubebuilder:default={type: ServiceDNS} Communication GroupCommunication `json:"communication,omitempty"` // +kubebuilder:default="15m" SessionTimeout *metav1.Duration `json:"sessionTimeout,omitempty"` // +kubebuilder:default="8h" MaxSessionDuration *metav1.Duration `json:"maxSessionDuration,omitempty"` } type AgentRoleSpec struct { // +kubebuilder:validation:Pattern="^[a-z0-9]([-a-z0-9]*[a-z0-9])?$" Name string `json:"name"` RuntimeRef string `json:"runtimeRef"` Coordinator bool `json:"coordinator,omitempty"` Dependencies []string `json:"dependencies,omitempty"` // +kubebuilder:default=1 // +kubebuilder:validation:Maximum=8 Replicas *int32 `json:"replicas,omitempty"` // +kubebuilder:default="Never" // +kubebuilder:validation:Enum=Never;OnFailure RestartPolicy RolePolicy `json:"restartPolicy,omitempty"` EnvOverrides []corev1.EnvVar `json:"envOverrides,omitempty"` } ``
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The code block for the CRD definition has incorrect formatting, including a long list of language identifiers and missing newlines. This makes the snippet unreadable.

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:resource:scope=Namespaced
type MultiAgentRuntime struct {
    metav1.TypeMeta   `json:",inline"` 
    metav1.ObjectMeta `json:"metadata,omitempty"` 
    Spec   MultiAgentRuntimeSpec   `json:"spec"` 
    Status MultiAgentRuntimeStatus `json:"status,omitempty"` 
}

type MultiAgentRuntimeSpec struct {
    // +kubebuilder:validation:MinItems=2
    // +kubebuilder:validation:MaxItems=16
    Roles         []AgentRoleSpec    `json:"roles"` 
    Lifecycle     GroupLifecycle     `json:"lifecycle,omitempty"` 
    // +kubebuilder:default={type: ServiceDNS}
    Communication GroupCommunication `json:"communication,omitempty"` 
    // +kubebuilder:default="15m"
    SessionTimeout     *metav1.Duration `json:"sessionTimeout,omitempty"` 
    // +kubebuilder:default="8h"
    MaxSessionDuration *metav1.Duration `json:"maxSessionDuration,omitempty"` 
}

type AgentRoleSpec struct {
    // +kubebuilder:validation:Pattern="^[a-z0-9]([-a-z0-9]*[a-z0-9])?$"
    Name        string           `json:"name"` 
    RuntimeRef  string           `json:"runtimeRef"` 
    Coordinator bool             `json:"coordinator,omitempty"` 
    Dependencies []string        `json:"dependencies,omitempty"` 
    // +kubebuilder:default=1
    // +kubebuilder:validation:Maximum=8
    Replicas      *int32         `json:"replicas,omitempty"` 
    // +kubebuilder:default="Never"
    // +kubebuilder:validation:Enum=Never;OnFailure
    RestartPolicy RolePolicy     `json:"restartPolicy,omitempty"` 
    EnvOverrides  []corev1.EnvVar `json:"envOverrides,omitempty"` 
}

Comment thread proposalmd.md

go

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` func (s *Server) createSandboxGroup( ctx context.Context, dyn dynamic.Interface, mar *runtimev1alpha1.MultiAgentRuntime, ) (*types.CreateGroupResponse, error) { ordered, err := topoSort(mar.Spec.Roles) if err != nil { return nil, err } groupSessionID := "grp-" + uuid.New().String() created := make([]createdRole, 0, len(ordered)) needRollback := true defer func() { if !needRollback { return } rollbackCtx, cancel := context.WithTimeout( context.Background(), groupRollbackTimeout) defer cancel() for i := len(created) - 1; i >= 0; i-- { s.rollbackSandboxCreation( dyn, created[i].sandbox, nil, created[i].sessionID) } _ = s.k8sClient.DeleteHeadlessService(rollbackCtx, mar.Namespace, groupSessionID) }() svc, err := s.k8sClient.CreateHeadlessService(ctx, mar, groupSessionID) if err != nil { return nil, err } var coordinator *createdRole for _, role := range ordered { sandbox, entry, err := buildSandboxByAgentRuntime( mar.Namespace, role.RuntimeRef, s.informers) if err != nil { return nil, err } sandbox.Labels[GroupLabel] = groupSessionID sandbox.Labels[RoleLabel] = role.Name sandbox.Spec.PodTemplate.Spec.Subdomain = groupSessionID sandbox.Spec.PodTemplate.Spec.Hostname = role.Name injectGroupEnvVars(&sandbox.Spec.PodTemplate, role, ordered, mar) resultChan := s.sandboxController.WatchSandboxOnce(ctx, sandbox.Namespace, sandbox.Name) resp, err := s.createSandbox(ctx, dyn, sandbox, nil, entry, resultChan) s.sandboxController.UnWatchSandbox(sandbox.Namespace, sandbox.Name) if err != nil { if mar.Spec.Lifecycle.StartupPolicy == BestEffort && !role.Coordinator { appendDegradedRole(mar, role.Name, err) continue } return nil, fmt.Errorf("role %s: %w", role.Name, err) } cr := createdRole{name: role.Name, sandbox: sandbox, sessionID: resp.SessionID, resp: resp} created = append(created, cr) if role.Coordinator { coordinator = &cr } } if coordinator == nil { return nil, errors.New("group has no coordinator role") } if err := s.storeClient.SaveAgentGroup(ctx, groupSessionID, buildGroupManifest(mar, created, svc, coordinator)); err != nil { return nil, err } needRollback = false return &types.CreateGroupResponse{ GroupSessionID: groupSessionID, CoordinatorSessionID: coordinator.sessionID, ServiceName: svc.Name, Roles: buildRoleSummaries(created), }, nil } `
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pseudo-code for createSandboxGroup has formatting issues and appears to miss handling for the Replicas field defined in AgentRoleSpec. If a role specifies multiple replicas, the loop should create the corresponding number of sandboxes. Additionally, it would be beneficial to specify which environment variables are injected via injectGroupEnvVars (e.g., service addresses of other roles) to clarify how inter-agent discovery works.

func (s *Server) createSandboxGroup(
    ctx context.Context,
    dyn dynamic.Interface,
    mar *runtimev1alpha1.MultiAgentRuntime,
) (*types.CreateGroupResponse, error) {
    ordered, err := topoSort(mar.Spec.Roles)
    if err != nil { return nil, err }
    groupSessionID := "grp-" + uuid.New().String()
    created := make([]createdRole, 0, len(ordered))
    needRollback := true
    defer func() {
        if !needRollback { return }
        rollbackCtx, cancel := context.WithTimeout(
            context.Background(), groupRollbackTimeout)
        defer cancel()
        for i := len(created) - 1; i >= 0; i-- {
            s.rollbackSandboxCreation(
                dyn, created[i].sandbox, nil, created[i].sessionID)
        }
        _ = s.k8sClient.DeleteHeadlessService(rollbackCtx, mar.Namespace, groupSessionID)
    }()
    svc, err := s.k8sClient.CreateHeadlessService(ctx, mar, groupSessionID)
    if err != nil { return nil, err }
    var coordinator *createdRole
    for _, role := range ordered {
        sandbox, entry, err := buildSandboxByAgentRuntime(
            mar.Namespace, role.RuntimeRef, s.informers)
        if err != nil { return nil, err }
        sandbox.Labels[GroupLabel] = groupSessionID
        sandbox.Labels[RoleLabel]  = role.Name
        sandbox.Spec.PodTemplate.Spec.Subdomain = groupSessionID
        sandbox.Spec.PodTemplate.Spec.Hostname  = role.Name
        injectGroupEnvVars(&sandbox.Spec.PodTemplate, role, ordered, mar)
        resultChan := s.sandboxController.WatchSandboxOnce(ctx, sandbox.Namespace, sandbox.Name)
        resp, err := s.createSandbox(ctx, dyn, sandbox, nil, entry, resultChan)
        s.sandboxController.UnWatchSandbox(sandbox.Namespace, sandbox.Name)
        if err != nil {
            if mar.Spec.Lifecycle.StartupPolicy == BestEffort && !role.Coordinator {
                appendDegradedRole(mar, role.Name, err)
                continue
            }
            return nil, fmt.Errorf("role %s: %w", role.Name, err)
        }
        cr := createdRole{name: role.Name, sandbox: sandbox,
            sessionID: resp.SessionID, resp: resp}
        created = append(created, cr)
        if role.Coordinator { coordinator = &cr }
    }
    if coordinator == nil {
        return nil, errors.New("group has no coordinator role")
    }
    if err := s.storeClient.SaveAgentGroup(ctx, groupSessionID,
        buildGroupManifest(mar, created, svc, coordinator)); err != nil {
        return nil, err
    }
    needRollback = false
    return &types.CreateGroupResponse{
        GroupSessionID:       groupSessionID,
        CoordinatorSessionID: coordinator.sessionID,
        ServiceName:          svc.Name,
        Roles:                buildRoleSummaries(created),
    }, nil
}

Comment thread proposalmd.md

go

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` SaveAgentGroup(ctx context.Context, manifest *types.AgentGroupManifest) error GetAgentGroup(ctx context.Context, groupSessionID string) (*types.AgentGroupManifest, error) DeleteAgentGroup(ctx context.Context, groupSessionID string) error `
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The code block for store extensions has messy language identifiers and missing newlines.

Suggested change
Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` SaveAgentGroup(ctx context.Context, manifest *types.AgentGroupManifest) error GetAgentGroup(ctx context.Context, groupSessionID string) (*types.AgentGroupManifest, error) DeleteAgentGroup(ctx context.Context, groupSessionID string) error `
SaveAgentGroup(ctx context.Context, manifest *types.AgentGroupManifest) error
GetAgentGroup(ctx context.Context, groupSessionID string) (*types.AgentGroupManifest, error)
DeleteAgentGroup(ctx context.Context, groupSessionID string) error

Comment thread proposalmd.md

go

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` v1Group.POST("/multi-agent-runtime", s.handleMultiAgentRuntimeCreate) v1Group.DELETE("/multi-agent-runtime/groups/:groupId", s.handleDeleteAgentGroup) v1Group.GET("/multi-agent-runtime/groups/:groupId/topology", s.handleGetGroupTopology) `
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The code block for new HTTP routes has messy language identifiers and missing newlines.

Suggested change
Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` v1Group.POST("/multi-agent-runtime", s.handleMultiAgentRuntimeCreate) v1Group.DELETE("/multi-agent-runtime/groups/:groupId", s.handleDeleteAgentGroup) v1Group.GET("/multi-agent-runtime/groups/:groupId/topology", s.handleGetGroupTopology) `
v1Group.POST("/multi-agent-runtime", s.handleMultiAgentRuntimeCreate)
v1Group.DELETE("/multi-agent-runtime/groups/:groupId", s.handleDeleteAgentGroup)
v1Group.GET("/multi-agent-runtime/groups/:groupId/topology", s.handleGetGroupTopology)

Comment thread proposalmd.md

go

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` func (k *K8sClient) CreateHeadlessService( ctx context.Context, mar *runtimev1alpha1.MultiAgentRuntime, groupSessionID string, ) (*corev1.Service, error) { svc := &corev1.Service{ ObjectMeta: metav1.ObjectMeta{ Name: groupSessionID, Namespace: mar.Namespace, OwnerReferences: []metav1.OwnerReference{{ APIVersion: mar.APIVersion, Kind: mar.Kind, Name: mar.Name, UID: mar.UID, Controller: ptr.To(true), BlockOwnerDeletion: ptr.To(true), }}, Labels: map[string]string{GroupLabel: groupSessionID}, }, Spec: corev1.ServiceSpec{ ClusterIP: corev1.ClusterIPNone, Selector: map[string]string{GroupLabel: groupSessionID}, PublishNotReadyAddresses: false, }, } return k.clientset.CoreV1().Services(mar.Namespace).Create( ctx, svc, metav1.CreateOptions{}) } `
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The code block for CreateHeadlessService has messy language identifiers and missing newlines.

func (k *K8sClient) CreateHeadlessService(
    ctx context.Context,
    mar *runtimev1alpha1.MultiAgentRuntime,
    groupSessionID string,
) (*corev1.Service, error) {
    svc := &corev1.Service{
        ObjectMeta: metav1.ObjectMeta{
            Name:      groupSessionID,
            Namespace: mar.Namespace,
            OwnerReferences: []metav1.OwnerReference{{
                APIVersion:         mar.APIVersion,
                Kind:               mar.Kind,
                Name:               mar.Name,
                UID:                mar.UID,
                Controller:         ptr.To(true),
                BlockOwnerDeletion: ptr.To(true),
            }},
            Labels: map[string]string{GroupLabel: groupSessionID},
        },
        Spec: corev1.ServiceSpec{
            ClusterIP: corev1.ClusterIPNone,
            Selector:  map[string]string{GroupLabel: groupSessionID},
            PublishNotReadyAddresses: false,
        },
    }
    return k.clientset.CoreV1().Services(mar.Namespace).Create(
        ctx, svc, metav1.CreateOptions{})
}

Comment thread proposalmd.md

go

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` func (s *Server) createSandboxGroup( ctx context.Context, dyn dynamic.Interface, mar *runtimev1alpha1.MultiAgentRuntime, ) (*types.CreateGroupResponse, error) { ordered, err := topoSort(mar.Spec.Roles) if err != nil { return nil, err } groupSessionID := "grp-" + uuid.New().String() created := make([]createdRole, 0, len(ordered)) needRollback := true defer func() { if !needRollback { return } rollbackCtx, cancel := context.WithTimeout( context.Background(), groupRollbackTimeout) defer cancel() for i := len(created) - 1; i >= 0; i-- { s.rollbackSandboxCreation( dyn, created[i].sandbox, nil, created[i].sessionID) } _ = s.k8sClient.DeleteHeadlessService(rollbackCtx, mar.Namespace, groupSessionID) }() svc, err := s.k8sClient.CreateHeadlessService(ctx, mar, groupSessionID) if err != nil { return nil, err } var coordinator *createdRole for _, role := range ordered { sandbox, entry, err := buildSandboxByAgentRuntime( mar.Namespace, role.RuntimeRef, s.informers) if err != nil { return nil, err } sandbox.Labels[GroupLabel] = groupSessionID sandbox.Labels[RoleLabel] = role.Name sandbox.Spec.PodTemplate.Spec.Subdomain = groupSessionID sandbox.Spec.PodTemplate.Spec.Hostname = role.Name injectGroupEnvVars(&sandbox.Spec.PodTemplate, role, ordered, mar) resultChan := s.sandboxController.WatchSandboxOnce(ctx, sandbox.Namespace, sandbox.Name) resp, err := s.createSandbox(ctx, dyn, sandbox, nil, entry, resultChan) s.sandboxController.UnWatchSandbox(sandbox.Namespace, sandbox.Name) if err != nil { if mar.Spec.Lifecycle.StartupPolicy == BestEffort && !role.Coordinator { appendDegradedRole(mar, role.Name, err) continue } return nil, fmt.Errorf("role %s: %w", role.Name, err) } cr := createdRole{name: role.Name, sandbox: sandbox, sessionID: resp.SessionID, resp: resp} created = append(created, cr) if role.Coordinator { coordinator = &cr } } if coordinator == nil { return nil, errors.New("group has no coordinator role") } if err := s.storeClient.SaveAgentGroup(ctx, groupSessionID, buildGroupManifest(mar, created, svc, coordinator)); err != nil { return nil, err } needRollback = false return &types.CreateGroupResponse{ GroupSessionID: groupSessionID, CoordinatorSessionID: coordinator.sessionID, ServiceName: svc.Name, Roles: buildRoleSummaries(created), }, nil } `
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/documentation Improvements or additions to documentation size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[lfx-mentorship-2026-June-August] Support multi-AgentCube Capability

5 participants