Proposal for review#349
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @AAdIprog! It looks like this is your first PR to volcano-sh/agentcube 🎉 |
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #349 +/- ##
==========================================
+ Coverage 47.57% 49.12% +1.55%
==========================================
Files 30 30
Lines 2819 2858 +39
==========================================
+ Hits 1341 1404 +63
+ Misses 1338 1301 -37
- Partials 140 153 +13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds an LFX mentorship proposal for MultiAgentRuntime / multi-AgentCube support, but the submitted document is currently hard to review due to placement and formatting issues, and the PR also includes unrelated lockfile churn.
Changes:
- Adds a new multi-agent capability proposal document.
- Modifies the Docusaurus
package-lock.jsonwith peer metadata changes unrelated to the proposal.
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
proposalmd.md |
New mentorship/design proposal for MultiAgentRuntime support. |
docs/agentcube/package-lock.json |
Unrelated lockfile metadata changes. |
Files not reviewed (1)
- docs/agentcube/package-lock.json: Language not supported
Comments suppressed due to low confidence (11)
proposalmd.md:32
- The metadata section is flattened into a single run-on line (
FieldValueProject...) instead of a readable Markdown table/list, so key fields such as title, applicant, branch, and date cannot be parsed reliably by readers.
FieldValueProjectCNCF Volcano / AgentCubeMentorshipLFX 2026 Term 2 (Jun – Aug 2026)TitleSupport multi-AgentCube CapabilityTracking Issuevolcano-sh/agentcube#301Parent Designvolcano-sh/volcano#4686Target BranchmainApplicantAadi ShahGitHubAAdIprogEmail[aadishah132@gmail.com](mailto:aadishah132@gmail.com)TimezoneIST (UTC+5:30)Weekly Hours30–40 hrs/week (weeks 8 and 11 used as buffer)Mentor@hzxuzhonghu (Zhonghu Xu)Proposal Versionv1.0Date2026-05-16
proposalmd.md:123
- This code example appears to contain pasted UI language-selector text and malformed backticks rather than a fenced Go block. As written, the CRD example will render as a single unreadable line and readers cannot copy or review the proposed API shape.
go
Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML`` // +kubebuilder:object:root=true // +kubebuilder:subresource:status // +kubebuilder:resource:scope=Namespaced type MultiAgentRuntime struct { metav1.TypeMeta `json:",inline"` metav1.ObjectMeta `json:"metadata,omitempty"` Spec MultiAgentRuntimeSpec `json:"spec"` Status MultiAgentRuntimeStatus `json:"status,omitempty"` } type MultiAgentRuntimeSpec struct { // +kubebuilder:validation:MinItems=2 // +kubebuilder:validation:MaxItems=16 Roles []AgentRoleSpec `json:"roles"` Lifecycle GroupLifecycle `json:"lifecycle,omitempty"` // +kubebuilder:default={type: ServiceDNS} Communication GroupCommunication `json:"communication,omitempty"` // +kubebuilder:default="15m" SessionTimeout *metav1.Duration `json:"sessionTimeout,omitempty"` // +kubebuilder:default="8h" MaxSessionDuration *metav1.Duration `json:"maxSessionDuration,omitempty"` } type AgentRoleSpec struct { // +kubebuilder:validation:Pattern="^[a-z0-9]([-a-z0-9]*[a-z0-9])?$" Name string `json:"name"` RuntimeRef string `json:"runtimeRef"` Coordinator bool `json:"coordinator,omitempty"` Dependencies []string `json:"dependencies,omitempty"` // +kubebuilder:default=1 // +kubebuilder:validation:Maximum=8 Replicas *int32 `json:"replicas,omitempty"` // +kubebuilder:default="Never" // +kubebuilder:validation:Enum=Never;OnFailure RestartPolicy RolePolicy `json:"restartPolicy,omitempty"` EnvOverrides []corev1.EnvVar `json:"envOverrides,omitempty"` } ``
proposalmd.md:144
- The
createSandboxGroupexample has the same corrupted code-block formatting (Plain textANTLR...) and is collapsed onto one line, which makes the central control-plane algorithm unreadable in the proposal.
go
Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` func (s *Server) createSandboxGroup( ctx context.Context, dyn dynamic.Interface, mar *runtimev1alpha1.MultiAgentRuntime, ) (*types.CreateGroupResponse, error) { ordered, err := topoSort(mar.Spec.Roles) if err != nil { return nil, err } groupSessionID := "grp-" + uuid.New().String() created := make([]createdRole, 0, len(ordered)) needRollback := true defer func() { if !needRollback { return } rollbackCtx, cancel := context.WithTimeout( context.Background(), groupRollbackTimeout) defer cancel() for i := len(created) - 1; i >= 0; i-- { s.rollbackSandboxCreation( dyn, created[i].sandbox, nil, created[i].sessionID) } _ = s.k8sClient.DeleteHeadlessService(rollbackCtx, mar.Namespace, groupSessionID) }() svc, err := s.k8sClient.CreateHeadlessService(ctx, mar, groupSessionID) if err != nil { return nil, err } var coordinator *createdRole for _, role := range ordered { sandbox, entry, err := buildSandboxByAgentRuntime( mar.Namespace, role.RuntimeRef, s.informers) if err != nil { return nil, err } sandbox.Labels[GroupLabel] = groupSessionID sandbox.Labels[RoleLabel] = role.Name sandbox.Spec.PodTemplate.Spec.Subdomain = groupSessionID sandbox.Spec.PodTemplate.Spec.Hostname = role.Name injectGroupEnvVars(&sandbox.Spec.PodTemplate, role, ordered, mar) resultChan := s.sandboxController.WatchSandboxOnce(ctx, sandbox.Namespace, sandbox.Name) resp, err := s.createSandbox(ctx, dyn, sandbox, nil, entry, resultChan) s.sandboxController.UnWatchSandbox(sandbox.Namespace, sandbox.Name) if err != nil { if mar.Spec.Lifecycle.StartupPolicy == BestEffort && !role.Coordinator { appendDegradedRole(mar, role.Name, err) continue } return nil, fmt.Errorf("role %s: %w", role.Name, err) } cr := createdRole{name: role.Name, sandbox: sandbox, sessionID: resp.SessionID, resp: resp} created = append(created, cr) if role.Coordinator { coordinator = &cr } } if coordinator == nil { return nil, errors.New("group has no coordinator role") } if err := s.storeClient.SaveAgentGroup(ctx, groupSessionID, buildGroupManifest(mar, created, svc, coordinator)); err != nil { return nil, err } needRollback = false return &types.CreateGroupResponse{ GroupSessionID: groupSessionID, CoordinatorSessionID: coordinator.sessionID, ServiceName: svc.Name, Roles: buildRoleSummaries(created), }, nil } `
proposalmd.md:152
- The store interface snippet is not a valid fenced code block and is collapsed onto one line with pasted editor metadata, making the proposed API additions difficult to review or copy accurately.
go
Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` SaveAgentGroup(ctx context.Context, manifest *types.AgentGroupManifest) error GetAgentGroup(ctx context.Context, groupSessionID string) (*types.AgentGroupManifest, error) DeleteAgentGroup(ctx context.Context, groupSessionID string) error `
proposalmd.md:160
- The HTTP route snippet is malformed and flattened into a single line with pasted language-selector text, so the documented routes are not readable as a code example.
go
Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` v1Group.POST("/multi-agent-runtime", s.handleMultiAgentRuntimeCreate) v1Group.DELETE("/multi-agent-runtime/groups/:groupId", s.handleDeleteAgentGroup) v1Group.GET("/multi-agent-runtime/groups/:groupId/topology", s.handleGetGroupTopology) `
proposalmd.md:170
- The headless Service example is collapsed into a malformed inline block with pasted editor metadata, which prevents the Kubernetes resource design from being reviewed as formatted Go code.
go
Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` func (k *K8sClient) CreateHeadlessService( ctx context.Context, mar *runtimev1alpha1.MultiAgentRuntime, groupSessionID string, ) (*corev1.Service, error) { svc := &corev1.Service{ ObjectMeta: metav1.ObjectMeta{ Name: groupSessionID, Namespace: mar.Namespace, OwnerReferences: []metav1.OwnerReference{{ APIVersion: mar.APIVersion, Kind: mar.Kind, Name: mar.Name, UID: mar.UID, Controller: ptr.To(true), BlockOwnerDeletion: ptr.To(true), }}, Labels: map[string]string{GroupLabel: groupSessionID}, }, Spec: corev1.ServiceSpec{ ClusterIP: corev1.ClusterIPNone, Selector: map[string]string{GroupLabel: groupSessionID}, PublishNotReadyAddresses: false, }, } return k.clientset.CoreV1().Services(mar.Namespace).Create( ctx, svc, metav1.CreateOptions{}) } `
proposalmd.md:178
- The files-to-modify section is flattened into a single unstructured line instead of a Markdown table, making it difficult to tell which paths are new, modified, generated, or optional.
PathChangepkg/apis/runtime/v1alpha1/multiagentruntime\_types.goNEW — CRD type definitionpkg/apis/runtime/v1alpha1/register.goMODIFIED — MultiAgentRuntimeKind + GVK blockmanifests/charts/base/crds/runtime.agentcube.volcano.sh\_multiagentruntimes.yamlNEW — generated by make gen-crdpkg/common/types/types.goMODIFIED — MultiAgentRuntimeKind constantpkg/common/types/group.goNEW — AgentGroupManifest, request/response DTOspkg/workloadmanager/multiagent.goNEW — createSandboxGrouppkg/workloadmanager/multiagent\_controller.goNEW (phase 4) — reconcilerpkg/workloadmanager/k8s\_client.goMODIFIED — CreateHeadlessService, DeleteHeadlessServicepkg/workloadmanager/server.goMODIFIED — three new routespkg/workloadmanager/handlers.goMODIFIED — handleMultiAgentRuntimeCreate, handleDeleteAgentGroup, handleGetGroupTopologypkg/store/interface.goMODIFIED — three new methodspkg/store/store\_redis.go / store\_valkey.goMODIFIED — implement three methods + testspkg/router/session\_manager.goMODIFIED — switch-case for MultiAgentRuntimeKindsdk-python/agentcube/multi\_agent.pyNEW — MultiAgentRuntimeClientexample/multi-agent-pcap/NEW — pcap-analyzer rewritten as MultiAgentRuntimedocs/design/multi-agent-design.mdNEW — design docdocs/devguide/multi-agent.mdNEW — user guide with NetworkPolicy exampletest/e2e/e2e\_test.goMODIFIED — TestMultiAgentCreate, TestMultiAgentRollback, TestMultiAgentBestEffort
proposalmd.md:182
- The 12-week timeline is collapsed into one long line, so week numbers, date ranges, phases, and deliverables run together and the project plan is effectively unreadable.
WeekDatesPhaseKey Deliverable1Jun 2–8BondingDesign doc Discussion published; draft PR open with file layout; notes posted to #3012Jun 9–15Bonding→CorePR 1 merged: CRD-only (multiagentruntime\_types.go, codegen, Kind constants)3Jun 16–22CoreStore extensions (SaveAgentGroup, GetAgentGroup, DeleteAgentGroup) on Redis + Valkey; AgentGroupManifest type4Jun 23–29CorePR 2 opened: createSandboxGroup (Atomic only), 3 new HTTP routes, session\_manager switch-case5Jun 30–Jul 6CoreCreateHeadlessService wired; DNS verified end-to-end; PR 2 merged6Jul 7–13TopologyPR 3: topoSort + cycle detection, dependency env-var injection, handleGetGroupTopology7Jul 14–20Topologypcap-analyzer rewritten as MultiAgentRuntime (example/multi-agent-pcap/); E2E added8Jul 21–27BufferBuffer / mid-project review; load-test (100 groups × 3 roles) if no slippage9Jul 28–Aug 3PolishBestEffort startup policy; admission webhook (coordinator uniqueness, runtimeRef, cycle, role-name)10Aug 4–10PolishPR 4 opened: BestEffort + webhook + Python SDK (MultiAgentRuntimeClient) + docs11Aug 11–17PolishMultiAgentRuntimeReconciler (feature-flagged); Prometheus metrics wired; PR 4 review iterations12Aug 18–24PolishPR 4 merged; blog post draft; LFX final report; issue #301 closed with PR links
proposalmd.md:222
- The documentation plan is flattened into a single line, losing the intended audience/output/location/phase columns and making the doc deliverables hard to verify.
AudienceOutputWherePhaseMentor + maintainersDesign doc (Sections 5+6 summary)GitHub DiscussionWeek 1Mentor + maintainersdocs/design/multi-agent-design.md (canonical)RepoWeek 10Operatorsdocs/devguide/multi-agent.md (user guide + NetworkPolicy example)RepoWeek 10Operatorsexample/multi-agent-pcap/ with READMERepoWeek 7SDK userssdk-python/examples/multi\_agent\_usage.pyRepoWeek 10Wider communityBlog post on volcano.sh/blogVolcano blogWeek 12
proposalmd.md:226
- The risk table is flattened into one run-on line, so individual risks, likelihoods, impacts, and mitigations cannot be reliably distinguished.
#RiskLikelihoodImpactMitigationR1Mentor disagrees with headless-Service-DNS transportMediumMediumWeek 1 design doc proposes ServiceDNS with named alternatives; switch cost ≤4 daysR2Another applicant merges similar work in parallelMediumHighCollaboration over competition: will propose pairing on PR 2 if both selectedR3agent-sandbox upstream API changes mid-mentorshipMediumHighPin to v0.1.1 in go.mod; raise flag in mentor sync if upstream cuts v0.2.0R4Scope creep (A2A / MCP / streaming / dashboards)HighMediumSection 14 enumerates follow-ups; new asks become separate issues, not PR ridersR516 roles × 8 replicas = 128 sandboxes in one call (DoS)LowHighSchema-level caps (MaxItems=16, Maximum=8); admission webhook enforces per-namespace quotaR6Personal scheduling conflictMediumLowWeek 8 buffer absorbs; will pre-load PR work by one week if conflict known >2 weeks aheadR7Reconciler scope too large to finish in week 11MediumLowGated behind AGENTCUBE\_GROUP\_SELF\_HEAL flag; can ship as follow-up
proposalmd.md:245
- The candidate-evidence table is collapsed into a single paragraph, which makes the claimed project needs, evidence, and gaps difficult to map to each other.
Project NeedMy EvidenceGap to CloseGo (1.22+) on production servicesOpen-source PRs above (OpenKruise, Volcano, KubeStellar — all Go-based)NoneKubernetes controller-runtime / kubebuilderOpenKruise #2303, #2313; Volcano #695 — all touch controller reconciliation pathsNoneCRD design, OpenAPI schema, admission webhooksVolcano #692 adds a new API surface; familiarity through coursework and personal K8s projectsNoneclient-go informers, dynamic clientApplied in Volcano #695 E2E scaffolding; personal homelab controllersNoneRedis / Valkey integrationPersonal projects (Valkey API is Redis-compatible)NoneK8s Services, headless DNS, NetworkPolicyStudied in depth for this proposal; applied in homelab multi-service setupsNoneAgent ecosystem (LangGraph, MCP, A2A)Experimented with LangGraph pipelines; followed MCP/A2A specs closelyNoneagent-sandbox (sigs.k8s.io) internalsNo prior PRs — see 2-week reading plan below2-week rampAgentCube codebase specificallyNo prior PRs — all file/line citations in Sections 4–6 personally verified2-week ramp
| @@ -0,0 +1,345 @@ | |||
| LFX Mentorship 2026 — Term 2 | |||
|
|
||
| Open-source contributions demonstrating this background: | ||
|
|
||
| Repository / PRDescriptionKubeStellar #648Japanese i18n translations; eliminated externally hardcoded strings across user-facing elements to improve maintainability.KubeStellar #623Italian translations across components; removed hardcoded strings from external sources for better global accessibility.OpenKruise #2303Fixed livenessProbe handling in the manager's YAML processing; improved pod lifecycle reliability with Kubernetes probes.OpenKruise #2302Resolved stuck deployments caused by image-tag updates where the digest was unchanged, ensuring correct rollout execution.OpenKruise #2313Replaced static TODO placeholders in the CR controller with dynamic logic for reliable container-image propagation.Volcano #442Updated the Volcano README with bug fixes, new sections on project structure and contribution guidelines, refined setup instructions.Volcano #692Enabled AI/ML model (v7.8) support in the Kthena router for open-source LLM integration; new server configs, traffic routing examples, and local testing instructions.Volcano #695Implemented E2E tests covering the full ModelServing lifecycle (create/update/delete), validating multi-phase workflows and timeouts. |
| "resolved": "https://registry.npmjs.org/@algolia/client-search/-/client-search-5.46.2.tgz", | ||
| "integrity": "sha512-ZsOJqu4HOG5BlvIFnMU0YKjQ9ZI6r3C31dg2jk5kMWPSdhJpYL9xa5hEe7aieE+707dXeMI4ej3diy6mXdZpgA==", | ||
| "license": "MIT", | ||
| "peer": true, |
|
|
||
| go | ||
|
|
||
| Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` func (s *Server) createSandboxGroup( ctx context.Context, dyn dynamic.Interface, mar *runtimev1alpha1.MultiAgentRuntime, ) (*types.CreateGroupResponse, error) { ordered, err := topoSort(mar.Spec.Roles) if err != nil { return nil, err } groupSessionID := "grp-" + uuid.New().String() created := make([]createdRole, 0, len(ordered)) needRollback := true defer func() { if !needRollback { return } rollbackCtx, cancel := context.WithTimeout( context.Background(), groupRollbackTimeout) defer cancel() for i := len(created) - 1; i >= 0; i-- { s.rollbackSandboxCreation( dyn, created[i].sandbox, nil, created[i].sessionID) } _ = s.k8sClient.DeleteHeadlessService(rollbackCtx, mar.Namespace, groupSessionID) }() svc, err := s.k8sClient.CreateHeadlessService(ctx, mar, groupSessionID) if err != nil { return nil, err } var coordinator *createdRole for _, role := range ordered { sandbox, entry, err := buildSandboxByAgentRuntime( mar.Namespace, role.RuntimeRef, s.informers) if err != nil { return nil, err } sandbox.Labels[GroupLabel] = groupSessionID sandbox.Labels[RoleLabel] = role.Name sandbox.Spec.PodTemplate.Spec.Subdomain = groupSessionID sandbox.Spec.PodTemplate.Spec.Hostname = role.Name injectGroupEnvVars(&sandbox.Spec.PodTemplate, role, ordered, mar) resultChan := s.sandboxController.WatchSandboxOnce(ctx, sandbox.Namespace, sandbox.Name) resp, err := s.createSandbox(ctx, dyn, sandbox, nil, entry, resultChan) s.sandboxController.UnWatchSandbox(sandbox.Namespace, sandbox.Name) if err != nil { if mar.Spec.Lifecycle.StartupPolicy == BestEffort && !role.Coordinator { appendDegradedRole(mar, role.Name, err) continue } return nil, fmt.Errorf("role %s: %w", role.Name, err) } cr := createdRole{name: role.Name, sandbox: sandbox, sessionID: resp.SessionID, resp: resp} created = append(created, cr) if role.Coordinator { coordinator = &cr } } if coordinator == nil { return nil, errors.New("group has no coordinator role") } if err := s.storeClient.SaveAgentGroup(ctx, groupSessionID, buildGroupManifest(mar, created, svc, coordinator)); err != nil { return nil, err } needRollback = false return &types.CreateGroupResponse{ GroupSessionID: groupSessionID, CoordinatorSessionID: coordinator.sessionID, ServiceName: svc.Name, Roles: buildRoleSummaries(created), }, nil } ` |
|
|
||
| go | ||
|
|
||
| Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` func (s *Server) createSandboxGroup( ctx context.Context, dyn dynamic.Interface, mar *runtimev1alpha1.MultiAgentRuntime, ) (*types.CreateGroupResponse, error) { ordered, err := topoSort(mar.Spec.Roles) if err != nil { return nil, err } groupSessionID := "grp-" + uuid.New().String() created := make([]createdRole, 0, len(ordered)) needRollback := true defer func() { if !needRollback { return } rollbackCtx, cancel := context.WithTimeout( context.Background(), groupRollbackTimeout) defer cancel() for i := len(created) - 1; i >= 0; i-- { s.rollbackSandboxCreation( dyn, created[i].sandbox, nil, created[i].sessionID) } _ = s.k8sClient.DeleteHeadlessService(rollbackCtx, mar.Namespace, groupSessionID) }() svc, err := s.k8sClient.CreateHeadlessService(ctx, mar, groupSessionID) if err != nil { return nil, err } var coordinator *createdRole for _, role := range ordered { sandbox, entry, err := buildSandboxByAgentRuntime( mar.Namespace, role.RuntimeRef, s.informers) if err != nil { return nil, err } sandbox.Labels[GroupLabel] = groupSessionID sandbox.Labels[RoleLabel] = role.Name sandbox.Spec.PodTemplate.Spec.Subdomain = groupSessionID sandbox.Spec.PodTemplate.Spec.Hostname = role.Name injectGroupEnvVars(&sandbox.Spec.PodTemplate, role, ordered, mar) resultChan := s.sandboxController.WatchSandboxOnce(ctx, sandbox.Namespace, sandbox.Name) resp, err := s.createSandbox(ctx, dyn, sandbox, nil, entry, resultChan) s.sandboxController.UnWatchSandbox(sandbox.Namespace, sandbox.Name) if err != nil { if mar.Spec.Lifecycle.StartupPolicy == BestEffort && !role.Coordinator { appendDegradedRole(mar, role.Name, err) continue } return nil, fmt.Errorf("role %s: %w", role.Name, err) } cr := createdRole{name: role.Name, sandbox: sandbox, sessionID: resp.SessionID, resp: resp} created = append(created, cr) if role.Coordinator { coordinator = &cr } } if coordinator == nil { return nil, errors.New("group has no coordinator role") } if err := s.storeClient.SaveAgentGroup(ctx, groupSessionID, buildGroupManifest(mar, created, svc, coordinator)); err != nil { return nil, err } needRollback = false return &types.CreateGroupResponse{ GroupSessionID: groupSessionID, CoordinatorSessionID: coordinator.sessionID, ServiceName: svc.Name, Roles: buildRoleSummaries(created), }, nil } ` |
|
|
||
| go | ||
|
|
||
| Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` func (s *Server) createSandboxGroup( ctx context.Context, dyn dynamic.Interface, mar *runtimev1alpha1.MultiAgentRuntime, ) (*types.CreateGroupResponse, error) { ordered, err := topoSort(mar.Spec.Roles) if err != nil { return nil, err } groupSessionID := "grp-" + uuid.New().String() created := make([]createdRole, 0, len(ordered)) needRollback := true defer func() { if !needRollback { return } rollbackCtx, cancel := context.WithTimeout( context.Background(), groupRollbackTimeout) defer cancel() for i := len(created) - 1; i >= 0; i-- { s.rollbackSandboxCreation( dyn, created[i].sandbox, nil, created[i].sessionID) } _ = s.k8sClient.DeleteHeadlessService(rollbackCtx, mar.Namespace, groupSessionID) }() svc, err := s.k8sClient.CreateHeadlessService(ctx, mar, groupSessionID) if err != nil { return nil, err } var coordinator *createdRole for _, role := range ordered { sandbox, entry, err := buildSandboxByAgentRuntime( mar.Namespace, role.RuntimeRef, s.informers) if err != nil { return nil, err } sandbox.Labels[GroupLabel] = groupSessionID sandbox.Labels[RoleLabel] = role.Name sandbox.Spec.PodTemplate.Spec.Subdomain = groupSessionID sandbox.Spec.PodTemplate.Spec.Hostname = role.Name injectGroupEnvVars(&sandbox.Spec.PodTemplate, role, ordered, mar) resultChan := s.sandboxController.WatchSandboxOnce(ctx, sandbox.Namespace, sandbox.Name) resp, err := s.createSandbox(ctx, dyn, sandbox, nil, entry, resultChan) s.sandboxController.UnWatchSandbox(sandbox.Namespace, sandbox.Name) if err != nil { if mar.Spec.Lifecycle.StartupPolicy == BestEffort && !role.Coordinator { appendDegradedRole(mar, role.Name, err) continue } return nil, fmt.Errorf("role %s: %w", role.Name, err) } cr := createdRole{name: role.Name, sandbox: sandbox, sessionID: resp.SessionID, resp: resp} created = append(created, cr) if role.Coordinator { coordinator = &cr } } if coordinator == nil { return nil, errors.New("group has no coordinator role") } if err := s.storeClient.SaveAgentGroup(ctx, groupSessionID, buildGroupManifest(mar, created, svc, coordinator)); err != nil { return nil, err } needRollback = false return &types.CreateGroupResponse{ GroupSessionID: groupSessionID, CoordinatorSessionID: coordinator.sessionID, ServiceName: svc.Name, Roles: buildRoleSummaries(created), }, nil } ` |
| Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` SaveAgentGroup(ctx context.Context, manifest *types.AgentGroupManifest) error GetAgentGroup(ctx context.Context, groupSessionID string) (*types.AgentGroupManifest, error) DeleteAgentGroup(ctx context.Context, groupSessionID string) error ` | ||
|
|
||
| Storage layout: group:{grp-xxx} HASH (manifest JSON) + group:active ZSET (for GC scan). |
| The router does not gain new routes. One switch-case addition in session\_manager.go:117–126 routes MultiAgentRuntimeKind create calls to the new endpoint. | ||
|
|
||
| #### 6.5 Headless Service per group | ||
|
|
||
| CreateHeadlessService in k8s\_client.go (~30 LOC): ClusterIP: None, selector {group: groupSessionID}, OwnerReferences pointing to the MultiAgentRuntime CR. Deleting the CR cascades to the Service via K8s GC. Manual cleanup in the rollback path is only needed if the CR itself was never created. | ||
|
|
||
| go | ||
|
|
||
| Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` func (k *K8sClient) CreateHeadlessService( ctx context.Context, mar *runtimev1alpha1.MultiAgentRuntime, groupSessionID string, ) (*corev1.Service, error) { svc := &corev1.Service{ ObjectMeta: metav1.ObjectMeta{ Name: groupSessionID, Namespace: mar.Namespace, OwnerReferences: []metav1.OwnerReference{{ APIVersion: mar.APIVersion, Kind: mar.Kind, Name: mar.Name, UID: mar.UID, Controller: ptr.To(true), BlockOwnerDeletion: ptr.To(true), }}, Labels: map[string]string{GroupLabel: groupSessionID}, }, Spec: corev1.ServiceSpec{ ClusterIP: corev1.ClusterIPNone, Selector: map[string]string{GroupLabel: groupSessionID}, PublishNotReadyAddresses: false, }, } return k.clientset.CoreV1().Services(mar.Namespace).Create( ctx, svc, metav1.CreateOptions{}) } ` |
|
|
||
| Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` v1Group.POST("/multi-agent-runtime", s.handleMultiAgentRuntimeCreate) v1Group.DELETE("/multi-agent-runtime/groups/:groupId", s.handleDeleteAgentGroup) v1Group.GET("/multi-agent-runtime/groups/:groupId/topology", s.handleGetGroupTopology) ` | ||
|
|
||
| The router does not gain new routes. One switch-case addition in session\_manager.go:117–126 routes MultiAgentRuntimeKind create calls to the new endpoint. |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive mentorship proposal for implementing multi-AgentCube capabilities, featuring a new MultiAgentRuntime CRD and orchestration logic. Review feedback highlights the need to revert unrelated dependency noise in package-lock.json and relocate the proposal file to the documentation directory. Additionally, several corrections are requested to fix broken Markdown formatting in tables and code blocks, and to address a logic gap in the pseudo-code regarding role replicas.
| "resolved": "https://registry.npmjs.org/@algolia/client-search/-/client-search-5.46.2.tgz", | ||
| "integrity": "sha512-ZsOJqu4HOG5BlvIFnMU0YKjQ9ZI6r3C31dg2jk5kMWPSdhJpYL9xa5hEe7aieE+707dXeMI4ej3diy6mXdZpgA==", | ||
| "license": "MIT", | ||
| "peer": true, |
There was a problem hiding this comment.
The package-lock.json file contains numerous changes adding "peer": true to various dependencies. These changes appear unrelated to the addition of the mentorship proposal and add significant noise to the pull request. Please revert these changes unless they are strictly necessary for the documentation site's functionality.
| @@ -0,0 +1,345 @@ | |||
| LFX Mentorship 2026 — Term 2 | |||
|
|
||
| Open-source contributions demonstrating this background: | ||
|
|
||
| Repository / PRDescriptionKubeStellar #648Japanese i18n translations; eliminated externally hardcoded strings across user-facing elements to improve maintainability.KubeStellar #623Italian translations across components; removed hardcoded strings from external sources for better global accessibility.OpenKruise #2303Fixed livenessProbe handling in the manager's YAML processing; improved pod lifecycle reliability with Kubernetes probes.OpenKruise #2302Resolved stuck deployments caused by image-tag updates where the digest was unchanged, ensuring correct rollout execution.OpenKruise #2313Replaced static TODO placeholders in the CR controller with dynamic logic for reliable container-image propagation.Volcano #442Updated the Volcano README with bug fixes, new sections on project structure and contribution guidelines, refined setup instructions.Volcano #692Enabled AI/ML model (v7.8) support in the Kthena router for open-source LLM integration; new server configs, traffic routing examples, and local testing instructions.Volcano #695Implemented E2E tests covering the full ModelServing lifecycle (create/update/delete), validating multi-phase workflows and timeouts. |
There was a problem hiding this comment.
The table formatting for the open-source contributions is broken. It appears the content was pasted without the necessary Markdown table syntax (pipes and separators).
| Repository / PRDescriptionKubeStellar #648Japanese i18n translations; eliminated externally hardcoded strings across user-facing elements to improve maintainability.KubeStellar #623Italian translations across components; removed hardcoded strings from external sources for better global accessibility.OpenKruise #2303Fixed livenessProbe handling in the manager's YAML processing; improved pod lifecycle reliability with Kubernetes probes.OpenKruise #2302Resolved stuck deployments caused by image-tag updates where the digest was unchanged, ensuring correct rollout execution.OpenKruise #2313Replaced static TODO placeholders in the CR controller with dynamic logic for reliable container-image propagation.Volcano #442Updated the Volcano README with bug fixes, new sections on project structure and contribution guidelines, refined setup instructions.Volcano #692Enabled AI/ML model (v7.8) support in the Kthena router for open-source LLM integration; new server configs, traffic routing examples, and local testing instructions.Volcano #695Implemented E2E tests covering the full ModelServing lifecycle (create/update/delete), validating multi-phase workflows and timeouts. | |
| | Repository / PR | Description | | |
| | :--- | :--- | | |
| | KubeStellar #648 | Japanese i18n translations; eliminated externally hardcoded strings across user-facing elements to improve maintainability. | | |
| | KubeStellar #623 | Italian translations across components; removed hardcoded strings from external sources for better global accessibility. | | |
| | OpenKruise #2303 | Fixed livenessProbe handling in the manager's YAML processing; improved pod lifecycle reliability with Kubernetes probes. | | |
| | OpenKruise #2302 | Resolved stuck deployments caused by image-tag updates where the digest was unchanged, ensuring correct rollout execution. | | |
| | OpenKruise #2313 | Replaced static TODO placeholders in the CR controller with dynamic logic for reliable container-image propagation. | | |
| | Volcano #442 | Updated the Volcano README with bug fixes, new sections on project structure and contribution guidelines, refined setup instructions. | | |
| | Volcano #692 | Enabled AI/ML model (v7.8) support in the Kthena router for open-source LLM integration; new server configs, traffic routing examples, and local testing instructions. | | |
| | Volcano #695 | Implemented E2E tests covering the full ModelServing lifecycle (create/update/delete), validating multi-phase workflows and timeouts. | |
|
|
||
| ### 1\. Title & Metadata | ||
|
|
||
| FieldValueProjectCNCF Volcano / AgentCubeMentorshipLFX 2026 Term 2 (Jun – Aug 2026)TitleSupport multi-AgentCube CapabilityTracking Issuevolcano-sh/agentcube#301Parent Designvolcano-sh/volcano#4686Target BranchmainApplicantAadi ShahGitHubAAdIprogEmail[aadishah132@gmail.com](mailto:aadishah132@gmail.com)TimezoneIST (UTC+5:30)Weekly Hours30–40 hrs/week (weeks 8 and 11 used as buffer)Mentor@hzxuzhonghu (Zhonghu Xu)Proposal Versionv1.0Date2026-05-16 |
There was a problem hiding this comment.
The metadata table formatting is broken. Please use standard Markdown table syntax for better readability.
| FieldValueProjectCNCF Volcano / AgentCubeMentorshipLFX 2026 Term 2 (Jun – Aug 2026)TitleSupport multi-AgentCube CapabilityTracking Issuevolcano-sh/agentcube#301Parent Designvolcano-sh/volcano#4686Target BranchmainApplicantAadi ShahGitHubAAdIprogEmail[aadishah132@gmail.com](mailto:aadishah132@gmail.com)TimezoneIST (UTC+5:30)Weekly Hours30–40 hrs/week (weeks 8 and 11 used as buffer)Mentor@hzxuzhonghu (Zhonghu Xu)Proposal Versionv1.0Date2026-05-16 | |
| | Field | Value | | |
| | :--- | :--- | | |
| | Project | CNCF Volcano / AgentCube | | |
| | Mentorship | LFX 2026 Term 2 (Jun – Aug 2026) | | |
| | Title | Support multi-AgentCube Capability | | |
| | Tracking Issue | volcano-sh/agentcube#301 | | |
| | Parent Design | volcano-sh/volcano#4686 | | |
| | Target Branch | main | | |
| | Applicant | Aadi Shah | | |
| | GitHub | AAdIprog | | |
| | Email | [aadishah132@gmail.com](mailto:aadishah132@gmail.com) | | |
| | Timezone | IST (UTC+5:30) | | |
| | Weekly Hours | 30–40 hrs/week (weeks 8 and 11 used as buffer) | | |
| | Mentor | @hzxuzhonghu (Zhonghu Xu) | | |
| | Proposal Version | v1.0 | | |
| | Date | 2026-05-16 | |
|
|
||
| #### Pain points | ||
|
|
||
| Pain PointWhere in CodeUser must manually orchestrate 3 separate sandbox lifecyclespcap\_analyzer.py (whole file)Inter-agent endpoints must be hand-discovered (env vars or hardcoded)pcap-analyzer/deployment.yaml:50–58Failure in one agent leaves siblings running — no cleanupNot implemented anywhereNo shared session: each agent gets its own x-agentcube-session-idhandlers.go:266–271No group-level GC: sandboxes survive until idle timeout if user code crashesgarbage\_collection.go operates per-sandbox |
There was a problem hiding this comment.
The pain points table formatting is broken. Proper Markdown table syntax is required for it to render correctly.
| Pain PointWhere in CodeUser must manually orchestrate 3 separate sandbox lifecyclespcap\_analyzer.py (whole file)Inter-agent endpoints must be hand-discovered (env vars or hardcoded)pcap-analyzer/deployment.yaml:50–58Failure in one agent leaves siblings running — no cleanupNot implemented anywhereNo shared session: each agent gets its own x-agentcube-session-idhandlers.go:266–271No group-level GC: sandboxes survive until idle timeout if user code crashesgarbage\_collection.go operates per-sandbox | |
| | Pain Point | Where in Code | | |
| | :--- | :--- | | |
| | User must manually orchestrate 3 separate sandbox lifecycles | pcap_analyzer.py (whole file) | | |
| | Inter-agent endpoints must be hand-discovered (env vars or hardcoded) | pcap-analyzer/deployment.yaml:50–58 | | |
| | Failure in one agent leaves siblings running — no cleanup | Not implemented anywhere | | |
| | No shared session: each agent gets its own x-agentcube-session-id | handlers.go:266–271 | | |
| | No group-level GC: sandboxes survive until idle timeout if user code crashes | garbage_collection.go operates per-sandbox | |
|
|
||
| go | ||
|
|
||
| Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML`` // +kubebuilder:object:root=true // +kubebuilder:subresource:status // +kubebuilder:resource:scope=Namespaced type MultiAgentRuntime struct { metav1.TypeMeta `json:",inline"` metav1.ObjectMeta `json:"metadata,omitempty"` Spec MultiAgentRuntimeSpec `json:"spec"` Status MultiAgentRuntimeStatus `json:"status,omitempty"` } type MultiAgentRuntimeSpec struct { // +kubebuilder:validation:MinItems=2 // +kubebuilder:validation:MaxItems=16 Roles []AgentRoleSpec `json:"roles"` Lifecycle GroupLifecycle `json:"lifecycle,omitempty"` // +kubebuilder:default={type: ServiceDNS} Communication GroupCommunication `json:"communication,omitempty"` // +kubebuilder:default="15m" SessionTimeout *metav1.Duration `json:"sessionTimeout,omitempty"` // +kubebuilder:default="8h" MaxSessionDuration *metav1.Duration `json:"maxSessionDuration,omitempty"` } type AgentRoleSpec struct { // +kubebuilder:validation:Pattern="^[a-z0-9]([-a-z0-9]*[a-z0-9])?$" Name string `json:"name"` RuntimeRef string `json:"runtimeRef"` Coordinator bool `json:"coordinator,omitempty"` Dependencies []string `json:"dependencies,omitempty"` // +kubebuilder:default=1 // +kubebuilder:validation:Maximum=8 Replicas *int32 `json:"replicas,omitempty"` // +kubebuilder:default="Never" // +kubebuilder:validation:Enum=Never;OnFailure RestartPolicy RolePolicy `json:"restartPolicy,omitempty"` EnvOverrides []corev1.EnvVar `json:"envOverrides,omitempty"` } `` |
There was a problem hiding this comment.
The code block for the CRD definition has incorrect formatting, including a long list of language identifiers and missing newlines. This makes the snippet unreadable.
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:resource:scope=Namespaced
type MultiAgentRuntime struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec MultiAgentRuntimeSpec `json:"spec"`
Status MultiAgentRuntimeStatus `json:"status,omitempty"`
}
type MultiAgentRuntimeSpec struct {
// +kubebuilder:validation:MinItems=2
// +kubebuilder:validation:MaxItems=16
Roles []AgentRoleSpec `json:"roles"`
Lifecycle GroupLifecycle `json:"lifecycle,omitempty"`
// +kubebuilder:default={type: ServiceDNS}
Communication GroupCommunication `json:"communication,omitempty"`
// +kubebuilder:default="15m"
SessionTimeout *metav1.Duration `json:"sessionTimeout,omitempty"`
// +kubebuilder:default="8h"
MaxSessionDuration *metav1.Duration `json:"maxSessionDuration,omitempty"`
}
type AgentRoleSpec struct {
// +kubebuilder:validation:Pattern="^[a-z0-9]([-a-z0-9]*[a-z0-9])?$"
Name string `json:"name"`
RuntimeRef string `json:"runtimeRef"`
Coordinator bool `json:"coordinator,omitempty"`
Dependencies []string `json:"dependencies,omitempty"`
// +kubebuilder:default=1
// +kubebuilder:validation:Maximum=8
Replicas *int32 `json:"replicas,omitempty"`
// +kubebuilder:default="Never"
// +kubebuilder:validation:Enum=Never;OnFailure
RestartPolicy RolePolicy `json:"restartPolicy,omitempty"`
EnvOverrides []corev1.EnvVar `json:"envOverrides,omitempty"`
}|
|
||
| go | ||
|
|
||
| Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` func (s *Server) createSandboxGroup( ctx context.Context, dyn dynamic.Interface, mar *runtimev1alpha1.MultiAgentRuntime, ) (*types.CreateGroupResponse, error) { ordered, err := topoSort(mar.Spec.Roles) if err != nil { return nil, err } groupSessionID := "grp-" + uuid.New().String() created := make([]createdRole, 0, len(ordered)) needRollback := true defer func() { if !needRollback { return } rollbackCtx, cancel := context.WithTimeout( context.Background(), groupRollbackTimeout) defer cancel() for i := len(created) - 1; i >= 0; i-- { s.rollbackSandboxCreation( dyn, created[i].sandbox, nil, created[i].sessionID) } _ = s.k8sClient.DeleteHeadlessService(rollbackCtx, mar.Namespace, groupSessionID) }() svc, err := s.k8sClient.CreateHeadlessService(ctx, mar, groupSessionID) if err != nil { return nil, err } var coordinator *createdRole for _, role := range ordered { sandbox, entry, err := buildSandboxByAgentRuntime( mar.Namespace, role.RuntimeRef, s.informers) if err != nil { return nil, err } sandbox.Labels[GroupLabel] = groupSessionID sandbox.Labels[RoleLabel] = role.Name sandbox.Spec.PodTemplate.Spec.Subdomain = groupSessionID sandbox.Spec.PodTemplate.Spec.Hostname = role.Name injectGroupEnvVars(&sandbox.Spec.PodTemplate, role, ordered, mar) resultChan := s.sandboxController.WatchSandboxOnce(ctx, sandbox.Namespace, sandbox.Name) resp, err := s.createSandbox(ctx, dyn, sandbox, nil, entry, resultChan) s.sandboxController.UnWatchSandbox(sandbox.Namespace, sandbox.Name) if err != nil { if mar.Spec.Lifecycle.StartupPolicy == BestEffort && !role.Coordinator { appendDegradedRole(mar, role.Name, err) continue } return nil, fmt.Errorf("role %s: %w", role.Name, err) } cr := createdRole{name: role.Name, sandbox: sandbox, sessionID: resp.SessionID, resp: resp} created = append(created, cr) if role.Coordinator { coordinator = &cr } } if coordinator == nil { return nil, errors.New("group has no coordinator role") } if err := s.storeClient.SaveAgentGroup(ctx, groupSessionID, buildGroupManifest(mar, created, svc, coordinator)); err != nil { return nil, err } needRollback = false return &types.CreateGroupResponse{ GroupSessionID: groupSessionID, CoordinatorSessionID: coordinator.sessionID, ServiceName: svc.Name, Roles: buildRoleSummaries(created), }, nil } ` |
There was a problem hiding this comment.
The pseudo-code for createSandboxGroup has formatting issues and appears to miss handling for the Replicas field defined in AgentRoleSpec. If a role specifies multiple replicas, the loop should create the corresponding number of sandboxes. Additionally, it would be beneficial to specify which environment variables are injected via injectGroupEnvVars (e.g., service addresses of other roles) to clarify how inter-agent discovery works.
func (s *Server) createSandboxGroup(
ctx context.Context,
dyn dynamic.Interface,
mar *runtimev1alpha1.MultiAgentRuntime,
) (*types.CreateGroupResponse, error) {
ordered, err := topoSort(mar.Spec.Roles)
if err != nil { return nil, err }
groupSessionID := "grp-" + uuid.New().String()
created := make([]createdRole, 0, len(ordered))
needRollback := true
defer func() {
if !needRollback { return }
rollbackCtx, cancel := context.WithTimeout(
context.Background(), groupRollbackTimeout)
defer cancel()
for i := len(created) - 1; i >= 0; i-- {
s.rollbackSandboxCreation(
dyn, created[i].sandbox, nil, created[i].sessionID)
}
_ = s.k8sClient.DeleteHeadlessService(rollbackCtx, mar.Namespace, groupSessionID)
}()
svc, err := s.k8sClient.CreateHeadlessService(ctx, mar, groupSessionID)
if err != nil { return nil, err }
var coordinator *createdRole
for _, role := range ordered {
sandbox, entry, err := buildSandboxByAgentRuntime(
mar.Namespace, role.RuntimeRef, s.informers)
if err != nil { return nil, err }
sandbox.Labels[GroupLabel] = groupSessionID
sandbox.Labels[RoleLabel] = role.Name
sandbox.Spec.PodTemplate.Spec.Subdomain = groupSessionID
sandbox.Spec.PodTemplate.Spec.Hostname = role.Name
injectGroupEnvVars(&sandbox.Spec.PodTemplate, role, ordered, mar)
resultChan := s.sandboxController.WatchSandboxOnce(ctx, sandbox.Namespace, sandbox.Name)
resp, err := s.createSandbox(ctx, dyn, sandbox, nil, entry, resultChan)
s.sandboxController.UnWatchSandbox(sandbox.Namespace, sandbox.Name)
if err != nil {
if mar.Spec.Lifecycle.StartupPolicy == BestEffort && !role.Coordinator {
appendDegradedRole(mar, role.Name, err)
continue
}
return nil, fmt.Errorf("role %s: %w", role.Name, err)
}
cr := createdRole{name: role.Name, sandbox: sandbox,
sessionID: resp.SessionID, resp: resp}
created = append(created, cr)
if role.Coordinator { coordinator = &cr }
}
if coordinator == nil {
return nil, errors.New("group has no coordinator role")
}
if err := s.storeClient.SaveAgentGroup(ctx, groupSessionID,
buildGroupManifest(mar, created, svc, coordinator)); err != nil {
return nil, err
}
needRollback = false
return &types.CreateGroupResponse{
GroupSessionID: groupSessionID,
CoordinatorSessionID: coordinator.sessionID,
ServiceName: svc.Name,
Roles: buildRoleSummaries(created),
}, nil
}|
|
||
| go | ||
|
|
||
| Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` SaveAgentGroup(ctx context.Context, manifest *types.AgentGroupManifest) error GetAgentGroup(ctx context.Context, groupSessionID string) (*types.AgentGroupManifest, error) DeleteAgentGroup(ctx context.Context, groupSessionID string) error ` |
There was a problem hiding this comment.
The code block for store extensions has messy language identifiers and missing newlines.
| Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` SaveAgentGroup(ctx context.Context, manifest *types.AgentGroupManifest) error GetAgentGroup(ctx context.Context, groupSessionID string) (*types.AgentGroupManifest, error) DeleteAgentGroup(ctx context.Context, groupSessionID string) error ` | |
| SaveAgentGroup(ctx context.Context, manifest *types.AgentGroupManifest) error | |
| GetAgentGroup(ctx context.Context, groupSessionID string) (*types.AgentGroupManifest, error) | |
| DeleteAgentGroup(ctx context.Context, groupSessionID string) error |
|
|
||
| go | ||
|
|
||
| Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` v1Group.POST("/multi-agent-runtime", s.handleMultiAgentRuntimeCreate) v1Group.DELETE("/multi-agent-runtime/groups/:groupId", s.handleDeleteAgentGroup) v1Group.GET("/multi-agent-runtime/groups/:groupId/topology", s.handleGetGroupTopology) ` |
There was a problem hiding this comment.
The code block for new HTTP routes has messy language identifiers and missing newlines.
| Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` v1Group.POST("/multi-agent-runtime", s.handleMultiAgentRuntimeCreate) v1Group.DELETE("/multi-agent-runtime/groups/:groupId", s.handleDeleteAgentGroup) v1Group.GET("/multi-agent-runtime/groups/:groupId/topology", s.handleGetGroupTopology) ` | |
| v1Group.POST("/multi-agent-runtime", s.handleMultiAgentRuntimeCreate) | |
| v1Group.DELETE("/multi-agent-runtime/groups/:groupId", s.handleDeleteAgentGroup) | |
| v1Group.GET("/multi-agent-runtime/groups/:groupId/topology", s.handleGetGroupTopology) |
|
|
||
| go | ||
|
|
||
| Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` func (k *K8sClient) CreateHeadlessService( ctx context.Context, mar *runtimev1alpha1.MultiAgentRuntime, groupSessionID string, ) (*corev1.Service, error) { svc := &corev1.Service{ ObjectMeta: metav1.ObjectMeta{ Name: groupSessionID, Namespace: mar.Namespace, OwnerReferences: []metav1.OwnerReference{{ APIVersion: mar.APIVersion, Kind: mar.Kind, Name: mar.Name, UID: mar.UID, Controller: ptr.To(true), BlockOwnerDeletion: ptr.To(true), }}, Labels: map[string]string{GroupLabel: groupSessionID}, }, Spec: corev1.ServiceSpec{ ClusterIP: corev1.ClusterIPNone, Selector: map[string]string{GroupLabel: groupSessionID}, PublishNotReadyAddresses: false, }, } return k.clientset.CoreV1().Services(mar.Namespace).Create( ctx, svc, metav1.CreateOptions{}) } ` |
There was a problem hiding this comment.
The code block for CreateHeadlessService has messy language identifiers and missing newlines.
func (k *K8sClient) CreateHeadlessService(
ctx context.Context,
mar *runtimev1alpha1.MultiAgentRuntime,
groupSessionID string,
) (*corev1.Service, error) {
svc := &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: groupSessionID,
Namespace: mar.Namespace,
OwnerReferences: []metav1.OwnerReference{{
APIVersion: mar.APIVersion,
Kind: mar.Kind,
Name: mar.Name,
UID: mar.UID,
Controller: ptr.To(true),
BlockOwnerDeletion: ptr.To(true),
}},
Labels: map[string]string{GroupLabel: groupSessionID},
},
Spec: corev1.ServiceSpec{
ClusterIP: corev1.ClusterIPNone,
Selector: map[string]string{GroupLabel: groupSessionID},
PublishNotReadyAddresses: false,
},
}
return k.clientset.CoreV1().Services(mar.Namespace).Create(
ctx, svc, metav1.CreateOptions{})
}|
|
||
| go | ||
|
|
||
| Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML` func (s *Server) createSandboxGroup( ctx context.Context, dyn dynamic.Interface, mar *runtimev1alpha1.MultiAgentRuntime, ) (*types.CreateGroupResponse, error) { ordered, err := topoSort(mar.Spec.Roles) if err != nil { return nil, err } groupSessionID := "grp-" + uuid.New().String() created := make([]createdRole, 0, len(ordered)) needRollback := true defer func() { if !needRollback { return } rollbackCtx, cancel := context.WithTimeout( context.Background(), groupRollbackTimeout) defer cancel() for i := len(created) - 1; i >= 0; i-- { s.rollbackSandboxCreation( dyn, created[i].sandbox, nil, created[i].sessionID) } _ = s.k8sClient.DeleteHeadlessService(rollbackCtx, mar.Namespace, groupSessionID) }() svc, err := s.k8sClient.CreateHeadlessService(ctx, mar, groupSessionID) if err != nil { return nil, err } var coordinator *createdRole for _, role := range ordered { sandbox, entry, err := buildSandboxByAgentRuntime( mar.Namespace, role.RuntimeRef, s.informers) if err != nil { return nil, err } sandbox.Labels[GroupLabel] = groupSessionID sandbox.Labels[RoleLabel] = role.Name sandbox.Spec.PodTemplate.Spec.Subdomain = groupSessionID sandbox.Spec.PodTemplate.Spec.Hostname = role.Name injectGroupEnvVars(&sandbox.Spec.PodTemplate, role, ordered, mar) resultChan := s.sandboxController.WatchSandboxOnce(ctx, sandbox.Namespace, sandbox.Name) resp, err := s.createSandbox(ctx, dyn, sandbox, nil, entry, resultChan) s.sandboxController.UnWatchSandbox(sandbox.Namespace, sandbox.Name) if err != nil { if mar.Spec.Lifecycle.StartupPolicy == BestEffort && !role.Coordinator { appendDegradedRole(mar, role.Name, err) continue } return nil, fmt.Errorf("role %s: %w", role.Name, err) } cr := createdRole{name: role.Name, sandbox: sandbox, sessionID: resp.SessionID, resp: resp} created = append(created, cr) if role.Coordinator { coordinator = &cr } } if coordinator == nil { return nil, errors.New("group has no coordinator role") } if err := s.storeClient.SaveAgentGroup(ctx, groupSessionID, buildGroupManifest(mar, created, svc, coordinator)); err != nil { return nil, err } needRollback = false return &types.CreateGroupResponse{ GroupSessionID: groupSessionID, CoordinatorSessionID: coordinator.sessionID, ServiceName: svc.Name, Roles: buildRoleSummaries(created), }, nil } ` |
What type of PR is this?
/kind documentation
What this PR does / why we need it:
Adds the LFX Mentorship proposal for supporting multi-AgentCube capability. The proposal lays out the problem statement, proposed MultiAgentRuntime design, control-plane/store/router changes, rollout plan, testing strategy, and risks so the work can be reviewed and tracked in one place.
Which issue(s) this PR fixes:
Fixes #301
Special notes for your reviewer:
This is a proposal-only PR, not an implementation PR. It is intended to document the design direction and milestone plan for the multi-agent capability work.
Does this PR introduce a user-facing change?:
Yes