feat: add NetworkPolicy support for sandbox isolation#291
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request implements network isolation for sandboxes using Kubernetes NetworkPolicies, supporting 'None', 'Restricted', and 'Custom' modes via the AgentRuntime and CodeInterpreter CRDs. The workload-manager is updated to manage the lifecycle of these policies, including automated creation and cleanup. A critical issue was identified in pkg/workloadmanager/workload_builder.go where the per-session NetworkPolicy override is ignored during the creation of CodeInterpreter sandboxes when warm pools are not utilized.
c8b70b4 to
9e2f065
Compare
There was a problem hiding this comment.
Pull request overview
Adds optional Kubernetes NetworkPolicy generation/cleanup to improve sandbox network isolation in AgentCube’s Workload Manager, closing the “no network restrictions” gap for untrusted sandbox code execution.
Changes:
- Introduces
SandboxNetworkPolicyin runtime API templates (AgentRuntime + CodeInterpreter) and a per-session override viaCreateSandboxRequest. - Implements NetworkPolicy building (None/Restricted/Custom) plus lifecycle management (create on sandbox create; best-effort delete on rollback, delete API, and GC).
- Adds router selector/namespace configuration (flags + Helm values) to support cross-namespace router→sandbox ingress allow rules.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/workloadmanager/workload_builder.go | Wires template/session policy into sandbox build flow and returns a NetworkPolicy object for creation. |
| pkg/workloadmanager/server.go | Adds config defaults for router selector/namespace used by Restricted mode ingress. |
| pkg/workloadmanager/networkpolicy_builder.go | Builds NetworkPolicy objects for None/Restricted/Custom modes (router ingress + optional DNS egress). |
| pkg/workloadmanager/networkpolicy_builder_test.go | Unit tests for NetworkPolicy building and “replace, not merge” selection behavior. |
| pkg/workloadmanager/k8s_client.go | Adds helpers to create/delete NetworkPolicies and generalizes clientset type. |
| pkg/workloadmanager/handlers.go | Creates NetworkPolicy during sandbox creation; deletes on rollback and explicit delete handler. |
| pkg/workloadmanager/handlers_test.go | Updates handler tests for new builder signatures and adds session override forwarding test. |
| pkg/workloadmanager/garbage_collection.go | Ensures NetworkPolicy cleanup runs even when Sandbox/SandboxClaim is already NotFound. |
| pkg/workloadmanager/garbage_collection_test.go | Adds fake clientset to cover NP cleanup path without nil panics. |
| pkg/common/types/sandbox.go | Extends CreateSandboxRequest to accept per-session NetworkPolicy override. |
| pkg/apis/runtime/v1alpha1/networkpolicy_types.go | Adds new API types for SandboxNetworkPolicy modes and rules. |
| pkg/apis/runtime/v1alpha1/agent_type.go | Adds NetworkPolicy field to AgentRuntime SandboxTemplate. |
| pkg/apis/runtime/v1alpha1/codeinterpreter_types.go | Adds NetworkPolicy field to CodeInterpreterSandboxTemplate. |
| pkg/apis/runtime/v1alpha1/zz_generated.deepcopy.go | Updates generated deep-copies for new NetworkPolicy fields/types. |
| manifests/charts/base/values.yaml | Adds Helm values for router selector/namespace used in Restricted mode. |
| manifests/charts/base/templates/workloadmanager.yaml | Passes router selector/namespace flags into workload-manager deployment. |
| manifests/charts/base/templates/rbac/workloadmanager.yaml | Grants workload-manager RBAC to manage networkpolicies. |
| manifests/charts/base/crds/runtime.agentcube.volcano.sh_agentruntimes.yaml | Updates AgentRuntime CRD schema with networkPolicy fields. |
| manifests/charts/base/crds/runtime.agentcube.volcano.sh_codeinterpreters.yaml | Updates CodeInterpreter CRD schema with networkPolicy fields. |
| cmd/workload-manager/main.go | Adds flags + selector parsing for router selector/namespace configuration. |
| cmd/workload-manager/main_test.go | Unit tests for selector parsing logic. |
| // createNetworkPolicy creates a NetworkPolicy using the workloadmanager's own client. | ||
| func createNetworkPolicy(ctx context.Context, clientset kubernetes.Interface, np *networkingv1.NetworkPolicy) error { | ||
| _, err := clientset.NetworkingV1().NetworkPolicies(np.Namespace).Create(ctx, np, metav1.CreateOptions{}) | ||
| if err != nil && !apierrors.IsAlreadyExists(err) { | ||
| return fmt.Errorf("failed to create network policy %s/%s: %w", np.Namespace, np.Name, err) | ||
| } | ||
| return nil |
| "fmt" | ||
| "time" | ||
|
|
||
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #291 +/- ##
==========================================
+ Coverage 47.57% 54.44% +6.87%
==========================================
Files 30 35 +5
Lines 2819 3440 +621
==========================================
+ Hits 1341 1873 +532
- Misses 1338 1368 +30
- Partials 140 199 +59
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
hzxuzhonghu
left a comment
There was a problem hiding this comment.
Good design with three modes that maintain backward compatibility. The GC leak fix is a nice bonus. Please confirm behavior of NetworkPolicy for the warm-pool path before merging.
There was a problem hiding this comment.
Pull request overview
Adds Kubernetes NetworkPolicy generation/management to improve sandbox network isolation in the Workload Manager, with configurable router selectors and support for template-level defaults plus per-session overrides.
Changes:
- Introduces
SandboxNetworkPolicyAPI types and addsnetworkPolicyfields to AgentRuntime/CodeInterpreter templates and CreateSandboxRequest. - Implements NetworkPolicy building (Restricted/Custom) and lifecycle management (create, rollback, delete, GC cleanup).
- Adds Helm values/flags for router selector + namespace and updates RBAC/CRDs/examples/tests accordingly.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/workloadmanager/workload_builder.go | Threads session/template NetworkPolicy into sandbox build flow. |
| pkg/workloadmanager/server.go | Adds configurable router selector/namespace defaults to server config. |
| pkg/workloadmanager/networkpolicy_builder.go | Implements NetworkPolicy object construction for sandbox pods. |
| pkg/workloadmanager/networkpolicy_builder_test.go | Unit tests for NetworkPolicy builder + override semantics. |
| pkg/workloadmanager/k8s_client.go | Adds NetworkPolicy create/delete helpers; returns SandboxClaim UID for owner refs. |
| pkg/workloadmanager/handlers.go | Creates NetworkPolicy (with owner refs), adds rollback + delete handler cleanup. |
| pkg/workloadmanager/handlers_test.go | Updates handler/server tests for new create signature + session NP override. |
| pkg/workloadmanager/garbage_collection.go | Ensures NP cleanup runs even when Sandbox/SandboxClaim is already NotFound. |
| pkg/workloadmanager/garbage_collection_test.go | Tests GC deletes associated NetworkPolicies. |
| pkg/common/types/sandbox.go | Extends CreateSandboxRequest with per-session NetworkPolicy override. |
| pkg/apis/runtime/v1alpha1/networkpolicy_types.go | New CRD/API types for sandbox network policy configuration. |
| pkg/apis/runtime/v1alpha1/agent_type.go | Adds template-level networkPolicy to AgentRuntime SandboxTemplate. |
| pkg/apis/runtime/v1alpha1/codeinterpreter_types.go | Adds template-level networkPolicy to CodeInterpreterSandboxTemplate. |
| pkg/apis/runtime/v1alpha1/zz_generated.deepcopy.go | Regenerates DeepCopy for new API fields/types. |
| manifests/charts/base/values.yaml | Adds chart values for router selector/namespace. |
| manifests/charts/base/templates/workloadmanager.yaml | Wires chart values into workload-manager flags. |
| manifests/charts/base/templates/rbac/workloadmanager.yaml | Grants workloadmanager RBAC to manage NetworkPolicies. |
| manifests/charts/base/crds/runtime.agentcube.volcano.sh_agentruntimes.yaml | Updates CRD schema for new template field. |
| manifests/charts/base/crds/runtime.agentcube.volcano.sh_codeinterpreters.yaml | Updates CRD schema for new template field. |
| example/code-interpreter/code-interpreter-network-policy.yaml | Adds usage examples for Restricted/Custom modes. |
| cmd/workload-manager/main.go | Adds flags for router selector/namespace + selector parsing helper. |
| cmd/workload-manager/main_test.go | Tests selector parsing behavior. |
|
Hey @hzxuzhonghu Practically: the NP gets created, sits there with no matching pods, and does nothing. The sandbox runs unprotected in the warm-pool case even if Mode=Restricted. I've got a TODO in the code at the build site marking this. Two possible fixes I'm aware of:
My suggestion: merge this PR as-is since the direct-sandbox path works correctly, and track the warm-pool fix as a follow-up once we confirm what labels agent-sandbox applies. I'd rather not block the whole feature on the warm-pool edge case, but happy to dig into it more if you'd prefer it handled in this PR. |
hzxuzhonghu
left a comment
There was a problem hiding this comment.
The overall design is solid — per-session override, Restricted-mode DNS egress defaults, cross-namespace router peer with NamespaceSelector, and GC/rollback NP cleanup are all well thought out. A few issues below.
Also note dead code in handleDeleteSandbox: deleteSandboxClaim and deleteSandbox in k8s_client.go already return nil for IsNotFound errors, so the if apierrors.IsNotFound(err) branches at lines ~322 and ~334 of handlers.go are unreachable — err can never be a NotFound error at that point. The log lines there will never fire. The dead branches should be removed to avoid confusing future readers who might assume NotFound could surface there.
| // Create NetworkPolicy if one was built from the template spec. | ||
| // Uses the workloadmanager's own client (not the user client) since NP is infra. | ||
| if networkPolicy != nil { | ||
| if err := createNetworkPolicy(ctx, s.k8sClient.clientset, networkPolicy); err != nil { |
There was a problem hiding this comment.
Security: Mode=Custom in CreateSandboxRequest is a privilege-escalation vector when EnableAuth=true.
The NetworkPolicy is created here using s.k8sClient.clientset — the workloadmanager's own elevated service account (which now has networkpolicies create/delete RBAC per this PR's manifest changes). The NP rules, however, come entirely from the user-supplied sandboxReq.NetworkPolicy field.
This means a user who can call the WM sandbox-create API but does not have networkpolicies/create permission in k8s can bypass that restriction by setting networkPolicy.mode=Custom with arbitrary egress rules. The practical impact: they can weaken the isolation the cluster operator intended (e.g., override a template-level Restricted policy with open egress rules that reach the k8s API server, cloud metadata endpoints, or internal services).
Suggested fixes:
- Disallow
Custommode inCreateSandboxRequestwhenEnableAuth=true; reserveCustomfor cluster-admin-controlled CRD templates only and allow onlyNone/Restrictedper-session overrides. - SubjectAccessReview gate: before creating the NP, perform a SAR check on behalf of the requesting user to verify they hold
networking.k8s.io/networkpoliciescreatein the sandbox namespace.
| routerPeer := networkingv1.NetworkPolicyPeer{ | ||
| PodSelector: &metav1.LabelSelector{MatchLabels: routerSelector}, | ||
| } | ||
| if routerNamespace != "" { |
There was a problem hiding this comment.
When routerNamespace is an empty string, NamespaceSelector is not added to the peer. Kubernetes NetworkPolicy semantics for a peer with only PodSelector (no NamespaceSelector): match pods in the same namespace as the NetworkPolicy.
If the router is in a different namespace and routerNamespace is empty, the router ingress rule silently becomes a same-namespace-only rule — cross-namespace router traffic is blocked, defeating the whole point of this cross-namespace logic.
In production this is guarded by NewServer defaulting RouterNamespace to IdentitySecretNamespace, so the empty case never fires at runtime. But the function itself is silently wrong for empty input, and any test or future caller that passes "" will get incorrect policy without any signal. Consider either panic/returning an error for empty routerNamespace, or at minimum adding a comment: // routerNamespace must be non-empty; an empty value restricts the rule to the sandbox's own namespace.
| np.Spec.Egress = buildRestrictedEgressRules(spec) | ||
| case runtimev1alpha1.NetworkPolicyModeCustom: | ||
| if spec.Custom != nil { | ||
| np.Spec.Ingress = spec.Custom.Ingress |
There was a problem hiding this comment.
PolicyTypes: [Ingress, Egress] is set unconditionally (lines 53-56) for both Restricted and Custom modes. In Custom mode, np.Spec.Ingress is set directly from spec.Custom.Ingress. If the caller provides only egress rules and leaves Custom.Ingress nil, the resulting NP has PolicyTypes: [Ingress] with no ingress rules — deny-all ingress, including from the router.
Unlike Restricted mode, Custom mode does not automatically inject the router ingress rule. This is not documented anywhere in the type or the mode description, so an operator writing:
networkPolicy:
mode: Custom
custom:
egress:
- ports: [{port: 443}]...would unknowingly block the router from reaching the sandbox and see requests time out silently.
Please add a note to SandboxNetworkPolicyCustomRules or NetworkPolicyModeCustom warning that callers must explicitly include any required ingress rules (including router ingress) when using Custom mode.
There was a problem hiding this comment.
The overall design is solid — per-session override semantics, rollback logic, GC cleanup, and the cross-namespace router NamespaceSelector are all well-thought-out.
Dead code in handleDeleteSandbox (lines ~322/334 — outside this diff's hunks): The if apierrors.IsNotFound(err) branches after deleteSandboxClaim/deleteSandbox calls are unreachable. Both helpers already swallow IsNotFound and return nil for it. Any error that reaches those if err != nil guards is always a real, non-NotFound error. The branches are harmless today but would cause a silent behavior change if the helpers ever stop suppressing NotFound. Fix: remove the dead IsNotFound branches, or remove NotFound-suppression from the helpers and handle it at each call site.
| // Uses the workloadmanager's own client (not the user client) since NP is infra. | ||
| if networkPolicy != nil { | ||
| if err := createNetworkPolicy(ctx, s.k8sClient.clientset, networkPolicy); err != nil { | ||
| return nil, api.NewInternalError(err) |
There was a problem hiding this comment.
Security: per-session Custom mode allows RBAC bypass when EnableAuth=true.
This call uses s.k8sClient.clientset (the workloadmanager's own elevated SA, which has networking.k8s.io/networkpolicies create RBAC per the manifest added in this PR). A user who calls this API endpoint with networkPolicy.mode=Custom can therefore create an arbitrary NetworkPolicy in the sandbox namespace, even if their own principal lacks that RBAC permission.
In a multi-tenant cluster where an operator sets Mode=Restricted to lock down sandbox egress (e.g., block the metadata endpoint or internal services), any API caller can override that with Mode=Custom and open egress at will — effectively a privilege escalation.
Fix options:
- Disallow
Customin per-session overrides. Only allow it in the operator-controlled template spec; session overrides can only setMode=Restricted/Noneand addallowedEgressentries. - SubjectAccessReview gate. Before using the elevated client, perform an
authorizationv1.SubjectAccessReviewto confirm the requesting user hasnetworking.k8s.io/networkpoliciescreate permission in the target namespace.
| case runtimev1alpha1.NetworkPolicyModeCustom: | ||
| if spec.Custom != nil { | ||
| np.Spec.Ingress = spec.Custom.Ingress | ||
| np.Spec.Egress = spec.Custom.Egress |
There was a problem hiding this comment.
PolicyTypes: [Ingress, Egress] is always set; nil Custom.Ingress = deny-all ingress, including from the router.
PolicyTypes is set unconditionally to [Ingress, Egress] for all non-None modes. In Custom mode, if spec.Custom.Ingress is nil (the user only wants to configure egress), np.Spec.Ingress is nil here. Per k8s NetworkPolicy semantics, PolicyType=Ingress with no ingress rules means deny all ingress — including from the router. The sandbox becomes completely unreachable.
This affects a user who passes {mode: Custom, custom: {egress: [...]}} expecting to add egress allow rules while keeping normal inbound traffic.
The kubebuilder validation only requires custom != null, so {mode: Custom, custom: {}} is accepted and silently creates a deny-all policy.
Fix: add a warning to the SandboxNetworkPolicyCustomRules godoc:
// When mode is Custom, ALL ingress is denied unless rules are explicitly
// specified here. The router ingress rule injected by Restricted mode is NOT
// automatic — include it in Ingress if sandbox reachability is required.
| PodSelector: &metav1.LabelSelector{MatchLabels: routerSelector}, | ||
| } | ||
| if routerNamespace != "" { | ||
| routerPeer.NamespaceSelector = &metav1.LabelSelector{ |
There was a problem hiding this comment.
Silent wrong semantics when routerNamespace is empty.
When routerNamespace == "", NamespaceSelector is skipped. Per k8s NetworkPolicy semantics, a NetworkPolicyPeer with only PodSelector and no NamespaceSelector selects pods in the same namespace as the NetworkPolicy — not all namespaces.
So if routerNamespace is empty, the rule only allows router pods co-located in the sandbox namespace. If the router is in a different namespace (the common multi-tenant layout the comment above describes), it silently cannot reach the sandbox — exactly the case the NamespaceSelector was added to prevent.
server.go/NewServer ensures RouterNamespace always defaults to at least IdentitySecretNamespace, so production is safe today. But the function itself has incorrect semantics for the empty case, making it a hazard for tests and any future direct callers.
Fix: guard at the top of buildRestrictedIngressRules:
if routerNamespace == "" {
// A missing namespace selector restricts the rule to the sandbox namespace only,
// defeating cross-namespace router ingress. Callers must supply a non-empty value.
klog.Fatalf("routerNamespace must not be empty")
}or return an error up the call chain.
| // "Custom": use the raw Ingress/Egress rules in the Custom field. | ||
| // +kubebuilder:default=None | ||
| // +optional | ||
| Mode NetworkPolicyMode `json:"mode,omitempty"` |
There was a problem hiding this comment.
Why do you need this field, i thin none can be represented with unset policy
Custom can be represented by AllowedIngress AllowedEgress
Restricted should be like None
|
|
||
| // AllowedEgress lists additional egress allow rules applied when Mode=Restricted. | ||
| // +optional | ||
| AllowedEgress []SandboxEgressRule `json:"allowedEgress,omitempty"` |
There was a problem hiding this comment.
| AllowedEgress []SandboxEgressRule `json:"allowedEgress,omitempty"` | |
| Egress []SandboxEgressRule `json:"allowedEgress,omitempty"` |
| // AllowDNS controls whether a DNS egress rule (port 53 UDP/TCP) is added when | ||
| // Mode=Restricted. Defaults to true. | ||
| // +optional | ||
| AllowDNS *bool `json:"allowDNS,omitempty"` |
| // AllowedIngress lists additional ingress allow rules when Mode=Restricted. | ||
| // Router ingress is always permitted in Restricted mode regardless of this field. | ||
| // +optional | ||
| AllowedIngress []SandboxIngressRule `json:"allowedIngress,omitempty"` |
There was a problem hiding this comment.
| AllowedIngress []SandboxIngressRule `json:"allowedIngress,omitempty"` | |
| Ingress []SandboxIngressRule `json:"ingress,omitempty"` |
| CIDRs []string `json:"cidrs,omitempty"` | ||
| // Ports restricts the rule to these destination ports. An empty list matches all ports. | ||
| // +optional | ||
| Ports []SandboxNetworkPolicyPort `json:"ports,omitempty"` |
There was a problem hiding this comment.
?? I think the comment is wrong , copied from SandboxEgressRule
| } | ||
|
|
||
| // SandboxIngressRule describes an ingress allow rule for a sandbox pod. | ||
| type SandboxIngressRule struct { |
There was a problem hiding this comment.
This struct is same as SandboxEgressRule, please check k8s NetworkPolicySpec api
| return nil, err | ||
| } | ||
| } else { | ||
| if _, err := createSandbox(ctx, dynamicClient, sandbox); err != nil { |
There was a problem hiding this comment.
This was extracted from createSandbox to get its cyclomatic complexity under the gocyclo lint limit of 15 , it was hitting 16 before. The logic is exactly the same, just pulled out into its own helper. The early return is just how I naturally wrote it in Go, happy to match whatever style you prefer here.
| func TestBuildNetworkPolicy_NilSpec(t *testing.T) { | ||
| // nil spec → no NP (no enforcement) | ||
| np := buildNetworkPolicy("sandbox-1", "default", nil, testRouterSelector, testRouterNamespace) | ||
| assert.Nil(t, np) | ||
| } | ||
|
|
||
| func TestBuildNetworkPolicy_Metadata(t *testing.T) { | ||
| np := buildNetworkPolicy("my-sandbox", "my-ns", &runtimev1alpha1.SandboxNetworkPolicy{}, | ||
| testRouterSelector, testRouterNamespace) | ||
| require.NotNil(t, np) |
| verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] | ||
| - apiGroups: ["networking.k8s.io"] | ||
| resources: ["networkpolicies"] | ||
| verbs: ["get", "list", "watch", "create", "delete"] |
| Spec corev1.PodSpec `json:"spec" protobuf:"bytes,3,opt,name=spec"` | ||
|
|
||
| // NetworkPolicy defines the network isolation policy for sandbox pods created from | ||
| // this template. Defaults to Mode=None (no policy created). |
| Resources corev1.ResourceRequirements `json:"resources,omitempty"` | ||
|
|
||
| // NetworkPolicy defines the network isolation policy for sandbox pods created from | ||
| // this template. Defaults to Mode=None (no policy created). |
| extraEnv: [] | ||
| networkPolicy: | ||
| # routerSelector is the pod label selector used to identify the router pod. | ||
| # Applied as the mandatory ingress allow rule when SandboxNetworkPolicy.Mode=Restricted. |
| routerSelector = flag.String("router-selector", "app=agentcube-router", | ||
| "Pod label selector identifying the router pod, used for the mandatory "+ | ||
| "router→sandbox ingress rule when NetworkPolicy.Mode=Restricted. "+ | ||
| "Format: comma-separated key=value pairs (e.g. app=agentcube-router,tier=proxy)") | ||
| routerNamespace = flag.String("router-namespace", "", | ||
| "Namespace the router runs in. Used to build a cross-namespace NamespaceSelector "+ | ||
| "for the router→sandbox ingress rule. Defaults to the workloadmanager's own namespace.") |
| func validateNetworkPolicyOverride(np *runtimev1alpha1.SandboxNetworkPolicy) error { | ||
| if np == nil { | ||
| return nil | ||
| } | ||
| for _, rule := range np.Egress { |
| // SandboxNetworkPolicy defines the network isolation policy applied to each sandbox pod. | ||
| // When set (non-nil), a deny-all NetworkPolicy is created with explicit allow rules for | ||
| // router ingress and DNS egress. Additional rules can be added via Ingress and | ||
| // Egress. Leave this field unset (nil) to disable network policy enforcement. | ||
| type SandboxNetworkPolicy struct { |
| networkPolicy: | ||
| description: |- | ||
| NetworkPolicy defines the network isolation policy for sandbox pods created from | ||
| this template. Defaults to Mode=None (no policy created). |
| networkPolicy: | ||
| description: |- | ||
| NetworkPolicy defines the network isolation policy for sandbox pods created from | ||
| this template. Defaults to Mode=None (no policy created). |
hzxuzhonghu
left a comment
There was a problem hiding this comment.
Solid feature PR — the three-mode design (None/Restricted/Custom) is clean and the cross-namespace router ingress fix with NamespaceSelector is important for real deployments. The --router-selector / --router-namespace flags are a nice way to keep this configurable. Two things worth looking at below.
| CIDRs is the list of CIDR blocks for this rule (source for ingress, destination for egress). | ||
| Each entry must be valid CIDR notation (e.g. "10.0.0.0/8" or "2001:db8::/32"). | ||
| items: | ||
| pattern: ^(\d{1,3}\.){3}\d{1,3}\/\d{1,2}$|^[0-9a-fA-F:]+\/\d{1,3}$ |
There was a problem hiding this comment.
The IPv4 part of this regex ((\d{1,3}\.){3}\d{1,3}\/\d{1,2}) accepts octets up to 999 — so 999.999.999.999/0 would pass CRD validation. Not a security issue since K8s will reject it at the NetworkPolicy level, but it might be confusing to users who get a mysterious downstream error instead of a clear validation message. A stricter octet bound like (25[0-5]|2[0-4]\d|[01]?\d\d?) would catch it earlier.
| @@ -95,6 +103,11 @@ func main() { | |||
| os.Exit(1) | |||
There was a problem hiding this comment.
Malformed pairs in --router-selector are silently skipped (only an empty-result warning is logged). If someone passes app=agentcube-router,badpair they'd get app=agentcube-router silently, which could be confusing. Maybe worth a klog.Warningf per skipped pair so it's visible in logs?
|
friendly ping @Sanchit2662 .Are u still working on it? |
|
@FAUST-BENCHOU , yes i am working on it ,I will push the changes shorlty. |
|
Thanks btw dont forget to rebase first |
0a71098 to
577409c
Compare
|
Hi @hzxuzhonghu , I have made the required change. |
|
Please resolve the conflicts, i will take a look |
- Add SandboxNetworkPolicy API type (None/Restricted/Custom modes) - Build per-sandbox NetworkPolicy with router ingress always allowed - Support cross-namespace router via NamespaceSelector - Per-session NP override in CreateSandboxRequest (replace semantics) - Configurable router selector/namespace via chart values and CLI flags - Clean up NP on sandbox delete, rollback, and GC - Add networkpolicies RBAC to workloadmanager ClusterRole Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
…eter path - Fix session NP override being ignored in direct sandbox path - Add CIDR format validation markers to SandboxEgressRule and SandboxIngressRule - Add example manifest for NetworkPolicy modes (Restricted, Custom) Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
- Add OwnerReference on NetworkPolicy pointing to Sandbox/SandboxClaim so k8s GC cascades deletion when owner is removed out-of-band - Add CEL validation: custom field must be set when mode is Custom - Add TODO comment on warm-pool NP selector gap pending agent-sandbox fix - Add GC lifecycle tests verifying NP is deleted alongside sandbox - Extend createSandboxClaim to return UID for OwnerReference wiring Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
Extract sandbox/claim creation and NetworkPolicy OwnerReference wiring into createWorkload helper to bring createSandbox complexity from 16 to under the gocyclo limit of 15 Also address Copilot review comments: - Treat AlreadyExists as error in createNetworkPolicy instead of silently reusing a potentially stale or mismatched policy - Validate per-session NetworkPolicy override in CreateSandboxRequest.Validate: reject unknown modes and require custom field when mode is Custom - Fix import ordering to be goimports-compliant across handlers.go, k8s_client.go, sandbox.go and main.go - Warn at startup when --router-selector parses to an empty map Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
- Remove Mode enum (None/Restricted/Custom); nil policy means no enforcement, non-nil enforces deny-all with router ingress and DNS
- Remove AllowDNS toggle; DNS egress is always included when a policy
is set - Rename AllowedEgress/AllowedIngress to Egress/Ingress; update JSON
tags to egress/ingress - Merge SandboxEgressRule and SandboxIngressRule into single
SandboxNetworkPolicyRule type - Block Custom mode in per-session overrides to prevent RBAC privilege
escalation via the workloadmanager elevated SA
- Add klog.Fatalf guard when routerNamespace is empty to prevent silent same-namespace-only router ingress semantics
- Remove dead IsNotFound branches in handleDeleteSandbox; helpers
already swallow NotFound and return nil - Fix incorrect Ports comment copied from SandboxEgressRule into
SandboxIngressRule
Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
- Block per-session networkPolicy override with 403 when auth is enabled, preventing privilege escalation through the workloadmanager's elevated SA - Replace klog.Fatalf in buildIngressRules with a proper error return so a misconfigured routerNamespace fails the request instead of crashing - Propagate buildNetworkPolicy errors up through both builder functions - Validate CIDRs in per-session NetworkPolicy overrides at request time (net.ParseCIDR) so invalid input returns 400 instead of a later 500 - Fix goimports ordering in sandbox.go Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
fb74b67 to
336027d
Compare
|
I resolved the conflicts. |
|
make sure ci green first |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Sandboxes had zero network restrictions. A pod inside a sandbox could reach anything, internal or external. Since sandboxes run untrusted code this is a real isolation gap, so I picked up issue #216 to add NetworkPolicy support.
I added a
SandboxNetworkPolicyfield to bothSandboxTemplate(AgentRuntime) andCodeInterpreterSandboxTemplate. Enforcement is controlled by whether the field is set:EgressandIngressfields.A few specific things I was careful about:
Cross-namespace router ingress — using just
PodSelectorwould only match pods in the same namespace as the NP. In a real deployment the router usually lives in a different namespace than the sandbox pods, so that would silently break the router rule. I fixed this by adding aNamespaceSelectorusingkubernetes.io/metadata.nameso it works across namespaces.Router selector is configurable — you can set it in
values.yamlunderworkloadmanager.networkPolicy.routerSelectorandrouterNamespace, or pass--router-selectorand--router-namespaceflags. Defaults toapp=agentcube-routerand the workloadmanager's own namespace.Per-session override — added
NetworkPolicyfield toCreateSandboxRequest. When set it fully replaces the template-level policy for that session only. No merging, just a clean replace. Overrides are rejected with 403 whenEnableAuth=true.Cleanup — NPs are deleted when you delete the sandbox through the API, on rollback if creation fails partway through, and in the GC loop. I also fixed a bug where the GC was returning early on
IsNotFoundbefore deleting the NP. Since the store record gets purged right after, that would've leaked the NP permanently. Now NP cleanup runs regardless of whether the sandbox was already gone.Rollback — simplified to always attempt NP delete when
networkPolicy != nilrather than tracking anpCreatedflag. This also handles the edge case where the k8s write succeeded but returned an error to us.RBAC — added
networking.k8s.io/networkpoliciescreate/get/list/watch/patch/delete to the workloadmanager ClusterRole.Which issue(s) this PR fixes:
Fixes #216
Special notes for your reviewer:
Known gap: for the warm-pool path (SandboxClaim), the NP gets created but its
PodSelectorusesSandboxNameLabelKeywhich agent-sandbox doesn't propagate onto warm pool pods. So the NP exists but won't enforce anything for warm-pool sessions right now. I want to confirm the label propagation behavior on the agent-sandbox side before fixing this, happy to do it as a follow-up.Does this PR introduce a user-facing change?: