Skip to content

feat: remove default 8h max TTL for AgentRuntime sandboxes#360

Open
HarshitPal25 wants to merge 4 commits into
volcano-sh:mainfrom
HarshitPal25:fix/remove-agentruntime-default-max-ttl
Open

feat: remove default 8h max TTL for AgentRuntime sandboxes#360
HarshitPal25 wants to merge 4 commits into
volcano-sh:mainfrom
HarshitPal25:fix/remove-agentruntime-default-max-ttl

Conversation

@HarshitPal25
Copy link
Copy Markdown
Contributor

What this PR does

Fixes #303

Problem

AgentRuntime.spec.maxSessionDuration had a kubebuilder default of 8h, which caused the workload-manager to always set Sandbox.Spec.Lifecycle.ShutdownTime to now + 8h. This meant every AgentRuntime sandbox was hard-deleted after 8 hours, even if it was still actively serving a long-running agent (e.g. hosting OpenClaw or any persistent agent service).

Solution

Make maxSessionDuration truly opt-in for AgentRuntime:

  • agent_type.go: Removed +kubebuilder:default="8h" from AgentRuntimeSpec.MaxSessionDuration; added +optional and a clear comment that omitting it means no hard cap.
  • workload_builder.go: buildSandboxObject no longer falls back to DefaultSandboxTTL when ttl == 0. ShutdownTime is only set when MaxSessionDuration is explicitly configured by the user.
  • sandbox_helper.go: buildSandboxPlaceHolder and buildSandboxInfo no longer fall back to DefaultSandboxTTL when ShutdownTime is absent. ExpiresAt is left as zero time (meaning no hard expiry).
  • store_redis.go: Skips adding to the expiry sorted-set when ExpiresAt is zero, so the GC loop never forcibly expires these sandboxes.
  • store_valkey.go: Same — zero ExpiresAt bypasses the expiry index, only tracking the session in the activity index for idle-timeout cleanup.
  • Tests: Updated to reflect the new behaviour: no ShutdownTime → zero ExpiresAt → sandbox runs indefinitely, cleaned up only by sessionTimeout idle logic.

CodeInterpreter is unchanged — it retains the 8h default at the buildSandboxByCodeInterpreter call-site.

Behaviour after this change

Resource maxSessionDuration set? Hard delete?
AgentRuntime No (default) Never (idle timeout only)
AgentRuntime Yes (e.g. 24h) After 24h
CodeInterpreter No (default) After 8h (unchanged)
CodeInterpreter Yes After configured duration

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the sandbox session duration logic to make MaxSessionDuration optional, allowing sandboxes to run indefinitely without a hard expiry when omitted. The Redis and Valkey store implementations are updated to skip the expiry index when ExpiresAt is zero, and helper functions are modified to prevent falling back to DefaultSandboxTTL. The reviewer points out that removing the fallback to DefaultSandboxTTL inside buildSandboxObject could lead to CodeInterpreter instances running indefinitely if created without an explicit duration (e.g., in unit tests). They suggest explicitly handling this fallback in buildSandboxByCodeInterpreter to maintain the 8-hour default guarantee.

Comment on lines +203 to +206
if params.ttl > 0 {
shutdownTime := metav1.NewTime(time.Now().Add(params.ttl))
sandbox.Spec.Lifecycle.ShutdownTime = &shutdownTime
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

By removing the fallback to DefaultSandboxTTL when params.ttl == 0 inside buildSandboxObject, any CodeInterpreter created without an explicit MaxSessionDuration (for example, in unit tests or direct client-go calls where kubebuilder defaults are not applied) will now run indefinitely without a hard expiry.

To maintain the 8h default guarantee for CodeInterpreter across all environments, the fallback to DefaultSandboxTTL should be explicitly handled in buildSandboxByCodeInterpreter (both in the warm pool path and the standard path).

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 26, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 54.16667% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.43%. Comparing base (524e55e) to head (dc24233).
⚠️ Report is 103 commits behind head on main.

Files with missing lines Patch % Lines
pkg/store/store_valkey.go 0.00% 8 Missing and 1 partial ⚠️
pkg/workloadmanager/workload_builder.go 77.77% 1 Missing and 1 partial ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #360      +/-   ##
==========================================
+ Coverage   47.57%   55.43%   +7.86%     
==========================================
  Files          30       34       +4     
  Lines        2819     3195     +376     
==========================================
+ Hits         1341     1771     +430     
+ Misses       1338     1244      -94     
- Partials      140      180      +40     
Flag Coverage Δ
unittests 55.43% <54.16%> (+7.86%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Removes the implicit 8-hour hard TTL for AgentRuntime sandboxes so long-running agent services aren’t forcibly deleted, making spec.maxSessionDuration truly opt-in and propagating “no hard expiry” through sandbox info + store GC indexing.

Changes:

  • Make AgentRuntime.spec.maxSessionDuration optional (omit => no hard cap).
  • Only set Sandbox.Spec.Lifecycle.ShutdownTime when a positive TTL is explicitly provided.
  • Allow ExpiresAt to be zero (no expiry) and skip indexing such sessions into the expiry sorted-set in Redis/Valkey.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pkg/apis/runtime/v1alpha1/agent_type.go Makes AgentRuntime max session duration optional in the API type.
pkg/workloadmanager/workload_builder.go Stops defaulting TTL to 8h; only sets ShutdownTime when TTL > 0.
pkg/workloadmanager/sandbox_helper.go Leaves ExpiresAt as zero when ShutdownTime is absent.
pkg/workloadmanager/sandbox_helper_test.go Updates placeholder behavior expectations for zero ExpiresAt.
pkg/store/store_redis.go Allows zero ExpiresAt and skips adding to expiry index when unset.
pkg/store/store_valkey.go Same as Redis: zero ExpiresAt bypasses expiry indexing.
Comments suppressed due to low confidence (1)

pkg/workloadmanager/workload_builder.go:206

  • New behavior (ttl == 0 => no ShutdownTime / no hard expiry) is introduced here but isn’t covered by tests in pkg/workloadmanager/workload_builder_test.go (current tests always pass a positive ttl). Please add a unit test asserting that when ttl is zero, sandbox.Spec.Lifecycle.ShutdownTime remains nil so the sandbox is not hard-expired.
	// Only set ShutdownTime when MaxSessionDuration is explicitly configured.
	// When omitted (ttl == 0), the sandbox runs indefinitely and is only
	// cleaned up by idle timeout (SessionTimeout).
	if params.ttl > 0 {
		shutdownTime := metav1.NewTime(time.Now().Add(params.ttl))
		sandbox.Spec.Lifecycle.ShutdownTime = &shutdownTime
	}

// +kubebuilder:default="8h"
// When omitted, sessions have no maximum duration and will only be cleaned up
// by idle timeout (SessionTimeout).
// +optional
Comment on lines +200 to +206
// Only set ShutdownTime when MaxSessionDuration is explicitly configured.
// When omitted (ttl == 0), the sandbox runs indefinitely and is only
// cleaned up by idle timeout (SessionTimeout).
if params.ttl > 0 {
shutdownTime := metav1.NewTime(time.Now().Add(params.ttl))
sandbox.Spec.Lifecycle.ShutdownTime = &shutdownTime
}
Comment thread pkg/store/store_redis.go
Comment on lines 158 to +167
pipe := rs.cli.Pipeline()
pipe.SetNX(ctx, sessionKey, b, 0)
pipe.ZAdd(ctx, rs.expiryIndexKey, redisv9.Z{
Score: float64(sandboxRedis.ExpiresAt.Unix()),
Member: sandboxRedis.SessionID,
})
// Only add to expiry index when ExpiresAt is set.
// Zero ExpiresAt means the sandbox has no maximum lifetime.
if !sandboxRedis.ExpiresAt.IsZero() {
pipe.ZAdd(ctx, rs.expiryIndexKey, redisv9.Z{
Score: float64(sandboxRedis.ExpiresAt.Unix()),
Member: sandboxRedis.SessionID,
})
}
Comment thread pkg/store/store_valkey.go
Comment on lines +183 to +197
if sandboxStore.ExpiresAt.IsZero() {
// Zero ExpiresAt means no maximum lifetime — skip expiry index.
// The sandbox will still be tracked in the activity index for idle timeout.
commands := make(valkey.Commands, 0, 3)
commands = append(commands, vs.cli.B().Setnx().Key(sessionKey).Value(string(b)).Build())
commands = append(commands, vs.cli.B().Zadd().Key(vs.lastActivityIndexKey).ScoreMember().
ScoreMember(float64(time.Now().Unix()), sandboxStore.SessionID).Build())

for i, resp := range vs.cli.DoMulti(ctx, commands...) {
if err = resp.Error(); err != nil {
return fmt.Errorf("StoreSandbox: DoMulti failed: %w, command index: %v", err, i)
}
}
return nil
}
// +kubebuilder:default="8h"
// When omitted, sessions have no maximum duration and will only be cleaned up
// by idle timeout (SessionTimeout).
// +optional
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs the generated CRD artifact update as well. Right now the installed AgentRuntime CRD still has default: 8h and still lists maxSessionDuration as required, so the API server will keep defaulting omitted values to 8h and this change will not actually fix #303 in a deployed cluster.

if params.ttl > 0 {
shutdownTime := metav1.NewTime(time.Now().Add(params.ttl))
sandbox.Spec.Lifecycle.ShutdownTime = &shutdownTime
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes ttl == 0 mean no hard expiry for every caller of buildSandboxObject. That is correct for AgentRuntime, but CodeInterpreter is supposed to keep the 8h max-session default from the issue discussion and PR description. Can we set DefaultSandboxTTL explicitly in the CodeInterpreter paths when Spec.MaxSessionDuration is nil, including the warm-pool placeholder path?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 It's better to align with CodeInterpreter.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread pkg/store/store_redis.go
})
// Only add to expiry index when ExpiresAt is set.
// Zero ExpiresAt means the sandbox has no maximum lifetime.
if !sandboxRedis.ExpiresAt.IsZero() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add store tests for the zero ExpiresAt branch? The important behavior is that the session is still stored and tracked in the last-activity index, but is not added to the expiry index.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes ofc we can i was fixing the failure test cases till now i would add store test for the zero expireat now

HarshitPal25 added a commit to HarshitPal25/agentcube that referenced this pull request May 26, 2026
Remove the 'default: 8h' value and drop maxSessionDuration from the
required list in the generated CRD YAML. This matches the updated
kubebuilder markers in agent_type.go where +kubebuilder:default was
removed and the field marked as +optional.

Fixes codegen check failure in PR volcano-sh#360.
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kevin-wangzefeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot AI review requested due to automatic review settings May 26, 2026 13:02
HarshitPal25 added a commit to HarshitPal25/agentcube that referenced this pull request May 26, 2026
Remove the 'default: 8h' value and drop maxSessionDuration from the
required list in the generated CRD YAML. This matches the updated
kubebuilder markers in agent_type.go where +kubebuilder:default was
removed and the field marked as +optional.

Fixes codegen check failure in PR volcano-sh#360.

Signed-off-by: HarshitPal25 <harshit13082006@gmail.com>
@HarshitPal25 HarshitPal25 force-pushed the fix/remove-agentruntime-default-max-ttl branch from a3d5351 to 879413e Compare May 26, 2026 13:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 11 changed files in this pull request and generated 2 comments.

Comment thread pkg/store/store_redis.go
Comment on lines 159 to +163
pipe.SetNX(ctx, sessionKey, b, 0)
pipe.ZAdd(ctx, rs.expiryIndexKey, redisv9.Z{
Score: float64(sandboxRedis.ExpiresAt.Unix()),
Member: sandboxRedis.SessionID,
})
// Only add to expiry index when ExpiresAt is set.
// Zero ExpiresAt means the sandbox has no maximum lifetime.
if !sandboxRedis.ExpiresAt.IsZero() {
pipe.ZAdd(ctx, rs.expiryIndexKey, redisv9.Z{
Comment thread pkg/store/store_valkey.go
Comment on lines +183 to +187
if sandboxStore.ExpiresAt.IsZero() {
// Zero ExpiresAt means no maximum lifetime — skip expiry index.
// The sandbox will still be tracked in the activity index for idle timeout.
commands := make(valkey.Commands, 0, 3)
commands = append(commands, vs.cli.B().Setnx().Key(sessionKey).Value(string(b)).Build())
Copy link
Copy Markdown
Contributor

@FAUST-BENCHOU FAUST-BENCHOU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommand to rebase first

Labels: map[string]string{
SessionIdLabelKey: params.sessionID,
WorkloadNameLabelKey: params.workloadName,
"managed-by": "agentcube-workload-manager",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert all this kind of fmt and rebase to the newest upstream first.We have solved all fmt problem in f17fa17 i think

}{
{
name: "no ShutdownTime falls back to DefaultSandboxTTL",
name: "no ShutdownTime means no expiry (sandbox runs indefinitely)",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not remove all DefaultSandboxTTL in ut test like in workload_builder_test.go L44 since we are not using it anymore

type: object
type: array
required:
- maxSessionDuration
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also upd example/agent-runtime/agent-runtime.yaml

if params.ttl > 0 {
shutdownTime := metav1.NewTime(time.Now().Add(params.ttl))
sandbox.Spec.Lifecycle.ShutdownTime = &shutdownTime
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 It's better to align with CodeInterpreter.

AgentRuntime.Spec.MaxSessionDuration had a kubebuilder default of "8h",
meaning every AgentRuntime sandbox was forcefully terminated after 8
hours regardless of whether the user configured this field. For
long-running agents (e.g., hosting OpenClaw within agentcube), this
made AgentRuntime unusable as a persistent runtime.

This change makes MaxSessionDuration truly optional:

- Remove `+kubebuilder:default="8h"` from AgentRuntimeSpec, replacing
  it with `+optional`. When omitted, sandboxes have no maximum lifetime.
- Update buildSandboxObject() to only set Lifecycle.ShutdownTime when
  ttl > 0 (i.e., MaxSessionDuration is explicitly configured). When
  ttl == 0, the sandbox runs indefinitely.
- Update buildSandboxPlaceHolder() and buildSandboxInfo() to use zero
  ExpiresAt when ShutdownTime is nil, instead of falling back to the
  hardcoded DefaultSandboxTTL constant.
- Sandboxes without MaxSessionDuration are still cleaned up by idle
  timeout (SessionTimeout), which defaults to 15 minutes.
- Update tests to reflect the new behavior.

Fixes volcano-sh#303

Signed-off-by: HarshitPal25 <harshit13082006@gmail.com>
Remove the 'default: 8h' value and drop maxSessionDuration from the
required list in the generated CRD YAML. This matches the updated
kubebuilder markers in agent_type.go where +kubebuilder:default was
removed and the field marked as +optional.

Fixes codegen check failure in PR volcano-sh#360.

Signed-off-by: HarshitPal25 <harshit13082006@gmail.com>
@HarshitPal25 HarshitPal25 force-pushed the fix/remove-agentruntime-default-max-ttl branch from 879413e to d5439a1 Compare May 29, 2026 11:55
HarshitPal25 added a commit to HarshitPal25/agentcube that referenced this pull request May 29, 2026
- CodeInterpreter explicitly sets DefaultSandboxTTL (8h) when
  MaxSessionDuration is nil, in both the direct sandbox path and the
  warm-pool placeholder path. This preserves the agreed-upon default
  for CodeInterpreter while AgentRuntime remains unlimited by default.

- Add TestRedisStore_StoreSandbox_ZeroExpiresAt to verify that a
  sandbox with zero ExpiresAt is stored, retrievable, absent from the
  expiry index (not GC'd by TTL), and still tracked in the last-
  activity index (eligible for idle-timeout GC).

- Add TestBuildSandboxObject_NoTTL to verify that ttl==0 produces a
  Sandbox with nil ShutdownTime (AgentRuntime indefinite lifetime path).

- Replace DefaultSandboxTTL with an explicit time.Hour in
  workload_builder_test.go L44 since that constant is no longer the
  implied default for every buildSandboxObject caller.

- Update example/agent-runtime/agent-runtime.yaml: remove hard-coded
  maxSessionDuration: 8h and replace it with a commented-out example
  documenting the optional nature of the field.

Signed-off-by: HarshitPal25 <harshit13082006@gmail.com>
HarshitPal25 added a commit to HarshitPal25/agentcube that referenced this pull request May 29, 2026
…iewers ask volcano-sh#360

- CodeInterpreter now explicitly sets DefaultSandboxTTL (8h) when
  MaxSessionDuration is nil, in both the direct sandbox path and the
  warm-pool placeholder path, preserving the agreed-upon behavior while
  AgentRuntime remains unlimited by default.
- Add TestRedisStore_StoreSandbox_ZeroExpiresAt: verifies that a sandbox
  with zero ExpiresAt is stored and retrievable, absent from the expiry
  index (not hard-deleted), and still tracked in the last-activity index
  (idle-timeout GC still works).
- Add TestBuildSandboxObject_NoTTL: verifies that ttl==0 produces a
  Sandbox with nil ShutdownTime (AgentRuntime indefinite lifetime path).
- Replace DefaultSandboxTTL with time.Hour in workload_builder_test.go
  since DefaultSandboxTTL is no longer the implied default for every
  buildSandboxObject caller.
- Update example/agent-runtime/agent-runtime.yaml: remove hard-coded
  maxSessionDuration: 8h and replace with a commented-out example that
  documents the field is optional.
Signed-off-by: HarshitPal25 <harshit13082006@gmail.com>
@HarshitPal25 HarshitPal25 force-pushed the fix/remove-agentruntime-default-max-ttl branch from d5439a1 to 8ff1fe7 Compare May 29, 2026 12:06
Copilot AI review requested due to automatic review settings May 29, 2026 12:06
…iewers ask volcano-sh#360

- CodeInterpreter now explicitly sets DefaultSandboxTTL (8h) when
  MaxSessionDuration is nil, in both the direct sandbox path and the
  warm-pool placeholder path, preserving the agreed-upon behavior while
  AgentRuntime remains unlimited by default.
- Add TestRedisStore_StoreSandbox_ZeroExpiresAt: verifies that a sandbox
  with zero ExpiresAt is stored and retrievable, absent from the expiry
  index (not hard-deleted), and still tracked in the last-activity index
  (idle-timeout GC still works).
- Add TestBuildSandboxObject_NoTTL: verifies that ttl==0 produces a
  Sandbox with nil ShutdownTime (AgentRuntime indefinite lifetime path).
- Replace DefaultSandboxTTL with time.Hour in workload_builder_test.go
  since DefaultSandboxTTL is no longer the implied default for every
  buildSandboxObject caller.
- Update example/agent-runtime/agent-runtime.yaml: remove hard-coded
  maxSessionDuration: 8h and replace with a commented-out example that
  documents the field is optional.
- Fix go.mod: re-sort require entries alphabetically (go mod tidy).
Signed-off-by: HarshitPal25 <harshit13082006@gmail.com>
@HarshitPal25 HarshitPal25 force-pushed the fix/remove-agentruntime-default-max-ttl branch from 8ff1fe7 to 71a3bdf Compare May 29, 2026 12:16
Signed-off-by: HarshitPal25 <harshit13082006@gmail.com>
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

Keywords which can automatically close issues and at(@) or hashtag(#) mentions are not allowed in commit messages.

The list of commits with invalid commit messages:

  • 360fd5a feat: remove default 8h max TTL for AgentRuntime sandboxes
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Comment thread test/e2e/echo_agent.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never used?

Comment thread pkg/store/store_valkey.go
Comment on lines +183 to +194
if sandboxStore.ExpiresAt.IsZero() {
// Zero ExpiresAt means no maximum lifetime — skip expiry index.
// The sandbox will still be tracked in the activity index for idle timeout.
commands := make(valkey.Commands, 0, 3)
commands = append(commands, vs.cli.B().Setnx().Key(sessionKey).Value(string(b)).Build())
commands = append(commands, vs.cli.B().Zadd().Key(vs.lastActivityIndexKey).ScoreMember().
ScoreMember(float64(time.Now().Unix()), sandboxStore.SessionID).Build())

for i, resp := range vs.cli.DoMulti(ctx, commands...) {
if err = resp.Error(); err != nil {
return fmt.Errorf("StoreSandbox: DoMulti failed: %w, command index: %v", err, i)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am seeing this is a little different from redis, is that because valkey sdk does not align with redis?

Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point from me, we can later expose sandbox lifecycle managerment api through sdk. Otherwise for agentruntime, user may not be able to delete the sandbox

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove AgentRuntime default max ttl

7 participants