Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
ac598eb
feat(kiloclaw): add multi-instance identity plumbing (PR1)
pandemicsyn Mar 25, 2026
22db005
Merge origin/main into florian/chore/org-support
pandemicsyn Mar 25, 2026
a4c91bb
fix(kiloclaw): validate instanceId format on user-facing routes
pandemicsyn Mar 26, 2026
28bdd28
Merge branch 'main' into florian/chore/org-support
pandemicsyn Mar 26, 2026
e0e1be0
refactor(kiloclaw): widen instanceId to full 32-char hex UUID
pandemicsyn Mar 27, 2026
14d028f
refactor(kiloclaw): use DB row UUID as instanceId, drop instance_id c…
pandemicsyn Mar 27, 2026
83f5b20
Merge origin/main — resolve stream-chat-credentials conflicts
pandemicsyn Mar 27, 2026
7e47d4f
docs(kiloclaw): update AGENTS.md for multi-instance migration
pandemicsyn Mar 27, 2026
0aa6979
refactor(kiloclaw): validate instanceId with zod at IO boundaries
pandemicsyn Mar 27, 2026
bdba5d7
fix(kiloclaw): return 400 on invalid instanceId in platform routes
pandemicsyn Mar 27, 2026
dd62edd
style(kiloclaw): remove as-unknown-as-Response cast in parseInstanceI…
pandemicsyn Mar 27, 2026
0fa0d70
fix(kiloclaw): thread instanceId through /api/kiloclaw/chat-credentials
pandemicsyn Mar 27, 2026
1768279
Merge origin/main — renumber migration to 0062, fix ActiveKiloClawIns…
pandemicsyn Mar 27, 2026
415a6f8
Merge origin/main — resolve customSecretMeta conflicts
pandemicsyn Mar 27, 2026
c05211c
Merge origin/main — renumber migration to 0063
pandemicsyn Mar 27, 2026
5e84305
refactor(kiloclaw): deduplicate instance identity helpers into worker…
pandemicsyn Mar 27, 2026
6346ede
fix: use subpath export for worker-utils/instance-id
pandemicsyn Mar 27, 2026
e616332
fix: inline instance-id in Next.js sandbox-id.ts
pandemicsyn Mar 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 33 additions & 3 deletions kiloclaw/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@ KiloClaw is a Cloudflare Worker that runs per-user OpenClaw AI assistant instanc
These are non-negotiable. Do not reintroduce shared/fallback paths.

- **No shared mode.** Every request, DO, and machine is user-scoped. There is no global machine, no shared fallback, no optional userId parameters.
- **User scoping.** Each user gets a dedicated Fly App (`acct-{hash}` in production, `dev-{hash}` in development), managed by the `KiloClawApp` DO. Instance DOs (`KiloClawInstance`) are keyed by `idFromName(userId)` (one instance per user). Machine names use `sandboxIdFromUserId(userId)`. Both are deterministic. **Known limitation**: when multi-sandbox-per-user is needed, the Instance DO key should change to `sandboxId` or an instance ID, and the platform API will need to accept a sandbox/instance identifier alongside userId. The App DO already supports this (one app per user, multiple instances per app).
- **Per-user Fly Apps.** New instances get a per-user Fly app created by `KiloClawApp.ensureApp()`. The app name (`flyAppName`) is cached in the Instance DO for proxy routing. Legacy instances without `flyAppName` fall back to `FLY_APP_NAME`. Apps are kept alive after instance destroy (empty apps cost nothing) and reused on re-provision.
- **User scoping (transitioning to multi-instance).** Legacy: Instance DOs are keyed by `idFromName(userId)` (one instance per user), sandboxId = `sandboxIdFromUserId(userId)`. **Multi-instance (in progress):** Instance DOs will be keyed by `idFromName(instanceId)` where `instanceId` = `kiloclaw_instances.id` UUID. sandboxId = `sandboxIdFromInstanceId(instanceId)` → `ki_{uuid-no-dashes}` (35 chars). The `ki_` prefix distinguishes instance-keyed sandboxIds from legacy userId-derived ones (base64url). All routes accept an optional `?instanceId=` query param; when absent, they fall back to the legacy userId-keyed path. See the "Multi-Instance Migration" section below for details.
- **Per-user Fly Apps (unchanged by multi-instance).** Each user gets a dedicated Fly App (`acct-{hash}` in production, `dev-{hash}` in development), managed by the `KiloClawApp` DO. Multiple instances per user share one Fly app. The App DO stays user-keyed.
- **Per-user Fly Apps (legacy detail).** New instances get a per-user Fly app created by `KiloClawApp.ensureApp()`. The app name (`flyAppName`) is cached in the Instance DO for proxy routing. Legacy instances without `flyAppName` fall back to `FLY_APP_NAME`. Apps are kept alive after instance destroy (empty apps cost nothing) and reused on re-provision.
- **`buildEnvVars` requires `sandboxId` and `gatewayTokenSecret`.** Returns `{ env, sensitive }` split. Sensitive values are AES-256-GCM encrypted and prefixed with `KILOCLAW_ENC_` before placement in machine config.env. Gateway token and `AUTO_APPROVE_DEVICES` are always set. No fallback to worker-level channel tokens.
- **Env var name constraints.** User-provided `envVars` and `encryptedSecrets` keys must be valid shell identifiers (`/^[A-Za-z_][A-Za-z0-9_]*$/`) and must not use reserved prefixes `KILOCLAW_ENC_` or `KILOCLAW_ENV_`. Validated at schema level (ingest) and runtime (decrypt block).
- **Token comparisons must be timing-safe.** Never compare auth/proxy tokens with `===`/`!==`. Use `timingSafeTokenEqual` from `controller/src/auth.ts` (or an equivalent `crypto.timingSafeEqual`-based helper) for bearer/proxy token validation.
Expand Down Expand Up @@ -65,7 +66,7 @@ src/
│ ├── middleware.ts # JWT auth + pepper validation via Hyperdrive
│ ├── jwt.ts # Token parsing/verification
│ ├── gateway-token.ts # HMAC-SHA256 derivation for per-sandbox tokens
│ └── sandbox-id.ts # userId <-> sandboxId (base64url, reversible)
│ └── sandbox-id.ts # userId <-> sandboxId (base64url) + instanceId validation/derivation
├── durable-objects/
│ ├── kiloclaw-app.ts # DO: per-user Fly App lifecycle (create app, allocate IPs, env key)
│ └── kiloclaw-instance.ts # DO: lifecycle state machine, reconciliation, two-phase destroy
Expand Down Expand Up @@ -175,6 +176,7 @@ Before submitting any change:
3. Do not reintroduce optional `userId` or `sandboxId` parameters (they are always required)
4. If changing bootstrap behavior, update `controller/src/bootstrap.ts` and its tests
5. If adding or changing user-facing features, add a changelog entry to `src/app/(app)/claw/components/changelog-data.ts` (newest first)
6. If adding a new route that resolves a KiloClawInstance DO stub, accept optional `?instanceId=` and use it as the DO key when present (see "Multi-Instance Migration" section)

## Test Targets by Change Type

Expand All @@ -192,6 +194,34 @@ Before submitting any change:
| Sandbox ID derivation | `src/auth/sandbox-id.test.ts` |
| Gateway token derivation | `src/auth/gateway-token.test.ts` |

## Multi-Instance Migration (In Progress)

KiloClaw is transitioning from one-instance-per-user to N-instances-per-owner (personal + org). This affects how you write new code:

### Identity model

| Concept | Legacy (current default) | Multi-instance (new path) |
| --------------- | ----------------------------------------------------- | -------------------------------------------------------------------------- |
| **DO key** | `idFromName(userId)` | `idFromName(instanceId)` where `instanceId` = `kiloclaw_instances.id` UUID |
| **sandboxId** | `sandboxIdFromUserId(userId)` — base64url, reversible | `sandboxIdFromInstanceId(instanceId)` → `ki_{uuid-no-dashes}` (35 chars) |
| **Proxy route** | Catch-all `/*` | `/i/:instanceId/*` |
| **Ownership** | Implicit (DO keyed by authed userId) | Explicit check: `status.userId === authed userId` |

### What to know when making changes

- **All platform/user/admin routes accept optional `?instanceId=`**. When present, it's used as the DO key instead of userId. When absent, legacy userId-keyed behavior applies. If you add a new route that resolves a DO stub, follow this pattern.
- **The `ki_` prefix on sandboxIds is load-bearing.** It distinguishes instance-keyed sandboxIds from legacy ones. Gateway token derivation, Fly metadata recovery, and volume naming all key off sandboxId. Do not remove or change the prefix scheme.
- **`orgId` is threaded through DO state and env vars.** The DO persists `orgId`, and `buildEnvVars` injects `KILOCODE_ORGANIZATION_ID` when present. If you add new env var logic, account for org instances.
- **Ownership checks are required on user-facing routes that accept `instanceId`.** The `/i/:instanceId/*` proxy route, `kiloclaw.ts` routes, and `api.ts` admin routes all verify `status.userId === authenticated userId` before allowing access. New routes must do the same.
- **The Instance DO `provision()` method accepts `opts.instanceId` and `opts.orgId`.** When `instanceId` is provided, sandboxId is derived from it instead of userId. When `orgId` is provided, it's persisted in DO state and injected as an env var.
- **Postgres `kiloclaw_instances.id` IS the instanceId.** There is no separate `instance_id` column. The existing UUID primary key is the routing identity.
- **Next.js is still the sole Postgres writer.** The `ensureActiveInstance()` function returns the row's `id` which callers use as the instanceId for worker API calls.

### Upcoming PRs (do not implement yet)

- **KiloClawRegistry DO** — SQLite-backed DO that indexes instances per owner (`user:{userId}` or `org:{orgId}`). Will replace direct `idFromName(userId)` lookups in the catch-all proxy.
- **Org instances** — `organization_id` column (already added to schema) links instances to orgs. Org tRPC router, org membership checks, and org member removal cleanup are pending.

## Code Style

- See `/.kilocode/rules/coding-style.md` for project-wide rules
Expand Down
6 changes: 6 additions & 0 deletions kiloclaw/src/auth/sandbox-id.ts
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,9 @@ export function userIdFromSandboxId(sandboxId: string): string {
const bytes = base64urlToBytes(sandboxId);
return new TextDecoder('utf-8', { fatal: false, ignoreBOM: true }).decode(bytes);
}

// ─── Instance-scoped identity ───────────────────────────────────────
// Canonical implementation lives in @kilocode/worker-utils/instance-id;
// re-exported here so existing imports within the worker package continue to work.

export { isValidInstanceId, sandboxIdFromInstanceId } from '@kilocode/worker-utils/instance-id';
50 changes: 50 additions & 0 deletions kiloclaw/src/db/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,56 @@ export async function getActiveInstance(db: WorkerDb, userId: string) {
return { id: row.id, sandboxId: row.sandbox_id };
}

/**
* Look up an active instance by its sandboxId.
* Used for DO restore when the DO has a stored sandboxId but lost other state.
*/
export async function getInstanceBySandboxId(db: WorkerDb, sandboxId: string) {
const row = await db
.select({
id: kiloclaw_instances.id,
sandbox_id: kiloclaw_instances.sandbox_id,
user_id: kiloclaw_instances.user_id,
})
.from(kiloclaw_instances)
.where(
and(eq(kiloclaw_instances.sandbox_id, sandboxId), isNull(kiloclaw_instances.destroyed_at))
)
.limit(1)
.then(rows => rows[0] ?? null);

if (!row) return null;
return {
id: row.id,
sandboxId: row.sandbox_id,
userId: row.user_id,
};
}

/**
* Look up an active instance by its primary key UUID.
* Used for DO restore when the caller knows the instanceId (= DB row id).
*/
export async function getInstanceById(db: WorkerDb, instanceId: string) {
const row = await db
.select({
id: kiloclaw_instances.id,
sandbox_id: kiloclaw_instances.sandbox_id,
user_id: kiloclaw_instances.user_id,
})
.from(kiloclaw_instances)
.where(and(eq(kiloclaw_instances.id, instanceId), isNull(kiloclaw_instances.destroyed_at)))
.limit(1)
.then(rows => rows[0] ?? null);

if (!row) return null;
return {
id: row.id,
sandboxId: row.sandbox_id,
userId: row.user_id,
};
}

export async function markInstanceDestroyed(db: WorkerDb, userId: string, sandboxId: string) {
await db
.update(kiloclaw_instances)
Expand Down
1 change: 1 addition & 0 deletions kiloclaw/src/durable-objects/kiloclaw-instance/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ export async function buildUserEnvVars(
instanceFeatures: state.instanceFeatures,
execSecurity: state.execSecurity ?? undefined,
execAsk: state.execAsk ?? undefined,
orgId: state.orgId,
customSecretMeta: state.customSecretMeta ?? undefined,
}
);
Expand Down
20 changes: 17 additions & 3 deletions kiloclaw/src/durable-objects/kiloclaw-instance/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ import type {
import { DEFAULT_INSTANCE_FEATURES } from '../../schemas/instance-config';
import type { FlyVolume, FlyVolumeSnapshot } from '../../fly/types';
import * as fly from '../../fly/client';
import { sandboxIdFromUserId } from '../../auth/sandbox-id';
import { sandboxIdFromUserId, sandboxIdFromInstanceId } from '../../auth/sandbox-id';
import { resolveLatestVersion, resolveVersionByTag } from '../../lib/image-version';
import { lookupCatalogVersion } from '../../lib/catalog-registration';
import { ImageVariantSchema } from '../../schemas/image-version';
Expand Down Expand Up @@ -199,7 +199,11 @@ export class KiloClawInstance extends DurableObject<KiloClawEnv> {
// Lifecycle methods (called by platform API routes via RPC)
// ========================================================================

async provision(userId: string, config: InstanceConfig): Promise<{ sandboxId: string }> {
async provision(
userId: string,
config: InstanceConfig,
opts?: { orgId?: string | null; instanceId?: string }
): Promise<{ sandboxId: string }> {
const provisionStart = performance.now();
await this.loadState();

Expand All @@ -210,7 +214,11 @@ export class KiloClawInstance extends DurableObject<KiloClawEnv> {
throw new Error('Cannot provision: instance is restoring from snapshot');
}

const sandboxId = sandboxIdFromUserId(userId);
// For instance-keyed DOs (instanceId provided), derive sandboxId from instanceId.
// For legacy userId-keyed DOs, derive from userId.
const sandboxId = opts?.instanceId
? sandboxIdFromInstanceId(opts.instanceId)
: sandboxIdFromUserId(userId);
const isNew = !this.s.status;

// Ensure per-user Fly App exists on first provision only.
Expand Down Expand Up @@ -322,6 +330,7 @@ export class KiloClawInstance extends DurableObject<KiloClawEnv> {
const configFields = {
userId,
sandboxId,
orgId: opts?.orgId ?? null,
status: (this.s.status ?? 'provisioned') satisfies InstanceStatus,
envVars: config.envVars ?? null,
encryptedSecrets: config.encryptedSecrets ?? null,
Expand Down Expand Up @@ -371,6 +380,7 @@ export class KiloClawInstance extends DurableObject<KiloClawEnv> {

this.s.userId = userId;
this.s.sandboxId = sandboxId;
this.s.orgId = opts?.orgId ?? null;
this.s.status = this.s.status ?? 'provisioned';
this.s.envVars = config.envVars ?? null;
this.s.encryptedSecrets = config.encryptedSecrets ?? null;
Expand Down Expand Up @@ -1278,6 +1288,7 @@ export class KiloClawInstance extends DurableObject<KiloClawEnv> {
async getStatus(): Promise<{
userId: string | null;
sandboxId: string | null;
orgId: string | null;
status: InstanceStatus | null;
provisionedAt: number | null;
lastStartedAt: number | null;
Expand Down Expand Up @@ -1314,6 +1325,7 @@ export class KiloClawInstance extends DurableObject<KiloClawEnv> {
return {
userId: this.s.userId,
sandboxId: this.s.sandboxId,
orgId: this.s.orgId,
status: this.s.status,
provisionedAt: this.s.provisionedAt,
lastStartedAt: this.s.lastStartedAt,
Expand Down Expand Up @@ -1374,6 +1386,7 @@ export class KiloClawInstance extends DurableObject<KiloClawEnv> {
async getDebugState(): Promise<{
userId: string | null;
sandboxId: string | null;
orgId: string | null;
status: InstanceStatus | null;
provisionedAt: number | null;
lastStartedAt: number | null;
Expand Down Expand Up @@ -1417,6 +1430,7 @@ export class KiloClawInstance extends DurableObject<KiloClawEnv> {
return {
userId: this.s.userId,
sandboxId: this.s.sandboxId,
orgId: this.s.orgId,
status: this.s.status,
provisionedAt: this.s.provisionedAt,
lastStartedAt: this.s.lastStartedAt,
Expand Down
3 changes: 3 additions & 0 deletions kiloclaw/src/durable-objects/kiloclaw-instance/state.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ export async function loadState(ctx: DurableObjectState, s: InstanceMutableState
const d = parsed.data;
s.userId = d.userId || null;
s.sandboxId = d.sandboxId || null;
s.orgId = d.orgId;
s.status = d.userId ? d.status : null;
s.envVars = d.envVars;
s.encryptedSecrets = d.encryptedSecrets;
Expand Down Expand Up @@ -103,6 +104,7 @@ export async function loadState(ctx: DurableObjectState, s: InstanceMutableState
export function resetMutableState(s: InstanceMutableState): void {
s.userId = null;
s.sandboxId = null;
s.orgId = null;
s.status = null;
s.envVars = null;
s.encryptedSecrets = null;
Expand Down Expand Up @@ -168,6 +170,7 @@ export function createMutableState(): InstanceMutableState {
loaded: false,
userId: null,
sandboxId: null,
orgId: null,
status: null,
envVars: null,
encryptedSecrets: null,
Expand Down
1 change: 1 addition & 0 deletions kiloclaw/src/durable-objects/kiloclaw-instance/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ export type InstanceMutableState = {
loaded: boolean;
userId: string | null;
sandboxId: string | null;
orgId: string | null;
status: InstanceStatus | null;
envVars: PersistedState['envVars'];
encryptedSecrets: PersistedState['encryptedSecrets'];
Expand Down
7 changes: 7 additions & 0 deletions kiloclaw/src/gateway/env.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ export type UserConfig = {
instanceFeatures?: string[];
execSecurity?: string | null;
execAsk?: string | null;
/** Organization ID — injected as KILOCODE_ORGANIZATION_ID for org instances. */
orgId?: string | null;
customSecretMeta?: Record<string, { configPath?: string }> | null;
};

Expand Down Expand Up @@ -188,6 +190,11 @@ export async function buildEnvVars(
}
}

// Org identity (non-sensitive, plaintext)
if (userConfig?.orgId) {
plainEnv.KILOCODE_ORGANIZATION_ID = userConfig.orgId;
}

// Worker-level passthrough (non-sensitive)
if (env.TELEGRAM_DM_POLICY) plainEnv.TELEGRAM_DM_POLICY = env.TELEGRAM_DM_POLICY;
if (env.DISCORD_DM_POLICY) plainEnv.DISCORD_DM_POLICY = env.DISCORD_DM_POLICY;
Expand Down
Loading
Loading