Skip to content

Lazy agent assignment — scheduler assigns agents when beads are ready, not at creation #1249

@jrf0110

Description

@jrf0110

Parent

Part of #204 (Phase 4: Hardening)

Problem

Agent assignment (hooking) is eager — it happens at bead creation time, not when the bead is ready to be worked on. slingBead() and slingConvoy() both call getOrCreateAgent() + hookBead() immediately for every bead, including beads blocked by dependencies.

For a 5-bead convoy (A → B → C → D → E), this creates 5 polecats at sling time. Only A can run — B through E sit idle with agents hooked but not dispatched, consuming name pool slots and unable to be reused for other work.

The scheduler (schedulePendingWork in scheduling.ts:250-321) never assigns agents to beads — it only dispatches agents that are already hooked. It's a retry mechanism, not a scheduling mechanism.

Why This Is Wrong

  1. Wasted agent slots — idle polecats hooked to blocked beads can't accept other work. A convoy of 10 beads ties up 10 agents even though only 1-3 are running at any time.
  2. Name pool exhaustion — the 20-name polecat pool fills up with idle agents. After 20 concurrent hooked beads (across all rigs), new polecats get generic Polecat-N names.
  3. Premature commitment — if the convoy is cancelled after bead A completes, 4 agents were created for nothing. If bead priorities change, the hooked agent can't be reassigned without unhooking.
  4. Blocks serial reuse — the persona/serial convoy optimization (Cloud Gastown: Future ideas — capabilities unique to or enhanced by the cloud model #447) wants to re-hook the agent that just completed bead A to bead B. Eager assignment hooks a different agent to B at sling time, before A even starts.
  5. Inconsistency — staged convoys (staged: true) correctly defer agent assignment. Non-staged convoys and slingBead don't.

Current Assignment Points

Path What it does Should it assign?
slingBead() (Town.do.ts:1600-1601) Creates agent, hooks immediately No — scheduler should handle
slingConvoy() non-staged (Town.do.ts:2269-2270) Hooks agent to ALL beads including blocked No — only unblocked beads need agents
slingConvoy() staged (Town.do.ts:2258-2264) Creates beads with agent: null Correct — deferred
startConvoy() (Town.do.ts:2349-2355) Hooks agents to ALL beads on start Should only hook unblocked
schedulePendingWork() (scheduling.ts:250-321) Never assigns — only dispatches hooked agents Should assign unhooked+unblocked beads
dispatchUnblockedBeads() (scheduling.ts:214-237) Dispatches pre-hooked agents when blockers close Should also assign if no agent hooked
feedStrandedConvoys() (patrol.ts:564-565) Safety net — hooks orphaned convoy beads Keep as safety net

Solution: Lazy Assignment via the Scheduler

1. Remove eager assignment from slingBead and slingConvoy

slingBead() should:

  • Create the bead with status = 'open', assignee_agent_bead_id = null
  • Arm the alarm to trigger schedulePendingWork on the next tick
  • Not call getOrCreateAgent or hookBead

slingConvoy() (non-staged) should follow the same pattern as staged — create all beads with no agents.

2. Extend schedulePendingWork to assign unhooked beads

Currently, the scheduler only queries for idle agents with current_hook_bead_id IS NOT NULL. Add a second query:

SELECT b.* FROM beads b
WHERE b.status = 'open'
  AND b.assignee_agent_bead_id IS NULL
  AND b.type IN ('issue', 'molecule')
  AND NOT EXISTS (
    SELECT 1 FROM bead_dependencies bd
    JOIN beads blocker ON bd.depends_on_bead_id = blocker.bead_id
    WHERE bd.bead_id = b.bead_id
      AND bd.dependency_type = 'blocks'
      AND blocker.status NOT IN ('closed', 'failed')
  )
ORDER BY b.priority DESC, b.created_at ASC

For each unblocked, unassigned bead:

  1. getOrCreateAgent() — find or create a polecat
  2. hookBead() — assign the agent
  3. dispatchAgent() — start the work

This is the same hook + dispatch that slingBead does today, just moved to the scheduler where it belongs.

3. Extend dispatchUnblockedBeads to handle unhooked beads

When a blocker closes and beads become unblocked, dispatchUnblockedBeads() currently assumes the bead already has an agent (checks assignee_agent_bead_id). It should also handle the case where the bead has no agent — call getOrCreateAgent + hookBead before dispatching.

4. Keep feedStrandedConvoys as a safety net

The patrol function that catches beads with no agent should remain — it's the catch-all for any edge case where the scheduler misses a bead.

Benefits

  • Agents only created when needed — a 10-bead serial convoy creates 1 agent at a time, not 10
  • Name pool preserved — polecats aren't wasted on blocked beads
  • Enables serial reuse — when bead A completes and bead B becomes unblocked, the scheduler can prefer re-hooking A's idle agent to B (the Cloud Gastown: Future ideas — capabilities unique to or enhanced by the cloud model #447 persona/serial optimization)
  • Consistent with staged convoys — all creation paths behave the same way
  • Cancellation is clean — cancel a convoy and no unnecessary agents exist

Acceptance Criteria

  • slingBead() creates beads with no agent — no getOrCreateAgent or hookBead
  • slingConvoy() (non-staged) creates beads with no agent
  • schedulePendingWork() picks up unblocked, unassigned beads and assigns + dispatches them
  • dispatchUnblockedBeads() handles beads with no pre-assigned agent
  • Serial convoy beads get agents one at a time as they become unblocked
  • No regression: beads still get dispatched promptly (within one alarm tick of becoming unblocked)
  • feedStrandedConvoys safety net retained

Notes

  • No data migration needed
  • The alarm interval (5s active, 30s idle) means beads may wait up to one tick before getting an agent. This is acceptable — the current fire-and-forget dispatch already has this latency when it fails.
  • The slingBead fire-and-forget dispatch (Town.do.ts:1609) is an optimization to avoid waiting for the next alarm tick. We could keep an optional "dispatch immediately if possible" path while still making the scheduler the primary assignment mechanism.
  • This is a prerequisite for the serial convoy agent reuse optimization in Cloud Gastown: Future ideas — capabilities unique to or enhanced by the cloud model #447
  • Preserving the "Toast is on it!" UX: For single beads with no blockers, the gt_sling handler can run the scheduler inline — assign an agent and dispatch synchronously, returning the agent info in the response. The Mayor gets the agent name immediately and can say "Toast is on it!" without any async delay. For convoy beads and blocked beads, the response returns agent: null and the Mayor announces assignments as they happen via hooked events. This gives us the best of both worlds: lazy assignment as the architectural pattern, but instant feedback for the common single-bead case.
  • Agent name uniqueness is cosmetic, not functional: allocatePolecatName() enforces unique names per rig, but the real identifier is the bead_id UUID. There's no technical reason you can't have multiple "Toasts" — names could be display names rather than unique identifiers. This becomes more interesting with the personas concept in Cloud Gastown: Future ideas — capabilities unique to or enhanced by the cloud model #447, where "Toast the Frontend Master" is meaningful and you might want 3 of them running in parallel.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Should fix before soft launchenhancementNew feature or requestgt:coreReconciler, state machine, bead lifecycle, convoy flowkilo-auto-fixAuto-generated label by Kilokilo-triagedAuto-generated label by Kilo

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions