-
Notifications
You must be signed in to change notification settings - Fork 24
Lazy agent assignment — scheduler assigns agents when beads are ready, not at creation #1249
Description
Parent
Part of #204 (Phase 4: Hardening)
Problem
Agent assignment (hooking) is eager — it happens at bead creation time, not when the bead is ready to be worked on. slingBead() and slingConvoy() both call getOrCreateAgent() + hookBead() immediately for every bead, including beads blocked by dependencies.
For a 5-bead convoy (A → B → C → D → E), this creates 5 polecats at sling time. Only A can run — B through E sit idle with agents hooked but not dispatched, consuming name pool slots and unable to be reused for other work.
The scheduler (schedulePendingWork in scheduling.ts:250-321) never assigns agents to beads — it only dispatches agents that are already hooked. It's a retry mechanism, not a scheduling mechanism.
Why This Is Wrong
- Wasted agent slots — idle polecats hooked to blocked beads can't accept other work. A convoy of 10 beads ties up 10 agents even though only 1-3 are running at any time.
- Name pool exhaustion — the 20-name polecat pool fills up with idle agents. After 20 concurrent hooked beads (across all rigs), new polecats get generic
Polecat-Nnames. - Premature commitment — if the convoy is cancelled after bead A completes, 4 agents were created for nothing. If bead priorities change, the hooked agent can't be reassigned without unhooking.
- Blocks serial reuse — the persona/serial convoy optimization (Cloud Gastown: Future ideas — capabilities unique to or enhanced by the cloud model #447) wants to re-hook the agent that just completed bead A to bead B. Eager assignment hooks a different agent to B at sling time, before A even starts.
- Inconsistency — staged convoys (
staged: true) correctly defer agent assignment. Non-staged convoys andslingBeaddon't.
Current Assignment Points
| Path | What it does | Should it assign? |
|---|---|---|
slingBead() (Town.do.ts:1600-1601) |
Creates agent, hooks immediately | No — scheduler should handle |
slingConvoy() non-staged (Town.do.ts:2269-2270) |
Hooks agent to ALL beads including blocked | No — only unblocked beads need agents |
slingConvoy() staged (Town.do.ts:2258-2264) |
Creates beads with agent: null |
Correct — deferred |
startConvoy() (Town.do.ts:2349-2355) |
Hooks agents to ALL beads on start | Should only hook unblocked |
schedulePendingWork() (scheduling.ts:250-321) |
Never assigns — only dispatches hooked agents | Should assign unhooked+unblocked beads |
dispatchUnblockedBeads() (scheduling.ts:214-237) |
Dispatches pre-hooked agents when blockers close | Should also assign if no agent hooked |
feedStrandedConvoys() (patrol.ts:564-565) |
Safety net — hooks orphaned convoy beads | Keep as safety net |
Solution: Lazy Assignment via the Scheduler
1. Remove eager assignment from slingBead and slingConvoy
slingBead() should:
- Create the bead with
status = 'open',assignee_agent_bead_id = null - Arm the alarm to trigger
schedulePendingWorkon the next tick - Not call
getOrCreateAgentorhookBead
slingConvoy() (non-staged) should follow the same pattern as staged — create all beads with no agents.
2. Extend schedulePendingWork to assign unhooked beads
Currently, the scheduler only queries for idle agents with current_hook_bead_id IS NOT NULL. Add a second query:
SELECT b.* FROM beads b
WHERE b.status = 'open'
AND b.assignee_agent_bead_id IS NULL
AND b.type IN ('issue', 'molecule')
AND NOT EXISTS (
SELECT 1 FROM bead_dependencies bd
JOIN beads blocker ON bd.depends_on_bead_id = blocker.bead_id
WHERE bd.bead_id = b.bead_id
AND bd.dependency_type = 'blocks'
AND blocker.status NOT IN ('closed', 'failed')
)
ORDER BY b.priority DESC, b.created_at ASCFor each unblocked, unassigned bead:
getOrCreateAgent()— find or create a polecathookBead()— assign the agentdispatchAgent()— start the work
This is the same hook + dispatch that slingBead does today, just moved to the scheduler where it belongs.
3. Extend dispatchUnblockedBeads to handle unhooked beads
When a blocker closes and beads become unblocked, dispatchUnblockedBeads() currently assumes the bead already has an agent (checks assignee_agent_bead_id). It should also handle the case where the bead has no agent — call getOrCreateAgent + hookBead before dispatching.
4. Keep feedStrandedConvoys as a safety net
The patrol function that catches beads with no agent should remain — it's the catch-all for any edge case where the scheduler misses a bead.
Benefits
- Agents only created when needed — a 10-bead serial convoy creates 1 agent at a time, not 10
- Name pool preserved — polecats aren't wasted on blocked beads
- Enables serial reuse — when bead A completes and bead B becomes unblocked, the scheduler can prefer re-hooking A's idle agent to B (the Cloud Gastown: Future ideas — capabilities unique to or enhanced by the cloud model #447 persona/serial optimization)
- Consistent with staged convoys — all creation paths behave the same way
- Cancellation is clean — cancel a convoy and no unnecessary agents exist
Acceptance Criteria
-
slingBead()creates beads with no agent — nogetOrCreateAgentorhookBead -
slingConvoy()(non-staged) creates beads with no agent -
schedulePendingWork()picks up unblocked, unassigned beads and assigns + dispatches them -
dispatchUnblockedBeads()handles beads with no pre-assigned agent - Serial convoy beads get agents one at a time as they become unblocked
- No regression: beads still get dispatched promptly (within one alarm tick of becoming unblocked)
-
feedStrandedConvoyssafety net retained
Notes
- No data migration needed
- The alarm interval (5s active, 30s idle) means beads may wait up to one tick before getting an agent. This is acceptable — the current fire-and-forget dispatch already has this latency when it fails.
- The
slingBeadfire-and-forget dispatch (Town.do.ts:1609) is an optimization to avoid waiting for the next alarm tick. We could keep an optional "dispatch immediately if possible" path while still making the scheduler the primary assignment mechanism. - This is a prerequisite for the serial convoy agent reuse optimization in Cloud Gastown: Future ideas — capabilities unique to or enhanced by the cloud model #447
- Preserving the "Toast is on it!" UX: For single beads with no blockers, the
gt_slinghandler can run the scheduler inline — assign an agent and dispatch synchronously, returning the agent info in the response. The Mayor gets the agent name immediately and can say "Toast is on it!" without any async delay. For convoy beads and blocked beads, the response returnsagent: nulland the Mayor announces assignments as they happen viahookedevents. This gives us the best of both worlds: lazy assignment as the architectural pattern, but instant feedback for the common single-bead case. - Agent name uniqueness is cosmetic, not functional:
allocatePolecatName()enforces unique names per rig, but the real identifier is thebead_idUUID. There's no technical reason you can't have multiple "Toasts" — names could be display names rather than unique identifiers. This becomes more interesting with the personas concept in Cloud Gastown: Future ideas — capabilities unique to or enhanced by the cloud model #447, where "Toast the Frontend Master" is meaningful and you might want 3 of them running in parallel.