Simulation MCP tools, game mode, and streaming#17
Open
freesig wants to merge 18 commits into
Open
Conversation
…grounding Simulations can now be backgrounded (Esc) instead of destroyed, with a global notification bar showing when responses arrive on any screen. Ctrl+s opens a session picker overlay to switch between background sessions. Q permanently ends a simulation.
…n ready The bar now always displays when any simulation is backgrounded, showing the count and status (ready/running) with a Ctrl+s hint. Previously it only appeared when a response notification fired.
Replace verbose footer bars (15+ keys per line) with minimal badge-styled hints showing only 3-5 essential keys. Press ? on any non-text-input screen to open a context-aware help popup listing all available key bindings.
…ction Extract shared orchestration logic from TUI into spec-forest lib so both TUI and MCP can drive simulations. Add fire-and-poll MCP tools: sim_create_session, sim_start, sim_send_input, sim_ask_report, sim_update_scenario, sim_get_status, sim_get_channels, sim_get_report, sim_list_sessions, sim_end.
When explore_code is true, sim_start loads the spec's project directory from the database so the simulation agent can read the codebase. Errors if the spec has no directory set.
Replace sequential request-response simulation with a pre-computed interaction tree. The AI now generates a tree of likely user interactions and their resulting outputs in a single call, so picking a predicted interaction is instant (no AI latency). Custom input falls back to AI generation. When the user reaches a leaf node, the next tree is pre-generated in the background. New MCP tool: sim_get_interactions returns predicted choices at the current position. sim_send_input now returns tree_hit/at_leaf fields. sim_create_session accepts tree_depth (1-3) and tree_branching (2-4).
Add an interactions panel between decisions and the input area that displays the pre-computed interaction choices from the tree. Users can navigate with arrow keys and press Enter to select, getting an instant response without AI latency. The 'i' key enters custom input mode for interactions not in the tree.
…on, and leaf interactions - Make PredictedInteraction.result optional so leaf-depth nodes can suggest interactions without pre-computed results - Change defaults to depth=4, branching=2 for deeper exploration (31 nodes) - Add Backspace back-navigation through the interaction tree - Eagerly pre-generate subtrees when the most likely path leads to a dead end, grafting results onto the existing tree instead of replacing it - Show spinner + "Expanding tree..." during background pregeneration - Dim shallow interactions with "(generates)" suffix in the interactions panel
Game mode lets users play through their spec by choosing interaction+outcome pairs. Each choice feeds back into the spec DAG, iteratively refining the specification through play. Players can also reject outcomes with corrections. - New data types: GameChoiceGroup, GameOutcome, GameTreeRoot, GameSpecUpdate - Game-specific AI prompts presenting alternative outcomes per interaction - Backend orchestration for game turns with background spec updates - 4 new MCP tools: game_get_choices, game_select_outcome, game_reject_outcome, game_get_spec_updates - TUI: grouped choices panel, reject overlay, spec update log overlay - Channel picker toggle (g key) to enable game mode
Update game mode cardinal rule and tree rules to steer the AI toward interactions that expose genuine spec ambiguity rather than obvious outcomes. At least half of interactions should target decision points where the player's choice resolves a meaningful specification question.
Show a breadcrumb bar (Start > Login > Email > Submit) in the TUI simulation view, allowing users to see their full interaction path and jump to any previous point. Press 'b' to focus the trail, use left/right to select, Enter to jump. Also exposes breadcrumbs via the sim_get_status MCP tool and adds a sim_navigate_to tool.
LLM responses sometimes include prose or markdown code fences around JSON, causing parse failures in simulation and game mode. Strengthen all prompts to explicitly forbid non-JSON output and extract a shared generic `extract_json<T>()` helper to replace duplicated 4-step extraction logic across all four parse functions.
Simulation init was slow with no way to see what Claude was doing. Switch from --output-format json to stream-json, parse NDJSON to extract the same final result while logging tool calls at info level and all intermediate events at debug level.
Replace cmd.output() with spawn + BufReader line-by-line streaming so stream-json events are logged as they arrive, not after the full response completes. Tool use events now appear immediately in logs.
Byte-index slicing panicked on multi-byte UTF-8 characters (e.g. '…') when truncating log lines to 200 bytes.
LLMs struggle to produce valid deeply-nested JSON at scale (~32KB),
causing consistent parse failures like "key must be a string at column
18790". Replace the nested tree format with a flat {nodes, edges}
adjacency list that eliminates nesting entirely.
- Add FlatTree/FlatNode/FlatEdge wire types for Claude's output
- Add flat_to_sim_tree and flat_to_game_tree conversion functions
- Parse functions try flat format first, fall back to nested (legacy)
- Update prompt schemas for both sim and game mode
- Include serde error and full response in parse failure messages
- Add tracing to extract_json for step-by-step diagnostics
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stack
Test plan