fix: disable tool definitions during final answer/summary generation#159
Closed
octo-patch wants to merge 18 commits into
Closed
fix: disable tool definitions during final answer/summary generation#159octo-patch wants to merge 18 commits into
octo-patch wants to merge 18 commits into
Conversation
- Fix server_name routing: dynamically parse system prompt to build tool→server mapping, auto-correct wrong server_name in LLM responses - Fix tool_name hallucination: python/python_code → run_python_code (only when system prompt defines run_python_code) - Fix parameter names: code → code_block, add default sandbox_id - Fix scrape_and_extract_info params: description → info_to_extract - Add stateless sandbox fallback for invalid sandbox_id Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a description about the best practice recommendation.
…iroMindAI#137) Previously, execute_tool_call() spawned a new subprocess and performed a full stdio handshake on every single tool invocation (~400 times per BC task), adding 2-5 minutes of overhead per task. This commit introduces persistent session management to ToolManager: - Add _get_or_create_session() that lazily opens a stdio/SSE session and keeps it alive in an AsyncExitStack for the lifetime of the ToolManager - Refactor execute_tool_call() to reuse the cached session instead of opening a new connection per call - Refactor get_all_tool_definitions() similarly so sessions opened at startup are immediately available for subsequent tool calls - Add close() method to cleanly shut down all sessions (including browser) - Call close() in execute_task_pipeline() finally block to guarantee cleanup This reduces N subprocess spawns (N = tool calls) down to one per server, matching the approach already used by the playwright server.
When generating the final answer or sub-agent summary, tool_definitions was still passed to the LLM, allowing the model to make tool calls instead of producing a text response. This caused intermittent failures where the model would output tool call XML after the summarize prompt even though the prompt explicitly forbade it. Passing an empty list for tool_definitions at the API level enforces that no tools are available during final summary generation, ensuring the model produces a text answer as intended. Fixes MiroMindAI#158
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #158
Problem
When
generate_agent_summarize_promptis invoked at the end of the main agent loop or sub-agent loop to request a final text answer,tool_definitionsis still passed to the underlying LLM call. This means the model remains capable of invoking tools at the API level, even though the summarize prompt explicitly instructs it not to.In practice this causes intermittent failures where the model — especially after hitting
max_turns— enters a<think>block acknowledging it should produce a report, but then still emits a<use_mcp_tool>block. Because the model receives an empty tool result (no executor handles it in this phase), it produces no usable text, causing the retry loop to exhaust attempts and return a format error.Solution
Pass
[]fortool_definitionsin both locations where the final answer is generated:answer_generator.py→generate_final_answer_with_retries: the main-agent retry looporchestrator.py→ sub-agent final summaryhandle_llm_callBy removing tool definitions at the API level, the model is physically prevented from calling tools and is forced to produce a text response, matching the intent of the summarize prompt.
Testing
The change is a two-line substitution (
tool_definitions→[]). Existing unit tests continue to pass. The fix eliminates the race condition between prompt-level and API-level tool availability.