perf: parallelize tool server init and reduce LLM retry overhead#139
perf: parallelize tool server init and reduce LLM retry overhead#139wangbinluo wants to merge 16 commits into
Conversation
- Fix server_name routing: dynamically parse system prompt to build tool→server mapping, auto-correct wrong server_name in LLM responses - Fix tool_name hallucination: python/python_code → run_python_code (only when system prompt defines run_python_code) - Fix parameter names: code → code_block, add default sandbox_id - Fix scrape_and_extract_info params: description → info_to_extract - Add stateless sandbox fallback for invalid sandbox_id Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…oMindAI#137) - P0: Parallelize MCP tool server initialization with asyncio.gather() (saves ~40-50s per task, previously ~71s sequential) - P0: Reduce LLM retry base_wait_time from 30s to 10s, max_retries from 10 to 5 - P1: Add httpx connection pooling for Serper API requests (reuse TCP connections across ~346 search calls per task) Ref: MiroMindAI#137 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@shawnlimn @xingxuanli Could you review this PR when you get a chance? This PR addresses the performance bottlenecks identified in #137 with two optimizations:
Note: PR #138 by @JasonOA888 covers only item 1; this PR is a superset with additional optimization and benchmark data. |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Algorithm team confirmed these values are needed for reliability under high load with self-hosted sglang servers. Ref: MiroMindAI#137 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…iroMindAI#137) Previously every tool call (except playwright) spawned a new MCP server subprocess, initialized it, called the tool, then killed it. BC tasks average ~300+ tool calls, so the spawn/teardown overhead adds up. Introduce PersistentMCPSession that keeps the subprocess alive for the entire task lifetime. On connection failure it transparently reconnects once. Sessions are cleaned up via close_all_sessions() at task end. This is the P0 "MCP server connection reuse" item from MiroMindAI#137, estimated to save 2-5 min per task. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Vanint
left a comment
There was a problem hiding this comment.
Solid PR — persistent sessions + parallel init is a big perf win, and this supersedes #138 with better exception handling. A few issues to address:
- Subprocess leak in _connect if session init fails: aenter/aexit are called manually, bypassing async with safety. If session.aenter() or initialize() throws, client.aexit is never called and the subprocess leaks:
async def _connect(self):
...
read, write = await self._client.__aenter__()
try:
self._session = ClientSession(read, write, sampling_callback=None)
await self._session.__aenter__()
await self._session.initialize()
except Exception:
await self._client.__aexit__(None, None, None)
self._client = None
raise
- Sessions not cleaned up on abnormal exit: close_all_sessions() is only called at the end of a successful run_main_agent. If the orchestrator throws mid-execution, subprocesses are leaked. Wrap in try/finally:
try:
# ... existing run_main_agent logic ...
finally:
await self.main_agent_tool_manager.close_all_sessions()
for tm in self.sub_agent_tool_managers.values():
await tm.close_all_sessions()
-
Reconnect logic in call_tool is not concurrency-safe: If two coroutines fail concurrently, both enter close() → ensure_connected(), potentially tearing down a connection the other just established. Currently the orchestrator calls tools sequentially so this won't bite, but PersistentMCPSession looks like a reusable component — worth either adding a lock around the reconnect path or documenting the single-caller assumption.
-
Reconnect failure lacks context: If the retry in call_tool also fails, the exception propagates without any indication that a reconnect was attempted. Consider logging or wrapping the error (e.g., raise RuntimeError(f"Reconnect to {self.server_name} also failed") from e).
-
get_all_tool_definitions connects twice per server: _get_single_server_tools opens a temporary connection to list tools, then execute_tool_call creates a new persistent session. Intentional (to avoid holding long-lived connections during init)? If so, a brief comment would help. If not, could reuse the persistent session for the initial tool listing too.
-
Re: #138: This PR covers everything in #138 with better exception handling (uses zip to preserve server name, retains fallback error entries). If this lands, #138 can be closed.
Fix these problems, and this is good to merge.
Summary
Addresses evaluation pipeline performance bottlenecks identified in #137.
Three changes targeting the slowest parts of BC benchmark evaluation.
Closes #137
Changes
manager.pyget_all_tool_definitions()now usesasyncio.gather()instead of sequentialforloop.openai_client.pybase_wait_time30s → 10s,max_retries10 → 5. Prevents 60-90s wasted on retries.search_and_scrape_webpage.pyreuses a sharedhttpx.AsyncClientinstead of creating a new one per request (~346 search calls per BC task).Benchmark Results
Tool server initialization time (3 runs on dev_wbl)
Comparison with main branch (extracted from existing evaluation logs)
Test plan
bench_init_time.pyon dev_wbl — 17.0s avg vs main baseline 234s avg