perf: parallelize tool server init and reduce LLM retry overhead by wangbinluo · Pull Request #139 · MiroMindAI/MiroThinker

wangbinluo · 2026-03-17T05:34:57Z

Summary

Addresses evaluation pipeline performance bottlenecks identified in #137.

Three changes targeting the slowest parts of BC benchmark evaluation.

Closes #137

Changes

P0: Parallelize MCP tool server initialization — manager.py get_all_tool_definitions() now uses asyncio.gather() instead of sequential for loop.
P0: Reduce LLM retry overhead — openai_client.py base_wait_time 30s → 10s, max_retries 10 → 5. Prevents 60-90s wasted on retries.
P1: httpx connection pooling — search_and_scrape_webpage.py reuses a shared httpx.AsyncClient instead of creating a new one per request (~346 search calls per BC task).

Benchmark Results

Tool server initialization time (3 runs on dev_wbl)

Run	dev_wbl (parallel)
Run 1 (cold start)	29.3s
Run 2	12.1s
Run 3	9.6s
Average	17.0s

Comparison with main branch (extracted from existing evaluation logs)

	main (sequential)	dev_wbl (parallel)	Speedup
Average	234s	17.0s	13.8x
Min	22s	9.6s	2.3x
Max	945s (15.8 min)	29.3s	32.3x

Note: main branch baseline was extracted from BC-ZH evaluation logs (qwen_xxg_negative_r10_new1_step50, 30 tasks). The high average (234s) includes E2B sandbox queueing under high concurrency (MAX_CONCURRENT=60). Per-server breakdown on main: tool-python avg=145s (max=631s), search avg=69s, jina avg=19s.

Test plan

Run bench_init_time.py on dev_wbl — 17.0s avg vs main baseline 234s avg
Run BC-ZH evaluation end-to-end and verify no regression in accuracy
Verify LLM retry behavior works correctly with reduced parameters
Check Serper API connection pooling doesn't cause stale connection issues

- Fix server_name routing: dynamically parse system prompt to build tool→server mapping, auto-correct wrong server_name in LLM responses - Fix tool_name hallucination: python/python_code → run_python_code (only when system prompt defines run_python_code) - Fix parameter names: code → code_block, add default sandbox_id - Fix scrape_and_extract_info params: description → info_to_extract - Add stateless sandbox fallback for invalid sandbox_id Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…oMindAI#137) - P0: Parallelize MCP tool server initialization with asyncio.gather() (saves ~40-50s per task, previously ~71s sequential) - P0: Reduce LLM retry base_wait_time from 30s to 10s, max_retries from 10 to 5 - P1: Add httpx connection pooling for Serper API requests (reuse TCP connections across ~346 search calls per task) Ref: MiroMindAI#137 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wangbinluo · 2026-03-17T06:38:13Z

@shawnlimn @xingxuanli Could you review this PR when you get a chance?

This PR addresses the performance bottlenecks identified in #137 with two optimizations:

Parallel tool server init (asyncio.gather) — measured 13.8x speedup (234s → 17s avg)
httpx connection pooling — reuse TCP connections for Serper API (~346 calls per BC task)

Note: PR #138 by @JasonOA888 covers only item 1; this PR is a superset with additional optimization and benchmark data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Algorithm team confirmed these values are needed for reliability under high load with self-hosted sglang servers. Ref: MiroMindAI#137 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…iroMindAI#137) Previously every tool call (except playwright) spawned a new MCP server subprocess, initialized it, called the tool, then killed it. BC tasks average ~300+ tool calls, so the spawn/teardown overhead adds up. Introduce PersistentMCPSession that keeps the subprocess alive for the entire task lifetime. On connection failure it transparently reconnects once. Sessions are cleaned up via close_all_sessions() at task end. This is the P0 "MCP server connection reuse" item from MiroMindAI#137, estimated to save 2-5 min per task. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Vanint

Solid PR — persistent sessions + parallel init is a big perf win, and this supersedes #138 with better exception handling. A few issues to address:

Subprocess leak in _connect if session init fails: aenter/aexit are called manually, bypassing async with safety. If session.aenter() or initialize() throws, client.aexit is never called and the subprocess leaks:

async def _connect(self):
    ...
    read, write = await self._client.__aenter__()
    try:
        self._session = ClientSession(read, write, sampling_callback=None)
        await self._session.__aenter__()
        await self._session.initialize()
    except Exception:
        await self._client.__aexit__(None, None, None)
        self._client = None
        raise

Sessions not cleaned up on abnormal exit: close_all_sessions() is only called at the end of a successful run_main_agent. If the orchestrator throws mid-execution, subprocesses are leaked. Wrap in try/finally:

try:
    # ... existing run_main_agent logic ...
finally:
    await self.main_agent_tool_manager.close_all_sessions()
    for tm in self.sub_agent_tool_managers.values():
        await tm.close_all_sessions()

Reconnect logic in call_tool is not concurrency-safe: If two coroutines fail concurrently, both enter close() → ensure_connected(), potentially tearing down a connection the other just established. Currently the orchestrator calls tools sequentially so this won't bite, but PersistentMCPSession looks like a reusable component — worth either adding a lock around the reconnect path or documenting the single-caller assumption.
Reconnect failure lacks context: If the retry in call_tool also fails, the exception propagates without any indication that a reconnect was attempted. Consider logging or wrapping the error (e.g., raise RuntimeError(f"Reconnect to {self.server_name} also failed") from e).
get_all_tool_definitions connects twice per server: _get_single_server_tools opens a temporary connection to list tools, then execute_tool_call creates a new persistent session. Intentional (to avoid holding long-lived connections during init)? If so, a brief comment would help. If not, could reuse the persistent session for the initial tool listing too.
Re: #138: This PR covers everything in #138 with better exception handling (uses zip to preserve server name, retains fallback error entries). If this lands, #138 can be closed.

Fix these problems, and this is good to merge.

shawnlimn and others added 13 commits February 4, 2026 16:17

add is_final_retry flag to optimize last retry behavior (MiroMindAI#111)

28a584a

rebase to main and ruff format

4a3f422

added discard-all ctx mngt

4b931f8

updated the block list

2f64e18

mirothinker-1.7 release

43492bf

updated readme

62db66a

update license

05bf3a9

update table results

3f75a61

updated readme

672ffcc

update figure

28fc1bb

added HF link (MiroMindAI#136)

d1ecdc2

style: fix ruff formatting in manager.py

6ead456

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wangbinluo mentioned this pull request Mar 17, 2026

Evaluation pipeline performance bottlenecks: BC benchmarks 3-5x slower than necessary #137

Closed

wangbinluo requested a review from BinWang28 March 17, 2026 06:54

wangbinluo self-assigned this Mar 17, 2026

wangbinluo and others added 2 commits March 17, 2026 14:54

revert: restore LLM retry parameters (max_retries=10, base_wait_time=30)

5bb6278

Algorithm team confirmed these values are needed for reliability under high load with self-hosted sglang servers. Ref: MiroMindAI#137 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wangbinluo marked this pull request as draft April 10, 2026 08:26

Vanint requested changes Apr 10, 2026

View reviewed changes

shawnlimn force-pushed the main branch from 370f983 to 7780db3 Compare April 21, 2026 09:27

shawnlimn closed this Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: parallelize tool server init and reduce LLM retry overhead#139

perf: parallelize tool server init and reduce LLM retry overhead#139
wangbinluo wants to merge 16 commits into
MiroMindAI:mainfrom
wangbinluo:dev_wbl

wangbinluo commented Mar 17, 2026 •

edited

Loading

Uh oh!

wangbinluo commented Mar 17, 2026 •

edited

Loading

Uh oh!

Vanint left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

wangbinluo commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Benchmark Results

Tool server initialization time (3 runs on dev_wbl)

Comparison with main branch (extracted from existing evaluation logs)

Test plan

Uh oh!

wangbinluo commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Vanint left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wangbinluo commented Mar 17, 2026 •

edited

Loading

wangbinluo commented Mar 17, 2026 •

edited

Loading