Harden tmux MCP tool surface#76
Merged
Merged
Conversation
why: search_panes is a readonly tool, but the tmux fast path embeds literal patterns in a format filter. Patterns containing #() can reach tmux's format job evaluator unless they take the Python search path. what: - Treat #( as unsafe for the tmux search fast path - Add a NamedTuple fast-path regression case for format jobs - Add a live marker regression proving #() does not spawn a job
why: display_message is registered as readonly, but tmux #() formats schedule shell jobs. The MCP tool should preserve metadata queries without handing shell-job formats to tmux. what: - Reject #() format jobs before pane resolution - Keep normal tmux format variable expansion unchanged - Add a marker regression for display_message format jobs
why: FastMCP validates structured output for schema-bearing tools. A 50 KB global limiter truncated successful results into text-only responses before tool-level caps could preserve structured metadata. what: - Raise the tail-preserving response backstop to 1 MB - Keep small-cap middleware tests for truncation behavior - Add a client-path regression for schema-bearing successes
why: Invalid LIBTMUX_SAFETY values are security configuration mistakes. Falling back to mutating can expose write tools when the user intended a restricted surface. what: - Resolve invalid safety values to readonly - Make SafetyMiddleware direct invalid tiers fail closed - Update safety text and add import-time visibility regression
why: Inherited FastMCP transport environment can change the server's startup surface and break stdio MCP clients. Local CLI flags should also resolve without starting the MCP server. what: - Parse --help and --version in the console entry point - Run FastMCP with an explicit stdio transport - Add regressions for CLI flags and transport pinning
why: Agents need a one-call terminal workflow that runs a command, waits for completion, captures output, and reports shell exit status without composing send_keys, wait_for_channel, and capture_pane manually. what: - Add RunCommandResult and the async run_command pane tool - Register run_command as a mutating shell-driving tool - Add tmux-backed regressions for status, timeout, truncation, and annotations - Document the new tool and surface it in server instructions
why: clear_pane clears scrollback and should use libtmux's one-call reset path instead of split tmux IPC calls. Its MCP annotations also need to disclose the state-losing behavior. what: - Replace separate reset and clear-history calls with Pane.reset - Register clear_pane with destructive non-idempotent hints in the mutating tier - Add regressions for reset delegation and annotation metadata
why: Agents need window and pane result metadata for reliable follow-up targeting and liveness decisions, and the package advertises typed support. what: - Add active_pane_id to WindowInfo serialization - Add pane_pid, pane_dead, and alternate_on to PaneSnapshot - Ship py.typed and regress stale completion and pagination docs claims - Refresh README and response examples for the current tool and model surface
why: CI exposed that long bash prompts can wrap private run_command synchronization commands, leaving private prompt tails after filtering and splitting user command output from small max_lines tails. what: - Run the user command and private status/wait plumbing as one shell compound list - Add a regression with a CI-shaped wrapped prompt and max_lines=2
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #76 +/- ##
==========================================
+ Coverage 84.84% 85.07% +0.22%
==========================================
Files 42 42
Lines 2772 2881 +109
Branches 372 385 +13
==========================================
+ Hits 2352 2451 +99
- Misses 314 322 +8
- Partials 106 108 +2 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
a39b4c0 to
1c43cb9
Compare
Member
Author
Code reviewFound 1 issue:
libtmux-mcp/src/libtmux_mcp/middleware.py Lines 350 to 369 in 1c43cb9 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
why: run_command's command argument carries the same secret-bearing shell payloads as keys, text, and shell, but it is not in _SENSITIVE_ARG_NAMES, so AuditMiddleware logs it in cleartext to long-lived audit archives. what: - Add an xfail regression asserting _summarize_args digests command
why: command carries the same secret-bearing shell payloads already digested for keys, text, and shell; logging it in cleartext leaks credentials passed to run_command into long-lived audit archives. what: - Add command to _SENSITIVE_ARG_NAMES and name it in the docstring - Drop the xfail and cover command in the sensitive-keys regression
why: _filter_run_command_internal_lines drops any captured line containing a generic substring (mcp_status, "tmux set-option -p", "tmux wait-for -S", "libtmux_mcp_"), silently losing legitimate command output that merely mentions them, with no signal in RunCommandResult. what: - Add an xfail regression keeping output lines that match the generic markers but not the per-call channel or status option
why: joining wrapped capture rows (tmux -J, as search and wait already do) keeps the private sync line one logical row carrying the full per-call channel and status option, so the over-broad substring markers are unnecessary and only caused silent false drops of real output. what: - Capture with join_wrapped=True - Match only the per-call channel and status option - Drop the xfail and assert the sync line is dropped
why: run_command is registered with ANNOTATIONS_SHELL, making six shell consumers, but the comment still said five and omitted it. what: - List run_command in the canonical shell-driving group and fix the count
why: the injection guard now also routes around ``#(`` (format job
execution), but the comment described only ``}`` and ``#{`` and said
"either sequence".
what:
- Document ``#(`` and reword to "any of these sequences"
why: Live MCP client verification showed wrapped private run_command sync fragments could leak into later command output, even after valid wrapper-like user output was preserved. what: - Filter exact run_command wrapper fragments across current and prior calls - Shorten private status and wait markers to avoid 80-column orphan wraps - Add regressions for wrapped and previous-call sync fragments
Member
Author
Code reviewFound 1 issue:
libtmux-mcp/src/libtmux_mcp/tools/pane_tools/io.py Lines 297 to 314 in 1e512b8 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
why: run_command should report exit status even when the user command mutates shell state before the private completion trailer runs. what: - Add strict xfail coverage for PATH mutation before the trailer - Add strict xfail coverage for errexit stopping the current shell list
why: run_command ran the private status trailer in the same interactive shell state as the user command, so PATH changes and errexit could block status reporting. what: - Run the user command in a subshell before reading its status - Invoke private tmux status and wait commands with resolved argv - Promote shell-state regressions from xfail to passing tests
why: run_command sends secret-bearing commands through the pane shell but exposes no equivalent to send_keys suppress_history. what: - Add strict xfail coverage for bash HISTCONTROL ignorespace behavior - Verify the command payload is absent after forcing history to disk
why: run_command command payloads can be persisted by interactive shell history even though MCP audit logs redact the command argument. what: - Add a suppress_history option matching send_keys leading-space behavior - Document the shell support caveat for run_command callers - Promote the shell history regression to a passing test
why: run_command output filtering can still drop legitimate hex output and tmux-like user text while hiding private synchronization fragments. what: - Add strict xfail coverage for hex output after the current sync line - Add strict xfail coverage for user tmux status snippets
why: run_command filtering still treated some current-call sync matches as wrapped fragments and matched broad tmux-looking user output. what: - Keep hex-like command output after the current sync line - Match prior private wrapper fragments without generic @s_ overmatching - Refresh stale filter comments and test descriptions
why: clear_pane now advertises destructive non-idempotent MCP annotations, but the public safety table still listed the mutating defaults. what: - Mark clear-pane destructive=true in the annotation table - Mark clear-pane idempotent=false to match runtime registration
why: the capture comment lost the load-bearing reason for join_wrapped — it keeps the per-call markers on one logical row so the filter's exact-marker match holds even when a wide prompt wraps. what: - Note that join_wrapped preserves the exact-marker match path
why: the docstring named HISTCONTROL (bash-specific) as the requirement, implying zsh and other shells cannot suppress history when the leading-space convention works wherever the shell ignores space-prefixed commands. what: - Describe the generic requirement instead of the bash variable in send_keys and run_command
why: run_command now wraps the command in a subshell, so cd, export, and other shell state changes no longer persist between calls; the docstring and tool page implied execution in the live interactive shell. what: - State the subshell isolation in the run_command docstring and tool page
why: run_command can execute in a requested pane while storing its private exit-status option on tmux's default pane when the target shell lacks TMUX_PANE. what: - Add a strict xfail that targets an inactive pane with TMUX_PANE removed - Clean up the temporary split pane so later pane tests keep their layout
why: run_command sends its private status trailer from inside the target shell, where TMUX_PANE can be missing or wrong. Untargeted set-option can store the exit status on tmux's active pane instead of the requested pane. what: - Pass the resolved pane id to shell-side set-option -p - Teach the sync-line filter the targeted set-option wrapper shape - Graduate the pane-target regression from strict xfail to passing
why: The or "" fallback is unreachable because target_pane_id is guaranteed to be non-None earlier in the run_command function. what: - Update RunCommandResult instantiation in pane_tools/io.py to use target_pane_id
0965733 to
839b13e
Compare
why: the 0.1.x line hardens the read-only and safety surface and adds run_command, but the unreleased CHANGES section had no entry for it. what: - Add What's new for run_command and typed pane/window metadata - Add Fixes for format-job blocking, fail-closed safety, structured response preservation, clear_pane reset, and stdio transport pinning
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
run_commandfor command completion with exit status, history suppression, and output filteringclear_panepy.typed, and refresh stale docsIssues
Closes #68
Closes #69
Closes #70
Closes #71
Closes #72
Closes #73
Closes #74
Closes #75
Verification
$ rm -rf docs/_build; uv run ruff check . --fix --show-fixes; uv run ruff format .; uv run mypy .; uv run py.test --reruns 0 -vvv; just build-docs;Passed locally: ruff clean, format unchanged, mypy clean, 577 tests passed, docs build succeeded.