Skip to content

Speed up app start/open #3

Description

@sanjarcode

Here's the fully updated spec:


Feature Spec: Add Shell Command Tools to Android MCP

Background / Problem

Current tools (Snapshot → Click) require multiple round-trips to launch an app:

  • Snapshot to read screen
  • Find target coordinates or selectors
  • Click
  • Snapshot again to verify

This is slow, fragile (coordinate/layout-dependent), and requires the target element to be visible on screen.


Current Tool Inventory (android-mcp-sanjar)

Tool | Description -- | -- ListDevices | List connected ADB devices ConnectDevice(serial) | Connect to a device by serial Device(action) | list / connect / disconnect Snapshot(use_vision, use_annotation) | Screenshot + accessibility tree Click(x, y) | Tap at coordinates LongClick(x, y) | Long press at coordinates ClickBySelector(text, resourceId, className, description) | Tap by UI element attributes Press(button) | Hardware button press (home, back, etc.) Type | Type text input Swipe | Swipe gesture Drag | Drag gesture Notification | Read device notifications Wait | Fixed time wait WaitForElement(...) | Wait for element to appear

Clarification on "adb shell" method: This runs adb shell am start from the laptop/host machine (where adb is installed, e.g. ~/Library/Android/sdk/platform-tools/adb), communicating to the phone over ADB (USB or Wireless Debugging). It does not execute a shell on the device itself in the MCP sense — the command originates on the host. This was validated using the Macos:Shell MCP tool to invoke the local adb binary directly.

The goal of this spec is to bring this capability natively into the android-mcp server, so callers don't need Macos:Shell or a local adb binary exposed to Claude as a separate escape hatch.


Workaround (Current State)

Without a ShellCommand tool, the only way to do fast app launching today is via Macos:Shell calling the host's adb binary:

~/Library/Android/sdk/platform-tools/adb -s <serial> shell am start -n com.whatsapp/.HomeActivity

This works but has two problems:

  1. It requires Macos:Shell (or equivalent host shell access) to be available as an MCP tool — a separate, unrelated server
  2. It leaks host machine details and is not portable across setups

Requested Features

1. ShellCommand tool — executes adb shell <command> from the MCP host machine against the connected device.

ShellCommand(command: str) -> stdout: str, exit_code: int

Enables: am start, monkey, input, dumpsys, pm list packages, etc.

2. LaunchApp tool (convenience wrapper) — launches an app by package name.

LaunchApp(package: str, activity: str = None) -> success: bool

If activity is omitted, resolves the default launcher activity automatically via monkey.

3. OpenDeepLink tool (convenience wrapper) — launches directly into a specific screen via URI.

OpenDeepLink(uri: str) -> success: bool

Example: blinkit://search?q=popcorn

Priority: ShellCommand is foundational — the other two are thin wrappers on top of it.

Implementation sketch

The MCP server already has adb wired internally (it's how Snapshot, Click etc. work). Adding ShellCommand is a ~10 line change:

@mcp.tool()
def shell_command(command: str) -> dict:
    """Run an adb shell command on the connected device."""
    result = subprocess.run(
        ["adb", "-s", DEVICE_SERIAL, "shell"] + command.split(),
        capture_output=True, text=True
    )
    return {
        "stdout": result.stdout.strip(),
        "exit_code": result.returncode
    }

Failure Modes

1. Package name must be known a priori adb shell am start requires the exact package name (e.g. com.whatsapp). For popular apps this works from model memory. For less common, enterprise, or region-specific apps, the model may not know the package name. Without ShellCommand, there's no way to run pm list packages to discover it dynamically — making this approach unreliable for the general case. Once ShellCommand exists, this failure mode disappears since the model can enumerate packages first.

2. Accessibility/description tree dumping is unreliable Using uiautomator dump or the accessibility tree (as surfaced by Snapshot) to find elements by description is fragile. Many apps set empty, generic, or non-deterministic content descriptions. System UI elements, custom views, and WebView-rendered content are particularly bad offenders. ClickBySelector with description= will silently timeout or match the wrong element in these cases. This makes selector-based navigation an unreliable fallback when shell access is unavailable.

3. Multiple devices require explicit serial targeting As encountered in testing, adb fails with "more than one device/emulator" when multiple devices are connected. The MCP server must either: (a) always pass -s <serial> using the currently active device, or (b) expose a SetActiveDevice tool. Currently ConnectDevice exists but it's unclear if it sets a global active serial for subsequent commands.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions