Skip to content

Render tool_result image content blocks inline in chat#1119

Open
Christophy wants to merge 2 commits into
getpaseo:mainfrom
Christophy:fix/tool-call-image-rendering
Open

Render tool_result image content blocks inline in chat#1119
Christophy wants to merge 2 commits into
getpaseo:mainfrom
Christophy:fix/tool-call-image-rendering

Conversation

@Christophy
Copy link
Copy Markdown

Linked issue

Closes #1118

Type of change

  • Bug fix
  • New feature (with prior issue + design alignment)
  • Refactor / code improvement
  • Docs

What does this PR do

Renders Anthropic image content blocks returned in tool_result (e.g. Read on a PNG, MCP screenshot tools) inline in the chat instead of stringifying them as raw JSON. Two commits:

  • Server: new extractToolResultParts helper splits tool_result.content into text + base64 images. Adds optional images?: ToolCallImage[] on ToolCallBase and threads it through the Claude provider into the timeline. Wire schema kept backward-compatible: ToolCallBasePayloadSchema drops its .strict() so older clients silently strip the new field (per CLAUDE.md protocol contract). Inbound ImageAttachmentSchema is left as-is; a separate outbound ToolCallImageSchema with .min(1) is used so existing request schemas aren't narrowed. Unrecognized block types are preserved as a deterministic JSON fallback appended to text — nothing is silently dropped.
  • App: new ToolCallImages component renders inline at container width, full resolution, tap-to-zoom. Cross-platform RN primitives only, no new dependencies. Web onLoad falls back to DOM target.naturalWidth/Height because react-native-web doesn't populate nativeEvent.source. Reducer merge preserves images across the running → completed lifecycle.

How did you verify it

Built locally and installed on each surface, then ran an agent issuing Read /tmp/thumbs_up.jpg. Before this PR the tool call card showed [{"source":{"data":"iVBORw0..."}}] as text. After, the image renders inline; tapping opens a full-screen zoom modal that dismisses on backdrop tap.

  • Desktop (macOS, Paseo.app)
desktop-tool-call-image
  • Web (Chrome via npm run dev:app)
web-tool-call-image
  • Android (debug APK on a real device)
android-tool-call-image
  • iOS — not tested locally (no device). Component uses only cross-platform RN primitives. Happy to fix follow-ups if iOS-specific issues turn up.

Tests:

  • Unit: npx vitest run packages/server/src/server/agent/providers/claude/tool-result-content.test.ts packages/server/src/server/agent/providers/claude/tool-call-mapper.test.ts packages/server/src/shared/messages.tool-call-schema.test.ts packages/app/src/types/stream.test.ts packages/app/src/components/tool-call-images.test.tsx
  • Real-API e2e (needs a Claude credential): npx vitest run packages/server/src/server/daemon-e2e/tool-result-images.real.e2e.test.ts — spawns a real daemon, runs a real Claude agent reading a 1×1 PNG fixture, asserts the timeline tool_call carries images: [{ mimeType: "image/png", ... }].

AI assistance disclosure: I used Claude Code on this change. The code, the tests, and the cross-surface installs are all things I ran and observed myself.

Checklist

  • One focused change. Unrelated cleanups split out.
  • npm run typecheck passes
  • npm run lint passes
  • npm run format ran (Biome)
  • UI changes include screenshots or video for every affected platform
  • Tests added or updated where it made sense

@Christophy Christophy changed the title Fix/tool call image rendering Render tool_result image content blocks inline in chat May 20, 2026
@Christophy
Copy link
Copy Markdown
Author

@paseo-ai hello

@Christophy
Copy link
Copy Markdown
Author

@paseo-ai hello, rebase this branch.

@Christophy Christophy force-pushed the fix/tool-call-image-rendering branch 3 times, most recently from 307ac91 to c345ac5 Compare May 22, 2026 00:14
When a Claude tool returns an image content block (Read on a PNG, MCP
screenshot tools, etc.), the daemon was stringifying the entire block array
into the chat timeline as raw JSON. Users saw `[{"source":{"data":"iVBOR..."}}]`
instead of the image.

Changes:

- packages/server/src/server/agent/agent-sdk-types.ts: add ToolCallImage
  type and an optional images? field on ToolCallBase, sibling to detail.
- packages/server/src/shared/messages.ts: mirror the wire schema. Drop
  .strict() from ToolCallBasePayloadSchema so older clients can still parse
  newer fields (forward-compat per CLAUDE.md "Protocol contract"). Split
  the image attachment schema: ImageAttachmentSchema stays permissive for
  inbound SendAgentMessage* requests; ToolCallImageSchema (with .min(1) on
  both fields) is used for outbound timeline items so the daemon never
  emits empty data URIs.
- packages/server/src/server/agent/providers/claude/tool-result-content.ts:
  new extractToolResultParts helper that splits tool_result.content into
  text and base64 images. Unrecognized block types (document, URL-source
  images, future shapes) are preserved as a deterministic JSON fallback
  appended to text so payloads are never silently dropped.
- packages/server/src/server/agent/providers/claude/agent.ts: route
  handleToolResult through the new helper; pass images to the mapper.
  Deleting the now-unreferenced coerceToolResultContentToString and three
  helpers it depended on (noUnusedLocals enforces this).
- packages/server/src/server/agent/providers/claude/tool-call-mapper.ts:
  thread images? through MapperParams onto the returned timeline item.
- tests: unit coverage in tool-result-content.test.ts (text-only, image-only,
  mixed, empty data, unsupported mime, URL source, unknown blocks, circular
  fallback). Schema round-trip + validation tests in
  messages.tool-call-schema.test.ts. New real-API e2e
  tool-result-images.real.e2e.test.ts runs a live Claude agent reading a 1x1
  PNG fixture and asserts the timeline tool_call item carries the image
  with mimeType image/png.

No UI changes here — the renderer will land as a follow-up PR once this
field is on main.
Consumes the tool_call.images field added in the prior commit. Tool-result
images (Read on a PNG, MCP screenshot tools, etc.) now render as inline
images in the chat timeline, sized to container width with aspect-ratio
preserved from the natural image dimensions. Tap to open a full-screen zoom
modal.

Changes:

- packages/app/src/components/tool-call-images.tsx: new component. Inline
  Image per entry, dynamic aspectRatio via onLoad, maxHeight cap of 480 to
  bound vertical space. Tap opens a Modal with the full-resolution image.
- packages/app/src/components/tool-call-details.tsx: thread an optional
  images prop through ToolCallDetailsContent and buildDetailSections so the
  image grid renders above each tool-detail variant when present.
- packages/app/src/types/stream.ts: carry images on AgentToolCallData and
  merge it across the running -> completed lifecycle. The merge uses
  data.images ?? existing so an event that omits the field preserves the
  existing array.
- packages/app/src/components/{message,tool-call-sheet,agent-stream-view}.tsx:
  wire the field through the existing tool-call call sites. The
  PermissionRequestCard call is intentionally left without images since
  AgentPermissionRequest does not carry one.
- packages/app/src/components/tool-call-images.test.tsx: unit test verifies
  empty input renders nothing and that each ToolCallImage produces an
  <img> with the expected data: URI.
- packages/app/src/types/stream.test.ts: reducer tests for the three image
  lifecycle paths (attach on completed, preserve across omitted update,
  replace on completed-replay).

No protocol changes — this PR strictly consumes the field added in the
prior server PR.
@Christophy Christophy force-pushed the fix/tool-call-image-rendering branch from c345ac5 to 08e7a8a Compare May 24, 2026 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Render tool_result image content blocks inline in chat

1 participant