Render tool_result image content blocks inline in chat#1119
Open
Christophy wants to merge 2 commits into
Open
Conversation
Author
|
@paseo-ai hello |
Author
|
@paseo-ai hello, rebase this branch. |
307ac91 to
c345ac5
Compare
When a Claude tool returns an image content block (Read on a PNG, MCP
screenshot tools, etc.), the daemon was stringifying the entire block array
into the chat timeline as raw JSON. Users saw `[{"source":{"data":"iVBOR..."}}]`
instead of the image.
Changes:
- packages/server/src/server/agent/agent-sdk-types.ts: add ToolCallImage
type and an optional images? field on ToolCallBase, sibling to detail.
- packages/server/src/shared/messages.ts: mirror the wire schema. Drop
.strict() from ToolCallBasePayloadSchema so older clients can still parse
newer fields (forward-compat per CLAUDE.md "Protocol contract"). Split
the image attachment schema: ImageAttachmentSchema stays permissive for
inbound SendAgentMessage* requests; ToolCallImageSchema (with .min(1) on
both fields) is used for outbound timeline items so the daemon never
emits empty data URIs.
- packages/server/src/server/agent/providers/claude/tool-result-content.ts:
new extractToolResultParts helper that splits tool_result.content into
text and base64 images. Unrecognized block types (document, URL-source
images, future shapes) are preserved as a deterministic JSON fallback
appended to text so payloads are never silently dropped.
- packages/server/src/server/agent/providers/claude/agent.ts: route
handleToolResult through the new helper; pass images to the mapper.
Deleting the now-unreferenced coerceToolResultContentToString and three
helpers it depended on (noUnusedLocals enforces this).
- packages/server/src/server/agent/providers/claude/tool-call-mapper.ts:
thread images? through MapperParams onto the returned timeline item.
- tests: unit coverage in tool-result-content.test.ts (text-only, image-only,
mixed, empty data, unsupported mime, URL source, unknown blocks, circular
fallback). Schema round-trip + validation tests in
messages.tool-call-schema.test.ts. New real-API e2e
tool-result-images.real.e2e.test.ts runs a live Claude agent reading a 1x1
PNG fixture and asserts the timeline tool_call item carries the image
with mimeType image/png.
No UI changes here — the renderer will land as a follow-up PR once this
field is on main.
Consumes the tool_call.images field added in the prior commit. Tool-result
images (Read on a PNG, MCP screenshot tools, etc.) now render as inline
images in the chat timeline, sized to container width with aspect-ratio
preserved from the natural image dimensions. Tap to open a full-screen zoom
modal.
Changes:
- packages/app/src/components/tool-call-images.tsx: new component. Inline
Image per entry, dynamic aspectRatio via onLoad, maxHeight cap of 480 to
bound vertical space. Tap opens a Modal with the full-resolution image.
- packages/app/src/components/tool-call-details.tsx: thread an optional
images prop through ToolCallDetailsContent and buildDetailSections so the
image grid renders above each tool-detail variant when present.
- packages/app/src/types/stream.ts: carry images on AgentToolCallData and
merge it across the running -> completed lifecycle. The merge uses
data.images ?? existing so an event that omits the field preserves the
existing array.
- packages/app/src/components/{message,tool-call-sheet,agent-stream-view}.tsx:
wire the field through the existing tool-call call sites. The
PermissionRequestCard call is intentionally left without images since
AgentPermissionRequest does not carry one.
- packages/app/src/components/tool-call-images.test.tsx: unit test verifies
empty input renders nothing and that each ToolCallImage produces an
<img> with the expected data: URI.
- packages/app/src/types/stream.test.ts: reducer tests for the three image
lifecycle paths (attach on completed, preserve across omitted update,
replace on completed-replay).
No protocol changes — this PR strictly consumes the field added in the
prior server PR.
c345ac5 to
08e7a8a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Linked issue
Closes #1118
Type of change
What does this PR do
Renders Anthropic image content blocks returned in
tool_result(e.g.Readon a PNG, MCP screenshot tools) inline in the chat instead of stringifying them as raw JSON. Two commits:extractToolResultPartshelper splitstool_result.contentinto text + base64 images. Adds optionalimages?: ToolCallImage[]onToolCallBaseand threads it through the Claude provider into the timeline. Wire schema kept backward-compatible:ToolCallBasePayloadSchemadrops its.strict()so older clients silently strip the new field (perCLAUDE.mdprotocol contract). InboundImageAttachmentSchemais left as-is; a separate outboundToolCallImageSchemawith.min(1)is used so existing request schemas aren't narrowed. Unrecognized block types are preserved as a deterministic JSON fallback appended to text — nothing is silently dropped.ToolCallImagescomponent renders inline at container width, full resolution, tap-to-zoom. Cross-platform RN primitives only, no new dependencies. WebonLoadfalls back to DOMtarget.naturalWidth/Heightbecause react-native-web doesn't populatenativeEvent.source. Reducer merge preservesimagesacross the running → completed lifecycle.How did you verify it
Built locally and installed on each surface, then ran an agent issuing
Read /tmp/thumbs_up.jpg. Before this PR the tool call card showed[{"source":{"data":"iVBORw0..."}}]as text. After, the image renders inline; tapping opens a full-screen zoom modal that dismisses on backdrop tap.npm run dev:app)Tests:
npx vitest run packages/server/src/server/agent/providers/claude/tool-result-content.test.ts packages/server/src/server/agent/providers/claude/tool-call-mapper.test.ts packages/server/src/shared/messages.tool-call-schema.test.ts packages/app/src/types/stream.test.ts packages/app/src/components/tool-call-images.test.tsxnpx vitest run packages/server/src/server/daemon-e2e/tool-result-images.real.e2e.test.ts— spawns a real daemon, runs a real Claude agent reading a 1×1 PNG fixture, asserts the timelinetool_callcarriesimages: [{ mimeType: "image/png", ... }].AI assistance disclosure: I used Claude Code on this change. The code, the tests, and the cross-surface installs are all things I ran and observed myself.
Checklist
npm run typecheckpassesnpm run lintpassesnpm run formatran (Biome)