-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Is your feature request related to a problem? Please describe.
Yes. The protocol currently supports inputModes and outputModes (e.g., supported mime types) for agent skills, but lacks a structured way for agents to optionally declare named input and output fields they expect or produce, along with their types or schemas.
This makes it harder for clients to know the exact structure and meaning of inputs an agent expects or outputs it produces, especially when multiple fields of the same type are involved.
Motivation
- Enable agents to declare named input/output fields with type information.
- Support schemas for better validation, logging, debugging, etc
- Let clients render structured forms to collect user input appropriately.
- Reduce ambiguity in how message/artifact parts are composed and interpreted.
- Provide a more semantic layer on top of the existing input/output modes.
Describe the solution you'd like
Introduce optional structured field definitions for agent input and output, using a new FieldDefinition model. Agents can declare named fields, their content kind (text, file, data), optional mime types or schemas, and a description. These would complement the inputModes/outputModes on the AgentSkill
from pydantic import BaseModel
from typing import Literal, Any, List, Optional
class FieldDefinition(BaseModel):
"""
Describes an expected input or output field.
"""
name: str | None = None
kind: Literal['text', 'file', 'data'] | None = None
mimeTypes: list[str] | None= None
schema: dict[str, Any] | None = None # For data parts
description: str | None = None
optional: bool | None = False
class AgentSkill(BaseModel):
"""
Represents a unit of capability that an agent can perform.
"""
description: str
examples: list[str] | None = None
id: str
name: str
tags: list[str]
# Existing simple mime-type lists remain supported:
inputModes: list[str] | None = None
outputModes: list[str] | None = None
# New optional structured fields for richer metadata:
inputFields: list[FieldDefinition] | None = None
outputFields: list[FieldDefinition] | None = NoneMessage senders can then match named parts accordingly. The protocol may also define conventions for how these fields map to a Part.name or DataPart.data.keys.
Example: Book Review Agent
An agent expects two text inputs:
- bookTitle: The book’s title
- authorName: The author’s name
And returns two text outputs:
- review-chunk: A chunk of the book's review
- publicationDate: When the book was published
{
"inputModes": ["text/plain"],
"outputModes": ["text/plain", "text/html"],
"inputFields": [
{ "name": "bookTitle", "kind": "text", "description": "The title of the book" },
{ "name": "authorName", "kind": "text", "description": "The author's full name" }
],
"outputFields": [
{ "name": "review-chunk",
"kind": "text",
"mimeTypes": ["text/plain"],
"description": "A chunk of the book's review. Sent when streaming",
"optional": true,
},
{ "name": "review-html",
"kind": "text",
"mimeTypes": ["text/html"],
"description": "HTML with the whole book's review. Sent when not streaming.",
"optional": true,
},
{ "name": "publicationDate", "kind": "text", "description": "Publication date"}
],
}Client sending a message to the agent:
message = Message(
parts=[
Part(
root=DataPart(
data={
"bookTitle": TextPart(text="Charlotte's Web"),
"authorName": TextPart(text="E. B. White")
}
)
],
# other Message fields as needed
)Agent response with Task and Artifacts:
result = Task(
taskId: "task1",
contextId="context1",
final=True,
status=TaskStatus(state="completed"),
artifacts=[
Artifact(
artifactId="artifact1",
name="review-html",
parts=[Part(root=TextPart(text="<div><h3>Review</h3><p>E.B. White’s <em>Charlotte’s Web</em> is a timeless classic that weaves together friendship, loss, and the quiet beauty of ordinary life.</p></div>"))]
),
Artifact(
artifactId="artifact2",
name="publicationDate",
parts=[Part(root=TextPart(text="October 15, 1952"))]
),
)Agent streaming response with Task and Artifacts:
# First event: Task
result = Task(
taskId: "task1",
contextId="context1",
final=False,
status=TaskStatus(state="submitted")
)
# Second event: Task artifact update event (with first part of review)
result = TaskArtifactUpdateEvent(
taskId: "task1",
contextId="context1",
artifact=Artifact(
artifactId="artifact1",
name="review-chunk",
parts=[
Part(root=TextPart(text="E.B. White’s Charlotte’s Web is a timeless classic that.")),
]
)
)
# Third event: Task artifact update event (with publication date)
result = TaskArtifactUpdateEvent(
taskId: "task1",
contextId="context1",
artifact=Artifact(
artifactId="artifact2",
name="publicationDate",
parts=[Part(root=TextPart(text="October 15, 1952"))]
)
)
# Fourth event: Task artifact update event (with second part of review)
result = TaskArtifactUpdateEvent(
taskId: "task1",
contextId="context1",
artifact=Artifact(
artifactId="artifact3",
name="review-chunk",
parts=[Part(root=TextPart(text="weaves together friendship, loss, and the quiet beauty of ordinary life"))]
)
)
...Describe alternatives you've considered
- Relying only on
inputModes/outputModes: insufficient for cases where structure matters. - Encoding structure in skill descriptions or agent documentation: not machine-readable
- Inferring field structure from examples: unreliable and inconsistent.
Additional context
No response
Code of Conduct
- I agree to follow this project's Code of Conduct