Skip to content

[Feat]: Optional Structured Field Definitions for Agent Inputs and Outputs #813

@inesmcm26

Description

@inesmcm26

Is your feature request related to a problem? Please describe.

Yes. The protocol currently supports inputModes and outputModes (e.g., supported mime types) for agent skills, but lacks a structured way for agents to optionally declare named input and output fields they expect or produce, along with their types or schemas.

This makes it harder for clients to know the exact structure and meaning of inputs an agent expects or outputs it produces, especially when multiple fields of the same type are involved.

Motivation

  • Enable agents to declare named input/output fields with type information.
  • Support schemas for better validation, logging, debugging, etc
  • Let clients render structured forms to collect user input appropriately.
  • Reduce ambiguity in how message/artifact parts are composed and interpreted.
  • Provide a more semantic layer on top of the existing input/output modes.

Describe the solution you'd like

Introduce optional structured field definitions for agent input and output, using a new FieldDefinition model. Agents can declare named fields, their content kind (text, file, data), optional mime types or schemas, and a description. These would complement the inputModes/outputModes on the AgentSkill

from pydantic import BaseModel
from typing import Literal, Any, List, Optional

class FieldDefinition(BaseModel):
    """
    Describes an expected input or output field.
    """
    name: str | None = None
    kind: Literal['text', 'file', 'data'] | None = None
    mimeTypes: list[str] | None= None
    schema: dict[str, Any] | None = None  # For data parts
    description: str | None = None
    optional: bool | None = False

class AgentSkill(BaseModel):
    """
    Represents a unit of capability that an agent can perform.
    """

    description: str
    examples: list[str] | None = None
    id: str
    name: str
    tags: list[str]

    # Existing simple mime-type lists remain supported:
    inputModes: list[str] | None = None
    outputModes: list[str] | None = None

    # New optional structured fields for richer metadata:
    inputFields: list[FieldDefinition] | None = None
    outputFields: list[FieldDefinition] | None = None

Message senders can then match named parts accordingly. The protocol may also define conventions for how these fields map to a Part.name or DataPart.data.keys.

Example: Book Review Agent

An agent expects two text inputs:

  • bookTitle: The book’s title
  • authorName: The author’s name

And returns two text outputs:

  • review-chunk: A chunk of the book's review
  • publicationDate: When the book was published
{
  "inputModes": ["text/plain"],
  "outputModes": ["text/plain", "text/html"],
  "inputFields": [
    { "name": "bookTitle", "kind": "text", "description": "The title of the book" },
    { "name": "authorName", "kind": "text", "description": "The author's full name" }
  ],
  "outputFields": [
    { "name": "review-chunk",
      "kind": "text",
      "mimeTypes": ["text/plain"],
      "description": "A chunk of the book's review. Sent when streaming",
      "optional": true,
    },
    { "name": "review-html",
      "kind": "text",
      "mimeTypes": ["text/html"],
      "description": "HTML with the whole book's review. Sent when not streaming.",
      "optional": true,
    },
    { "name": "publicationDate", "kind": "text", "description": "Publication date"}
  ],
}

Client sending a message to the agent:

message = Message(
    parts=[
        Part(
           root=DataPart(
              data={
                   "bookTitle": TextPart(text="Charlotte's Web"),
                   "authorName": TextPart(text="E. B. White")
              }
        )
    ],
    # other Message fields as needed
)

Agent response with Task and Artifacts:

result = Task(
    taskId: "task1",
    contextId="context1",
    final=True,
    status=TaskStatus(state="completed"),
    artifacts=[
        Artifact(
           artifactId="artifact1",
               name="review-html",
               parts=[Part(root=TextPart(text="<div><h3>Review</h3><p>E.B. White’s <em>Charlotte’s Web</em> is a timeless classic that weaves together friendship, loss, and the quiet beauty of ordinary life.</p></div>"))]
        ),
        Artifact(
           artifactId="artifact2",
               name="publicationDate",
                parts=[Part(root=TextPart(text="October 15, 1952"))]
        ),
)

Agent streaming response with Task and Artifacts:

# First event: Task
result = Task(
    taskId: "task1",
    contextId="context1",
    final=False,
    status=TaskStatus(state="submitted")
)

# Second event: Task artifact update event (with first part of review)
result = TaskArtifactUpdateEvent(
    taskId: "task1",
    contextId="context1",
    artifact=Artifact(
        artifactId="artifact1",
            name="review-chunk",
            parts=[
                Part(root=TextPart(text="E.B. White’s Charlotte’s Web is a timeless classic that.")),
            ]
     )
)


# Third event: Task artifact update event (with publication date)
result = TaskArtifactUpdateEvent(
    taskId: "task1",
    contextId="context1",
    artifact=Artifact(
        artifactId="artifact2",
            name="publicationDate",
            parts=[Part(root=TextPart(text="October 15, 1952"))]
     )
)

# Fourth event: Task artifact update event (with second part of review)
result = TaskArtifactUpdateEvent(
    taskId: "task1",
    contextId="context1",
    artifact=Artifact(
        artifactId="artifact3",
            name="review-chunk",
            parts=[Part(root=TextPart(text="weaves together friendship, loss, and the quiet beauty of ordinary life"))]
     )
)

...

Describe alternatives you've considered

  • Relying only on inputModes/outputModes: insufficient for cases where structure matters.
  • Encoding structure in skill descriptions or agent documentation: not machine-readable
  • Inferring field structure from examples: unreliable and inconsistent.

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions