A2A protocol: the agent interoperability standard, full anatomy, and how LLM4Agents builds on it
If MCP is how an agent talks to its tools, A2A is how an agent talks to another agent. Google announced the protocol in April 2025 with fifty technology partners on the launch slide, donated it to the Linux Foundation in June 2025, and shipped v1.0 in early 2026 with more than 150 organisations supporting it — Microsoft, AWS, Salesforce, SAP, ServiceNow, Workday, IBM, MongoDB, Atlassian, PayPal among them. In the year since launch A2A absorbed the role nobody else's spec could quite fill: the horizontal interoperability layer for autonomous agents. This post walks the protocol end-to-end — Agent Card schema, eleven JSON-RPC methods, eight-state task lifecycle, three update mechanisms, five auth schemes — explains why this layer mattered and not another, shows how LLM4Agents wires A2A into Agent Gen, x402 and ERC-8004, and ends with the four concrete improvements the protocol still needs.
Why A2A and not yet another bespoke API
Before A2A, every commercial agent platform shipped its own RPC. Salesforce's Einstein agents talked to Salesforce things. ServiceNow's Now Assist agents talked to ServiceNow things. Microsoft Copilot Studio agents talked to Microsoft Graph. Cross-vendor agent communication was a backlog of one-off integrations, each one a brittle adapter against a private schema, none of them composable.
This is the same shape of problem the web had in the early 2000s before REST conventions made cross-vendor service composition routine. A2A's bet — and the reason 150 organisations signed up so fast — is that the same convergence happens to agents now: a single transport, a single discovery mechanism, a single task model, and the network effect does the rest. An agent that speaks A2A can be hired by any other agent that speaks A2A, regardless of who built either. The first vendor to refuse interoperability pays the price; the last vendor to refuse is unhirable.
The technical decisions that made A2A succeed where prior attempts (LangChain agent protocols, AutoGen handoffs, IBM ACP) failed are all conservative. HTTPS as the transport. JSON-RPC 2.0 as the envelope. SSE as the streaming mechanism. Webhooks as the async callback. Enterprise auth schemes — OAuth 2.0, OIDC, mTLS, API keys — instead of inventing new ones. None of these are novel. They are the boring, battle-tested primitives every enterprise integration team already knows how to operate. The protocol shipped as something existing infrastructure could absorb on day one.
The Agent Card
An Agent Card is a JSON document an agent serves at a well-known URL — by convention /.well-known/agent-card.json — that describes everything another agent needs to interact with this one. It is the file equivalent of a DNS TXT record for capability discovery: machine-readable, cacheable, signable.
A minimal but realistic Agent Card looks like this:
{
"name": "research-agent",
"description": "Long-running research agent specialising in token analytics",
"version": "1.4.2",
"provider": {
"organization": "LLM4Agents",
"url": "https://llm4agents.com"
},
"url": "https://agent.example/a2a",
"capabilities": {
"streaming": true,
"pushNotifications": true,
"extendedAgentCard": true
},
"defaultInputModes": ["text", "data", "file"],
"defaultOutputModes": ["text", "data"],
"securitySchemes": {
"oauth2": {
"type": "oauth2",
"flows": { "clientCredentials": {
"tokenUrl": "https://auth.example/token",
"scopes": { "agent:invoke": "Invoke the agent" }
}}
}
},
"security": [{ "oauth2": ["agent:invoke"] }],
"skills": [
{
"id": "market-research",
"name": "Token market research",
"description": "Compile a structured report on a token's market dynamics, liquidity, top holders and recent on-chain activity.",
"tags": ["crypto", "research", "on-chain"],
"inputModes": ["text", "data"],
"outputModes": ["text", "data"],
"examples": [
"Research SOL liquidity across Solana DEXs over the past 30 days"
]
}
],
"extensions": [
{ "uri": "https://llm4agents.com/ext/x402-payable", "required": false },
{ "uri": "https://llm4agents.com/ext/erc8004-bound", "required": false }
]
}
The structure is intentionally narrow. A consumer reads the card and learns five things in order: what the agent is, where to call it, how to authenticate, what it can do (the skills array), and which optional extensions it supports. Skills are the unit of capability advertisement. Each skill is a named, tagged, described capability with concrete input/output modalities and natural-language examples. A discovery service can index skills the same way a search engine indexes web pages.
The extendedAgentCard capability is the privacy gate. A baseline Agent Card may declare only what the agent wants the open internet to see. Callers who authenticate get a richer card — internal tools, premium skills, higher rate limits — by calling GetExtendedAgentCard. This solves the "do I have to publish every internal capability to the world" objection enterprises raise at the first design review.
The eleven JSON-RPC methods
A2A is a JSON-RPC 2.0 protocol. Every method is a POST to the agent's URL with a JSON body that names the method, sets an id, and carries typed params. The full method surface is eleven calls, split into three groups.
Messaging (2):
SendMessage— submit a message to the agent, get back either a direct Message reply or a Task handle for long-running workSendStreamingMessage— same submission, but the response is a Server-Sent Events stream of incremental updates (requirescapabilities.streaming)
Task lifecycle (4):
GetTask— fetch a task's current state, artifacts, and optional historyListTasks— paginated, filterable listing of an agent's tasks (cursor-based)CancelTask— request cancellation, idempotentSubscribeToTask— open an SSE stream of incremental updates for an already-running task
Push notifications (4):
CreateTaskPushNotificationConfig— register a webhook for async updates to a taskGetTaskPushNotificationConfig— retrieve a registered webhook configurationListTaskPushNotificationConfigs— list all webhooks on a taskDeleteTaskPushNotificationConfig— remove a webhook
Discovery (1):
GetExtendedAgentCard— fetch the privileged Agent Card after authenticating
Eleven methods is small enough to implement in a weekend and complete enough to operate at scale. The decision to keep the surface narrow was deliberate: every additional method is an additional thing 150 implementations have to agree on.
The task lifecycle
Tasks are A2A's first-class abstraction for work that does not return synchronously. An agent receives a SendMessage, decides that the work is asynchronous, returns a Task handle, and emits state transitions until the task hits a terminal state.
The eight defined states form a small state machine:
Initial:
TASK_STATE_SUBMITTED // task accepted, queued
Active:
TASK_STATE_WORKING // agent is doing the work
Interruptable (non-terminal):
TASK_STATE_INPUT_REQUIRED // waiting for caller to provide more info
TASK_STATE_AUTH_REQUIRED // waiting for caller to authenticate
Terminal:
TASK_STATE_COMPLETED // success, artifacts attached
TASK_STATE_FAILED // agent could not complete
TASK_STATE_CANCELED // caller cancelled or system aborted
TASK_STATE_REJECTED // agent refused (policy, capacity, scope)
The interruptable states are the design move that separates A2A from a naive "call the agent and wait" RPC. A task is allowed to pause and ask its caller for more input or for authentication, then resume. The caller resumes the task by sending a new SendMessage against the same task id with the requested input. This handles the dominant real-world failure modes (the agent ran into ambiguity halfway through; the agent needed a tool that requires the user's OAuth consent) without forcing them into either "fail the task" or "guess and silently degrade."
REJECTED as a distinct terminal state matters more than it looks. It is the protocol-level affordance for an agent to say "I will not do this work" — over capacity, scope mismatch, policy violation — without claiming failure. The distinction is what lets downstream orchestrators route correctly: a failed task may be worth retrying with a different agent; a rejected task is a definitive answer that should not be retried against the same agent.
Messages, parts, and artifacts
The message structure is intentionally close to the structure of Claude's content blocks or OpenAI's message parts, which means agents authored against an existing LLM API can serialize to A2A without much translation.
A Message has a role (user or agent) and an array of Parts. Each Part is one of four types:
text— UTF-8 stringdata— structured JSON (object, array, scalar, null)raw— base64-encoded binary file content inlineurl— file reference, withmediaType
An Artifact is the output container an agent attaches to a completed (or in-progress) task. Artifacts have their own ID, a human-readable name and description, and an array of Parts. The shape mirrors Messages on purpose: an artifact is what an agent produces; a message is what an agent receives or emits in conversation.
The separation of artifacts from messages is the spec's answer to "where does the work product go." A long agent task may emit dozens of incremental messages along the way (progress, intermediate reasoning, asks for input) and a small number of artifacts at the end (the report, the diff, the plan). Callers polling the task can read artifacts independently from the conversational history.
Three update mechanisms, picked deliberately
How does a caller learn that a task progressed? A2A gives three choices and lets the agent advertise which ones it supports.
Polling. The boring baseline. The caller invokes GetTask on a schedule. Always available. Highest latency, simplest infrastructure.
Server-Sent Events (SSE). The caller opens a long-lived HTTP connection via SendStreamingMessage or SubscribeToTask and reads incremental updates as they arrive. Requires capabilities.streaming. Right for interactive UIs and short-to-medium running tasks where the caller is alive for the duration.
Push notifications (webhooks). The caller registers an HTTPS webhook on the task via CreateTaskPushNotificationConfig. The agent POSTs updates to the webhook as state changes. Requires capabilities.pushNotifications. Right for long-running tasks where the caller cannot stay connected (overnight jobs, multi-day workflows, agent-to-agent handoffs).
The three-channel approach is the right answer because no single one covers all the real workloads. SSE breaks when the caller's process dies. Webhooks break when the caller is behind a NAT or does not run a server. Polling works everywhere but wastes round-trips. A protocol that supports all three and lets the caller pick is the protocol an enterprise can deploy without rewriting their event infrastructure.
Authentication, the unglamorous detail that mattered
A2A reuses the OpenAPI security schemes verbatim:
- API key — header or query parameter, fine for service-to-service inside a trust boundary
- HTTP auth — Basic or Bearer tokens
- OAuth 2.0 — authorization code, client credentials, device code flows; the standard for cross-org agent-to-agent calls
- OIDC — OAuth 2.0 with identity claims, used when the agent needs to assert an end-user principal
- Mutual TLS — certificate-based, for high-security enterprise mesh deployments
Reusing OpenAPI's schemes means every API gateway, every identity provider, every enterprise security team already knows how to operate A2A. This sounds like an unsexy detail. It is the reason 150 organisations could ship support in their existing products without auditing a new protocol from scratch.
A2A vs MCP, the right mental model
The most common question we get from teams adopting both is "which one do I use." The answer is "both, at different layers."
MCP — vertical layer: agent ↔ tools
agent ──MCP──> filesystem, database, search, custom API
A2A — horizontal layer: agent ↔ agent
agent A ──A2A──> agent B
↓
└──MCP──> its own tools
An agent typically speaks MCP downward (to consume tools and resources) and A2A outward (to delegate to or be delegated by other agents). The two protocols do not compete; they compose. PayPal's production deployment is a clean example: a sales agent at the buyer end speaks A2A to PayPal's hosted invoice agent, which internally speaks MCP to the underlying PayPal billing systems. The buyer's agent never touches PayPal's MCP server — that boundary is owned by PayPal — but it can still delegate work into PayPal's territory because both sides speak A2A.
The other way to state the relationship: MCP is the protocol you implement to expose an agent's tools; A2A is the protocol you implement to expose an agent's skills. Tools are unit operations. Skills are agent-mediated outcomes. The two abstractions correspond to the two scopes at which an agent system is composable, and the field is converging on running both side by side.
How LLM4Agents builds on A2A
Every agent generated by Agent Gen ships with an A2A endpoint by default. The endpoint lives at the agent's public URL, the Agent Card is served at /.well-known/agent-card.json, and the agent's ERC-8004 registration file references the A2A endpoint under endpoints.a2a. The four files are wired together so a counterparty discovering the agent through the on-chain registry can pull the Agent Card without an additional out-of-band step.
The integration with x402 is where the composition gets interesting. A2A's spec does not include payment primitives — and deliberately so; protocols that try to do everything end up doing nothing well. We bridge the gap with a published extension URI, https://llm4agents.com/ext/x402-payable, declared in the extensions array of the Agent Card. An agent that supports the extension accepts the following pattern:
- Caller sends a
SendMessageto a paid skill. - Agent responds with a Task in state
TASK_STATE_INPUT_REQUIREDand a message Part of typedatacontaining an x402 payment requirement. - Caller signs the x402 payment, posts the proof to the agent via a second
SendMessage. - Agent verifies via the x402 facilitator, transitions the task to
TASK_STATE_WORKING, and finishes normally.
This is a pure-A2A interaction with one extension flag. No new RPC was needed. The interruptable INPUT_REQUIRED state was designed for exactly this kind of pause-and-collect-from-caller flow, and we are using it as intended.
The ERC-8004 binding works the same way. An extension URI, https://llm4agents.com/ext/erc8004-bound, declares that the agent's identity is anchored on-chain. The Agent Card includes the agent's identity token URI; counterparties can resolve reputation and validation records before they decide to send the first SendMessage. We push validation receipts back to ERC-8004 after every completed task, which closes the loop: an agent's track record from A2A interactions becomes queryable from a smart contract.
The composition pattern we ship by default in the SDK looks like this:
// Pseudocode for the LLM4Agents SDK composition flow
const registry = await erc8004.identityRegistry()
const candidates = await registry.search({ skill: "market-research",
minReputation: 90 })
for (const cand of candidates) {
const card = await a2a.getAgentCard(cand.endpoint)
if (!card.skills.find(s => s.id === "market-research")) continue
const task = await a2a.sendMessage(cand.endpoint, {
skill: "market-research",
parts: [{ type: "text", text: "Research SOL liquidity..." }]
})
if (task.state === "TASK_STATE_INPUT_REQUIRED" &&
task.message.parts[0].data.x402) {
await x402.payAndSubmit(cand.endpoint, task.id, task.message.parts[0].data.x402)
}
const result = await a2a.waitForTerminal(cand.endpoint, task.id)
await erc8004.validation.attest(cand.tokenId, result.success, result.metrics)
return result
}
Discovery via ERC-8004, capability check via A2A's Agent Card, payment via x402 inside an A2A task, validation back to ERC-8004 once the task lands. Four protocols composing through their well-defined seams. None of them owned by a single vendor.
Where A2A still needs to improve
A2A v1.0 is production-grade for its scope. The honest critique — and the work the protocol's working group has in front of it — is in what the scope deliberately excludes.
Payment-aware methods. The x402-via-INPUT_REQUIRED pattern works, but it is a workaround. Agent commerce is going to be the dominant use case for agent-to-agent traffic, and the protocol should standardize the payment handshake instead of leaving every implementation to invent its own extension URI. A native method (something like SendPaidMessage with an embedded payment requirement schema) would let downstream marketplaces filter, audit and reconcile agent payments without each one parsing a different vendor's extension.
Negotiation primitives. Project Deal made it concrete that agent-to-agent commerce is dominated by negotiation. Today, negotiation happens inside the text Parts of free-form messages — the agents bargain in natural language, and the orchestrator has no protocol-level structure to record or audit. A small extension defining a negotiation Part type with structured offer/counter-offer fields, optional reservation-price disclosure flags, and a settle/refuse outcome would give marketplaces the shape they need to make agent commerce auditable.
Semantic alignment. A2A standardizes the wire. It does not standardize what the words across the wire mean. Two agents can complete an A2A task perfectly while disagreeing on what "approved vendor" or "delivered" means. The field will need either a shared ontology layer (in the W3C / schema.org tradition) or a per-domain semantic extension registry where industries publish their reference dictionaries. Without it, agent ecosystems fragment into bilateral "we both know what this term means" pairs.
Identity binding. A2A's Agent Card has a signature field for card authenticity but no canonical way to bind an agent's A2A identity to a portable, marketplace-readable identity. ERC-8004 is the obvious candidate — chain-anchored, vendor-neutral, already deployed — and we expect the next minor version of A2A to either reference it directly or define a generic identity-binding extension that ERC-8004 (and any future equivalent) can implement.
None of these gaps is a flaw in v1.0. They are the right things to leave out of a v1.0. They are the work the field will fill in over v1.x as agent commerce moves from pilot to volume.
Closing
The shape of the agentic stack in 2026 is finally legible. MCP carries the agent-to-tool boundary. A2A carries the agent-to-agent boundary. x402 carries the payment that closes a deal. ERC-8004 carries the identity, reputation and validation that let two agents who have never met decide to transact in the first place. Each protocol covers one layer; none of them is a vendor's product; all of them are governed by Linux Foundation, Ethereum Foundation, or equivalent neutral homes.
A2A is the layer that sat empty the longest, and the one whose absence kept agent platforms walled off from each other for the longest. Its v1.0 release is the moment the agent-to-agent boundary finally has a spec the field can converge on. The remaining work — payment-aware methods, negotiation primitives, semantic alignment, identity binding — is the kind of work that gets done in v1.x of a successful protocol, not the kind that gets re-litigated. The composition story is settled. Now we build.
The spec is at a2a-protocol.org/latest/specification. The reference implementation is at github.com/a2aproject/A2A. If you are building anything that another agent will eventually need to call, both belong in your stack this quarter.