MCP deep dive: the Model Context Protocol, end to end, for the agent operator
Every post we have written about the agentic stack — A2A, AP2, x402, ERC-8004 — referenced MCP as if everyone already knew it. This is the post we should have written first. MCP is the protocol that lets an agent talk to its tools. If the rest of the stack standardises how agents talk to each other, how they pay each other, and how they prove their identity, MCP standardises how each one of them gets work done. This post is the comprehensive walkthrough we owe every future agent operator: architecture, the six primitives, the two transports, OAuth 2.1, the security model, the current 2025-11-25 spec, the upcoming 2026-07-28 release candidate (stateless core, MCP Apps, Tasks as a formal extension), the operator concerns the spec leaves for you to solve, and how Agent Builder turns MCP from a protocol you implement into a control surface you configure.
The shape of the problem MCP solves
Before MCP, every LLM application that wanted to connect to an external data source or a tool wrote a bespoke adapter. The agent calling GitHub had its own GitHub integration. The agent calling Notion had its own Notion integration. The agent reading from Postgres had its own database wrapper. When you wanted to plug a new tool into the agent, you wrote new code. When you wanted to plug your agent into a different LLM platform, you rewrote the integration on the other side. The problem was the same shape the early web had before HTTP: every client and every server invented their own protocol; nothing composed.
Anthropic open-sourced MCP in November 2024 with the explicit goal of being "USB-C for AI." The metaphor is exact. USB-C did not solve the problem of how to charge a laptop or how to drive a monitor; it standardised the connector and the negotiation protocol so that any device could connect to any host. MCP does not invent new tools, new resources, or new prompts; it standardises the contract by which a tool exposes itself, so that any LLM application can connect to any tool without bespoke code.
By mid-2026 every major LLM platform — Claude, ChatGPT, Gemini, Copilot, the open-source LLM frameworks — speaks MCP natively. The official registry lists hundreds of production servers. Anthropic, OpenAI, Microsoft, Google and AWS all maintain client libraries. The spec is governed under an open governance model with Working Groups, Standards Enhancement Proposals (SEPs), and a published roadmap. The day-one bet — that a protocol with the right primitives would absorb the tool-integration space the way HTTP absorbed the document-integration space — landed.
The architecture: hosts, clients, servers
MCP defines three roles. Understanding the separation is the load-bearing part of every other section.
MCP architecture:
┌──────────────────────────────────────────────────────────┐
│ HOST (Claude Desktop, Cursor, VS Code, your agent) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Client A │ │ Client B │ │ Client C │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
└────────┼───────────────┼───────────────┼─────────────────┘
│ JSON-RPC 2.0 │ JSON-RPC 2.0 │ JSON-RPC 2.0
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Server │ │ Server │ │ Server │
│ (Github)│ │ (Notion)│ │ (Postgres)
└─────────┘ └─────────┘ └─────────┘
A host is the LLM application the user actually interacts with — Claude Desktop, Cursor, VS Code with an MCP extension, an Agent Builder agent in production. The host owns the user's trust, manages the LLM that does the reasoning, and decides which servers the agent is allowed to talk to.
A client is the connector inside the host. There is one client per server connection. The client speaks the MCP protocol; it does not speak the language of the underlying tool. From the host's perspective, every connected server looks the same shape because every client implements the same protocol surface.
A server is the service that exposes capabilities. A GitHub MCP server exposes GitHub's API as MCP primitives. A Postgres MCP server exposes the database. A filesystem MCP server exposes local files. The server author writes the translation from MCP-shaped requests to the underlying API or data store. After that, every LLM application in the world can use the server through the same code.
The right mental model is that the host is the conductor, the clients are the musicians' technique, and the servers are the instruments. The conductor does not need to know how each instrument is built; the technique encapsulates that. Swap an instrument, and the technique still works.
The six primitives
MCP standardises six primitives split into two halves: three that the server offers the client, and three that the client offers the server. The split matters because it is the structural answer to "how does a tool talk back to the agent."
Server primitives: what the server offers
Tools. Executable actions the LLM can decide to call. A tool has a name, a description (which the LLM reads to decide when to invoke it), and a JSON Schema input definition. Calling a tool runs server-side code with the LLM-provided arguments and returns a result. Tools are the unit of agency: anything that changes the world — sending an email, writing a file, executing a query, posting a transaction — is a tool.
// Example tool definition served by a Postgres MCP server
{
"name": "query",
"description": "Run a read-only SQL query against the configured database. Use this when the user asks about data.",
"inputSchema": {
"type": "object",
"properties": {
"sql": { "type": "string",
"description": "A SELECT statement. INSERT/UPDATE/DELETE are rejected." }
},
"required": ["sql"]
},
"annotations": {
"readOnlyHint": true,
"openWorldHint": false
}
}
The annotations are hints the host uses to decide policy: a read-only tool can be auto-approved; a destructive tool requires explicit consent. The host treats annotations as untrusted unless the server itself is trusted.
Resources. Read-only data the LLM (or the user) can pull into context. A resource has a URI, a MIME type, and contents. Resources are the unit of state: anything that exists and that the agent might want to read — a file, a document, a row, a search result — is a resource. Resources can be subscribed to: the client tells the server "tell me when this changes," and the server pushes updates over the same connection.
// Example resource served by a filesystem MCP server
{
"uri": "file:///workspace/contracts/lease.pdf",
"name": "lease.pdf",
"mimeType": "application/pdf",
"size": 982341
}
// Subscribe to resource changes
{ "jsonrpc": "2.0", "id": 7,
"method": "resources/subscribe",
"params": { "uri": "file:///workspace/contracts/lease.pdf" } }
The tools-vs-resources distinction is not academic. Tools are how an agent acts; resources are how an agent reads. A well-designed server keeps the two separated: a read-only operation lives in a resource; an operation with side effects lives in a tool. Hosts apply different consent policies to each, and operators rely on that separation when scoping which capabilities an agent is allowed to use.
Prompts. Reusable, parameterised message templates that the user (not the model) can invoke. Prompts are the unit of user-facing workflow: a "summarise this PR" prompt is a one-click invocation that constructs a properly-scoped message the LLM can act on. Prompts are explicitly user-initiated — the LLM cannot call a prompt — which makes them safer than tools and more discoverable than free-form messages.
// Example prompt served by a GitHub MCP server
{
"name": "summarize_pr",
"description": "Summarise a pull request including the diff, comments and CI status.",
"arguments": [
{ "name": "repo", "description": "owner/name", "required": true },
{ "name": "prNumber", "description": "PR number", "required": true }
]
}
Client primitives: what the client offers
The other half of the protocol is what the client offers the server. This is the part most introductions to MCP skip; it is also what separates MCP from a one-way "agent calls tools" model. Servers can ask the client for things back.
Sampling. The server can ask the host to run an LLM call on its behalf. The classic example: a GitHub MCP server tasked with "review this PR" can sample the host's LLM with the diff and a review prompt, get back a structured review, and return it as the tool's result. The server author does not have to ship their own model API key, and the user has a single consent surface ("this server wants to make an LLM call on your behalf, do you allow it?"). The host retains control of which model, which budget, and which content actually goes to the model.
// Server requests sampling from the host
{ "jsonrpc": "2.0", "id": 11,
"method": "sampling/createMessage",
"params": {
"messages": [{
"role": "user",
"content": { "type": "text",
"text": "Review this PR and flag risks: ..." }
}],
"systemPrompt": "You are a senior reviewer.",
"maxTokens": 2000,
"modelPreferences": { "intelligencePriority": 0.8,
"costPriority": 0.2 }
} }
Roots. The server can ask the host for the URI or filesystem boundaries it is allowed to operate within. A filesystem MCP server invoked by a host should not be allowed to read arbitrary files; the host responds to a roots/list request with the project directories the user has consented to expose. The server then restricts its operations to those roots.
Elicitation. The server can ask the host to ask the user for additional information. The classic example: a payment-collecting server needs an email for the receipt, the LLM is mid-conversation and does not have it. Instead of inventing one or failing, the server emits an elicitation request, the host shows the user a small form, the user fills it in, the value is returned to the server. This is the protocol-level mechanism that prevents the agent from hallucinating user data.
The three client primitives together close a loop the early agent ecosystem fumbled badly. The agent calls a tool; the tool needs more from the model, or wants the user's input, or wants to know what files it can touch. Without protocol primitives, every server invented its own out-of-band channel — environment variables, side servers, "please call me later" patterns. With sampling, roots and elicitation, all three flows happen over the same connection with the same consent gates.
The lifecycle and the base protocol
MCP rides on JSON-RPC 2.0 over a duplex transport. The client and server perform a three-step handshake on connection, exchange capabilities, and then operate as a long-lived stateful session (in the 2025-11-25 spec; we will get to the 2026-07-28 stateless redesign).
Handshake:
Client Server
────── ──────
│ │
├── initialize (capabilities) ──►│
│◄── initialize response ────────┤
├── notifications/initialized ──►│
│ │
│ (normal protocol) │
├── tools/list ─────────────────►│
│◄── tool list ──────────────────┤
├── tools/call ────────────────►│
│◄── tool result ────────────────┤
│ │
During initialize, both sides declare which capabilities they support. The client says "I implement sampling, roots and elicitation"; the server says "I expose tools, resources and prompts, with subscriptions enabled on resources." The negotiation lets old clients connect to new servers gracefully — anything one side does not understand is simply not used.
The same JSON-RPC connection then carries every primitive call (tools/list, tools/call, resources/read, resources/subscribe, prompts/list, prompts/get, sampling/createMessage, roots/list, elicitation/create) plus the utility methods (ping, cancelled, progress, logging/setLevel) until one side closes.
Transports: stdio and Streamable HTTP
The protocol is transport-agnostic, but two transports dominate in practice.
stdio. The host launches the server as a child process and the two exchange JSON-RPC messages over stdin/stdout. This is the local-only case: a developer's machine running an MCP server that exposes the local filesystem, the local git repository, or a Postgres on localhost. Latency is microseconds, deployment is trivial, security is whatever the host process trust boundary already is.
Streamable HTTP. The server runs as a remote service and the host opens an HTTP connection. The current spec uses long-lived connections with Server-Sent Events for server-to-client streaming. The 2026-07-28 release candidate reworks this into a stateless model — every request carries its own routing context, no sticky sessions, no Mcp-Session-Id header — which lets MCP servers run horizontally behind a load balancer the way every other HTTP service does.
The 2026 roadmap is explicit: no new official transports are being added this cycle. The two that exist cover the two real cases (local process, remote service); adding a third would fragment the implementation surface for marginal gain.
OAuth 2.1, scopes, and the authorization stack
For remote MCP servers, authorization is the load-bearing problem. The spec adopts OAuth 2.1 with PKCE, separating the roles of resource server (the MCP server itself) and authorization server (which may be the same or a different service). The flow is the standard OAuth one: the host redirects the user to the authorization server, the user consents, the host receives an access token, the host attaches it to every subsequent MCP request.
What MCP adds on top of OAuth is scope discipline. Every tool a server exposes can declare the OAuth scopes it requires. A GitHub MCP server's create_pr tool requires repo:write; its list_issues tool requires only repo:read. The host can request only the scopes the user actually needs for the workflow at hand, which is what enables least-privilege access. An agent that only reads issues never gets the credentials to open a PR.
The 2026-07-28 release candidate adds six OAuth hardening enhancements. Clients must validate the iss parameter per RFC 9207 (mitigating mix-up attacks). Clients declare OpenID Connect application_type at registration time. Credentials bind to issuer URLs with re-registration required on migration. Refresh-token flows and step-up authentication scope accumulation are clarified. None of these are exotic; they are the kind of OAuth hygiene that any operator deploying remote MCP servers should adopt.
The 2026-07-28 release candidate: what changes
The release candidate locked on 21 May 2026 with the final spec landing on 28 July 2026. Five of its changes are big enough to call out by name.
Stateless protocol core. The biggest architectural shift in MCP's history. The initialize handshake disappears from the wire; Mcp-Session-Id is gone. Each request carries an MCP-Protocol-Version header and the routing metadata it needs (Mcp-Method, Mcp-Name) so any server instance can handle any request. Applications that need state — long-running tasks, subscriptions, conversational context — adopt the "explicit-handle pattern": the server mints an identifier on the first call and the model threads it through subsequent calls as an ordinary argument. The win is that MCP servers can now scale horizontally the way any other HTTP service does. The cost is that every server author has to rethink anything that previously relied on session affinity.
// Before — 2025-11-25, stateful
POST /mcp
Mcp-Session-Id: 1868a90c-3a3f-4f5b
Content-Type: application/json
{"jsonrpc":"2.0","id":2,"method":"tools/call",
"params":{"name":"search","arguments":{"q":"otters"}}}
// After — 2026-07-28, stateless
POST /mcp
MCP-Protocol-Version: 2026-07-28
Mcp-Method: tools/call
Mcp-Name: search
Content-Type: application/json
{"jsonrpc":"2.0","id":1,"method":"tools/call",
"params":{"name":"search","arguments":{"q":"otters"}}}
MCP Apps. Servers can now ship interactive HTML interfaces that the host renders in a sandboxed iframe. A tool declares its UI template up-front so the host can pre-fetch and security-review it. The rendered UI talks back to the host over the same JSON-RPC connection. The use case is the entire class of agent flows that need richer-than-text interaction: showing a map for a route, rendering a cart for review, displaying a chart for a financial query. Before MCP Apps, the server returned plain text or markdown and the host had to guess how to render it; with MCP Apps, the server controls the UI but the host controls the sandbox.
Tasks as a formal extension. Long-running tool calls (anything beyond a few seconds) now use the Tasks extension. The flow: tools/call returns a task handle instead of an immediate result; the client polls with tasks/get or subscribes via tasks/update; the client can tasks/cancel at any time. The semantics are deliberately aligned with A2A's task model (same eight states, same interruptable transitions), which is the foundation for MCP-A2A composition we use heavily in Agent Builder.
Routing headers. Mcp-Method and Mcp-Name let load balancers route MCP traffic without inspecting JSON payloads. This is operationally enormous for anyone running an MCP server at scale — your nginx config can route tools/call name=search to a different fleet than resources/read.
Formal deprecation policy. Features now follow Active → Deprecated → Removed with a minimum 12-month window between stages. Roots, Sampling, and Logging are entering deprecation in 2026-07-28, replaced by tool-parameter conventions, direct LLM API integration, and stderr/OpenTelemetry respectively. The deprecation policy is the most important piece of governance discipline MCP has added; it tells operators they will have a year of notice before any feature their server depends on disappears.
The security model the spec ships, and what it leaves to you
MCP's security model is structured around four key principles spelled out in the spec.
User consent and control. Every data access, every tool invocation, every sampling request must be auditable and approvable by the user. The spec does not enforce this at the protocol level — that is the host's job — but it requires hosts to implement it.
Data privacy. Hosts must obtain explicit consent before exposing user data to servers. Hosts must not relay resource data elsewhere without consent. This is the explicit answer to the obvious question "if my agent can read my filesystem, what stops the agent's vendor from shipping its contents to their analytics pipeline." The protocol says: nothing technically stops them, but the consent model requires them to ask.
Tool safety. Tools are arbitrary code execution. The protocol explicitly states that tool descriptions and annotations are untrusted unless the server is trusted. A malicious server can lie about what its tool does in the description; the host has to assume the worst until it has reasons not to. The implication for operators is that you trust the server, not the description.
Sampling controls. The user must approve any sampling request. The user controls whether sampling occurs, the actual prompt that gets sent, and what the server gets to see of the result. The protocol intentionally limits server visibility into prompts to prevent a malicious server from harvesting them.
What the spec leaves to operators is everything below those principles. The spec tells you to require user consent; it does not tell you how to implement a good consent UX. The spec tells you to scope OAuth tokens; it does not stop your operator from granting repo:admin to a server that only needed repo:read. The spec tells you to treat tool descriptions as untrusted; it does not stop you from connecting to a server you found on GitHub yesterday without reading what it actually does.
The operator concerns the spec does not solve
An agent operator running MCP in production has to solve a small set of problems the spec deliberately leaves open. We have run into all of these on our own infrastructure and they are the unwritten knowledge that separates an operator whose fleet is durable from one whose fleet is a liability.
Secret management. An MCP server that talks to GitHub needs a GitHub token. An MCP server that talks to Stripe needs an API key. The temptation is to drop the secret into an environment variable and start the server. The right answer is a secret manager (Doppler, 1Password, AWS Secrets Manager, Vault) that the server reads at runtime, that rotates the credential periodically, and that records every read. Long-lived plaintext credentials on disk are the most common cause of MCP-related security incidents in 2026.
Scope minimisation. Every OAuth scope you grant is a potential blast radius. Servers asking for broad scopes ("admin," "all-repo," "read-and-write") deserve scrutiny. The discipline operators we work with adopt: grant the smallest scope that actually works, observe the server's behaviour for two weeks, only widen the scope if the operator hits a genuine limit. Reading the server's source code (most are open source) to verify the listed required scope is the load-bearing step.
Tool-call observability. Every MCP call your agent makes should be logged. The arguments, the result, the latency, the cost. An agent operator who cannot answer "what tools did my agent call on Tuesday between 14:00 and 15:00" is operating blind. The 2026-07-28 spec standardises W3C Trace Context propagation, which means modern observability stacks (Datadog, Honeycomb, OpenTelemetry collectors) can correlate MCP calls into the same traces as everything else your agent does. Use it.
Server identity and provenance. An MCP server is a piece of software running on a machine. Knowing who built it, what version it is, and that the binary you are running matches the source you reviewed is the basic supply-chain discipline that the protocol does not enforce. Sign your server containers. Pin the version in your host configuration. Watch the upstream repository for unannounced changes. The most insidious MCP attack vector is a server that was benign at the time you reviewed it and that the maintainer (or whoever compromised the maintainer's account) ships an update on.
Connection-time consent vs runtime consent. A naive host asks the user "do you allow this server?" at connection time and then runs everything without further confirmation. A well-designed host asks once for read-only operations and asks every time for operations annotated as destructive or potentially expensive. Operators picking a host should care about this distinction; users picking an operator should care about which host they are operating under.
Sandbox and resource limits. An MCP server that can execute arbitrary code (a Python server, a shell-server, an arbitrary-tool server) should run in a sandbox with CPU, memory, and network limits. Firecracker microVMs (which we covered in our microVM post) are the right primitive. A misbehaving server that burns 100% CPU for an hour is a misbehaving server; a misbehaving server that escapes the sandbox is a compromised host. The difference is the limits you set up before the bad day.
Composition with A2A and the rest of the stack
MCP is the vertical layer of the agentic stack — agent to tools. The horizontal layer, agent to agent, is A2A. The two compose in a clean pattern: an agent uses MCP to interact with its own tools and consumes A2A to delegate work to other agents.
Composition pattern in production:
┌──── Your Agent ────┐
│ │
│ ▲ A2A ◄──────────┼── another agent
│ │ │
│ │ MCP (tools) │
│ ▼ │
│ ┌──┐ ┌──┐ ┌──┐ │
│ │GH│ │DB│ │FS│ │ ← MCP servers
│ └──┘ └──┘ └──┘ │
└────────────────────┘
From inside the agent's perspective, MCP is how it reaches its own tools and resources; A2A is how it reaches other agents. PayPal's production deployment is the canonical example: a sales agent at the buyer end speaks A2A to PayPal's hosted invoice agent, which internally speaks MCP to the underlying PayPal billing systems. The buyer's agent never touches PayPal's MCP server — that boundary is owned by PayPal — but it can still delegate work into PayPal's territory because both sides speak A2A.
The same composition with AP2 layers payments on top: an MCP tool call that costs money emits a Cart Mandate; the user (or the user's delegated authorization) signs it; the Payment Mandate flows through x402 or a card rail; the result of the tool call lands. The composition reads bottom-up: x402/AP2 for payment, MCP for the actual work, A2A for the cross-agent coordination, ERC-8004 for the identity that lets all of them trust each other.
How Agent Builder wires MCP for you
The honest acknowledgement: most operators do not want to implement MCP. They want their agent to have GitHub access without writing a server, Postgres access without configuring OAuth, Notion access without managing tokens. Agent Builder is what makes that possible.
Three concrete things Agent Builder does on top of MCP.
The catalog of pre-connected servers. Pick from the catalog — GitHub, Notion, Postgres, Stripe, Slack, Linear, Postgres, Google Drive, S3, web search, web browse, filesystem, shell, python — and a server is provisioned for your agent. The OAuth flow is initiated when you connect; the tokens are stored in the secret store; the scopes are minimised to what the workflow you described actually needs. You did not write any MCP code; you used the protocol.
Per-agent scope and rate caps. Each agent in your fleet gets its own MCP client with its own scope grants and its own daily-call budget. An agent that should only read your inbox cannot call send tools, even if the underlying server exposes them. An agent that misbehaves and starts looping on a tool hits the rate cap and stops before it drains anything. The cap is configurable per tool; reasonable defaults ship in the catalog entries.
Full observability surface. Every MCP call your agent makes shows up in the operator dashboard: which tool, which arguments, which result, how long it took, how much it cost. Failures are categorised. Cost over time is graphed. The 2026-07-28 W3C Trace Context propagation is enabled by default, so if you ship traces to your own backend, MCP traffic correlates with everything else.
BYO server. Operators who want to expose an internal API to their agents — a private CRM, a custom database, a proprietary scoring service — point Agent Builder at a self-hosted MCP server and it joins the catalog like any other. The same scope discipline, rate caps and observability apply. You write the server once; every agent in your fleet can use it.
Practical advice for the operator reading this on Monday
The shortest version of "how to start with MCP" we can fit in a paragraph: install one MCP-capable host (Claude Desktop is free, Cursor is the developer-friendly one, Agent Builder is the operator-friendly one); connect three servers (filesystem for local work, GitHub or Linear for project state, a search server for web context); make ten tool calls; read the transcripts; notice what the model got right and what it got wrong. By the end of a working morning you have the intuition for the protocol, and the intuition is what every more advanced topic in this post is built on.
The longer version is a sequence we have watched dozens of operators run through productively.
Read three servers' source code before you trust them. Not the README; the source. Look at what scopes the server asks for, what the tools actually do, what the resource URIs cover. This is the cheapest way to learn the shape of a real MCP server and the most reliable way to develop a feel for which third-party servers are worth installing.
Write one tiny server. Pick something you already automate with a shell script or a small Python program — a daily report, a CSV converter, an internal API call. Wrap it as an MCP server. The exercise of declaring an input schema, deciding what is a tool vs a resource, and writing the consent-appropriate annotations is the fastest path to understanding why the protocol made the choices it did.
Hook up observability before your first production agent. Token spend, tool-call rate, exception rate, latency p95. The five-minute version is Agent Builder's built-in dashboard; the production version is shipping the traces to whatever observability you already operate. The point is that you should not run an MCP-using agent in production without being able to answer "what did it do."
Adopt the deprecation policy in your own server. If you ship a server other agents will use, follow the same Active → Deprecated → Removed discipline the spec adopts. Your downstream consumers depend on a stable surface; breaking changes without notice are how server ecosystems die.
Treat security as a budget item, not an afterthought. Secret managers cost money; sandboxes cost compute; observability costs storage. They all cost less than the incident that they would have prevented. Operators serious about this work allocate two hours a week to reviewing logs and one hour a month to revisiting scope grants. The compounding return is enormous.
Closing
MCP is the protocol you would design if you sat down on a Monday and asked "what is the shape of the contract every LLM application would need with every tool, if we did this once and only once." Anthropic asked that question in late 2024; by mid-2026, every major platform, every framework, every operator-facing tool stack has converged on the answer. The 2025-11-25 specification is the stable surface most operators are running today; the 2026-07-28 release candidate is the rewrite that makes the protocol production-grade for scale (stateless transport, MCP Apps, Tasks as a formal extension, six OAuth hardening proposals).
For an operator reading this with a fleet of agents to run, the practical takeaways are short. Pick a host that takes consent seriously. Connect the smallest set of servers your agents actually need. Read the source of the servers you do not author. Hook observability up before you scale. Treat secrets, scopes and sandboxes as durable infrastructure, not as one-time setup.
For an operator reading this who has not started yet, the practical takeaway is even shorter. Install one host, connect three servers, make ten tool calls, read the transcripts. By Wednesday you will know more about how agents actually work than 90% of the people writing about them.
The spec is at modelcontextprotocol.io. The reference implementations are at github.com/modelcontextprotocol. Agent Builder's MCP catalog is at llm4agents.com. The protocol is the one piece of the agentic stack that an operator can adopt today, in production, with no regret-prone decisions left to make.