What an agent fleet actually costs: real numbers for one, ten, and thirty agents
We have said "inference is cheap" at least eight times across this blog without ever putting numbers on the page. The tutorial post mentioned Mariana's net of $575/month at week eight but did not break it down. The layoff editorial argued the economics work without showing the spreadsheet. This is the spreadsheet. Real mid-2026 prices for every layer of an agent fleet, three concrete budgets at three different scales, and the four cost anti-patterns that compound invisibly until the bill arrives. If you are deciding whether to commit a quarter of your life to running agents, this is the post you read before the commit.
Pricing by layer, mid-2026
The agent fleet has roughly six cost categories. Before assembling any budget you need defensible numbers for each one. Below are the prices we are seeing across customers as of June 2026. Your mileage will vary by provider and volume tier; these are realistic central values, not promotional minimums.
Model inference, per million tokens.
Frontier / heavy reasoning tier:
Claude Opus 4.5 $15 in / $75 out
GPT-5.4 $12 in / $60 out
Gemini 3.1 Pro $10 in / $50 out
Mid tier:
Claude Sonnet 4.6 $3 in / $15 out
GPT-5.4 Mini $2 in / $10 out
Gemini 3.1 Flash $1.5 in / $7.5 out
Cheap / routine tier:
Claude Haiku 4.5 $0.80 in / $4 out
GPT-5.4 Nano $0.50 in / $2.5 out
Gemini 3.1 Nano $0.40 in / $2 out
A frontier model costs roughly 20x what the cheapest tier costs. The Project Deal finding matters here: in routine work, the cheap tier is good enough that the operator should be defaulting to it. In adversarial work (negotiation, contract review, fraud screening), the frontier model's premium pays for itself many times over. Routing well is worth two-to-five times in margin.
Infrastructure and tooling, monthly fixed.
Per-agent isolated execution:
Firecracker microVM, bundled ~$0 (included)
Self-hosted equivalent ~$5/mo per agent
Vector store (RAG corpus):
Pinecone serverless $25-100/mo for < 1M vectors
Weaviate Cloud $25-150/mo
Self-hosted Qdrant $10-30/mo (infra only)
Secrets management:
Doppler Team $15-50/mo
1Password Business $8 per user
Cloud KMS (AWS / GCP) $1-10/mo
Observability:
Honeycomb free tier $0 (up to 20M events/mo)
Datadog APM $15-31 per host/mo
Built-in dashboard $0 (included)
Tool marketplaces (third-party MCP servers). Most MCP servers are free open source. A growing fraction of premium servers charge: enriched LinkedIn data ($50-200/mo), specialised legal corpora ($100-500/mo), high-volume web scraping ($30-300/mo), market-data APIs ($50-2000/mo depending on the dataset). Budget around $30-200/mo per agent that consumes premium tools.
Settlement and on-chain.
x402 settlement fees per transaction:
Base mainnet $0.001 - 0.005
Solana $0.0001 - 0.001
Polygon $0.005 - 0.02
ERC-8004 attestations per write:
Identity / Reputation update $0.05 - 0.20 on L2
Validation receipt $0.10 - 0.30 on L2
AP2 via card rail (PSP fees):
Standard processing 2.9% + $0.30 per tx
High volume / enterprise ~2.4% + $0.10 per tx
For an agent doing 10,000 settlements a month on x402 / Base, settlement fees come out to $10-50 — negligible. For the same volume on a card rail with $20 average ticket, the PSP fees are $5,000-6,000 — material. The choice of settlement rail is one of the largest cost-structure decisions an operator makes.
Step economics — the routing math
Before assembling the budget tables, one more building block: how much does a single agent step actually cost? It depends on what kind of step it is, and the variance is enormous.
Typical cost per step, by category:
Step Tier Tokens (in/out) Cost per step
──────────────────────────────────────────────────────────────────────
RAG retrieval + summary Haiku 2000 / 300 $0.0028
Classification Haiku 500 / 50 $0.0006
Tool selection + planning Sonnet 3000 / 500 $0.0165
Multi-step reasoning Sonnet 5000 / 1500 $0.0375
Code generation Sonnet 4000 / 2000 $0.042
Final synthesis (report) Opus 8000 / 3000 $0.345
Negotiation step Opus 4000 / 800 $0.120
Contract review Opus 15000 / 2000 $0.375
The orchestration patterns from our orchestration post determine which steps run how often. A supervisor-worker pattern with five workers running classification (Haiku) and one supervisor running synthesis (Opus) costs ~$0.35 per request — almost all of the cost is in the supervisor. The cheap workers are essentially free. An ill-designed pattern that runs Opus on every worker turns the same request into $1.70 — five times more for the same output.
Budget 1: Solo operator, months 1-3
This is Mariana from the tutorial post, modeled at month three with eight paying customers at $80/month for an inbox-triage service. Conservative assumptions throughout.
Solo operator — month 3 — inbox triage @ $80/customer
REVENUE
8 customers x $80 $640.00
VARIABLE COSTS
Model inference (mostly Haiku, weekly synthesis $32.00
in Sonnet, ~80K tokens/customer/month)
RAG embedding refreshes (Haiku) $4.00
Vector store (Pinecone serverless, 8 customers $25.00
x ~80K vectors)
Per-agent microVM time (bundled) $0.00
x402 fees (no settlements in this niche) $0.00
Stripe (8 invoices x 2.9% + $0.30 x $80) $20.96
──────────────────────────────────────────────────────
Subtotal variable $81.96
FIXED COSTS
Agent Builder, free tier $0.00
Doppler personal $7.00
Honeycomb free tier $0.00
Cloudflare Workers (small auxiliary functions) $5.00
Domain + email $2.00
──────────────────────────────────────────────────────
Subtotal fixed $14.00
TOTAL COSTS $95.96
NET $544.04
MARGIN 85.0%
The margin at this scale is misleading because the operator's time is not in the spreadsheet. If Mariana spends 20 hours a week running the operation (customer onboarding, transcript reading, prompt tuning, sales), her hourly rate is around $6.80/hour at month three. Real but not livable yet — which is why the trajectory matters more than the snapshot. By month eight at 25 customers, the same operating cost structure produces $1,940 net at the same 20 hours/week, or $24/hour. By month twelve, the customer base typically reaches 50 with some premium tier conversions, and the hourly rate clears $45/hour. The economics are real; they take a quarter to compound to a livable wage.
Budget 2: Small operation, 10 agents and 60 customers
This is the operator who is past the solo phase: ten distinct agents (each serving a sub-niche or providing supervisory functions), sixty customers across the agents, ARPU of $150/month. The operator has hired one part-time customer-success person ($1,500/month). Tools have grown to support multi-tenant operation.
Small operation — month 9 — 10 agents, 60 customers @ $150 avg
REVENUE
60 customers x $150 $9,000.00
VARIABLE COSTS
Model inference (mix of tiers, ~150K tok/cust) $580.00
Premium MCP servers (LinkedIn enriched data, $260.00
web scraping for 2 of the 10 agents)
Vector store (Pinecone, multi-tenant) $180.00
Document parsing services (OCR for AP agent) $90.00
x402 settlements (one of the agents bills $35.00
per-call via x402, ~15K txs/mo)
ERC-8004 attestations (reputation updates, $20.00
~100 validation receipts/mo)
Stripe fees $315.00
──────────────────────────────────────────────────────
Subtotal variable $1,480.00
FIXED COSTS
Agent Builder, Pro tier $99.00
Doppler Team $32.00
Honeycomb startup tier $130.00
Cloud infra (Workers, R2, etc.) $45.00
Domain, email, SSO $50.00
Part-time CS person $1,500.00
──────────────────────────────────────────────────────
Subtotal fixed $1,856.00
TOTAL COSTS $3,336.00
NET $5,664.00
MARGIN 62.9%
Three things deserve attention in this budget. First, the variable cost as a percentage of revenue stays under 20%, which is the hallmark of healthy SaaS unit economics. Second, the fixed cost is dominated by the part-time hire — the actual platform fees ($99/mo Pro tier, $32/mo Doppler) are noise. Third, the model inference budget supports ten agents serving sixty customers on $580/month. The "inference is cheap" claim has receipts.
Breakeven for this shape of business is around ARPU x customer count = $3,300, which is reached at 22 customers at the same $150 ARPU. The operator who hires the part-time CS person before reaching 30 customers is over-extended; the operator who delays the hire past 50 is leaving growth on the table.
Budget 3: Multi-fleet, 30+ agents and a real business
This is the operator who has grown into multiple fleets — a customer-facing fleet, an internal-research fleet, a monitoring fleet — with 30+ agents total, 200+ customers across multiple service lines, ARPU $300/month, and a team of four (operator + technical lead + two CS / sales).
Multi-fleet — year 2 — 30 agents, 200 customers @ $300 avg
REVENUE
200 customers x $300 $60,000.00
VARIABLE COSTS
Model inference (heavy Opus for negotiation $4,800.00
agents, Sonnet for analysis, Haiku for
routine — ~$24/customer/month)
Premium MCP servers + APIs $1,800.00
Vector store (enterprise tier) $850.00
x402 + ERC-8004 (high volume) $420.00
Document services $300.00
Stripe enterprise (negotiated 2.4%) $1,560.00
──────────────────────────────────────────────────────
Subtotal variable $9,730.00
FIXED COSTS
Agent Builder, Enterprise tier $499.00
Doppler Enterprise $180.00
Honeycomb Enterprise $750.00
Cloud infra $400.00
SOC2 audit (amortised monthly) $700.00
Operator salary (founder draw) $8,000.00
Technical lead $11,000.00
CS lead $6,500.00
Sales rep $5,500.00
Office + benefits + misc $2,400.00
──────────────────────────────────────────────────────
Subtotal fixed $35,929.00
TOTAL COSTS $45,659.00
NET $14,341.00
MARGIN 23.9%
The margin compresses at this scale because the operator is reinvesting in people. The same fleet run with the founder alone would net 50%+, but the operator's time is the binding constraint at this scale and the hires are the only way through. Variable cost remains under 17% — the unit economics are still healthy. The interesting line is "operator salary" — at this scale, the operator is finally drawing a real compensation, which the previous two budgets did not show because it does not exist yet.
This shape of business is where most operators we work with plateau or break through. Past 200 customers, the business either becomes a stable lifestyle business at the $60-100K MRR range (the operator's choice; many prefer this) or it raises capital and tries to grow into a real SaaS valuation. Both are valid; the third option — staying solo past 50 customers — is the one that burns people out.
The four cost anti-patterns
Operators do not usually fail because the unit economics are bad. They fail because four invisible cost paths compound past the point of correction. Each one has shown up in real customer post-mortems on our side.
Context bloat. The agent's working context grows linearly with the conversation, and frontier model token costs grow linearly with context. An agent designed without sliding-window memory or explicit context pruning will see a 5x cost increase between week one and week eight for the same user activity. Fix: cap context per request, summarize older turns, evict tool-call histories aggressively. The discipline from the memory post applies here.
Model over-sizing. The operator picks Opus because it works on the demo and never re-evaluates whether Sonnet or Haiku would work as well for the routine 80% of requests. The bill is 5-15x what it needs to be. Fix: every step in the agent's flow should be tagged with its tier requirement, and routing should default to the cheapest tier that meets the bar. See the routing table in the Project Deal post.
Tool-call loops. An agent that retries on failure without backoff, or that calls the same tool repeatedly because the result was ambiguous, can burn through a month's budget in a single afternoon. We have seen $4,000 in a four-hour incident from one mis-prompted retry loop. Fix: hard caps on tool-calls per request (3-10 depending on workload), explicit error handling for each tool, and a kill-switch in the dashboard that any operator can hit when costs spike.
Underestimated RAG corpus growth. Vector storage cost scales with corpus size, and per-customer corpora silently grow as the customer adds documents, mailboxes, or data sources. The pattern from Mariana's tutorial — one customer connected a second mailbox and tripled the corpus — is normative, not exceptional. Fix: hard caps per customer, explicit pricing tiers for corpus growth beyond a baseline, monitoring the corpus-size graph as carefully as the token-spend graph.
The costs the platform pricing page does not mention
Three categories of cost that no platform vendor lists on their pricing page but that every operator pays.
Operator time. The single largest cost line for the first eighteen months is the operator's own time. At 25 hours/week and a notional $50/hour opportunity cost, the operator is "spending" $5,000/month before any platform fee is charged. The economics of the solo budget above only work if you accept this opportunity cost willingly — i.e., you believe in the trajectory.
Compliance overhead. Working in regulated industries (healthcare, finance, EU users) carries fixed costs the SaaS pricing pages skip: legal review of contracts ($500-3,000 one-time), AI Act technical file preparation ($200-1,000 one-time per agent, see the EU AI Act post), SOC2 audit at the multi-fleet stage ($8,000-25,000 first year), data processing agreements with each customer (your time). Budget for these from month one if your niche touches regulated data.
Customer acquisition. The first ten customers in the solo budget cost nothing because they come from the operator's first-degree network. Customers eleven through fifty start to cost. Cold outreach at scale costs $100-500 per customer acquired; paid ads in the relevant niches cost more. The healthiest operators we see treat content (this blog format) as the long-term acquisition strategy and budget zero for paid ads for the first year. The unhealthy ones spend their margin on Google Ads chasing growth that never compounds.
The honest answer to "is this worth it"
The arithmetic above suggests three honest conclusions for operators considering this path.
Month one through three is a near-zero-margin training phase by design. The operator is paying their dues in customer-facing time. If you cannot float $500-1,000 of personal expenses for a quarter while the business reaches breakeven, this path is not the right path right now — the bridge financing is what makes the compounding possible.
Month four through twelve is when the math starts to make sense for the operator who picked a real niche and stayed focused. The solo budget reaching $5K-10K MRR is achievable in this window. Hitting that range requires saying no to twenty distractions for every one yes — niche discipline is itself an economic discipline at this stage.
Year two is the bifurcation point. Operators who stayed focused reach the small-operation budget and choose between staying lifestyle (60-80% margin, livable income, controlled hours) or scaling into the multi-fleet model with hires and lower per-dollar margin but higher absolute return. Both are real businesses; neither is wrong.
How Agent Builder maps to these line items
Three concrete things the platform pricing does and does not absorb.
What is included in the bundled fee. microVM execution per agent (no per-VM fees as long as you stay within the soft caps the tier specifies), the multi-agent operator dashboard, the W3C Trace Context observability stack with retention up to the tier limit, the catalog of pre-connected MCP servers, the eval-suite and canary deployment tooling, ERC-8004 identity minting, and the AP2 mandate signing infrastructure.
What you pay for separately. Model tokens at the model provider's published rates (we do not mark up, we pass through). Premium MCP server fees go directly to their providers. Vector store fees if you use a managed service (you can self-host if you want to absorb that cost). Settlement rail fees (x402 chain fees, AP2 PSP fees, Stripe). On-chain attestation costs.
Where the platform absorbs cost on your behalf. ERC-8004 reads (used heavily by the routing layer) are batched and absorbed up to your tier limit. Failed inference calls due to provider errors are credited back. Dashboard observability storage stays within the tier even if you spike. The intent is that the platform fee is predictable and operators do not have to model platform charges per request — that line should sit at "Agent Builder $X/month" and that is it.
Closing
The economics of running agents in 2026 are no longer a debate about whether the unit math works. The unit math works at three different scales for three different shapes of business. The real question is the one the spreadsheet does not answer: are you willing to invest the time and discipline through the early-quarter trough where the margin exists on paper but the operator is still under-compensated, in exchange for the compound margin that emerges from month four onward.
If you have read this far, the next move is to pull up your own spreadsheet and run the budget against the niche you have actually chosen from the niches post. Plug in the realistic ARPU for your chosen vertical, the realistic token spend per request, the realistic settlement profile. The number you arrive at is the number that matters. Ours are the central values; yours are the ones you have to defend in front of yourself before you commit.
The next post in this series leaves the operator-facing economics behind and zooms out to the ecosystem: the competitive map of which framework, which platform, which builder fits where. After you know what to build, who to serve, how much it costs, and how to ship in five days, the last question is which tools to use alongside us. See you there.