Context Handoff Contracts: A Practical Pattern for Preventing Multi-Agent Context Explosion
The Hidden Scaling Failure
You have two agents.
A ResearchAgent searches customer records, pulls pricing, reads support history. After fifteen tool calls, three LLM responses, and a brainstorming session, it hands off to a ComplianceReviewAgent.
Here’s what the compliance agent inherits:
- 22KB of raw transcript
- Brainstorming notes (“What if we just skip the compliance check?”)
- Rejected assumptions (“Pricing is the only factor”)
- Twelve sessions of unrelated user history
- Forty-five stale tool outputs from thirty minutes ago
- An internal debate log about budget approval
The compliance agent spends 40% of its context window on junk. It reads the brainstorming note about skipping compliance and gets confused. Does that mean the team already decided to bypass policy? It finds a stale pricing quote and recommends an approval threshold that expired last quarter.
This is not a model problem. It’s a handoff problem.
Most multi-agent architectures fail not because the LLM is weak, but because each agent inherits too much irrelevant state from the one before it. The pattern looks clean on a slide. Box A passes to Box B. In practice, Box B gets everything Box A ever touched.
And that breaks things.
The Anti-Pattern: Full Transcript Inheritance
“Just pass the whole conversation history” is the default. It’s also the wrong default.
Here’s what happens when every agent gets everything.
Token cost grows linearly with pipeline depth. If agent 1 generates 10K tokens of context and agent 2 generates another 8K, agent 3 sees 18K tokens before it does anything. By agent 5, you’re at 50K+ tokens of context, most of it irrelevant. You pay for all of it.
Latency compounds. More tokens means longer input processing. Each agent takes longer to parse what it received. In a chain of six agents, that adds seconds per run.
Role confusion. A research agent thinks freely. It explores bad ideas, writes messy notes, contradicts itself. That’s fine for research. But when a compliance agent reads “What if we skip compliance?” it doesn’t know that note was rejected two rounds ago. It treats it as live context.
Privacy leakage. User history that was relevant to the first agent is not relevant to the fifth. But it’s in the transcript. The fifth agent can surface information it had no business seeing.
Action attribution problems. When something goes wrong, you can’t tell which agent acted on which piece of context. Was the bad decision based on a stale fact from agent 2 or a brainstorming hallucination from agent 3? Good luck tracing that.
Google’s ADK (Agent Development Kit) explicitly addresses this. It separates durable session state from working context. It supports scoped handoffs where the calling agent decides what the next agent sees. The guidance is clear: scope what sub-agents see, and build context through explicit processors rather than ad-hoc prompt concatenation.
Anthropic’s research on context agrees. Context is finite. It’s central to how agents behave. Every token of irrelevant context is a token you can’t use for reasoning.
The fix is not to make context bigger. The fix is to be precise about what crosses the boundary.
Introduce the Context Handoff Contract
A handoff contract is a typed payload. It replaces “everything from the last agent” with “only what the next agent needs.”
The contract is not a prompt. It’s not a system message. It’s a structured data object that both agents (and the system) agree on. It defines:
- What the receiving agent is supposed to do
- What facts it’s allowed to see
- What context was deliberately excluded
- Where each fact came from
- How confident the system is in each fact
- Which tools it may (and may not) call
- What shape the output should take
- When the contract expires
The receiving agent does not see raw history. It sees a contract. The contract is generated by a context compiler—not written by hand inside prompts.
This is not a new idea in software engineering. It’s the same principle as API contracts between services. You don’t give a downstream service your full internal state. You give it a typed request with the fields it needs. Agents should work the same way.
Contract Fields
Let me walk through each field. The strongest part of this pattern is the schema itself.
{
"handoffId": "risk-review-2026-001",
"receivingAgent": "ComplianceReviewAgent",
"task": "Evaluate whether the proposed outbound email violates internal policy.",
"facts": [
{
"claim": "The recipient is an existing enterprise customer.",
"source": {
"source": "crm.lookup_customer",
"timestamp": "2026-06-27T09:20:00Z",
"method": "api_call"
},
"confidence": "high",
"eventId": "evt-9a3b2c"
},
{
"claim": "The customer signed the DPA in 2024.",
"source": {
"source": "contracts.search",
"timestamp": "2026-06-27T09:21:00Z",
"method": "tool_call:contracts.search"
},
"confidence": "high",
"eventId": "evt-7d8e1f"
}
],
"excludedContext": [
"type:brainstorm",
"type:internal_note",
"stale:crm.lookup_pricing"
],
"allowedTools": ["policy.search", "crm.read_customer_status"],
"forbiddenTools": ["email.send", "crm.update_customer"],
"outputSchema": {
"schemaName": "ComplianceDecisionV1",
"version": "1.2"
},
"expiresAfterMinutes": 30,
"createdAt": "2026-06-27T09:25:00Z",
"traceId": "trace-risk-review-2026-001"
}
handoffId. A unique identifier for this specific handoff. Used in logs, traces, and audit records. Every action the receiving agent takes should include this ID.
receivingAgent. The target agent name. The system uses this to route the contract and check that the agent’s capabilities match the requested task.
task. The single, precise task. Keep it under 500 characters. If you need more than that, the task is probably too broad. Split it.
facts. An array of factual claims. Each fact has four parts:
- claim: The actual information.
- source: Where it came from. System, timestamp, method. This is provenance. Without it, the receiving agent can’t judge reliability.
- confidence: high, medium, low, or unverified. A high-confidence fact came from a deterministic tool call that succeeded. Medium came from an LLM. Low or unverified came from user input or fallback logic.
- eventId: Links back to the originating session event for traceability.
excludedContext. A list of what was deliberately not passed. This is critical for audit. When someone asks “Why didn’t the compliance agent see the brainstorming notes?”, you have a record that they were explicitly excluded, not accidentally forgotten.
allowedTools and forbiddenTools. Explicit allow and deny lists. Never let the receiving agent discover tools through the transcript. Tell it what it can use and what it cannot. If you rely on the transcript for tool permissions, you’re one injection or one transcript formatting change away from a problem.
outputSchema. The expected shape of the agent’s output. Register schemas in a central registry. The contract references them by name and version.
expiresAfterMinutes. Contracts are time-bound. A thirty-minute expiry means stale facts trigger a re-compilation instead of being used as-is. This forces fresh context on long-running workflows.
traceId. Every contract carries a trace ID that connects it to the parent workflow, originating user request, and downstream actions.
Where the Contract Lives
The contract should be generated by a context compiler, not written by hand inside prompts.
A context compiler sits between agents. It receives session events (tool calls, LLM responses, user messages, internal notes) and produces a HandoffContract. The compiler:
- Filters events that are excluded by type (brainstorm, internal_note)
- Removes events explicitly marked as
excludeFromHandoff - Rejects stale tool results based on age
- Extracts factual claims from remaining events with appropriate confidence levels
- Sets provenance metadata on every fact
- Collects excluded-context labels for the audit trail
- Builds the tool allowlist from the tools that were actually used
- Validates the contract before handoff
This is close to what Google recommends: use named, ordered context processors that build context from sources, not from raw history concatenation. The compiler is a context processor. It takes messy session state and produces a structured, minimal payload.
The compiler does not need to be an LLM call. It’s a deterministic function. Take events in, apply rules, output contract. This keeps it testable and cheap.
The full schema, compiler, and validator are in the sample code repository.
Validation and Observability
Before a contract is handed to an agent, validate it.
Required fields check. handoffId, receivingAgent, task, traceId must all be present and non-empty. No defaults for these — if they’re missing, something is wrong upstream.
Source provenance. Every fact must have a source. If a fact has no source, it cannot be trusted. The receiving agent should treat unsourced facts as low confidence or ignore them entirely.
No forbidden context. Check that excluded context labels match what was actually filtered. If you see something that should have been excluded but wasn’t, reject the contract.
Token budget. Estimate the token cost of the contract (task + all facts). If it exceeds the agent’s budget, either trim facts or split the work.
Tool permission alignment. No tool should appear in both allowedTools and forbiddenTools. That’s a configuration error.
Output schema registration. If the contract references an output schema, confirm it exists in the registry. A reference to a schema that doesn’t exist means the agent’s output won’t be parsable downstream.
Trace ID propagation. The trace ID must be present and linked to the parent workflow. Without it, you lose the causal chain.
The validator returns errors and warnings. Errors prevent handoff. Warnings let it proceed but flag potential issues (no facts, no allowed tools, expired contract).
Log every validation result. Every error, every warning, every successful handoff. This is your observability layer. When something goes wrong, you can look at the validation log and see exactly what the contract looked like and why it was accepted or rejected.
Failure Modes
The pattern helps, but it’s not foolproof. Here are the failure modes I’ve seen.
Contract drift. Teams update the schema on the sending side but not on the receiving side. The receiving agent expects a field that doesn’t exist, or ignores a field that was renamed. Version your contract schema. Run compatibility tests.
Stale facts. The compiler marks a fact as fresh because the session event was recent, but the underlying data has already changed. The contract is valid but the world moved on. Short expiry windows help. So does timestamping every fact with when it was retrieved, not when it was generated.
Accidental over-redaction. The compiler excludes too much. The receiving agent gets a contract with no facts and one vague task statement. It has nothing to work with. Monitor contracts with zero facts. They’re usually a bug.
Provenance loss. The source system goes down or changes its API. The compiler can’t attach provenance. Facts end up with source: "unknown". Set a policy: if confidence is low and provenance is missing, don’t hand off. Fail fast.
Agent role confusion. The contract says “ComplianceReviewAgent” but the runtime routes it to a different agent with similar capabilities. The receiving agent executes the wrong task with the wrong tools. Tie receivingAgent to a registry of agent capabilities and validate before routing.
Implementation Checklist
Use this when adding handoff contracts to your multi-agent system.
Schema and compiler:
- Define the HandoffContract schema with all required fields (Zod, Pydantic, or equivalent)
- Build a context compiler that takes session events and produces a contract
- Set excluded event types (brainstorm, internal_note, rejected assumptions)
- Implement staleness rules per tool result type
Validation:
- Validate contracts before handoff: required fields, source provenance, no forbidden context, token budget, tool alignment, output schema, trace ID
- Log every validation result (errors, warnings, accept)
- Fail handoff on validation errors
Observability:
- Propagate trace ID through every contract
- Log excluded context labels for audit
- Monitor contracts with zero facts or zero allowed tools
Operations:
- Version the contract schema
- Set default expiry (30 minutes is a good start)
- Register output schemas in a central registry
- Tie receivingAgent names to a capability registry
- Run compatibility tests when schema changes
Testing:
- Prove sensitive events are excluded from contracts
- Prove stale facts are rejected
- Prove valid contracts pass validation
- Prove the compile → validate → handoff pipeline works end to end
The Takeaway
Context is the most precious resource in an agent system. Every token of irrelevant context is a token the model can’t use for reasoning. It’s also a token you pay for, a token that increases latency, and a token that can confuse downstream agents.
Handoff contracts fix this by making context transfer explicit, typed, and auditable. The receiving agent gets only what it needs. The excluded context is documented. Every fact has provenance. Every contract has an expiration. Every handoff is traceable.
This is not a theoretical pattern. The sample code includes a working compiler, validator, and test suite. Clone it, adapt it to your agent framework, and start scoping your handoffs.
Your compliance agent will thank you.
Code Samples
The GitHub repository contains a complete TypeScript project with:
- Typed handoff schema — Zod models for
HandoffContract,Fact,SourceProvenance, andSessionEvent - Context compiler — Converts raw session events into a minimal handoff contract, filtering out brainstorming notes, stale tool results, and sensitive decisions
- Contract validator — Pre-handoff checks: required fields, source provenance, token budget, tool permission alignment, expiry
- Unit tests — Proves that sensitive events are excluded, stale facts are rejected, and the compile → validate pipeline works end to end
Run it locally:
cd context-handoff-contracts
npm install
npm test
Discussion
Loading comments...