Most AI agent security writing jumps straight to specific tactics: prompt hardening, content filtering, retrieval sandboxing. Those are useful, but they are leaf concerns. The architectural question - where does each security concern live, who owns it, what fails when it fails - is the one teams actually need to answer before they can discuss tactics usefully.
Key concepts
This post is a reference architecture for AI agent runtime security, oriented around the actuation boundary. It covers six components, their responsibilities, their failure modes, and the two deployment shapes that hold up in production. The goal is a picture of the pieces that is specific enough to compare against your own system and see what is missing.
For the conceptual foundation see runtime authorization for AI agents and the TrigGuard architecture page. This post is the decomposition view.
The component set
A production AI agent has at least six components involved in the actuation path. In many systems they are fused into one or two processes; calling them out separately clarifies the responsibility boundaries:
- Planner / agent runtime. The code that manages the model, the plan state, and the emission of tool-call intents.
- Tool broker. The code that presents a typed tool API to the planner, routes calls to implementations, and manages tool lifecycle.
- Authorization gate. The control point that decides whether each intent may dispatch.
- Policy engine. The evaluator used by the gate to compute decisions.
- Receipt store. The append-only record of authorization decisions, signed and retrievable.
- Verification surface. The independent service that validates receipts on demand.
Each of these has a narrow responsibility and a failure mode that stays inside that responsibility if the architecture is clean. Confusion about which component does what is the most common cause of production defects.
Planner and agent runtime
The planner owns plan state, model invocation, and intent generation. Its inputs are user goals, retrieved context, and previous tool-call outcomes. Its outputs are tool-call intents that the tool broker dispatches.
Responsibilities
- Manage the plan DAG or loop.
- Call the model with the right prompt and context.
- Emit structured tool-call intents, not raw strings.
- Handle tool-call outcomes and decide next steps.
- Terminate cleanly when the plan is done or the budget is exhausted.
Explicit non-responsibilities
- Does not decide whether a tool call is allowed. That is the gate.
- Does not produce evidence. That is the receipt store.
- Does not dispatch tool calls directly. That is the tool broker.
Common failure mode
Planners that short-circuit the broker by calling tool implementations directly break the entire control model. This is the defect in many "agentic" library integrations. Fix: make the broker the only caller of tool implementations, full stop.
Tool broker
The tool broker is the typed interface the planner sees. It registers tools, exposes their schemas, validates arguments, and dispatches calls. It is the thin layer between the agent's intent and the authorization gate.
Responsibilities
- Maintain a registry of tools with schemas.
- Validate intent shape and argument types.
- Submit each validated intent to the authorization gate.
- Dispatch to the tool implementation only on
PERMIT. - Return the tool's result, or a structured denial, to the planner.
Explicit non-responsibilities
- Does not make security decisions. That is the gate.
- Does not log evidence. That is the receipt store (via the gate).
Common failure mode
Tool brokers that implement a "fallback mode" on gate failures - dispatching the call anyway - destroy the fail-closed property of the whole system. The only acceptable broker behavior on a gate failure is to return a denial to the planner. See fail-closed AI systems for the property this enforces.
Authorization gate
The gate is the control point. Every intent that the broker submits goes to the gate. The gate validates the request, assembles the evaluation context, invokes the policy engine, maps the result to one of PERMIT, DENY, SILENCE, and emits a signed receipt.
Responsibilities
- Validate the request against the pinned request shape.
- Assemble the full evaluation input: identity, surface, action, target, context.
- Pin the policy version.
- Invoke the policy engine.
- Map results to the three-valued decision enum.
- Enforce timeout-as-denial.
- Produce a signed receipt for every decision.
- Return the decision and receipt to the broker.
Explicit non-responsibilities
- Does not evaluate policies. That is the policy engine.
- Does not dispatch actions. That is the broker.
- Does not store receipts long-term. That is the receipt store.
Common failure mode
Gates that emit a decision without a receipt, or with a receipt that does not cover the full request, create evidence gaps. Every decision must have a complete receipt. "Best-effort receipt issuance" is not a mode; it is a bug.
Policy engine
The policy engine is the evaluator. It takes a structured input and a policy bundle and returns a decision. It is a computation, not a system. OPA, Cedar, and AWS IAM policy evaluators are all examples.
Responsibilities
- Evaluate the provided policy bundle against the provided input.
- Return a result that the gate can map to the decision enum.
- Be deterministic: same input, same bundle, same result.
Explicit non-responsibilities
- Does not know what
PERMIT,DENY,SILENCEmean to the rest of the system. It returns a value; the gate interprets it. - Does not produce receipts.
- Does not know about actions, side effects, or agents.
Common failure mode
Using the engine's native output semantics directly in external systems creates coupling between the engine's conventions and the broader authorization contract. Always map the engine's output to the system's decision enum at the gate layer, so you can swap engines without changing the contract.
Receipt store
The receipt store is the append-only record. Every signed receipt the gate produces is appended. Receipts are retrievable by request ID, by actor, by surface, by time range. The store is the single source of truth for what decisions were made.
Responsibilities
- Accept signed receipts and append them durably.
- Provide retrieval by the relevant indices.
- Preserve content integrity: a receipt, once stored, is not mutated.
- Retain for the defined audit window (often years for regulated industries).
Explicit non-responsibilities
- Does not verify signatures. That is the verifier.
- Does not decide access. The retrieval API has its own authorization, separate from the gate's.
- Does not summarize. Summaries are derived views, not authoritative records.
Common failure mode
Using a mutable database for receipts. The moment a receipt can be updated in place, the evidence chain is broken. The store must be append-only, or must be backed by a content-addressed immutable backend.
Verification surface
The verifier is an independent service or library that takes a receipt and validates it. It checks the signature against the authority's public key, confirms the content hash matches the receipt body, and optionally checks that the receipt's policy version reference corresponds to an archived bundle.
Responsibilities
- Validate signatures using the authority's public key.
- Confirm content hashes.
- Optionally re-evaluate the request against the archived policy and compare to the stored outcome (this is the replay test).
- Provide a simple API or CLI so second-line risk, audit, and external parties can use it without privileged access.
Explicit non-responsibilities
- Does not issue decisions. That is the gate.
- Does not write to the receipt store.
- Does not need to see the full receipt store. It only needs the specific receipts being checked.
Common failure mode
Building the verifier into the gate as a shared library. This works, but it defeats the independence property - the thing verifying the decisions is the thing that made them. The verifier should be a separate deployable, or at least a separately-trusted library that downstream parties can run independently. See verify for the TrigGuard verifier surface.
Deployment shapes
Two deployment shapes account for essentially all production AI agent authorization topologies.
Sidecar gate
Each agent process is colocated with an authorization sidecar. The broker calls the sidecar locally over gRPC or HTTP. The sidecar embeds the policy engine and a cached policy bundle. The sidecar streams receipts to the receipt store asynchronously.
Latency: low single-digit milliseconds per decision.
Failure isolation: a single agent's sidecar failure affects only that agent.
Suited for: high-throughput workloads, agents deployed across many processes or nodes, service mesh users familiar with the sidecar pattern.
Central decision service
A dedicated authorization service handles decisions for many agents. Agents call the service over the network. The service embeds the policy engine and serves receipts to the store synchronously or via a tail-writer.
Latency: tens of milliseconds per decision.
Failure isolation: the service is a shared dependency; an outage affects all agents.
Suited for: cross-agent coordination (global rate limits, cluster-wide risk signals), smaller-scale deployments where sidecar overhead is not justified, organizations that prefer centralized control planes.
Both shapes use the same six components and enforce the same decision contract. The choice is about locality, availability, and cross-agent visibility, not about the architecture.
End-to-end trace of a single decision
Putting the pieces together, here is what happens when an agent emits a tool call:
1. Planner emits intent: { "tool": "payments.transfer", "args": { ... } }
2. Broker validates the intent against the registered tool schema
3. Broker constructs an authorization request: { actor, surface, action, target, context }
4. Broker submits the request to the gate (sidecar or central)
5. Gate pins policy version v82
6. Gate invokes policy engine with request + bundle v82
7. Engine returns "permit" with rule identifier
8. Gate wraps in { outcome: "PERMIT", policy_version: "v82", rule: "payments.transfer.self_service", receipt: { ... signed } }
9. Gate appends receipt to receipt store
10. Gate returns decision to broker
11. Broker dispatches to the tool implementation
12. Tool result flows back through broker to planner
13. Planner emits next intent or terminates
Thirteen steps, each owned by one component, each with a crisp responsibility. That is what a clean architecture looks like on the actuation path.
Failure modes across the architecture
The clean mapping above makes failure analysis tractable. For each component, the question is: if this component fails or is compromised, what is the blast radius? The answers are:
- Planner fails. Agent stalls. No actions commit. No compromise of control.
- Broker fails. Tool dispatch halts. No actions commit. No compromise of control.
- Gate fails. All decisions are denied by fail-closed default. No actions commit. Operational incident, but not a control incident.
- Policy engine fails (inside gate). Gate returns
SILENCE. No actions commit. - Receipt store fails for writes. Gate stalls decisions (receipts must commit before
PERMITreturns in strict mode) or degrades to batch retry. In degraded mode,PERMITis held until the receipt commits. - Verifier fails. Decisions continue to be made and logged; only the downstream check is offline. No effect on live control.
Note the pattern: every failure mode halts actions or has no effect on the control. There is no failure mode that opens actions. That is the architectural property. It is not free - it is achieved by the fail-closed defaults at every component boundary - but it is the property that justifies the design.
Frequently asked questions
Is this overbuilt for a small team?
For an early product with reversible actions and low stakes, yes, this is too much architecture. The components collapse: planner and broker can be the same library, gate and policy engine can be a single process, receipt store can be a file, verifier can be a CLI. The components' responsibilities are still there, just fused. The architecture becomes relevant as soon as you have irreversible actions in the loop.
Can existing service mesh tooling serve as the gate?
Partially. Istio, Linkerd, and similar meshes can enforce request-level policies, and for internal service-to-service calls they are a reasonable starting point. They are not sufficient for the full agent actuation model, because (a) their policy semantics do not natively support PERMIT/DENY/SILENCE, (b) they do not produce signed receipts, and (c) many agent actions go to surfaces the mesh does not front.
What if we add a new tool?
New tools become new surfaces. The surface is added to the registry, a policy is written for it, receipts for the surface are retained alongside the rest. The architecture does not need to change; the artifacts (registry, policy bundle) do.
Next step
For the conceptual foundation see runtime authorization for AI agents. For the deployment-shape detail see architecture. For the decision contract see pre-execution authorization and deterministic authorization.
Related architecture
Next step
Compare your runtime security decomposition against a durable reference architecture with engineering.