Governance

AI Agent Runtime Security Architecture

Most AI agent security writing jumps straight to specific tactics: prompt hardening, content filtering, retrieval sandboxing. Those are useful, but they are leaf concerns. The architectural question - where does each security concern live, who owns it, what fails when it fails - is the one teams actually need to answer before they can discuss tactics usefully.

Key concepts

This post is a reference architecture for AI agent runtime security, oriented around the actuation boundary. It covers six components, their responsibilities, their failure modes, and the two deployment shapes that hold up in production. The goal is a picture of the pieces that is specific enough to compare against your own system and see what is missing.

For the conceptual foundation see runtime authorization for AI agents and the TrigGuard architecture page. This post is the decomposition view.

The component set

A production AI agent has at least six components involved in the actuation path. In many systems they are fused into one or two processes; calling them out separately clarifies the responsibility boundaries:

Each of these has a narrow responsibility and a failure mode that stays inside that responsibility if the architecture is clean. Confusion about which component does what is the most common cause of production defects.

Planner and agent runtime

The planner owns plan state, model invocation, and intent generation. Its inputs are user goals, retrieved context, and previous tool-call outcomes. Its outputs are tool-call intents that the tool broker dispatches.

Responsibilities

Explicit non-responsibilities

Common failure mode

Planners that short-circuit the broker by calling tool implementations directly break the entire control model. This is the defect in many "agentic" library integrations. Fix: make the broker the only caller of tool implementations, full stop.

Tool broker

The tool broker is the typed interface the planner sees. It registers tools, exposes their schemas, validates arguments, and dispatches calls. It is the thin layer between the agent's intent and the authorization gate.

Responsibilities

Explicit non-responsibilities

Common failure mode

Tool brokers that implement a "fallback mode" on gate failures - dispatching the call anyway - destroy the fail-closed property of the whole system. The only acceptable broker behavior on a gate failure is to return a denial to the planner. See fail-closed AI systems for the property this enforces.

Authorization gate

The gate is the control point. Every intent that the broker submits goes to the gate. The gate validates the request, assembles the evaluation context, invokes the policy engine, maps the result to one of PERMIT, DENY, SILENCE, and emits a signed receipt.

Responsibilities

Explicit non-responsibilities

Common failure mode

Gates that emit a decision without a receipt, or with a receipt that does not cover the full request, create evidence gaps. Every decision must have a complete receipt. "Best-effort receipt issuance" is not a mode; it is a bug.

Policy engine

The policy engine is the evaluator. It takes a structured input and a policy bundle and returns a decision. It is a computation, not a system. OPA, Cedar, and AWS IAM policy evaluators are all examples.

Responsibilities

Explicit non-responsibilities

Common failure mode

Using the engine's native output semantics directly in external systems creates coupling between the engine's conventions and the broader authorization contract. Always map the engine's output to the system's decision enum at the gate layer, so you can swap engines without changing the contract.

Receipt store

The receipt store is the append-only record. Every signed receipt the gate produces is appended. Receipts are retrievable by request ID, by actor, by surface, by time range. The store is the single source of truth for what decisions were made.

Responsibilities

Explicit non-responsibilities

Common failure mode

Using a mutable database for receipts. The moment a receipt can be updated in place, the evidence chain is broken. The store must be append-only, or must be backed by a content-addressed immutable backend.

Verification surface

The verifier is an independent service or library that takes a receipt and validates it. It checks the signature against the authority's public key, confirms the content hash matches the receipt body, and optionally checks that the receipt's policy version reference corresponds to an archived bundle.

Responsibilities

Explicit non-responsibilities

Common failure mode

Building the verifier into the gate as a shared library. This works, but it defeats the independence property - the thing verifying the decisions is the thing that made them. The verifier should be a separate deployable, or at least a separately-trusted library that downstream parties can run independently. See verify for the TrigGuard verifier surface.

Deployment shapes

Two deployment shapes account for essentially all production AI agent authorization topologies.

Sidecar gate

Each agent process is colocated with an authorization sidecar. The broker calls the sidecar locally over gRPC or HTTP. The sidecar embeds the policy engine and a cached policy bundle. The sidecar streams receipts to the receipt store asynchronously.

Latency: low single-digit milliseconds per decision.

Failure isolation: a single agent's sidecar failure affects only that agent.

Suited for: high-throughput workloads, agents deployed across many processes or nodes, service mesh users familiar with the sidecar pattern.

Central decision service

A dedicated authorization service handles decisions for many agents. Agents call the service over the network. The service embeds the policy engine and serves receipts to the store synchronously or via a tail-writer.

Latency: tens of milliseconds per decision.

Failure isolation: the service is a shared dependency; an outage affects all agents.

Suited for: cross-agent coordination (global rate limits, cluster-wide risk signals), smaller-scale deployments where sidecar overhead is not justified, organizations that prefer centralized control planes.

Both shapes use the same six components and enforce the same decision contract. The choice is about locality, availability, and cross-agent visibility, not about the architecture.

End-to-end trace of a single decision

Putting the pieces together, here is what happens when an agent emits a tool call:

1. Planner emits intent: { "tool": "payments.transfer", "args": { ... } }
2. Broker validates the intent against the registered tool schema
3. Broker constructs an authorization request: { actor, surface, action, target, context }
4. Broker submits the request to the gate (sidecar or central)
5. Gate pins policy version v82
6. Gate invokes policy engine with request + bundle v82
7. Engine returns "permit" with rule identifier
8. Gate wraps in { outcome: "PERMIT", policy_version: "v82", rule: "payments.transfer.self_service", receipt: { ... signed } }
9. Gate appends receipt to receipt store
10. Gate returns decision to broker
11. Broker dispatches to the tool implementation
12. Tool result flows back through broker to planner
13. Planner emits next intent or terminates

Thirteen steps, each owned by one component, each with a crisp responsibility. That is what a clean architecture looks like on the actuation path.

Failure modes across the architecture

The clean mapping above makes failure analysis tractable. For each component, the question is: if this component fails or is compromised, what is the blast radius? The answers are:

Note the pattern: every failure mode halts actions or has no effect on the control. There is no failure mode that opens actions. That is the architectural property. It is not free - it is achieved by the fail-closed defaults at every component boundary - but it is the property that justifies the design.

Frequently asked questions

Is this overbuilt for a small team?

For an early product with reversible actions and low stakes, yes, this is too much architecture. The components collapse: planner and broker can be the same library, gate and policy engine can be a single process, receipt store can be a file, verifier can be a CLI. The components' responsibilities are still there, just fused. The architecture becomes relevant as soon as you have irreversible actions in the loop.

Can existing service mesh tooling serve as the gate?

Partially. Istio, Linkerd, and similar meshes can enforce request-level policies, and for internal service-to-service calls they are a reasonable starting point. They are not sufficient for the full agent actuation model, because (a) their policy semantics do not natively support PERMIT/DENY/SILENCE, (b) they do not produce signed receipts, and (c) many agent actions go to surfaces the mesh does not front.

What if we add a new tool?

New tools become new surfaces. The surface is added to the registry, a policy is written for it, receipts for the surface are retained alongside the rest. The architecture does not need to change; the artifacts (registry, policy bundle) do.

Next step

For the conceptual foundation see runtime authorization for AI agents. For the deployment-shape detail see architecture. For the decision contract see pre-execution authorization and deterministic authorization.

Next step

Compare your runtime security decomposition against a durable reference architecture with engineering.

Request a demo Review architecture Read protocol Documentation