AI Agent Safety

Agent risk

How agents create execution risk

Deterministic outcomes before irreversible effects at the tool boundary.

// Transition

When recommendations become actions

Recommendation systems score and rank. Agentic systems issue actions: trades, deploys, tickets, exports, privilege changes. The execution layer introduces irreversible risk because external systems treat a successful call as commitment. That is why AI execution governance centers on authorization before commit, not on nicer wording in the model output.

// Production

Failure modes in production agent systems

Concrete patterns where speed meets consequence:

trading or treasury agents placing orders or transfers;
CI/CD bots merging, deploying, or rotating credentials;
support copilots mutating accounts, refunds, or entitlements;
infrastructure agents changing firewalls, DNS, scaling, or data stores.

In each case, the failure is not bad text; it is an unauthorized execution. See pre-execution authorization for the gate model.

// Limits

Why monitoring alone is not execution safety

Dashboards, logs, and anomaly detectors tell you what happened. They do not, by themselves, withhold a commit at the moment a tool issues an irreversible call. Execution safety requires a fail-closed authorization step before the execution surface accepts work, not only post-hoc review. Read fail-closed AI systems for default posture.

// Mitigations

Why sandboxing and rate limits are not enough

Sandboxes, throttles, and prompt guardrails reduce blast radius and slow abuse. They do not replace a deterministic PERMIT / DENY / SILENCE decision bound to the exact request context at commit time. Policy prompts drift; sandboxes leak when production credentials are reachable. Mitigation without an execution gate still allows almost-safe paths to become incidents.

// Pipeline

The agent execution pipeline

Stages on the hot path: signal frame (declared intent and state), policy evaluation, explicit PERMIT / DENY / SILENCE, optional execution surface on PERMIT only, then a verifiable receipt. The stack diagram at the top of this page is the canonical visual for agent execution risk.

// Surfaces

Irreversible execution surfaces

Classes of actions that need deterministic authorization include payments and treasury movements, production deployments, regulated data exports, and identity or permission changes. These are execution surfaces: once called, downstream systems assume commitment. Map your surfaces, then bind policy enforcement at the choke points.

// Posture

Fail-closed control for autonomous systems

Autonomous systems run without a human in the loop for every step. The safe default is to not execute when evaluation is incomplete, ambiguous, or unsafe. That posture is fail-closed execution governance, not fail until someone notices. See fail-closed AI systems for semantics and operating defaults.

// Evidence

Verifiable decisions and receipts

Execution decisions must be tamper-evident: auditors, counterparties, and internal risk teams need to prove what was authorized, under which policy version, for which request hash. TrigGuard issues signed receipts consumable by Verify; the wire format and verification rules live in the protocol overview and receipt sections.

// Placement

Where TrigGuard sits in the agent stack

Conceptual stack: application and UX, agent orchestration (planners, tool routers), the execution governance layer (TrigGuard evaluation and receipts), then execution surfaces (payments, cloud control planes, data planes). TrigGuard is not another model; it is the control plane in front of irreversible APIs.

// Related

Governance, routes, and products

AI execution governance · Pre-execution authorization · Policy enforcement engine · Deterministic authorization · Automated system governance · Protocol · Verify · Products · Industries

// Hub

Category pillar

Return to the cluster hub: AI execution governance. Visit the hub →