← Back to Blog
·9 min read·Ruakiel Team

HOW TO CONTAIN AI AGENT FAILURES IN PRODUCTION

AI agents fail differently from deterministic software. A malicious tool response, a poisoned retrieval result, or an adversarial prompt can cascade silently. Here is how Ruakiel bounds the blast radius before the failure occurs.

Agent SafetyContainmentProduction AI

WHEN AGENTS FAIL

AI agents fail differently from deterministic software. A traditional service that encounters invalid input returns an error. An agent that encounters a malicious tool response, a poisoned retrieval result, or an adversarial user prompt may continue executing — taking increasingly damaging actions while appearing to operate normally.

The question is not whether your agents will fail. It is how much damage they can cause before the failure is detected and stopped.

THREE FAILURE MODES WORTH TAKING SERIOUSLY

Most production agent failures fall into three categories:

  • Model hallucination. The model constructs a tool call that is syntactically valid but semantically wrong — querying the wrong record, writing to the wrong document, or sending a message to the wrong recipient. The system executes it without complaint because the call itself is well-formed.
  • Prompt injection. A user message, a tool result, or a retrieved document contains adversarial instructions. The model treats those instructions as authoritative and executes actions the user never intended. This is indirect when it comes from a tool response — the user did not write the injection.
  • Unconstrained execution. An agent with access to write and destructive operations can be manipulated into using them in contexts that should be strictly read-only. Without operation classification enforced at the infrastructure layer, the agent does not know it is crossing a severity threshold — and neither does your audit log.

All three share a common solution: architectural containment that operates independently of the model’s reasoning. The model cannot be reasoned with. The architecture can be enforced.

THE CONTAINMENT ARCHITECTURE

Containment requires that the blast radius of any failure is bounded before the failure occurs — not detected afterward. Ruakiel enforces this at four levels:

┌─────────────────────────────────────────────┐ │ 1. Operation Tier Classification │ │ READ_ONLY < WRITE < DESTRUCTIVE │ │ Context type determines available tier │ ├─────────────────────────────────────────────┤ │ 2. Per-Objective Checkpointing │ │ Failure replans — it does not cascade │ ├─────────────────────────────────────────────┤ │ 3. Human-in-the-Loop Gates │ │ High-impact operations require approval │ ├─────────────────────────────────────────────┤ │ 4. Per-Tenant Wall-Clock Timeout │ │ Unbounded runs are structurally blocked │ └─────────────────────────────────────────────┘

Tier classification is the first gate. Tools are classified as read-only, write, or destructive. When an agent is operating in a read-only context — handling a lookup, a summary, or a direct answer — only read-only tools exist from the model’s perspective. There is no write tool to be manipulated into using. The destructive tier is not restricted; it is absent.

Per-objective checkpointing means that when one step in a multi-step plan fails, the failure is scoped to that objective. A replanning step evaluates the failure and updates the remaining plan without re-executing destructive operations that already completed. The execution history is preserved, and results marked critical are retained even when older context is compressed to manage the model’s context window.

Human-in-the-loop gates are not an optional overlay — they are a structural step in the orchestration pipeline. An operation that exceeds a configured severity threshold is paused until explicit human confirmation arrives. Without approval, the plan terminates cleanly. The agent cannot proceed around the gate.

Wall-clock timeouts enforce a per-tenant execution budget on every orchestration cycle. A run that has not completed within the configured limit is terminated, the elapsed time is logged against the budget, and a 504 is returned. An LLM loop cannot consume unbounded compute at a tenant’s expense — or at any other tenant’s expense.

WHY IMMUTABLE STATE MATTERS

Containment depends on reasoning about what happened and when. If orchestration state is mutable — if nodes can overwrite each other’s results, or if state is shared by reference across graph boundaries — you lose the ability to reconstruct the execution sequence reliably after a failure.

Ruakiel’s orchestration state is a frozen model. Every node receives the current state and returns an updated copy. No node mutates state in place. Checkpoints are always consistent, and failures can be diagnosed from the state at the moment of failure — not from a partially mutated object somewhere downstream.

Production AI safety is not a property of the model. It is a property of the architecture surrounding the model. The model will do what the model does. The architecture determines whether that matters.