返回文章列表
AI AgentsProduction EngineeringReliabilityObservabilityBest Practices
⚠️

Why AI Agents Fail in Production: 6 Patterns Every Engineer Must Know

Six concrete failure patterns that bring AI agents down in production, with working code examples showing how to detect, prevent, and recover from each one.

iBuidl Research2026-03-1613 min 阅读
TL;DR
  • Agents fail differently from APIs: The failure modes for agentic systems are fundamentally different from stateless API calls — loops, context corruption, and missing rollbacks create cascading failures that are hard to debug without observability.
  • The 6 patterns: Context pollution, tool call infinite loops, hallucinated function signatures, missing rollback mechanisms, no observability, and over-automation of high-stakes decisions.
  • Each has a fix: These are engineering problems with engineering solutions — circuit breakers, idempotency keys, structured logging, and human-in-the-loop gates all have clear implementation patterns.
  • Bottom line: Build agents like distributed systems engineers build services — defensive, observable, and with explicit failure budgets. The AI part is just another component that can misbehave.

Why Agent Failures Are Different

When a REST API fails, it fails fast and loudly. An HTTP 500 response, a stack trace, a clear timestamp. You know something went wrong, and you know when.

AI agents fail slowly, expensively, and quietly. A context that drifts over 40 turns. A tool that gets called 12 times in a loop before the bill arrives. A rollback that never happened because no one wrote the undo path. A decision that should have had a human review it, that didn't.

The teams hitting these problems in 2026 aren't inexperienced — they're experienced engineers who correctly understood how to build stateless services, but underestimated how different stateful, multi-step AI systems are. These are the six patterns killing agent deployments right now.


Failure Pattern 1 — Context Pollution

What it looks like

An agent handles customer support tickets. After 30 turns, it starts giving advice about a completely different customer's account. Or it answers question 47 using terminology from question 3 that no longer applies. The agent's outputs drift from correct to plausible-but-wrong over the course of a long session.

Why it happens

LLMs are stateless. The "memory" of a conversation is the context window — a raw concatenation of all previous messages. As conversations grow, earlier messages from different contexts, completed tasks, or retracted assumptions continue to influence outputs. The model can't distinguish "this instruction was superseded" from "this instruction is still active."

The fix

Implement context hygiene: summarize and prune on a sliding window, and explicitly mark completed task sections as closed.

from anthropic import Anthropic
from dataclasses import dataclass, field
from typing import Literal

@dataclass
class ManagedContext:
    messages: list[dict] = field(default_factory=list)
    system_prompt: str = ""
    max_messages: int = 20  # hard cap before summarization
    client: Anthropic = field(default_factory=Anthropic)

    def add_message(self, role: Literal["user", "assistant"], content: str):
        self.messages.append({"role": role, "content": content})
        if len(self.messages) > self.max_messages:
            self._summarize_and_prune()

    def _summarize_and_prune(self):
        # Summarize the oldest half of messages
        to_summarize = self.messages[: self.max_messages // 2]
        summary_prompt = (
            "Summarize the following conversation history into a concise paragraph "
            "capturing all decisions made, facts established, and open tasks. "
            "Be precise — this summary will replace the original messages.\n\n"
            + "\n".join(f"{m['role']}: {m['content']}" for m in to_summarize)
        )

        summary_response = self.client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=512,
            messages=[{"role": "user", "content": summary_prompt}],
        )
        summary_text = summary_response.content[0].text

        # Replace old messages with the summary as a system-level note
        summary_message = {
            "role": "user",
            "content": f"[CONTEXT SUMMARY — previous {len(to_summarize)} messages]: {summary_text}",
        }
        self.messages = [summary_message] + self.messages[self.max_messages // 2 :]
        print(f"Context pruned: {len(to_summarize)} messages → 1 summary block")

    def complete(self, user_message: str) -> str:
        self.add_message("user", user_message)
        response = self.client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            system=self.system_prompt,
            messages=self.messages,
        )
        reply = response.content[0].text
        self.add_message("assistant", reply)
        return reply
Explicit Task Closure

For multi-task agents, append a structured marker when a sub-task completes: [TASK: "order_lookup" STATUS: "COMPLETE" RESULT: "order #4421 shipped 2026-03-14"]. This gives the model a clear signal that prior context for that task is resolved and should not contaminate future tasks.


Failure Pattern 2 — Tool Call Infinite Loops

What it looks like

Your agent calls search_database, gets no results, calls it again with slightly different parameters, gets no results, calls it again... 200 times later your bill is $47 and the task never completed. Or more subtly: an agent calls write_file, verifies it with read_file, finds a discrepancy (due to its own imprecise verification logic), calls write_file again, ad infinitum.

Why it happens

Agents without loop detection will retry indefinitely when their success condition is never satisfied. The model doesn't have a built-in "I've tried enough" — it follows the instruction to "keep trying until the file is correctly written" literally.

The fix

Circuit breakers at the tool dispatcher level, not at the prompt level. Prompt instructions like "don't loop more than 5 times" are suggestions. Code-level limits are guarantees.

from collections import defaultdict
from functools import wraps
import time

class CircuitBreaker:
    def __init__(self, max_calls_per_tool: int = 5, window_seconds: int = 60):
        self.max_calls = max_calls_per_tool
        self.window = window_seconds
        self.call_log: dict[str, list[float]] = defaultdict(list)

    def check(self, tool_name: str) -> None:
        now = time.time()
        # Evict calls outside the window
        self.call_log[tool_name] = [
            t for t in self.call_log[tool_name] if now - t < self.window
        ]
        if len(self.call_log[tool_name]) >= self.max_calls:
            raise RuntimeError(
                f"Circuit breaker tripped: '{tool_name}' called "
                f"{len(self.call_log[tool_name])} times in {self.window}s. "
                "Stopping agent to prevent runaway execution."
            )
        self.call_log[tool_name].append(now)

breaker = CircuitBreaker(max_calls_per_tool=5, window_seconds=120)

def protected_tool(tool_name: str):
    """Decorator that wraps any tool function with circuit breaker protection."""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            breaker.check(tool_name)
            return func(*args, **kwargs)
        return wrapper
    return decorator

@protected_tool("search_database")
def search_database(query: str) -> list[dict]:
    # actual implementation
    ...

@protected_tool("write_file")
def write_file(path: str, content: str) -> bool:
    # actual implementation
    ...

Additionally, implement idempotency tracking — if the same tool is called with identical arguments twice in a row, that's almost always a bug:

class IdempotencyGuard:
    def __init__(self, max_repeated_calls: int = 2):
        self.last_calls: dict[str, tuple] = {}
        self.repeat_counts: dict[str, int] = defaultdict(int)
        self.max_repeats = max_repeated_calls

    def check(self, tool_name: str, args: tuple) -> None:
        key = tool_name
        if self.last_calls.get(key) == args:
            self.repeat_counts[key] += 1
            if self.repeat_counts[key] >= self.max_repeats:
                raise RuntimeError(
                    f"Idempotency violation: '{tool_name}' called with identical "
                    f"arguments {self.max_repeats}+ times in sequence. "
                    f"Args: {args[:2]}..."  # truncate for log safety
                )
        else:
            self.repeat_counts[key] = 0
        self.last_calls[key] = args

Failure Pattern 3 — Hallucinated Function Signatures

What it looks like

Your agent calls create_ticket(title="Bug report", priority="high", assignee_id=4821) but your actual function signature is create_ticket(title: str, severity: Literal["low", "medium", "critical"]). The call fails, the agent invents a different set of parameters, fails again, and either loops or silently no-ops.

Why it happens

Models are trained to call tools based on schema descriptions. If the schema is ambiguous, outdated, or the model hasn't been fine-tuned on your exact API, it will hallucinate parameter names and values with high confidence.

The fix

Strict schema validation at the dispatcher, with structured error feedback to the model — not generic error messages.

import Anthropic from "@anthropic-ai/sdk";
import { z } from "zod";

// Define strict schemas for every tool
const CreateTicketSchema = z.object({
  title: z.string().min(1).max(200),
  severity: z.enum(["low", "medium", "critical"]),
  description: z.string().optional(),
});

type CreateTicketInput = z.infer<typeof CreateTicketSchema>;

function dispatchTool(
  toolName: string,
  rawInput: unknown
): { success: true; result: unknown } | { success: false; error: string } {
  if (toolName === "create_ticket") {
    const parsed = CreateTicketSchema.safeParse(rawInput);
    if (!parsed.success) {
      // Return structured error that the model can understand and correct
      const issues = parsed.error.issues
        .map((i) => `  - '${i.path.join(".")}': ${i.message}`)
        .join("\n");
      return {
        success: false,
        error:
          `Schema validation failed for create_ticket:\n${issues}\n` +
          `Valid schema: { title: string, severity: 'low'|'medium'|'critical', description?: string }`,
      };
    }
    const result = createTicket(parsed.data);
    return { success: true, result };
  }
  return { success: false, error: `Unknown tool: ${toolName}` };
}

The key insight: return the valid schema alongside the error. Models can self-correct tool calls when they receive precise, structured feedback — they cannot recover from generic "invalid parameters" messages.


Failure Pattern 4 — Missing Rollback Mechanism

What it looks like

An agent is halfway through a multi-step database migration: 3 of 6 tables updated, step 4 fails. The agent reports an error and stops. Now your database is in a partially-migrated state that no one designed for, and rollback requires manual forensic work.

Why it happens

Developers design the happy path. AI agents feel more "intelligent" than scripts, so teams underestimate how often they'll need to undo agent actions. Every irreversible action taken by an agent needs a corresponding undo action defined at design time.

The fix

Saga pattern: every agent step registers its compensating action before executing.

from dataclasses import dataclass, field
from typing import Callable
import logging

logger = logging.getLogger(__name__)

@dataclass
class AgentSaga:
    """Tracks executed steps and their compensating actions for rollback."""
    completed_steps: list[tuple[str, Callable]] = field(default_factory=list)

    def execute_step(
        self,
        step_name: str,
        action: Callable,
        compensating_action: Callable,
    ) -> any:
        """Execute an action and register its rollback before running."""
        logger.info(f"Executing step: {step_name}")
        # Register compensation BEFORE executing — so even partial failures are recoverable
        self.completed_steps.append((step_name, compensating_action))
        try:
            result = action()
            logger.info(f"Step completed: {step_name}")
            return result
        except Exception as e:
            logger.error(f"Step failed: {step_name} — {e}")
            self.rollback()
            raise

    def rollback(self):
        """Execute compensating actions in reverse order."""
        logger.warning(f"Rolling back {len(self.completed_steps)} completed steps")
        for step_name, compensate in reversed(self.completed_steps):
            try:
                logger.info(f"Compensating: {step_name}")
                compensate()
            except Exception as e:
                logger.error(f"Compensation failed for '{step_name}': {e} — MANUAL INTERVENTION REQUIRED")
        self.completed_steps.clear()

# Usage
saga = AgentSaga()

saga.execute_step(
    step_name="create_user_record",
    action=lambda: db.insert("users", user_data),
    compensating_action=lambda: db.delete("users", where={"id": user_data["id"]}),
)

saga.execute_step(
    step_name="send_welcome_email",
    action=lambda: email_service.send(welcome_email),
    compensating_action=lambda: email_service.cancel_if_undelivered(welcome_email["id"]),
)
Not Every Action is Reversible

Some actions (sent emails, published posts, wire transfers) cannot be meaningfully compensated. For these, implement confirmation gates before execution rather than rollback after. The saga pattern handles reversible actions; human approval handles irreversible ones.


Failure Pattern 5 — No Observability

What it looks like

Your agent is running in production. A customer reports incorrect behavior on a ticket from two weeks ago. You look at your logs and find: INFO agent_run complete. That's it. You have no idea what tools were called, what the model was thinking, what context was in the prompt, or why it made the decision it made.

Why it happens

Observability for agents requires logging structured data at every step of the reasoning loop — not just inputs and outputs. Most teams apply their existing service logging patterns to agents, which were designed for stateless request/response flows.

The fix

Structured trace logging at every decision point.

import json
import uuid
from datetime import datetime, timezone
from dataclasses import dataclass, asdict

@dataclass
class AgentTrace:
    trace_id: str
    session_id: str
    step_number: int
    timestamp: str
    event_type: str  # "tool_call" | "model_response" | "error" | "human_gate"
    tool_name: str | None = None
    tool_input: dict | None = None
    tool_output: str | None = None
    model_thinking: str | None = None
    context_token_count: int | None = None
    duration_ms: int | None = None
    metadata: dict | None = None

class ObservableAgent:
    def __init__(self, session_id: str | None = None):
        self.session_id = session_id or str(uuid.uuid4())
        self.trace_id = str(uuid.uuid4())
        self.step = 0
        self.traces: list[AgentTrace] = []

    def log(self, event_type: str, **kwargs) -> AgentTrace:
        self.step += 1
        trace = AgentTrace(
            trace_id=self.trace_id,
            session_id=self.session_id,
            step_number=self.step,
            timestamp=datetime.now(timezone.utc).isoformat(),
            event_type=event_type,
            **kwargs,
        )
        self.traces.append(trace)
        # Emit to your observability platform (Datadog, OpenTelemetry, etc.)
        print(json.dumps(asdict(trace), default=str))
        return trace

    def log_tool_call(self, tool_name: str, input_data: dict, output: str, duration_ms: int):
        self.log(
            "tool_call",
            tool_name=tool_name,
            tool_input=input_data,
            tool_output=output[:2000],  # truncate long outputs
            duration_ms=duration_ms,
        )

    def log_model_response(self, thinking: str, token_count: int):
        self.log(
            "model_response",
            model_thinking=thinking[:1000],
            context_token_count=token_count,
        )

With structured traces, you can reconstruct any agent session deterministically, measure tool call frequency, catch loop patterns by querying event_type = "tool_call" AND tool_name = X GROUP BY trace_id, and alert on sessions exceeding token or step thresholds.


Failure Pattern 6 — Over-Automation of High-Stakes Decisions

What it looks like

An agent is managing customer refunds. It autonomously refunds $50,000 to a customer based on a misinterpreted complaint. Or an agent is managing infrastructure and terminates a production database because it matched a cleanup rule. These are rare — but the impact is disproportionate.

Why it happens

The path of least resistance when building agents is to give them all the tools they need and let them run. Human approval feels like friction. It is friction — but in some cases that friction is the entire point.

The fix

Explicit policy configuration for action risk tiers, with human-in-the-loop gates for high-stakes decisions.

from enum import Enum
from typing import Callable

class RiskTier(Enum):
    LOW = "low"        # Auto-execute: read-only, reversible, low blast radius
    MEDIUM = "medium"  # Log + execute: write, reversible, limited blast radius
    HIGH = "high"      # Human approval required: irreversible or large blast radius

TOOL_RISK_TIERS: dict[str, RiskTier] = {
    "search_database": RiskTier.LOW,
    "read_file": RiskTier.LOW,
    "send_notification": RiskTier.MEDIUM,
    "update_record": RiskTier.MEDIUM,
    "issue_refund": RiskTier.HIGH,
    "delete_resource": RiskTier.HIGH,
    "send_bulk_email": RiskTier.HIGH,
    "terminate_instance": RiskTier.HIGH,
}

def execute_with_gate(
    tool_name: str,
    tool_fn: Callable,
    args: dict,
    human_approval_fn: Callable[[str, dict], bool],  # implement per your platform
) -> any:
    tier = TOOL_RISK_TIERS.get(tool_name, RiskTier.HIGH)  # default to HIGH if unknown

    if tier == RiskTier.LOW:
        return tool_fn(**args)

    elif tier == RiskTier.MEDIUM:
        logger.info(f"MEDIUM risk action: {tool_name} with args {args}")
        return tool_fn(**args)

    elif tier == RiskTier.HIGH:
        approved = human_approval_fn(
            f"Agent requests HIGH-RISK action: {tool_name}",
            args,
        )
        if not approved:
            raise PermissionError(
                f"Human operator rejected execution of '{tool_name}'. "
                "Agent should inform the user and stop."
            )
        logger.warning(f"HIGH risk action APPROVED: {tool_name} with args {args}")
        return tool_fn(**args)

The pattern is simple: unknown tools default to HIGH risk tier. You must explicitly whitelist tools as low or medium risk. This inverts the default from "everything is allowed until blocked" to "everything is blocked until explicitly permitted."


Section 3 — Practical Takeaways

Building reliable agents in 2026 is a systems engineering problem, not a prompt engineering problem. The six patterns above share a common thread: they all arise from treating agent actions as trustworthy, reversible, and observable by default, when they are none of those things.

The defense posture that works:

  1. Context: sliding-window summarization + explicit task closure markers
  2. Loops: circuit breakers in code, not in prompts
  3. Tool calls: zod/pydantic validation with structured error feedback
  4. Rollbacks: saga pattern — compensating actions registered before execution
  5. Observability: structured traces per step, not per session
  6. High-stakes actions: explicit risk tiers with human gates, default-deny for unknowns

None of these are novel ideas — distributed systems engineers have applied them for 20 years. The novelty is applying them to an agent loop where the "coordinator" is a probabilistic language model rather than deterministic code. That difference makes the failures more subtle and the fixes more critical, not less so.

The One Metric That Matters

Track mean time to detect (MTTD) for agent failures in production. If your MTTD is measured in hours or days, you don't have observability — you have logs. Structured traces with real-time alerting should get your MTTD below 5 minutes for any of the six patterns above.

更多文章