Mukunda Rao Katta

Bounding the Blast Radius of Autonomous AI Agents: A Three-Layer Runtime Safety Framework

Abstract:

Autonomous AI agents are increasingly given authority to act in the real world, calling APIs, spending money, sending messages, and triggering workflows whose effects cannot be reversed. A wrong answer from a chatbot inconveniences a user; a wrong action from an agent charges a card, leaks data, or sends an unintended message. Most popular agent frameworks ship with optimistic defaults: no enforced budget, no allowlist of reachable domains, no schema validation of tool arguments. A growing class of agent failures observed in deployments are not model errors at all, but containment errors. This work asks what minimum runtime layer is required to bound an autonomous agent’s real-world impact under realistic threat conditions, including misbehaving models, prompt-injected inputs, and adversarial tool responses, without crippling agent utility. The proposed answer is a three-layer runtime safety harness placed between the agent and its environment: a budget layer enforcing hard ceilings on monetary spend and per-session tool-call counts, an egress layer restricting the external hosts reachable by the agent, and a validation layer that checks tool arguments against a declared schema before execution and returns a structured correction to the model on failure, with every action written to an append-only audit log and arguments hashed by default to preserve reviewability without leaking sensitive payloads. The framework is released as MIT-licensed open-source libraries on GitHub, npm, and crates.io, with a Python composition integrating all three layers and standalone TypeScript and Rust implementations covering the egress and validation layers. In demonstration agents the harness contained runaway tool-call loops, blocked egress to disallowed hosts, and converted malformed tool calls into structured retries rather than silent execution errors, supporting the claim that a large fraction of practical agent failures are containment failures solvable at the runtime layer with a small number of well-placed guards.

Profile:

Senior AI/ML Engineer and Architect with 8+ years of experience building production-grade machine learning systems at Fortune 100 scale. Currently leading the design of Agentic AI systems, RAG pipelines, and predictive analytics solutions at Southwest Airlines, with prior experience as a Software Development Engineer at Amazon Web Services. Expertise includes Generative AI, LangGraph-based multi-stage retrieval systems, hybrid search, model evaluation frameworks, scalable ML infrastructure on AWS and Kubernetes, and large-scale data engineering using PySpark and AWS Glue. Passionate about developing ethical, high-impact AI solutions and collaborating on innovative Generative AI projects during hackathons.