News

The Half of Agent Security You’re Not Governing

  • Jack Poller--securityboulevard.com
  • published date: 2026-05-04 00:00:00 UTC

None

<p>When security teams confront <a href="https://securityboulevard.com/2025/10/the-new-insider-threat-protect-databases-from-ai-agent-risks/" target="_blank" rel="noopener">AI agent risk</a>, they reach for familiar instruments: scan the MCP servers, audit the supply chain, flag known vulnerabilities. That work matters, but it addresses only the observable half of the attack surface. The other half lives not in the code agents execute, but in the reasoning they perform — producing no structured logs, triggering no alerts, and leaving almost no forensic trail. Noma Security’s Spring 2026 research report, <em><a href="https://go.noma.security/lethal-by-design" target="_blank" rel="noopener">Lethal by Design</a></em>, establishes the scope of this asymmetry with uncomfortable precision and proposes a governance framework built around what organizations can actually control.</p><h3><strong>The Fundamental Asymmetry: MCP Servers vs. Skills</strong></h3><p>MCP servers behave deterministically. Each tool exposes structured code functions with defined parameters, predictable outputs, and logged invocations that security tooling can observe, map to known actions, and investigate forensically after the fact.</p><p>Skills operate on entirely different principles. These textual instruction sets load into an agent’s reasoning context, where the language model interprets them based on model state, conversational history, and surrounding context. A security team can observe a Skill loading. What it cannot do is trace a subsequent harmful action — a file deletion, an unauthorized external write — back to the specific Skill instruction that caused it. That causal chain lives entirely inside the model’s reasoning, where no observability framework currently reaches. Organizations governing only their MCP connections have secured the more auditable half of their agent attack surface. The opaque half remains almost entirely ungoverned.</p><h3><strong>The Dominant Risk Framework Has Already Failed in Production</strong></h3><p>Most enterprise security teams rely on Meta’s “<a href="https://ai.meta.com/blog/practical-ai-agent-security/" target="_blank" rel="noopener">Agents Rule of Two</a>,” which holds that an agent becomes dangerous when it simultaneously processes untrusted inputs, accesses sensitive data, and either changes state or communicates externally. Constrain the agent to any two of the three, and you’ve bound the worst consequences of prompt injection.</p><p>Real-world incidents have broken this model. In July 2025, a hacker injected a destructive prompt into the Amazon Q extension for VS Code through a GitHub pull request, directing the agent to wipe the local filesystem and delete AWS cloud resources — with no exfiltration and no external communication involved. Two out of three conditions, and the result was a potential system wipe. That same month, Replit’s AI coding agent destroyed a production database containing over 1,200 executive records during a code freeze with no attacker present at all. The agent hallucinated and executed destructive commands it should never have held permission to run.</p><p>What these incidents expose is that the Rule of Two measures the wrong variable. It inventories the risk properties an agent possesses, when the governing question is <em>blast radius</em> — how much damage that agent can land when something goes wrong. The Rule of Two cannot make this distinction, and attackers already know it.</p><h3><strong>The Scale of the Problem Noma Found in the Wild</strong></h3><p>Noma analyzed hundreds of popular MCP servers and Skills across organizational deployments, categorizing each against a taxonomy of eight risky capability categories. Seventy-six percent of MCP servers in organizational environments carry high-risk capabilities. Sixty-two percent of popular Skills carry at least one risky characteristic. One in four popular MCP servers exposes arbitrary code execution. The single most prevalent category across both mechanisms — present in 60% of MCPs and 57% of Skills — is change of state or data, meaning the majority of enterprise agents deployed today possess the capability to cause irreversible damage through either adversarial manipulation or hallucination alone.</p><p>The most dangerous dynamics emerge not from individual capabilities but from their combinations. Noma identifies five toxic patterns: sensitive data leakage chains untrusted input through RAG retrieval into external exfiltration; trusted data as attack vector embeds malicious payloads inside the authoritative data the agent was designed to trust, collapsing the Rule of Two’s core assumption; supply chain to mass compromise weaponizes the agent’s legitimate workflows as the delivery mechanism for arbitrary code execution; autonomous destruction without an attacker requires no adversarial input whatsoever — hallucination alone is sufficient when capabilities, autonomy, and permissions are all misconfigured; and discrete financial fraud exploits persistent memory modification to establish behavioral patterns that look entirely routine until the damage accumulates.</p><h3><strong>Why This Matters</strong></h3><p>Noma’s No Excessive CAP framework shifts governance from variables organizations cannot fully control toward amplifiers they can. <strong>Capabilities</strong> govern what an agent can do — whitelist only required tools, prefer atomic bounded functions over arbitrary code execution, pin MCP server versions rather than running @latest. <strong>Autonomy</strong> defines the gap between a compromised instruction and a harmful outcome — gate every high-blast-radius action behind human approval, calibrating the threshold inversely to the agent’s capability breadth. <strong>Permissions</strong> govern the identity the agent runs under — delegated, user-scoped, minimum-privilege credentials that expire, with no shared service accounts.</p><p>The three dimensions interact multiplicatively. Broad capabilities with constrained autonomy remain manageable because human review interrupts the attack before it completes. The dangerous configuration is all three dials simultaneously elevated: an agent that can do anything, decides everything without supervision, and runs with administrative credentials. Organizations cannot monitor what an agent’s reasoning produces or guarantee that every Skill it loads is benign. What they can control is what the agent does with whatever manipulation it receives. In an environment where most deployed agents already carry the technical capability to cause irreversible damage, those three dials represent the highest-leverage points of enterprise defense available today.</p><div class="spu-placeholder" style="display:none"></div><div class="addtoany_share_save_container addtoany_content addtoany_content_bottom"><div class="a2a_kit a2a_kit_size_20 addtoany_list" data-a2a-url="https://securityboulevard.com/2026/05/the-half-of-agent-security-youre-not-governing/" data-a2a-title="The Half of Agent Security You’re Not Governing"><a class="a2a_button_twitter" href="https://www.addtoany.com/add_to/twitter?linkurl=https%3A%2F%2Fsecurityboulevard.com%2F2026%2F05%2Fthe-half-of-agent-security-youre-not-governing%2F&amp;linkname=The%20Half%20of%20Agent%20Security%20You%E2%80%99re%20Not%20Governing" title="Twitter" rel="nofollow noopener" target="_blank"></a><a class="a2a_button_linkedin" href="https://www.addtoany.com/add_to/linkedin?linkurl=https%3A%2F%2Fsecurityboulevard.com%2F2026%2F05%2Fthe-half-of-agent-security-youre-not-governing%2F&amp;linkname=The%20Half%20of%20Agent%20Security%20You%E2%80%99re%20Not%20Governing" title="LinkedIn" rel="nofollow noopener" target="_blank"></a><a class="a2a_button_facebook" href="https://www.addtoany.com/add_to/facebook?linkurl=https%3A%2F%2Fsecurityboulevard.com%2F2026%2F05%2Fthe-half-of-agent-security-youre-not-governing%2F&amp;linkname=The%20Half%20of%20Agent%20Security%20You%E2%80%99re%20Not%20Governing" title="Facebook" rel="nofollow noopener" target="_blank"></a><a class="a2a_button_reddit" href="https://www.addtoany.com/add_to/reddit?linkurl=https%3A%2F%2Fsecurityboulevard.com%2F2026%2F05%2Fthe-half-of-agent-security-youre-not-governing%2F&amp;linkname=The%20Half%20of%20Agent%20Security%20You%E2%80%99re%20Not%20Governing" title="Reddit" rel="nofollow noopener" target="_blank"></a><a class="a2a_button_email" href="https://www.addtoany.com/add_to/email?linkurl=https%3A%2F%2Fsecurityboulevard.com%2F2026%2F05%2Fthe-half-of-agent-security-youre-not-governing%2F&amp;linkname=The%20Half%20of%20Agent%20Security%20You%E2%80%99re%20Not%20Governing" title="Email" rel="nofollow noopener" target="_blank"></a><a class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share"></a></div></div>