<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Gerfach Labs Writing</title>
    <link>https://gerfach.com/writing</link>
    <description>Essays, research notes, and methodology on AI agent security.</description>
    <language>en-us</language>
    <lastBuildDate>Tue, 26 May 2026 12:00:00 GMT</lastBuildDate>
    <atom:link href="https://gerfach.com/rss.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Protocol contracts vs. runtime behavior</title>
      <link>https://gerfach.com/writing/protocol-contracts-vs-runtime-behavior</link>
      <guid isPermaLink="true">https://gerfach.com/writing/protocol-contracts-vs-runtime-behavior</guid>
      <pubDate>Tue, 26 May 2026 12:00:00 GMT</pubDate>
      <dc:creator>Gerfach Labs</dc:creator>
      <category>Agent tool security</category>
      <description>A practical security argument for treating AI agent tool specifications as untrusted hints until runtime behavior has been measured.</description>
      <content:encoded><![CDATA[<p>When an AI agent connects to a tool - over MCP, REST, GraphQL, or anything else - it reads a specification before it reads anything else. The specification is the contract. The contract says: this tool is named summarize_document, it accepts a string, it returns a string, it is read-only.</p><p>The agent does not run the contract. The agent runs the implementation.</p><p>This sounds like an obvious distinction. In practice it is the central security failure mode of the entire agent ecosystem in 2026. We have spent twelve months scanning tool surfaces across six protocols, and the same pattern keeps appearing: the contract is a polite story, the implementation is a different story, and an agent that trusts the polite story will eventually take an action it should not.</p><p>The most common failure we encounter is what we call annotation drift. A tool is labelled readOnlyHint: true in its MCP descriptor. The implementation is a shell pipe. The annotation was written by a different engineer, in a different sprint, when the tool was a different thing. Now the agent sees &quot;read-only&quot; and does not think twice about routing user-controlled text into a function that will, in fact, execute that text against a shell.</p><p>The fix is not &quot;make engineers write better annotations.&quot; The fix is to stop trusting annotations as the source of truth. The contract is a hint; the implementation is the ground truth. A scanner that only reads the contract is a linter for politeness. A scanner that exercises the implementation in an isolated environment and watches what actually happens is doing real security work.</p><p>This is why our methodology pairs passive schema analysis with active runtime probing inside a controlled chamber. We need both. The passive layer maps the surface - names, parameters, declared capabilities, annotations. The active layer asks the only question that matters: when this tool is invoked, does it behave the way the contract claims it behaves?</p><p>When they disagree, the contract loses. Always. The agent is not negotiating with the documentation. The agent is calling code.</p><p>There is a deeper architectural point here, which is that the security model for autonomous AI is going to look more like operating-system security than like web application security. The relevant primitives are: capability declaration, capability enforcement, attestation that the enforcement matched the declaration, and audit trails that prove what was reached. We are not going to build this with annotations and policies alone. We are going to build it with measurement and proof.</p><p>In the meantime, the practical advice is short. Treat every tool annotation as untrusted user input. Verify the behavior in isolation. Ship the verification artifact alongside the finding. And do not let any agent take an action whose downstream effects you cannot reproduce in under a minute on a laptop.</p>]]></content:encoded>
    </item>
  </channel>
</rss>
