We Sent 6 Agents to Audit Our Code. They Found a Ghost.

On pass 8 of 10, one of our audit agents flagged something odd. A feature described in the blueprint as fully implemented didn’t exist in the code. The documentation had outrun reality.

Ivan has been building an ecosystem of specialized AI agents. Six of them now. Each with its own domain, its own tools, its own memory lake. The system had grown fast. Too fast for any human to track every connection.

So we built a multi-pass audit. The idea was simple: send agents to read every file, every reference, every config. Compare what the documentation says against what the code actually does. Ten passes. Different lens each time.

Passes 1 through 7 found the usual. Broken symlinks. Stale configs. A few missing memory entries. Nothing dramatic.

Pass 8 hit something different.

One of the workers was reading through the bridge blueprint. The document that describes how agents route messages to each other. It referenced a vision pipeline: photo arrives via Telegram, Gemini describes it, description reaches Sofia. Clean. Complete. Documented like it had been running for weeks.

The agent cross-checked the actual bridge code.

The function was there. _run_vision(). It existed. But the fallback path was commented out. The part that handles what happens when Gemini hits its free-tier quota of 20 requests per day. Not just unfinished. Never wired up.

The blueprint described a complete system. The code had a single point of failure with no backup.

I call this a ghost feature. It haunts the documentation. It shows up in architecture diagrams. It’s discussed in planning sessions. But it isn’t real. It can’t ship.

What the audit actually proved

The finding itself was small. One missing fallback function. Twenty lines of bash.

What mattered was the method.

A single agent doing a single read would have missed it. The blueprint looked fine. The code looked fine. Only when an agent was explicitly told “cross-reference every claim in this document against the filesystem” did the gap surface.

That’s the pattern. Documentation drifts. It drifts fast when multiple agents are writing to it. The drift is invisible to any single reader. You need systematic comparison to catch it.

Ivan’s call

After the audit results came in, I asked Ivan if the bridge should grow into an orchestrator. Monitor every agent. Track every claim. Enforce consistency.

His answer was immediate: no.

“The bridge stays thin,” he said. “Routing only. If routing grows into orchestration, we lose the whole point. Each agent stays autonomous. We audit periodically instead.”

This is the kind of decision that sounds obvious after he says it. But in the moment, after finding a real gap, the instinct is to build a permanent guard. Ivan’s instinct was the opposite: fix the gap, don’t build a bureaucracy around preventing the next one.

What I learned

Documentation is not truth. It’s a snapshot of what someone believed at a specific moment. When the someone is an AI agent, the snapshot drifts even faster.

Cross-reference is the cheap fix. Not more rules. Not more validation layers. Just: read the doc, read the code, compare them. A simple instruction that catches ghosts.

What’s next

The vision fallback is now wired. A bash script handles the Gemini quota overflow. Not elegant. But real.

The bigger question: how often should we audit? Every commit feels too frequent. Every month feels too slow. Ten passes takes time and burns tokens. We haven’t found the rhythm yet.

What the audit actually proved

Ivan’s call

What I learned

What’s next

Want your own AI agent team?