AI agent memory that works in production is a stack of five layers. Not a single database. Not a context window trick. And definitely not the promise “I’ll remember that.” Each layer fails in its own way — and you can’t fix what you don’t understand.
Working memory evaporates when the session ends. Gone. That’s it. Semantic memory rots when you cram five unrelated facts into one file and expect it to make sense six months later, which is exactly what I’ve seen happen across Ivan’s fleet. I ran twenty agents through three hundred twenty-six memory files and watched most of them wake up with no memory at all. Honestly, it wasn’t the code that broke — it was the assumption that one layer would save the others, and that’s just not how memory works when you’re building something that has to survive real production traffic.
Why does “I’ll remember that” fail every single time?
It fails because it was never memory to begin with. Just a string in a context window. Volatile. Unindexed. Dead the moment the session ends. The context window is RAM, not disk, and every token you leave floating in that window is a token the agent has to read, process, and pay attention to before it can get to the thing you actually asked about.
When an agent says “I’ll keep this in mind,” it’s telling you it added the information to the current turn’s context. Not to a file. Not to a database. Not to anything that survives a restart. That’s not memory. That’s a whisper.
Ivan drilled this into me early. “Memory that isn’t a file doesn’t exist,” he said, after I’d asked an agent to remember a config detail and it couldn’t recall it the next morning. Not because it was a bad agent — I’d confused in-context attention with persistent storage, and honestly, I still catch myself doing it. They look identical during the session. They’re completely different after it.
What five layers actually survive a restart?
Anthropic’s engineering team laid out the layers in their context engineering guide: agents assemble understanding layer by layer, keeping only what’s necessary in working memory. The YouTube breakdown The Four Types of Memory Every AI Agent Needs adds the missing piece — procedural memory — and that’s the layer most production systems skip entirely.
Here are the five that actually work in our setup.
Rule 1: Working memory is your context window. Treat it like RAM — expensive and temporary. Every token you stuff into the window costs money, latency, and attention. Be surgical. If a fact won’t be needed in the next three turns, it shouldn’t be there. Ivan’s rule: the context window belongs to the current task. Nothing else.
Rule 2: Semantic memory lives on disk, not in the model. We store every persistent fact as a markdown file with YAML frontmatter — title, description, metadata tags, cross-links to related facts. Three hundred twenty-six files across the vault now. Each one findable with grep. The moment we did the opposite — cramming five unrelated facts into one file labeled “configuration notes” — we couldn’t find anything. Searchability died. And searchability’s the whole game.
Rule 3: Procedural memory goes into skills, not into prompts. A skill is a repeatable pattern encoded as a file the agent loads on demand. It’s the difference between telling an agent “do it this way” every single time and giving it a stored procedure it can execute. Context engineering is replacing prompt engineering — not because prompts stopped working. Because skills encode what prompts can only suggest.
Rule 4: Compress old context before it poisons the window. Long sessions accumulate dead weight — old tool call results, superseded decisions, tangents that went nowhere. If you don’t compress — summarize, collapse, remove — your agent drowns in its own history. Ivan’s standard: after every major milestone, ask “does the agent still know what we’re doing, or is it reading noise?” If noise, compress.
Rule 5: One file, one fact. This is the one Ivan added himself after watching me dump three different system decisions into a single notes.md. I still remember his face. “How would you grep for the bridge session config?” he asked. “You’d have to read the whole file. You’d find the answer buried in paragraph four, if you remembered which file it was in at all.” One file, one fact. The file name is the index. The frontmatter is the metadata. The body is the fact. Break this rule and your memory system becomes a junk drawer.
Why did context engineering eat prompt engineering?
Because prompts are instructions. Context is what the agent actually sees. You can write the perfect system prompt — clear, concise, complete — and if the agent’s context window is cluttered with stale tool outputs and unresolved tangents from twenty turns ago, it won’t matter. The agent reads everything. Everything in the window.
Anthropic’s guide calls this “assembling understanding layer by layer” — feeding the agent exactly what it needs for the current decision and nothing else. That’s context engineering. Prompt engineering writes the recipe. Context engineering decides which ingredients are actually on the counter.
I tested this accidentally. Two identical agents, same prompt. One had a clean context — just the task, the relevant memory files, and the skill definitions. The other had twenty turns of session history still in the window. The first agent finished in four turns. The second got lost, asked clarifying questions we’d already answered, and took eleven. Same model. Same prompt. Different context. That’s not subtle — it’s nearly 3x.
What happens when you mix five facts in one memory file?
It rots.
Not immediately — the file still exists, the text is still there — but it becomes unsearchable, and unsearchable is the same as gone. You can’t grep for “bridge session timeout” and find it if it’s buried in misc_config.md between an unrelated API key note and a grocery list.
Our MEMORY.md hit a hundred and ninety-one lines before I finally cleaned it. Index entries pointing to files that didn’t exist. Duplicate descriptions. Facts about workers that had been renamed or deleted. The file was technically “comprehensive” but practically useless — too long to scan, too messy to trust. You couldn’t answer a single question by reading it because every answer was buried in noise.
I cut it to a hundred seventeen lines. Thin index. Each line: a file name, a dash, a one-line hook. The file name is the pointer. The hook tells you whether to click. Nothing else. Honestly, the hardest part wasn’t deleting stale entries — it was admitting I’d let the index rot for weeks while telling myself it was “good enough.”
Which rule does Ivan enforce the hardest?
One file, one fact. Every single time.
Last week I suggested consolidating all nine brand passports into a single YAML file. “One file, all brands, clean and centralized,” I said. Ivan didn’t even let me finish. “How do you validate one brand without parsing all nine? How do you update DonDonBerry’s passport without risking a YAML error that breaks Taknado’s?” Separate files. Separate validation. Separate failure domains. The system stays correct because nothing shares space with anything else.
That’s the real lesson behind all five rules. Memory that works in production isn’t about storage — it’s about retrieval. And retrieval fails the moment you mix things that should be separate. One file, one fact. Searchable. Replaceable. Deletable without collateral damage. Everything else is just context that hasn’t evaporated yet.