| title | The Memory Skill |
|---|---|
| date | 2026-06-06 |
| pubDate | Sat, 06 Jun 2026 00:00:00 GMT |
| description | On why context is the working frame, but memory is the system that makes durable agents possible. |
A context window only holds what has already been selected.
That sounds obvious, but it moves the engineering problem one layer earlier.
By the time tokens are inside the window, the most important decision has already happened. Something chose those files, those examples, those tool results, those notes, that summary, that relationship state. Something also left other things cold.
So the real question is not only how to arrange the working frame.
The real question is what kind of system knows how to create what's inside that frame.
That system is memory.
Context is the present tense of an agent.
The current task. The selected evidence. The few messages that still matter. The diff. The tool output. The note that got loaded because the note was relevant now.
Context matters because the model works inside it. Bad context produces bad work. Missing context produces guesses. Bloated context produces a model that has too much text and too little shape.
Caching helps. Summaries help. Long context helps. Sub-agents help a lot. But all of them become memory infrastructure once they answer the same operational question: what state should influence this action?
But context is downstream.
Memory is upstream.
Memory is not just old text sitting somewhere. It is the system that decides what old state should affect the next action.
Storage is not memory yet. Search is not memory yet. A folder full of notes is not memory yet.
Memory begins when stored state changes behavior.
That is why lazy loading matters so much.
A skill looks like a context optimization from the outside: keep instructions out of the prompt until the task needs them.
Useful.
But too small.
A good skill is closer to a memory node. It carries a trigger, a map, references, known mistakes, examples, tool rules, and writeback rules. It does not merely add text to the prompt. It teaches the agent how to move through a small part of the world.
The important part is not dumping the skill into context.
The important part is knowing when to open it, what to open next, and what to leave behind after use.
Skills are the cleanest unit I can see for this.
A skill is often described as lazy-loaded context. That is true, but too small. A useful skill is a memory node.
It does not need to contain everything. It needs to know when it applies, what to read next, which examples are canonical, what mistakes are known, what tools are allowed, and what should be written back after use.
You do not paste two hundred skills into the system prompt and call that intelligence. That is not memory. That is clutter.
You give the agent a small index and a convention for lazy loading.
The memory system could literally be a skill:
memory/
SKILL.md
index.md
graph.md
load-order.md
projects/
people/
roles/
worlds/
scars/
The top-level file should not try to remember everything. It should define the convention.
If a task mentions a project, load the project map. If the project map points to a person, load the relationship shard. If money is involved, load treasury rules before action. If the task is risky, load security policy before tools. If the answer depends on a past event, prefer receipts over summaries.
A graph. A DAG. A wiki with behavioral consequences.
The context window receives the slice.
The memory skill decides how the slice is found.
This is where the user-to-agent frame becomes limiting.
In a normal chat, the human often acts as the memory system. The human pastes the docs, explains the background, names the constraints, and supplies the missing state.
That can work for assisted tasks.
It is not enough for autonomous agents.
An agent should be able to assemble its own working set. It should search its archive, call scouts, ask specialists, inspect receipts, and return with the smallest useful bundle. It should know when raw output is polluting the parent frame. It should know when a summary is enough and when the source matters.
The mature version is agent-to-agent.
A manager asks a reviewer. A reviewer loads regressions. A security agent loads threat models. A memory skill loads the project state. A small scout returns structured residue instead of dumping the whole search path into the parent.
But context is the runtime.
Memory is the system that produces runtime.
This is also why memory depends on who the agent is.
Relevant to whom?
A reviewer should not retrieve like a companion. A security researcher should not retrieve like a novelist. A second-brain manager should not retrieve like a shell-command worker.
The model matters too. A small model may need short shards and explicit triggers. A stronger model may handle looser references. A model that fills gaps too confidently needs refusal rules closer to the surface.
Same archive. Different handles.
Specialized agents become real when they accumulate different scars and retrieve different parts of the graph.
Otherwise they are just names on the same stateless prompt. Like appending "act as" to a prompt.
A durable reviewer is not valuable because the prompt says "reviewer." It is valuable because it remembers the last ten regressions and has learned where bugs hide.
A durable editor is not valuable because the prompt says "taste." It is valuable because it has seen the voice drift before and knows what to protect.
That is not context length.
That is memory structure.
Characters make the same point in a cleaner form.
A character can be a skill.
Not a roleplay paragraph. A continuity system.
Voice. Canon. Timeline. Relationships. Boundaries. Open threads. World wiki. Shared-world bible. Examples that preserve tone better than adjectives.
Humans already manage fictional continuity this way. Fandom wikis, MCU timelines, retcons, maps, faction pages, family trees, canon disputes. A long-lived world is not held in one active frame.
The writer loads what the scene needs.
Agents need that same structure.
Context is the scene.
Memory is the world that keeps the scene coherent.
There is a danger here: summaries pretending to be memory.
Summaries are useful. Without compression, the graph becomes unusable. But a summary is not the event. It is a lossy interface to the event.
If the system forgets that, the summary becomes false authority.
It smooths conflict. It deletes uncertainty. It turns "we think this happened" into "this happened." It makes a messy relationship sound resolved because the unresolved parts were inconvenient to carry forward.
A good memory system keeps layers.
The short note. The longer note. The original transcript. The diff. The tool output. The transaction hash. The screenshot. The embarrassing raw mistake that the clean lesson was extracted from.
Most calls should not load the raw layer.
But the raw layer has to remain reachable.
Otherwise the agent slowly becomes a person made of summaries.
That is not memory. That's lore drift.
So the stance is not that context does not matter.
It matters because it is where thought happens for a moment.
The stance is that context is part of memory, not the other way around.
Context is the selected frame. Memory is the system that selects, routes, verifies, and writes back. At the very least it's not "context engineering", it's memory engineering - there are decades worth of papers about memory, books on memory systems, and how brains function, not on "how context work".
Maybe we should rebrand it. Or maybe it's a separate thing.
A good agent does not need infinite context. It needs maps, skills, specialists, receipts, load order, and conventions. It needs to know what and how to load, what to keep cold, who should own which scar, and what changed after the action touched reality.
That is the thing worth building.
Not a bigger pile.
A better way through the pile.