Persistent, cross-runtime, typed memory for AI agents — local, offline, no cloud.
Tool 10 in the UnityInFlow ecosystem.
AI agents forget everything between sessions. agent-memory is a small
MCP stdio server backed by an embedded SQLite
database: any MCP-speaking runtime can store, search, list, and forget structured
memories, and every runtime pointed at the same database file shares the same
memory — so a decision recorded by one agent is recalled by the next, across
sessions and across tools.
- Local & offline. Embedded SQLite, zero accounts, zero cloud. Your memory never leaves your machine.
- Typed memories. Six categories —
DECISION,PATTERN,ERROR,TODO,ARCHITECTURE,CONSTRAINT— so retrieval is structured, not a soup of text. - Decay, not deletion. Memories fade in ranking over time (exponential decay;
high-value pinned types fade slower) but are never deleted by decay. The only
ways a memory is removed are an explicit TTL expiry or
memory_forget. - Search by meaning. Semantic search via local Ollama embeddings when available, with automatic fallback to SQLite FTS5 keyword ranking when it isn't — both blended with decay. No Ollama, no network? Everything still works.
brew install unityinflow/tap/agent-memoryDownload the tarball for your platform from the latest release:
| Platform | Asset |
|---|---|
| macOS arm64 (Apple Silicon) | agent-memory-aarch64-apple-darwin.tar.gz |
| macOS x86_64 (Intel) | agent-memory-x86_64-apple-darwin.tar.gz |
| Linux x86_64 (gnu) | agent-memory-x86_64-unknown-linux-gnu.tar.gz |
| Linux aarch64 (gnu) | agent-memory-aarch64-unknown-linux-gnu.tar.gz |
| Linux x86_64 (musl, static) | agent-memory-x86_64-unknown-linux-musl.tar.gz (when available) |
| Linux aarch64 (musl, static) | agent-memory-aarch64-unknown-linux-musl.tar.gz (when available) |
Verify the checksum against SHA256SUMS.txt (shipped with every release),
then extract:
shasum -a 256 -c SHA256SUMS.txt --ignore-missing # (sha256sum on Linux)
tar -xzf agent-memory-<triple>.tar.gz
./agent-memory --versionRequires a stable Rust toolchain (edition 2021).
git clone https://github.com/UnityInFlow/agent-memory
cd agent-memory
cargo build --release
# binary at target/release/agent-memory# Serve the MCP stdio server (creates the DB on first run).
agent-memory serve
# Choose the database file (default: your OS data dir; honors $AGENT_MEMORY_DB).
agent-memory --db /path/to/memory.db serve
AGENT_MEMORY_DB=/path/to/memory.db agent-memory serveThe server speaks JSON-RPC over stdout; all logs go to stderr, so stdout stays a pure protocol channel.
Point every runtime at one database via AGENT_MEMORY_DB and they share memory:
{
"mcpServers": {
"agent-memory": {
"command": "agent-memory",
"args": ["serve"],
"env": {
"AGENT_MEMORY_DB": "/Users/you/.agent-memory/memory.db"
}
}
}
}Drop the same block into each agent runtime's MCP config; because they open the
same AGENT_MEMORY_DB file (WAL-mode SQLite with a serialized writer), a memory
stored in one is immediately retrievable in another.
| Tool | Purpose |
|---|---|
memory_store |
Persist a typed memory; returns its id. |
memory_search |
Semantic (embedding) search when Ollama is reachable, FTS5 keyword otherwise — ranked by relevance × decay, recency-bumped on retrieval. |
memory_list |
List memories newest-first, with optional type/tag/scope filters. |
memory_forget |
Delete a memory by id (clean not-found for unknown ids). |
{
"content": "We chose io.github.unityinflow as the Maven group.",
"type": "DECISION",
"tags": ["maven", "release"],
"scope": "kore-runtime",
"ttl_secs": null
}content and type are required; tags, source, scope, and ttl_secs are
optional. A non-null ttl_secs makes the memory expire that many seconds after it
is stored — the background sweep then removes it.
{ "query": "maven group", "type": "DECISION", "limit": 10 }Returns an envelope carrying the mode actually used plus the ranked results,
each with a freshly-recomputed decay_score:
{ "search_mode": "semantic", "results": [ ... ] }search_mode is "semantic" when the query was answered via embeddings and
"keyword" when the FTS5 path served it (Ollama absent or unreachable). A
query that matches nothing returns an empty list, never an error.
{ "type": "TODO", "scope": "kore-runtime", "limit": 20 }{ "id": 42 }Returns { "id": 42, "deleted": true }, or deleted: false for an unknown id.
Out of the box, search uses SQLite FTS5 keyword ranking — fully offline, no extra software. To rank by meaning instead, run a local Ollama and pull the embedding model:
ollama pull nomic-embed-textThat's the whole setup. When Ollama is reachable, stores embed content automatically and searches use vector KNN (cosine) blended with decay. Point at a non-default Ollama with either:
agent-memory --ollama-url http://localhost:11434 serve
AGENT_MEMORY_OLLAMA_URL=http://localhost:11434 agent-memory serveGraceful fallback, always. If Ollama is stopped, absent, or the model isn't pulled:
- stores still succeed — content is queued (
embedding_status = 0) and backfilled by the background sweep once Ollama returns; - searches transparently fall back to the FTS5 keyword path and report it via
"search_mode": "keyword"; - nothing errors, nothing blocks. The tool works fully offline without Ollama.
For non-MCP integrations, the same store and logic are exposed over HTTP:
agent-memory serve-rest # binds 127.0.0.1:7437 by default
agent-memory serve-rest --addr 127.0.0.1:8080 # custom loopback bind| Method | Path | Purpose |
|---|---|---|
POST |
/api/memories |
Store a memory (same fields as memory_store) |
GET |
/api/memories |
List, with optional type/tag/scope/limit query params |
POST |
/api/search |
Search — returns the same {search_mode, results} envelope |
DELETE |
/api/memories/{id} |
Forget by id (404 with a structured body for unknown ids) |
GET |
/health |
Liveness + embedder reachability status |
The REST daemon and the MCP stdio server can run concurrently against the same database file (WAL-mode SQLite with a serialized writer).
Security: the API is unauthenticated and binds loopback-only by
default. Binding a non-loopback address requires the explicit
--allow-remote flag, and the server logs a loud warning naming the exposure
when you do. Never expose the REST API publicly — anyone who can reach it
can read and delete every memory. If you need remote access, keep it behind a
VPN/SSH tunnel or an authenticating reverse proxy.
Bootstrap your memory from an existing GSD project's state:
agent-memory import --from gsd-state .planning/STATE.md| STATE.md section | Imported as |
|---|---|
### Decisions bullets |
DECISION |
### Blockers/Concerns bullets |
CONSTRAINT |
### Pending Todos bullets |
TODO |
## Deferred Items table rows |
TODO (tagged deferred) |
Every imported memory carries source = "gsd-state", the gsd tag, and a
scope (the project directory containing .planning by default; override with
--scope). Re-running the same import is a no-op: unchanged items are
deduplicated on the exact (source, type, content) key and reported as skipped.
A background task sweeps on an interval while the server runs:
- TTL expiry — rows whose
expires_atis in the past are deleted. - Decay materialization — every survivor's
decay_scoreis recomputed so ranking stays cheap between reads. Pinned types (DECISION,ARCHITECTURE,CONSTRAINT) decay markedly slower. - Embedding backfill — memories stored while Ollama was unreachable are embedded in batches once it comes back.
Decay never deletes. A memory with no TTL stays retrievable forever, however
low its decay score falls — it simply ranks lower. Removal happens only via TTL or
memory_forget.
cargo test --workspace # unit + integration (incl. stdout-purity)
cargo clippy --workspace --all-targets -- -D warnings
cargo fmt --check
cargo llvm-cov --workspace --fail-under-lines 80 # >80% coverage gate (CI-enforced)CI runs fmt-check, clippy -D warnings, the full test suite, and the coverage gate
on the UnityInFlow self-hosted runners. See CONTRIBUTING.md.
MIT © Jiří Hermann