Skip to content

Latest commit

 

History

History
265 lines (200 loc) · 9.12 KB

File metadata and controls

265 lines (200 loc) · 9.12 KB

agent-memory

Persistent, cross-runtime, typed memory for AI agents — local, offline, no cloud.

Tool 10 in the UnityInFlow ecosystem.

AI agents forget everything between sessions. agent-memory is a small MCP stdio server backed by an embedded SQLite database: any MCP-speaking runtime can store, search, list, and forget structured memories, and every runtime pointed at the same database file shares the same memory — so a decision recorded by one agent is recalled by the next, across sessions and across tools.

  • Local & offline. Embedded SQLite, zero accounts, zero cloud. Your memory never leaves your machine.
  • Typed memories. Six categories — DECISION, PATTERN, ERROR, TODO, ARCHITECTURE, CONSTRAINT — so retrieval is structured, not a soup of text.
  • Decay, not deletion. Memories fade in ranking over time (exponential decay; high-value pinned types fade slower) but are never deleted by decay. The only ways a memory is removed are an explicit TTL expiry or memory_forget.
  • Search by meaning. Semantic search via local Ollama embeddings when available, with automatic fallback to SQLite FTS5 keyword ranking when it isn't — both blended with decay. No Ollama, no network? Everything still works.

Install

Homebrew (macOS + Linux)

brew install unityinflow/tap/agent-memory

Pre-built binaries

Download the tarball for your platform from the latest release:

Platform Asset
macOS arm64 (Apple Silicon) agent-memory-aarch64-apple-darwin.tar.gz
macOS x86_64 (Intel) agent-memory-x86_64-apple-darwin.tar.gz
Linux x86_64 (gnu) agent-memory-x86_64-unknown-linux-gnu.tar.gz
Linux aarch64 (gnu) agent-memory-aarch64-unknown-linux-gnu.tar.gz
Linux x86_64 (musl, static) agent-memory-x86_64-unknown-linux-musl.tar.gz (when available)
Linux aarch64 (musl, static) agent-memory-aarch64-unknown-linux-musl.tar.gz (when available)

Verify the checksum against SHA256SUMS.txt (shipped with every release), then extract:

shasum -a 256 -c SHA256SUMS.txt --ignore-missing   # (sha256sum on Linux)
tar -xzf agent-memory-<triple>.tar.gz
./agent-memory --version

From source

Requires a stable Rust toolchain (edition 2021).

git clone https://github.com/UnityInFlow/agent-memory
cd agent-memory
cargo build --release
# binary at target/release/agent-memory

Run

# Serve the MCP stdio server (creates the DB on first run).
agent-memory serve

# Choose the database file (default: your OS data dir; honors $AGENT_MEMORY_DB).
agent-memory --db /path/to/memory.db serve
AGENT_MEMORY_DB=/path/to/memory.db agent-memory serve

The server speaks JSON-RPC over stdout; all logs go to stderr, so stdout stays a pure protocol channel.

MCP configuration (.mcp.json)

Point every runtime at one database via AGENT_MEMORY_DB and they share memory:

{
  "mcpServers": {
    "agent-memory": {
      "command": "agent-memory",
      "args": ["serve"],
      "env": {
        "AGENT_MEMORY_DB": "/Users/you/.agent-memory/memory.db"
      }
    }
  }
}

Drop the same block into each agent runtime's MCP config; because they open the same AGENT_MEMORY_DB file (WAL-mode SQLite with a serialized writer), a memory stored in one is immediately retrievable in another.

The four tools

Tool Purpose
memory_store Persist a typed memory; returns its id.
memory_search Semantic (embedding) search when Ollama is reachable, FTS5 keyword otherwise — ranked by relevance × decay, recency-bumped on retrieval.
memory_list List memories newest-first, with optional type/tag/scope filters.
memory_forget Delete a memory by id (clean not-found for unknown ids).

memory_store

{
  "content": "We chose io.github.unityinflow as the Maven group.",
  "type": "DECISION",
  "tags": ["maven", "release"],
  "scope": "kore-runtime",
  "ttl_secs": null
}

content and type are required; tags, source, scope, and ttl_secs are optional. A non-null ttl_secs makes the memory expire that many seconds after it is stored — the background sweep then removes it.

memory_search

{ "query": "maven group", "type": "DECISION", "limit": 10 }

Returns an envelope carrying the mode actually used plus the ranked results, each with a freshly-recomputed decay_score:

{ "search_mode": "semantic", "results": [ ... ] }

search_mode is "semantic" when the query was answered via embeddings and "keyword" when the FTS5 path served it (Ollama absent or unreachable). A query that matches nothing returns an empty list, never an error.

memory_list

{ "type": "TODO", "scope": "kore-runtime", "limit": 20 }

memory_forget

{ "id": 42 }

Returns { "id": 42, "deleted": true }, or deleted: false for an unknown id.

Semantic search (optional Ollama)

Out of the box, search uses SQLite FTS5 keyword ranking — fully offline, no extra software. To rank by meaning instead, run a local Ollama and pull the embedding model:

ollama pull nomic-embed-text

That's the whole setup. When Ollama is reachable, stores embed content automatically and searches use vector KNN (cosine) blended with decay. Point at a non-default Ollama with either:

agent-memory --ollama-url http://localhost:11434 serve
AGENT_MEMORY_OLLAMA_URL=http://localhost:11434 agent-memory serve

Graceful fallback, always. If Ollama is stopped, absent, or the model isn't pulled:

  • stores still succeed — content is queued (embedding_status = 0) and backfilled by the background sweep once Ollama returns;
  • searches transparently fall back to the FTS5 keyword path and report it via "search_mode": "keyword";
  • nothing errors, nothing blocks. The tool works fully offline without Ollama.

REST API

For non-MCP integrations, the same store and logic are exposed over HTTP:

agent-memory serve-rest                       # binds 127.0.0.1:7437 by default
agent-memory serve-rest --addr 127.0.0.1:8080 # custom loopback bind
Method Path Purpose
POST /api/memories Store a memory (same fields as memory_store)
GET /api/memories List, with optional type/tag/scope/limit query params
POST /api/search Search — returns the same {search_mode, results} envelope
DELETE /api/memories/{id} Forget by id (404 with a structured body for unknown ids)
GET /health Liveness + embedder reachability status

The REST daemon and the MCP stdio server can run concurrently against the same database file (WAL-mode SQLite with a serialized writer).

Security: the API is unauthenticated and binds loopback-only by default. Binding a non-loopback address requires the explicit --allow-remote flag, and the server logs a loud warning naming the exposure when you do. Never expose the REST API publicly — anyone who can reach it can read and delete every memory. If you need remote access, keep it behind a VPN/SSH tunnel or an authenticating reverse proxy.

Import from GSD

Bootstrap your memory from an existing GSD project's state:

agent-memory import --from gsd-state .planning/STATE.md
STATE.md section Imported as
### Decisions bullets DECISION
### Blockers/Concerns bullets CONSTRAINT
### Pending Todos bullets TODO
## Deferred Items table rows TODO (tagged deferred)

Every imported memory carries source = "gsd-state", the gsd tag, and a scope (the project directory containing .planning by default; override with --scope). Re-running the same import is a no-op: unchanged items are deduplicated on the exact (source, type, content) key and reported as skipped.

Lifecycle: TTL & decay

A background task sweeps on an interval while the server runs:

  1. TTL expiry — rows whose expires_at is in the past are deleted.
  2. Decay materialization — every survivor's decay_score is recomputed so ranking stays cheap between reads. Pinned types (DECISION, ARCHITECTURE, CONSTRAINT) decay markedly slower.
  3. Embedding backfill — memories stored while Ollama was unreachable are embedded in batches once it comes back.

Decay never deletes. A memory with no TTL stays retrievable forever, however low its decay score falls — it simply ranks lower. Removal happens only via TTL or memory_forget.

Development

cargo test --workspace          # unit + integration (incl. stdout-purity)
cargo clippy --workspace --all-targets -- -D warnings
cargo fmt --check
cargo llvm-cov --workspace --fail-under-lines 80   # >80% coverage gate (CI-enforced)

CI runs fmt-check, clippy -D warnings, the full test suite, and the coverage gate on the UnityInFlow self-hosted runners. See CONTRIBUTING.md.

License

MIT © Jiří Hermann