Implement the following plan:
Build a dual-layer memory system (personal + shared) that operates as an MCP server, integrable with any IDE (Cursor, VS Code, Claude Code) and droppable into any project.
After analyzing the Mem0 repository, here's what I found:
1. Fact Extraction Prompt (prompts.py:14-59)
Extracts atomic facts from conversations:
Input: "Hi, my name is John. I am a software engineer."
Output: {"facts": ["Name is John", "Is a Software engineer"]}
2. Memory Update Prompt (prompts.py:175-323)
Decides ADD/UPDATE/DELETE/NONE for each fact:
- ADD: New fact not in memory
- UPDATE: Fact contradicts/extends existing (e.g., "likes Python" → "loves Python and Rust")
- DELETE: Fact contradicts existing (e.g., "loves pizza" → "hates pizza")
- NONE: Fact already exists
| File | Lines | Purpose | Can Extract? |
|---|---|---|---|
mem0/configs/prompts.py |
460 | The prompts | ✅ Yes (no deps) |
mem0/memory/main.py |
800 | Memory class | |
mem0/memory/storage.py |
219 | SQLite history | ✅ Yes (just sqlite3) |
# Simplified from main.py lines 370-530
def add(messages, user_id, ...):
# 1. Extract facts using LLM
facts = llm.generate(FACT_EXTRACTION_PROMPT + messages)
# → ["Name is John", "Prefers Python"]
# 2. For each fact, search existing memory
for fact in facts:
existing = vector_store.search(fact, limit=5)
# 3. Ask LLM to decide action
actions = llm.generate(UPDATE_MEMORY_PROMPT + existing + facts)
# → [{"id": "new", "text": "Name is John", "event": "ADD"},
# {"id": "0", "text": "Loves Python", "event": "UPDATE"}]
# 4. Execute actions
for action in actions:
if action.event == "ADD": create_memory(...)
elif action.event == "UPDATE": update_memory(...)
elif action.event == "DELETE": delete_memory(...) ❌ No scheduled decay - Only deletes on contradiction
❌ No time-based pruning - Old memories stay forever
❌ No consolidation jobs - No nightly/weekly processing
❌ No access-count tracking - No "promote hot memories"
The decay mechanisms mentioned in their docs are:
- LRU (manual, not automatic)
- Contradiction-based deletion (real-time, not scheduled)
Core (required):
- qdrant-client # Vector store (8MB)
- pydantic # Data validation
- openai # LLM calls
- posthog # Telemetry (can disable)
- pytz # Timezone
- sqlalchemy # Not actually used much?
Transitive: ~20 packages, ~50MB total | Approach | Pros | Cons |
|---|---|---|
Use pip install mem0ai |
Battle-tested, maintained, 5 min setup | Extra 50MB deps, tied to their API |
| Extract core files | Smaller, more control | Need to maintain ourselves, miss updates |
| Adapt prompts only | Best of both worlds | Still need vector store + LLM logic |
My recommendation: Use as dependency + add our scheduler layer
Why:
- The prompts are the value - and they're just strings (we can see/adapt them)
- The vector store abstraction is useful (supports many backends)
- Maintenance burden shifts to them
- We focus on what they DON'T have: scheduled decay/consolidation
CEMS = Your custom layer on top of Mem0:
┌─────────────────────────────────────────┐
│ CEMS (What We Build) │
│ • MCP Server interface │
│ • Scheduled decay/consolidation │
│ • Personal/Shared namespacing │
│ • Claude Code skills │
└────────────────┬────────────────────────┘
│ calls
▼
┌─────────────────────────────────────────┐
│ Mem0 (pip install) │
│ • Fact extraction prompts │
│ • ADD/UPDATE/DELETE logic │
│ • Vector store abstraction │
│ • History tracking │
└─────────────────────────────────────────┘
So: CEMS is not a fork of Mem0. It's a thin orchestration layer that adds what Mem0 lacks.
| System | Accuracy | Latency | Open Source | Strengths |
|---|---|---|---|---|
| Mem0 | 66.9% | 1.4s | Apache 2.0 | Production-ready, best accuracy-speed balance |
| Letta/MemGPT | ~48% | 4.4s | True OSS | Self-directed memory, OS-like architecture |
| Zep | Best open-domain | Async delays | Community Ed | Temporal knowledge graph (Graphiti engine) |
| Cognee | Good | Good | Open | Vector + Graph hybrid, LlamaIndex integration |
| Server | Features |
|---|---|
| mcp-mem0 | Mem0 integration, template for Python MCP |
| mcp-memory-service | Semantic search, 13+ AI tools, cloud sync |
| OpenMemory MCP | Local-first, cross-client (Cursor, Claude, Windsurf) |
| simple-memory-mcp | Knowledge graph, JSON persistence |
| mcp-memory-keeper | Claude Code specific, session context |
Recommendation: Build on Mem0 (or Cognee) as the core memory engine, add custom layers for:
- Dual-tenant architecture (personal + shared)
- Memory decay/consolidation
- MCP server wrapper
- IDE commands/skills
The article proposes explicit scheduled maintenance - treating memory like a system that needs garbage collection:
| Schedule | Task | Purpose |
|---|---|---|
| Nightly (3 AM) | Consolidation | Merge duplicates, promote hot memories |
| Weekly (Sunday) | Summarization | Compress old items, prune 90-day stale |
| Monthly (1st) | Re-indexing | Rebuild embeddings, reweight graph, archive 180-day dead |
| System | Decay/Forgetting | Consolidation | Scheduled Jobs |
|---|---|---|---|
| Mem0 | ✅ LRU decay, auto-filtering | ✅ Real-time (ADD/UPDATE/DELETE decisions) | |
| Letta/MemGPT | ✅ Strategic eviction | ✅ Self-directed by LLM | |
| Zep | ✅ Graph-based | ❌ Heavy async processing | |
| Cognee | ❌ None |
Mem0's approach:
- Has "filtering & decay" that automatically prunes outdated memories
- Uses LRU (least-recently-used) policies
- Background summarizer runs asynchronously (not on a schedule)
- Limitation: "Consolidation is not fully automated—duplicate and semantically similar memories may accumulate"
Letta/MemGPT's approach (closest to the article):
- LLM manages its own memory via tool calls
- Periodic summarization when context gets full
- "Sleep-time agents" can consolidate memory
- Most sophisticated but requires more infrastructure
None of the existing systems have explicit cron-based maintenance like the article describes. This is the custom layer we add:
# scheduler.py - What we build on top of Mem0/Letta
from apscheduler.schedulers.asyncio import AsyncIOScheduler
scheduler = AsyncIOScheduler()
# Nightly: 3 AM - Merge duplicates, promote hot
@scheduler.scheduled_job('cron', hour=3)
async def nightly_consolidation():
for user_id in get_active_users():
await consolidate_memories(user_id)
await promote_hot_memories(user_id)
# Weekly: Sunday 4 AM - Summarize old, prune stale
@scheduler.scheduled_job('cron', day_of_week='sun', hour=4)
async def weekly_summarization():
for user_id in get_active_users():
await resummarize_categories(user_id)
await prune_stale_memories(user_id, days=90)
# Monthly: 1st of month 5 AM - Full reindex
@scheduler.scheduled_job('cron', day=1, hour=5)
async def monthly_reindex():
for user_id in get_active_users():
await rebuild_embeddings(user_id)
await archive_dead_nodes(user_id, days=180) Use Mem0 as the core (best retrieval accuracy) + add custom scheduled workers for the decay/consolidation behavior you liked from the article.
This gives you:
- ✅ Production-ready retrieval (Mem0)
- ✅ Real-time ADD/UPDATE/DELETE decisions (Mem0)
- ✅ Explicit scheduled maintenance (custom)
- ✅ Full control over decay policies (custom)
Think of it like organizing a library:
| Type | What It Does | Real-World Analogy |
|---|---|---|
| Relational DB (PostgreSQL/SQLite) | Stores structured data with relationships | Excel spreadsheet with linked tables |
| Vector Store (Qdrant/LanceDB) | Stores "meaning" of text for similarity search | Finding books by "vibe" not just keywords |
| Graph DB (Neo4j/Kuzu) | Stores connections between things | Mind map showing how ideas connect |
Memory: "Razvan prefers Python over JavaScript for backend"
│
├─► Relational DB: Stores metadata
│ user_id: razvan, created: 2025-01-19, category: "preferences"
│
├─► Vector Store: Stores meaning for search
│ "What does Razvan like?" → finds this memory
│
└─► Graph DB: Stores relationships
[Razvan] ─prefers─► [Python] ─for─► [Backend]
[Razvan] ─avoids─► [JavaScript] ─for─► [Backend]
Option A: Embedded (Simpler, Local-First)
SQLite → Stores user data, metadata (single file)
LanceDB → Vector search (single folder, no server)
Kuzu → Graph queries (embedded, no server)
✅ Zero external services
✅ Works offline
✅ Single `~/.cems/` folder
❌ Doesn't scale to team use
Option B: Full Stack (Powerful, Team-Ready)
PostgreSQL → User data, metadata, multi-user
Qdrant → Vector search (dedicated server)
Neo4j → Graph queries (dedicated server)
✅ Scales to team/company
✅ Better performance
✅ Production-grade
❌ Needs Docker or cloud
My Recommendation for You:
Since you want local-first personal initially → Start with Embedded (Option A)
Later, for team/company use → Migrate to Full Stack (Option B)
The code stays the same - just change connection strings.
┌─────────────────────────────────────────────────────────────────┐
│ CEMS Platform │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Personal │ │ Shared │ │ MCP Server │ │
│ │ Memory │◄──►│ Memory │◄──►│ Interface │ │
│ │ (per-user) │ │ (company) │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ └────────┬─────────┘ │
│ │ │ │ │
│ └─────────┬─────────┘ │ │
│ │ │ │
│ ┌─────────▼─────────┐ │ │
│ │ Memory Engine │◄─────────────────────┘ │
│ │ (Mem0/Cognee) │ │
│ └─────────┬─────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────┐ ┌──────────┐ ┌──────────┐ │
│ │Vector│ │Knowledge │ │ Temporal │ │
│ │Store │ │ Graph │ │ Index │ │
│ └──────┘ └──────────┘ └──────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Background Workers │ │
│ │ • Nightly Consolidation • Weekly Summarization │ │
│ │ • Monthly Re-indexing • Memory Decay │ │
│ └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Client Integrations │
├──────────┬──────────┬──────────┬──────────┬────────────────────┤
│ Cursor │ VS Code │ Claude │ Windsurf│ Custom Apps │
│ IDE │ IDE │ Code │ │ (via SDK) │
└──────────┴──────────┴──────────┴──────────┴────────────────────┘
scope: Individual user
isolation: Complete - only accessible by owner
content:
- Personal preferences (coding style, tools, patterns)
- Project-specific context (current work, decisions)
- Learning history (what worked, what failed)
- Private notes and todos scope: Organization or team
access: Read by all, write with permissions
content:
- Codebase patterns and conventions
- Architecture decisions (ADRs)
- Team knowledge base
- Shared learnings and best practices
- Project documentation ┌────────────────┬─────────────┬──────────────┐
│ Memory Type │ Read Access │ Write Access │
├────────────────┼─────────────┼──────────────┤
│ Personal │ Owner only │ Owner only │
│ Shared (Team) │ Team │ Team │
│ Shared (Org) │ Org │ Admins │
└────────────────┴─────────────┴──────────────┘
# Immutable, timestamped source of truth
class Resource:
id: str
user_id: str
scope: "personal" | "shared"
content: str # Raw conversation/document
timestamp: datetime
source: str # "conversation" | "document" | "code"
metadata: dict # Extracted facts from resources
class MemoryItem:
id: str
resource_id: str # Traceability
user_id: str
scope: "personal" | "shared"
category: str # e.g., "coding_preferences", "project_decisions"
content: str # Atomic fact
confidence: float
embedding: list[float]
created_at: datetime
accessed_at: datetime
access_count: int # High-level context, actively maintained
class CategorySummary:
id: str
user_id: str
scope: "personal" | "shared"
category: str
summary: str # Markdown summary
item_count: int
last_updated: datetime
version: int // Memory Write Operations
interface MemoryTools {
// Add memory (auto-extracts facts)
"memory_add": {
content: string;
scope: "personal" | "shared";
category?: string;
};
// Search memory
"memory_search": {
query: string;
scope?: "personal" | "shared" | "both";
category?: string;
limit?: number;
};
// Get category summary
"memory_get_summary": {
category: string;
scope: "personal" | "shared";
};
// List categories
"memory_list_categories": {
scope?: "personal" | "shared" | "both";
};
// Forget specific memory
"memory_forget": {
memory_id: string;
};
// Switch context (for multi-project)
"memory_set_context": {
project_id?: string;
team_id?: string;
};
} // Read-only context
interface MemoryResources {
"memory://personal/summary": string; // Personal memory overview
"memory://shared/summary": string; // Shared memory overview
"memory://categories": CategoryList; // All categories
"memory://recent": RecentMemories; // Recently accessed
} # ~/.claude/mcp_config.json
{
"mcpServers": {
"cems": {
"command": "cems-server",
"args": ["--user", "${USER}", "--team", "myteam"],
"env": {
"CEMS_API_KEY": "${CEMS_API_KEY}",
"CEMS_SHARED_ENDPOINT": "https://memory.mycompany.com"
}
}
}
} # ~/.claude/skills/memory/skill.md
/remember <fact> # Add to personal memory
/share <fact> # Add to shared memory
/recall <query> # Search memory
/context # Show current memory context
/forget <id> # Remove specific memory // .cursor/mcp.json or settings.json
{
"mcp.servers": {
"cems": {
"command": "npx",
"args": ["-y", "@cems/mcp-server"],
"env": {
"CEMS_USER": "razvan",
"CEMS_TEAM": "engineering"
}
}
}
} async def nightly_consolidation(user_id: str):
# 1. Get today's memories
recent = await get_memories_since(user_id, hours=24)
# 2. Find and merge duplicates
duplicates = find_semantic_duplicates(recent, threshold=0.92)
for group in duplicates:
merged = merge_memories(group)
await replace_memories(group, merged)
# 3. Promote frequently accessed
hot_memories = await get_high_access_memories(user_id, threshold=5)
for mem in hot_memories:
await increase_priority(mem) async def weekly_summarization(user_id: str):
# 1. Get memories older than 30 days
old_memories = await get_memories_older_than(user_id, days=30)
# 2. Re-summarize categories
categories = group_by_category(old_memories)
for category, memories in categories.items():
summary = await generate_summary(memories)
await update_category_summary(user_id, category, summary)
# 3. Prune stale memories (not accessed in 90 days)
stale = await get_stale_memories(user_id, days=90)
await archive_memories(stale) async def monthly_reindex(user_id: str):
# 1. Rebuild embeddings with latest model
all_memories = await get_all_memories(user_id)
for mem in all_memories:
mem.embedding = await generate_embedding(mem.content)
# 2. Re-weight graph edges by access patterns
await reweight_graph_edges(user_id)
# 3. Archive dead nodes (180+ days unused)
dead_nodes = await find_dead_nodes(user_id, days=180)
await archive_nodes(dead_nodes) Based on your requirements:
- ✅ Local-first personal use
- ✅ Claude Code only (initially)
- ✅ Memory decay/consolidation (the key feature)
core_engine: mem0
why: Best retrieval accuracy (66.9%), Apache 2.0
storage: # Embedded stack (local-first)
relational: SQLite (~/.cems/memory.db)
vector: LanceDB (~/.cems/vectors/)
graph: Kuzu (~/.cems/graph/) # optional, adds relationships
mcp_server: Python (FastMCP)
why: Native mem0 SDK, single command startup
background_jobs: APScheduler (embedded)
why: No Redis/Celery needed for local use
deployment: Single Python package
pip install cems
cems-server start # That's it | Component | Build or Use? | Details |
|---|---|---|
| Memory retrieval | Use Mem0 | Don't reinvent - it's proven |
| MCP interface | Build | Thin wrapper around Mem0 |
| Scheduled decay jobs | Build | The article's approach (custom) |
| Personal/Shared tiers | Build | Namespace isolation layer |
| Claude Code skills | Build | /remember, /recall, /forget |
# Single user, everything local
cems-server start --mode personal --storage ~/.cems - SQLite + local embeddings
- No network required
- Data stays on machine
# Team deployment with shared memory
docker-compose up -d
# Includes: Postgres, Qdrant, Neo4j, CEMS API, Workers - Shared memory accessible by team
- Personal memories isolated per user
- Central dashboard for management
# Connect to hosted service
cems-server start --cloud --api-key $CEMS_API_KEY - No infrastructure management
- Multi-region support
- SOC 2 compliant
┌─────────────────────────────────────────────────────────────┐
│ CEMS (What We Build) │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌──────────────────────┐ │
│ │ MCP Server │ │ Scheduler │ │ Claude Code Skills │ │
│ │ (FastMCP) │ │ (APScheduler│ │ /remember /recall │ │
│ └──────┬──────┘ └──────┬──────┘ └──────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Memory Wrapper (namespace isolation) ││
│ │ personal:{user_id} | shared:{team_id} ││
│ └──────────────────────────┬──────────────────────────────┘│
└─────────────────────────────┼───────────────────────────────┘
│ pip install mem0ai
▼
┌─────────────────────────────────────────────────────────────┐
│ Mem0 (Dependency) │
│ • Fact extraction prompts │
│ • ADD/UPDATE/DELETE/NONE logic │
│ • Qdrant vector store │
│ • SQLite history │
└─────────────────────────────────────────────────────────────┘
Files to create:
├── cems/
│ ├── __init__.py
│ ├── memory.py # Mem0 wrapper with namespace isolation
│ ├── config.py # Configuration dataclasses
│ └── models.py # Extended metadata models
Key code:
# cems/memory.py
from mem0 import Memory
class CEMSMemory:
def __init__(self, user_id: str, team_id: str = None):
# Personal memory namespace
self.personal = Memory()
self.user_id = user_id
# Shared memory (optional)
self.team_id = team_id
self.shared = Memory() if team_id else None
def add(self, content: str, scope: str = "personal"):
"""Add memory to personal or shared namespace"""
if scope == "personal":
return self.personal.add(content, user_id=self.user_id)
elif scope == "shared" and self.shared:
return self.shared.add(content, user_id=f"team:{self.team_id}")
def search(self, query: str, scope: str = "both"):
"""Search across namespaces"""
results = []
if scope in ("personal", "both"):
results.extend(self.personal.search(query, user_id=self.user_id))
if scope in ("shared", "both") and self.shared:
results.extend(self.shared.search(query, user_id=f"team:{self.team_id}"))
return results - Install mem0ai dependency
- Create CEMSMemory wrapper class
- Implement namespace isolation (personal/shared)
- Add extended metadata tracking (access_count, last_accessed)
Files to create:
├── cems/
│ ├── scheduler.py # APScheduler jobs
│ ├── maintenance/
│ │ ├── __init__.py
│ │ ├── consolidation.py # Nightly: merge duplicates
│ │ ├── summarization.py # Weekly: compress old memories
│ │ └── reindex.py # Monthly: rebuild embeddings
Key code:
# cems/scheduler.py
from apscheduler.schedulers.background import BackgroundScheduler
from cems.maintenance import consolidation, summarization, reindex
scheduler = BackgroundScheduler()
# Nightly: 3 AM - Merge semantic duplicates, promote hot memories
@scheduler.scheduled_job('cron', hour=3)
def nightly_job():
for user_id in get_active_users():
consolidation.merge_duplicates(user_id, similarity_threshold=0.92)
consolidation.promote_hot_memories(user_id, access_threshold=5)
# Weekly: Sunday 4 AM - Summarize old categories, prune stale
@scheduler.scheduled_job('cron', day_of_week='sun', hour=4)
def weekly_job():
for user_id in get_active_users():
summarization.compress_old_memories(user_id, older_than_days=30)
summarization.prune_stale(user_id, not_accessed_days=90)
# Monthly: 1st at 5 AM - Full reindex
@scheduler.scheduled_job('cron', day=1, hour=5)
def monthly_job():
for user_id in get_active_users():
reindex.rebuild_embeddings(user_id)
reindex.archive_dead(user_id, not_accessed_days=180) - Nightly consolidation job
- Weekly summarization job
- Monthly re-indexing job
- Manual trigger CLI (
cems-cli run-maintenance)
Files to create:
├── cems/
│ ├── server.py # FastMCP server
│ └── tools.py # MCP tools implementation
Key code:
# cems/server.py
from mcp.server.fastmcp import FastMCP
from cems.memory import CEMSMemory
mcp = FastMCP("CEMS Memory Server")
@mcp.tool()
def memory_add(content: str, scope: str = "personal") -> dict:
"""Add a memory to your personal or shared namespace"""
return memory.add(content, scope=scope)
@mcp.tool()
def memory_search(query: str, scope: str = "both", limit: int = 5) -> list:
"""Search your memories"""
return memory.search(query, scope=scope)[:limit]
@mcp.tool()
def memory_forget(memory_id: str) -> dict:
"""Delete a specific memory"""
return memory.delete(memory_id) - FastMCP server with memory tools
- Authentication (user_id from env)
- MCP config generator
Files to create:
├── ~/.claude/
│ ├── mcp_config.json # CEMS server config
│ └── skills/cems/
│ └── skill.md # /remember, /recall, /forget commands
Skill definitions:
# /remember <fact>
Add a memory to your personal store.
Uses: memory_add tool with scope="personal"
# /share <fact>
Add a memory to the shared team store.
Uses: memory_add tool with scope="shared"
# /recall <query>
Search your memories for relevant information.
Uses: memory_search tool
# /forget <id>
Remove a specific memory.
Uses: memory_forget tool - MCP config template
- Skill definitions
- Installation script
After implementation, test with:
# 1. Start the server
cems-server start
# 2. In Claude Code, verify MCP connection
# Should see "cems" in available tools
# 3. Test memory operations
/remember "I prefer Python for backend work"
/recall "What do I prefer for backend?"
# Should return the memory
# 4. Test scheduled jobs manually
cems-cli run-consolidation # Force nightly job
cems-cli run-summarization # Force weekly job
# 5. Check storage
ls ~/.cems/
# Should see: memory.db, vectors/, (optionally graph/) - Mem0 GitHub
- Letta/MemGPT GitHub
- Cognee GitHub
- AI Memory Benchmark Comparison
- Survey of AI Agent Memory Frameworks
- MCP Memory Service
- OpenMemory MCP
- Collaborative Memory: Multi-User Memory Sharing
- Memory Engineering for Multi-Agent Systems
If you need specific details from before exiting plan mode (like exact code snippets, error messages, or content you generated), read the full transcript at:
/Users/razvan/.claude/projects/-Users-razvan-Development-llm-memory/4c35153c-caa9-4e6a-a319-13f37949f9cf.jsonl
- Phase 1: Core Memory Wrapper
- Install mem0ai dependency
- Create CEMSMemory wrapper class
- Implement namespace isolation (personal/shared)
- Add extended metadata tracking - Phase 2: Scheduled Maintenance
- Nightly consolidation job
- Weekly summarization job
- Monthly re-indexing job
- Manual trigger CLI - Phase 3: MCP Server
- FastMCP server with memory tools
- Authentication
- MCP config generator - Phase 4: Claude Code Integration
- MCP config template
- Skill definitions
- Installation script