Skip to content

Cleverse/honcho

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

531 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Static Badge PyPI version NPM version Discord

Honcho is memory infrastructure for building stateful agents that understand changing people, agents, groups, projects, and ideas over time.

Store messages and events, let Honcho reason in the background, then query peer representations, session context, search results, or natural-language insights from any model or framework. Use it managed at api.honcho.dev or self-host the FastAPI server yourself.

Using Honcho as your memory system will earn your agents higher retention, more trust, and help you build data moats to out-compete incumbents.

Honcho has defined the Pareto Frontier of Agent Memory. Watch the video, check out our evals page, and read the blog post for more detail.

Contents

The Honcho project is split between several repositories, with this one hosting the core service logic — implemented as a FastAPI server. Client SDKs for Python and TypeScript live in the sdks/ directory.

Start Here

I want to... Path Get started
Give my coding agent persistent memory Claude Code, OpenCode, OpenClaw, Hermes, or any MCP client Integrations
Add memory to my product Python or TypeScript SDK Quickstart
Self-host Honcho Docker / local development Self-hosting

Why Honcho

Capability What it means
Reasoning-first memory Extracts conclusions from conversations and events, not just matching chunks.
Peer-centric model Tracks users, agents, groups, projects, and ideas as entities that change over time.
Multi-peer perspective Models what one peer knows about another when configured.
Managed or self-hosted Use api.honcho.dev or run the FastAPI server yourself.
Agent-tool integrations MCP, Claude Code, OpenCode, OpenClaw, Hermes, Cursor-compatible clients.

The Honcho Loop

  1. Store conversations, events, documents, or tool traces as messages on a session.
  2. Reason — Honcho processes the queue in the background and updates peer representations.
  3. Query — ask Honcho for context, search results, peer representations, or a natural-language answer.
  4. Inject — drop the result into any LLM call or agent framework.

Concretely: workspaces hold peers, peers participate in sessions, messages live on sessions, and Honcho builds a per-peer representation that you query through the Chat Endpoint or directly.

Quickstart

Get an API key at app.honcho.dev — when you sign up you'll be prompted to join an organization, which gets its own dedicated Honcho instance and $100 free credits. Or self-host and run against http://localhost:8000.

Python

pip install honcho-ai
# or: uv add honcho-ai
# or: poetry add honcho-ai
import os
from honcho import Honcho

# Managed service uses api.honcho.dev by default. For self-hosted, pass
# base_url="http://localhost:8000" or set HONCHO_URL.
honcho = Honcho(
    workspace_id="my-app-testing",
    api_key=os.environ["HONCHO_API_KEY"],
)

# 1. Store: peers and messages on a session
alice = honcho.peer("alice")
tutor = honcho.peer("tutor")
session = honcho.session("session-1")
session.add_messages([
    alice.message("Hey there — can you help me with my math homework?"),
    tutor.message("Absolutely. Send me your first problem!"),
])

# 2. Reason: happens asynchronously in the background.

# 3. Query: ask Honcho what it knows, or pull prompt-ready context.
answer = alice.chat("What learning styles does the user respond to best?")
context = session.context(summary=True, tokens=10_000)

# 4. Inject: hand the context to your model of choice.
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
    model=os.environ.get("OPENAI_MODEL", "gpt-4o-mini"),
    messages=context.to_openai(assistant=tutor),
)

TypeScript

npm install @honcho-ai/sdk
# or: bun add @honcho-ai/sdk
import { Honcho } from "@honcho-ai/sdk";
import OpenAI from "openai";

const honcho = new Honcho({
  workspaceId: "my-app-testing",
  apiKey: process.env.HONCHO_API_KEY,
});

const alice = await honcho.peer("alice");
const tutor = await honcho.peer("tutor");
const session = await honcho.session("session-1");
await session.addMessages([
  alice.message("Hey there — can you help me with my math homework?"),
  tutor.message("Absolutely. Send me your first problem!"),
]);

const answer = await alice.chat(
  "What learning styles does the user respond to best?",
);
const context = await session.context({ summary: true, tokens: 10_000 });

const openai = new OpenAI();
const completion = await openai.chat.completions.create({
  model: process.env.OPENAI_MODEL ?? "gpt-4o-mini",
  messages: context.toOpenAI({ assistant: tutor }),
});

Note: background reasoning is asynchronous. Newly-added messages may take a moment to be reflected in chat/representation responses; for low-latency reads, use the representation endpoint.

What Honcho Gives You

Need API
Save interaction history session.add_messages(...)
Ask what Honcho knows about a peer peer.chat(...)
Get prompt-ready context session.context(...).to_openai(...) / .to_anthropic(...)
Hybrid search (BM25 + vector) peer.search(...), session.search(...), honcho.search(...)
Low-latency static representations peer.representation(...), session.representation(...)
Import documents session.upload_file(...)
Inspect background processing honcho.queue_status(...)

See the full SDK Reference and API Reference.

Integrations

Claude Code

Two ways, depending on how deep you want to go:

Plugin (richer integration — recommended for Claude Code users):

/plugin marketplace add plastic-labs/claude-honcho
/plugin install honcho@honcho

Raw MCP (works in any MCP client — Cursor, Cline, Windsurf, etc.):

claude mcp add honcho \
  --transport http \
  --url "https://mcp.honcho.dev" \
  --header "Authorization: Bearer hch-your-key-here" \
  --header "X-Honcho-User-Name: YourName"

Details: Claude Code guide · MCP guide.

OpenCode

opencode plugin "@honcho-ai/opencode-honcho" --global

Details: OpenCode guide.

OpenClaw

openclaw plugins install @honcho-ai/openclaw-honcho
openclaw honcho setup
openclaw gateway --force

openclaw honcho setup prompts for your API key, writes the config, and optionally migrates legacy MEMORY.md / USER.md / IDENTITY.md files into Honcho (non-destructive — originals are never deleted). Details: OpenClaw guide.

Hermes

hermes memory setup   # select "honcho", point at api.honcho.dev or your local server

Details: Hermes guide.

Add Honcho to your own codebase (agent skill)

For wiring the Honcho SDK into an existing application, install the integration skill — it explores your codebase, asks about integration preferences, generates the SDK setup, and verifies it works:

npx skills add plastic-labs/honcho

Then invoke /honcho-integration in Claude Code (or /honcho-dev:integrate via the plugin marketplace). Details: agentic development guide.

Other MCP clients

The same claude mcp add form (or its client-specific equivalent) works in any MCP-compatible client. See MCP guide.

Core Concepts

Honcho organises everything around peers — humans and AI agents alike are first-class entities. The peer model enables:

  • Multi-participant sessions with mixed human and AI agents
  • Configurable observation settings (which peers observe which others)
  • Flexible identity management for all participants
  • Support for complex multi-agent interactions

Peers exchange messages within sessions; Honcho reasons over those messages to build a representation of each peer that you can query.

  • Workspace (formerly App): top-level container; isolates data between use cases.
  • Peer (formerly User): any participant — human user or AI agent.
  • Session: a conversation context; many-to-many with peers.
  • Message: an atomic data unit (peer-to-peer communication or ingested document chunk).

What you query out of Honcho:

  • Conclusions — what Honcho has extracted about a peer (deductive and inductive). Exposed via the conclusions API.
  • Representations — static, low-latency snapshots of what Honcho knows about a peer (optionally session-scoped).
  • Peer Cards — compact identity summaries.
  • Session context / summaries — prompt-ready bundles for long-running conversations.
Internal storage (Collections & Documents)

Internally, Honcho stores peer-related observations in collections of vector-embedded documents. Collections are keyed by (observer, observed) peer pairs — the same mechanism powers self-representation (observer == observed) and cross-peer modelling (peer X's understanding of peer Y). These primitives are not exposed directly; the Conclusions API is the public surface.

Benchmarks & Evals

Honcho's evals span LongMemEval, LoCoMo, and other long-conversation benchmarks. See the evals page, the research blog post, and the Pareto-frontier announcement video for methodology and reproducible results.

Self-hosting

Honcho is open source under AGPL-3.0. You can run the full server locally with Docker, then point the SDKs at http://localhost:8000.

Quick start (Docker)

git clone https://github.com/plastic-labs/honcho.git
cd honcho
cp docker-compose.yml.example docker-compose.yml
cp .env.template .env       # fill in LLM_GEMINI_API_KEY / LLM_ANTHROPIC_API_KEY / LLM_OPENAI_API_KEY
docker compose up

Then point the SDKs at it:

honcho = Honcho(workspace_id="my-app-testing", base_url="http://localhost:8000")
# or: export HONCHO_URL=http://localhost:8000
Local development without Docker

Below is a guide on setting up a local environment for running the Honcho Server without Docker.

Prerequisites and Dependencies

Honcho is developed using python and uv.

The minimum python version is 3.10 The minimum uv version is 0.5.0

Setup

Once the dependencies are installed on the system run the following steps to get the local project setup.

  1. Clone the repository
git clone https://github.com/plastic-labs/honcho.git
  1. Enter the repository and install the python dependencies

We recommend using a virtual environment to isolate the dependencies for Honcho from other projects on the same system. uv will create a virtual environment when you sync your dependencies in the project.

cd honcho
uv sync

This will create a virtual environment and install the dependencies for Honcho. The default virtual environment will be located at honcho/.venv. Activate the virtual environment via:

source honcho/.venv/bin/activate
  1. Set up a database

Honcho utilizes Postgres for its database with pgvector. An easy way to get started with a postgres database is to create a project with Supabase

Alternatively, a docker-compose template is available with a sample database configuration. To use Docker:

cp docker-compose.yml.example docker-compose.yml
docker compose up -d database
  1. Edit the environment variables

Honcho uses a .env file for managing runtime environment variables. A .env.template file is included for convenience. Several of the configurations are not required and are only necessary for additional logging, monitoring, and security.

Below are the required configurations:

DB_CONNECTION_URI= # Connection uri for a postgres database (with postgresql+psycopg prefix)

# LLM Provider API Keys
LLM_GEMINI_API_KEY= # API Key for Google Gemini (used for deriver, summary, and dialectic minimal/low by default)
LLM_ANTHROPIC_API_KEY= # API Key for Anthropic (used for dialectic medium/high/max and dream by default)
LLM_OPENAI_API_KEY= # API Key for OpenAI (used for embeddings when EMBED_MESSAGES=true)

Note that the DB_CONNECTION_URI must have the prefix postgresql+psycopg to function properly. This is a requirement brought by sqlalchemy

The template has the additional functionality disabled by default. To ensure that they are disabled you can verify the following environment variables are set to false:

AUTH_USE_AUTH=false
SENTRY_ENABLED=false

If you set AUTH_USE_AUTH to true you will need to generate a JWT secret. You can do this with the following command:

python scripts/generate_jwt_secret.py

This will generate a JWT secret and print it to the console. You can then set the AUTH_JWT_SECRET environment variable. This is required for AUTH_USE_AUTH:

AUTH_JWT_SECRET=<generated_secret>
  1. Run database migrations

With the database set up and environment variables configured, run the migrations to create the necessary tables:

uv run alembic upgrade head

This will create all tables for Honcho including workspaces, peers, sessions, messages, and the queue system.

  1. Launch Honcho

With everything set up, you can now launch a local instance of Honcho. In addition to the database, two components need to be running:

Start the API server:

uv run fastapi dev src/main.py

This is a development server that will reload whenever code is changed.

Start a background worker (deriver):

In a separate terminal, run:

uv run python -m src.deriver

The deriver generates representations, summaries, peer cards, and manages dreaming tasks. You can increase the number of derivers to improve runtime efficiency.

Contributors: see CONTRIBUTING.md for pre-commit setup. Deploying to Fly.io: see Self-hosting docs → Deploying on Fly.io.

Configuration

Honcho uses a flexible configuration system that supports both TOML files and environment variables. Configuration values are loaded in priority order: environment variables > .env file > config.toml > defaults.

Full configuration reference

Using config.toml

Copy the example configuration file to get started:

cp config.toml.example config.toml

Then modify the values as needed. The TOML file is organized into sections:

  • [app] - Application-level settings (log level, session limits, embedding settings, namespace)
  • [db] - Database connection and pool settings
  • [auth] - Authentication configuration
  • [cache] - Redis cache configuration
  • [llm] - LLM provider API keys and general settings
  • [deriver] - Background worker settings and representation configuration
  • [peer_card] - Peer card generation settings
  • [dialectic] - Chat Endpoint configuration with per-level reasoning settings
  • [summary] - Session summarization settings
  • [dream] - Dream processing configuration (including specialist models and surprisal settings)
  • [webhook] - Webhook configuration
  • [metrics] - Prometheus pull-based metrics
  • [telemetry] - CloudEvents telemetry for analytics
  • [vector_store] - Vector store configuration (pgvector, turbopuffer, or lancedb)
  • [sentry] - Error tracking and monitoring settings

Using Environment Variables

All configuration values can be overridden using environment variables. The environment variable names follow this pattern:

  • {SECTION}_{KEY} for top-level section settings
  • Use __ inside {KEY} for nested settings
  • Just {KEY} for app-level settings

Examples:

  • DB_CONNECTION_URI - Database connection string
  • AUTH_JWT_SECRET - JWT secret key
  • DERIVER_MODEL_CONFIG__TRANSPORT - Transport for the background deriver
  • SUMMARY_MODEL_CONFIG__MODEL - Summary model override
  • DIALECTIC_LEVELS__low__MODEL_CONFIG__MODEL - Model for low reasoning level
  • LOG_LEVEL - Application log level
  • METRICS_ENABLED - Enable Prometheus metrics
  • TELEMETRY_ENABLED - Enable CloudEvents telemetry

Example

If you have this in config.toml:

[db]
CONNECTION_URI = "postgresql+psycopg://localhost/honcho_dev"
POOL_SIZE = 10

You can override just the connection URI in production:

export DB_CONNECTION_URI="postgresql+psycopg://prod-server/honcho_prod"

The application will use the production connection URI while keeping the pool size from config.toml.

Architecture

Honcho splits into two services: Storage (workspaces, peers, sessions, messages, internal collections) and Insights (reasoning, conclusions, representations, summaries, the chat endpoint). Storage is synchronous via the API; Insights is asynchronous via a background queue consumed by the deriver worker process.

Key features:

  • Rich Reasoning System — multiple implementation methods that extract conclusions from interactions and build comprehensive representations of peers
  • Chat Endpoint — reasoning-informed responses that integrate conclusions with current context
  • Background Processing — asynchronous processing pipeline for expensive operations like representation updates and session summarization
  • Multi-Provider Support — configurable LLM providers for different use cases
Storage primitives in detail

Honcho contains several different primitives used for storing application and peer data. This data is used for managing conversations, modeling peer identity, building RAG applications, and more.

The philosophy behind Honcho is to provide a platform that is peer-centric and easily scalable from a single user to a million.

Below is a mapping of the different primitives and their relationships.

Workspaces
├── Peers ←──────────────────┐
│   ├── Sessions             │
│   └── (internal collections, keyed by observer/observed peer pair)
│                            │
│                            │
└── Sessions ←───────────────┤ (many-to-many)
    ├── Peers ───────────────┘
    └── Messages (session-level)

Relationship Details:

  • A Workspace contains multiple Peers.
  • Peers and Sessions have a many-to-many relationship (peers can participate in multiple sessions, sessions can have multiple peers).
  • Messages belong to a session and are labelled by their source peer.
  • Internal collections of vector-embedded documents are keyed by (observer, observed) peer pairs. They are not directly exposed via the API; the observations stored in them are exposed as Conclusions.

Users familiar with APIs such as the OpenAI Assistants API will be familiar with much of the mapping here.

Workspaces

This is the top level construct of Honcho. Developers can register different Workspaces for different assistants, agents, AI enabled features, etc. It is a way to isolate data between use cases and provide multi-tenant capabilities.

Peers

Within a Workspace everything revolves around a Peer. The Peer object represents any participant in the system — whether human users or AI agents. This unified model enables complex multi-participant interactions.

Sessions

The Session object represents a set of interactions between Peers within a Workspace. Other applications may refer to this as a thread or conversation. Sessions can involve multiple peers with configurable observation settings.

Messages

The Message represents an atomic data unit that exists at the session level: communication between peers within a session context. All messages are labelled by their source peer and can be processed asynchronously to update their representations. This flexible design allows for both conversational interactions and broader data ingestion for personality modelling.

Reasoning pipeline

The reasoning functionality of Honcho is built on top of the Storage service. As Messages and Sessions are created for Peers, Honcho will asynchronously reason about peer psychology to derive facts about them and store them in reserved internal collections.

A high level summary of the pipeline is as follows:

  1. Messages are created via the API.
  2. Derivation tasks are enqueued for background processing, including:
    • representation: update representations of Peers.
    • summary: create summaries of Sessions.
  3. Session-based queue processing ensures proper ordering.
  4. Results are stored internally and surfaced via the Conclusions API, Representations, Peer Cards, and the Chat Endpoint.
Retrieving data and insights

Honcho exposes several different ways to retrieve data from the system to best serve the needs of any given application.

Get Context

In long-running conversations with an LLM, the context window can fill up quickly. To address this, Honcho provides a context endpoint that returns a combination of messages, conclusions, summaries from a session up to a provided token limit.

Use this to keep sessions going indefinitely. If you'd like to see this in action, try out Honcho Chat.

Search

There are several search endpoints that let developers query messages at the Workspace, Session, or Peer level using a hybrid search strategy.

Requests can include advanced filters to further refine the results.

Chat API

The flagship interface for using these insights is the Chat Endpoint (POST /peers/{peer_id}/chat). It takes natural-language requests to get data about a peer and returns reasoning-grounded responses. Examples:

  • Asking Honcho for a generic or specific insight about the peer.
  • Asking Honcho to hydrate a prompt with data about the peer's behaviour.
  • Asking Honcho for a second opinion on how to respond.
  • Getting personalised responses that incorporate long-term facts and context.

Representations

For low-latency use cases, Honcho provides access to a representation endpoint that returns a static document with insights about a peer in the context of a particular session. Use this to quickly add context to a prompt without having to wait for an LLM response.

SDKs

SDKs are versioned independently of the server. Current SDK versions track each other; the server badge above reflects the deployed server version.

See the SDK Reference for full API surface, the API Reference for the raw HTTP API, and per-SDK example folders for runnable demos.

Learn More

Contributing

We welcome contributions to Honcho! Please read our Contributing Guide for details on our development process, coding conventions, and how to submit pull requests.

License

Honcho is licensed under the AGPL-3.0 License. Learn more at the License file.

About

Memory library for building stateful agents

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 90.5%
  • TypeScript 9.5%