Aetheris

The reliability layer your AI agents are missing.

Your agent is processing 1,000 customer records. It reaches record 847 — and the process dies.

Without Aetheris: start over from record 1. Re-run 847 LLM calls. Pay twice. Pray nothing was written twice.

With Aetheris: restart. It resumes from record 847. Zero duplicates. Zero data loss.

The problem with AI agents in production

Every production AI agent eventually hits the same three walls:

Failure mode	What happens today
Process crash mid-task	Restart from the beginning; re-run all LLM calls
Retry after tool failure	Email sent twice, order created twice, payment charged twice
"Why did the AI do that?"	No visibility, no audit trail, no replay

Aetheris is an open-source runtime that solves all three — without requiring you to rewrite your agent.

Quickstart — no Docker required

Requirements: Go 1.26.1+, Git

git clone https://github.com/Colin4k1024/Aetheris.git
cd Aetheris
make run-embedded        # starts with embedded SQLite, no external services

curl http://localhost:8080/api/health   # {"status":"ok", ...}

From Python (pip install aetheris):

from aetheris import AetherisClient

client = AetherisClient("http://localhost:8080")
job = client.run("my-agent", "Summarize the Q3 earnings report")
result = job.wait()
print(result.output)

From any language — Aetheris exposes a REST API. Wrap your existing agent with two config lines:

# configs/api.embedded.yaml
agents:
  agents:
    my_python_agent:
      type: "external_http"
      external:
        url: "http://localhost:9000/invoke"
        timeout: "120s"

Then submit a job:

curl -X POST http://localhost:8080/api/agents/my_python_agent/message \
  -H "Idempotency-Key: task-001" \
  -H "Content-Type: application/json" \
  -d '{"message": "Process customer batch #42"}'

→ Full quickstart guide

Core guarantees

1. Crash recovery

Every job step is checkpointed. If the worker dies, the next worker picks up from the last checkpoint — not the beginning.

Job progress:  ████████████████████░░░░░░░░░░  (step 16/25)
Worker crash!  💀
Restart:       ████████████████████            (resumes at step 16)

2. At-most-once tool execution

External API calls (payments, emails, order creation) are wrapped in an invocation ledger. Even if a step is retried, each side effect runs exactly once.

# Without Aetheris:  retry → email sent twice
# With Aetheris:     retry → ledger returns cached result, email sent once

3. Full decision audit trail

Every LLM call, tool invocation, and checkpoint is appended to an immutable event log. You can replay any job from any point — without re-calling LLMs or external APIs.

aetheris trace <job-id>    # view the full decision timeline
aetheris replay <job-id>   # replay without side effects

Connect your existing agent

Aetheris works with any agent, in any language. You don't need to change your agent code.

For split API/Worker deployments, load the same external_http agent definition into both processes so the API can accept /api/agents/:id/message and the Worker can execute the job.

Python (LangChain / any agent)

# Your existing LangChain agent — unchanged
from langchain_openai import ChatOpenAI
from langchain.agents import create_react_agent

agent = create_react_agent(ChatOpenAI(), tools, prompt)

# Expose it as an HTTP endpoint (one function)
from aetheris.integrations.langchain import serve
serve(agent, port=9000)   # Aetheris will call this endpoint durably

→ Full LangChain integration guide

Any HTTP service

# Add to configs/api.embedded.yaml
agents:
  agents:
    my_agent:
      type: "external_http"
      external:
        url: "http://your-agent:9000/invoke"

Your agent receives a job envelope with message, job_id, and idempotency_key. It returns {"answer": "...", "final": true}.

→ External HTTP adapter docs

Go (Eino / native)

// Built-in via AgentFactory — config-driven
// configs/agents.yaml
agents:
  my_eino_agent:
    type: "react"
    llm: "default"
    tools: ["web_search", "calculator"]

→ Eino integration guide

How it works

Your Agent (Python/JS/Go/any)
        │
        ▼
  Aetheris API ──── idempotency key ──▶ Invocation Ledger
        │                                    (at-most-once)
        ▼
  Durable Worker ──── checkpoint ──────▶ Event Store
        │                                    (crash recovery)
        ▼
  Trace & Replay API ───────────────────────────────▶ Audit

The runtime is event-sourced: every state transition is an append-only event. This enables deterministic replay — the same job can be re-run at any time without re-calling LLMs or APIs.

vs. LangGraph Platform / Temporal / vanilla frameworks

	Aetheris	LangGraph Platform	Temporal
Open source + self-hosted	✅	❌ (cloud only)	✅
No infrastructure for local dev	✅ (embedded SQLite)	❌	❌ (requires server)
At-most-once tool execution	✅ built-in	⚠️ manual	⚠️ manual
Works with any agent framework	✅	❌ LangGraph only	❌ requires SDK
LLM decision audit trail	✅	✅	❌
Deterministic replay	✅	❌	❌

Explore the external_http batch demo

See the current black-box adapter boundary in 2 minutes:

cd examples/crash_recovery
pip install aetheris
python demo.py
# Starts a local external_http demo agent and submits one durable batch job

The example shows durable submission and trace visibility around one external HTTP call. For true per-step checkpoint resume inside the work itself, use native Aetheris tools/workflows instead of a single external_http call.

→ External HTTP batch demo

Repository map

Path	Purpose
cmd/api	HTTP API service
cmd/worker	Background job worker
cmd/cli	CLI: `aetheris trace/replay/jobs/chat`
configs	Runtime configs (embedded, Docker, production)
examples	Working examples for each integration pattern
sdk/python	Python SDK (`pip install aetheris`)
docs	Guides, API reference, design notes
internal/agent	Core runtime engine

Documentation

Goal	Link
Get started in 5 minutes	docs/guides/quickstart.md
Connect an existing HTTP agent	docs/adapters/external-http-agent.md
Connect a LangChain agent	docs/adapters/langchain.md
Understand crash recovery	docs/guides/runtime-guarantees.md
Deploy to production (Docker)	docs/guides/deployment.md
API reference	docs/reference/api.md

License

Apache 2.0 — free to use, self-host, and modify.

Name		Name	Last commit message	Last commit date
Latest commit History 355 Commits
.githooks		.githooks
.github		.github
artifacts/release		artifacts/release
blog		blog
business		business
cmd		cmd
community/langchain-community		community/langchain-community
configs		configs
content		content
deployments		deployments
design		design
docs		docs
examples		examples
internal		internal
pkg		pkg
scripts		scripts
sdk		sdk
templates		templates
tools		tools
.gitignore		.gitignore
.golangci.yml		.golangci.yml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
COMMUNITY.md		COMMUNITY.md
COMMUNITY_GROWTH.md		COMMUNITY_GROWTH.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
Makefile		Makefile
OPEN_SOURCE_CHECKLIST.md		OPEN_SOURCE_CHECKLIST.md
OPEN_SOURCE_FEASIBILITY_REPORT.md		OPEN_SOURCE_FEASIBILITY_REPORT.md
PROJECT-BOARD.md		PROJECT-BOARD.md
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
eino_examples.test		eino_examples.test
executor.test		executor.test
generate_test_data.sh		generate_test_data.sh
go.mod		go.mod
go.sum		go.sum
runtime.test		runtime.test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aetheris

The problem with AI agents in production

Quickstart — no Docker required

Core guarantees

1. Crash recovery

2. At-most-once tool execution

3. Full decision audit trail

Connect your existing agent

Python (LangChain / any agent)

Any HTTP service

Go (Eino / native)

How it works

vs. LangGraph Platform / Temporal / vanilla frameworks

Explore the external_http batch demo

Repository map

Documentation

License

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Aetheris

The problem with AI agents in production

Quickstart — no Docker required

Core guarantees

1. Crash recovery

2. At-most-once tool execution

3. Full decision audit trail

Connect your existing agent

Python (LangChain / any agent)

Any HTTP service

Go (Eino / native)

How it works

vs. LangGraph Platform / Temporal / vanilla frameworks

Explore the external_http batch demo

Repository map

Documentation

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages