Skip to content

decko/raki

Repository files navigation

RAKI -- Retrieval Assessment for Knowledge Impact

A CLI tool that evaluates agentic RAG quality from session transcripts.

Report Preview

RAKI HTML Report

Three tiers of metrics

Tier What you need Metrics
Operational Nothing (zero config) Verify rate, rework cycles, cost, severity, latency, tokens, self-correction
Knowledge --docs-path Knowledge gap rate, knowledge miss rate
Analytical --judge Faithfulness, answer relevancy, context precision, context recall

Features

  • Operational metrics -- verify rate, rework cycles, severity, cost, latency, tokens, self-correction (no LLM required)
  • Knowledge metrics -- gap rate and miss rate based on project documentation coverage
  • Analytical metrics -- Ragas-backed context precision/recall, faithfulness, answer relevancy (LLM judge)
  • HTML reports -- interactive reports with session-level detail and color-coded thresholds
  • Quality gates -- per-metric --gate thresholds and --fail-on-regression for CI
  • Pluggable adapters -- bring any session format; built-in support for session-schema and Alcove

Quick Start

# Install
uv pip install raki[html]

# Validate manifest
uv run raki validate --manifest raki.yaml

# Run operational metrics (default, no API keys needed)
uv run raki run --manifest raki.yaml

# Add knowledge metrics
uv run raki run --manifest raki.yaml --docs-path ./docs

# Add analytical metrics (requires LLM credentials)
uv run raki run --manifest raki.yaml --judge --judge-provider anthropic

# Add analytical metrics via LiteLLM (e.g. OpenAI)
uv run raki run --manifest raki.yaml --judge --judge-provider litellm --judge-model gpt-4o

Usage

# Run all tiers (operational + knowledge + analytical)
uv run raki run --manifest raki.yaml --docs-path ./docs --judge

# Run with direct Anthropic API
uv run raki run --manifest raki.yaml --judge --judge-provider anthropic

# Run with LiteLLM (any LiteLLM-supported model)
uv run raki run --manifest raki.yaml --judge --judge-provider litellm --judge-model gpt-4o

# Run specific metrics only
uv run raki run --manifest raki.yaml --metrics cost_efficiency,rework_cycles

# Quality gates for CI
uv run raki run --manifest raki.yaml \
  --gate 'first_pass_verify_rate>0.85' \
  --gate 'rework_cycles<1.5' \
  --quiet

# List available metrics
uv run raki metrics

# Validate manifest and session data
uv run raki validate --manifest raki.yaml --deep

# Compare two evaluation runs
uv run raki report --diff results/baseline.json results/compare.json --fail-on-regression

# List available adapters
uv run raki adapters

Documentation

Development

uv sync --python 3.12 --all-extras
uv run pytest tests/ -v
uv run ruff check src/ tests/
uv run ruff format src/ tests/
uv run ty check src/raki/

See CONTRIBUTING.md for the full contribution workflow.

License

Apache 2.0 -- see LICENSE for details.

About

Retrieval Assessment for Knowledge Impact — evaluate agentic RAG quality from session transcripts

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages