Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## What this project is

MaxKB (Max Knowledge Brain) is an enterprise agent platform combining a RAG pipeline, an agentic workflow engine, and MCP tool-use. The repo is a single deployable that ships a Django/DRF backend and a Vue 3 SPA frontend built into Django's `staticfiles`.

- **Backend**: Python 3.11, Django 5.2, DRF, LangChain/LangGraph, Celery + django-celery-beat + django-apscheduler, PostgreSQL (with pgvector), Redis. Dependency manager is **uv** (see `pyproject.toml`, `tool.uv` sections — torch is pinned to CPU index on Linux/Win and the default index on macOS).
- **Frontend**: Vue 3 + Vite + Element Plus + Pinia, in `ui/`. Two entry HTMLs: `admin.html` (full console) and `chat.html` (embed/widget). The `build` script type-checks and bundles both; `build-chat` builds only the chat entry (`--mode chat`).
- **Packaging**: `installer/` contains the Dockerfile(s) and the layered start scripts (`start-all.sh` → orchestrates `start-postgres.sh`, `start-redis.sh`, `start-maxkb.sh`). The published image is `1panel/maxkb`.

## Common commands

Backend (run from repo root; `uv sync` first to install deps):

```bash
# Dev server (Django runserver on 0.0.0.0:8080) — also runs collectstatic + migrate first
python main.py dev web

# Dev: Celery worker (named "celery") or the local model service
python main.py dev celery
python main.py dev local_model

# Production-style: start everything (web + task workers)
python main.py start all # add -d for daemon, -w N for worker count, -f to force
python main.py start web
python main.py start task

# DB / static only
python main.py upgrade_db
python main.py collect_static

# Standard Django/DRF tooling (manage.py lives in apps/, not repo root)
python apps/manage.py <command>
python apps/manage.py test <app_label> # e.g. application, knowledge, chat
python apps/manage.py test application.tests # single module/class/method
```

`main.py` is the canonical entrypoint; it inserts `apps/` onto `sys.path`, sets `DJANGO_SETTINGS_MODULE=maxkb.settings`, and dispatches to custom management commands in `apps/common/management/commands/` (`start`, `stop`, `restart`, `status`, `celery`). Don't bypass it for `start`/`stop` — those commands manage daemonization and worker pools.

Frontend (run from `ui/`):

```bash
npm install
npm run dev # admin app (vite default mode)
npm run chat # chat embed (vite --mode chat)
npm run build # type-check + build admin
npm run build-chat # type-check + build chat embed
npm run lint # eslint --fix
npm run format # prettier write src/
npm run type-check # vue-tsc --build, no emit
```

Lint (Python uses ruff, configured in `pyproject.toml` — `line-length = 120`):

```bash
uv run ruff check .
uv run ruff format .
```

Docker quickstart (what users actually run):

```bash
docker run -d --name=maxkb --restart=always -p 8080:8080 -v ~/.maxkb:/opt/maxkb 1panel/maxkb
```

## Architecture

### Process model & settings split

`apps/maxkb/` is the Django project. Settings and URL routing branch on `SERVER_NAME` (set in `main.py` from the requested service):

- `SERVER_NAME=web` (default) → loads `settings/base/web.py`, `urls/web.py`. Full app: REST API, RAG, workflow engine, chat, Celery integration.
- `SERVER_NAME=local_model` → loads `settings/base/model.py`, `urls/model.py`. A separate, much smaller Django process that hosts local embedding/rerank/STT/TTS models (so the heavy ML deps don't have to live in every web worker). Bind host/port come from `LOCAL_MODEL_HOST`/`LOCAL_MODEL_PORT`.

`settings/__init__.py` composes `base + logging + auth + lib + mem`. `lib.py` builds the Celery broker URL (with Redis Sentinel support via `MAXKB_REDIS_SENTINEL_SENTINELS`). Config is read by `apps/maxkb/conf.py:ConfigManager` from env vars and `/opt/maxkb/conf` (overridable via `MAXKB_CONFIG`); key env vars are `MAXKB_DB_*`, `MAXKB_REDIS_*`, `MAXKB_CORE_WORKER`.

The `installer/start-all.sh` is what runs inside the official image: it conditionally launches embedded Postgres/Redis (only when their host is `127.0.0.1`) and then `start-maxkb.sh` runs init-shell hooks from `MAXKB_INIT_SHELL_DIR` before calling `python main.py start`. Note the v1→v2 guard: presence of `PG_VERSION` in legacy paths aborts startup.

### Django apps (under `apps/`)

Each is a self-contained DRF app with the same shape: `api/` (serializers + permissions + view inputs), `views/` (DRF views and routers), `models/`, `serializers/`, `migrations/`, plus optional `sql/`, `template/`, `task/`.

- **`application`** — Agents/applications themselves. Two big engines live here:
- `chat_pipeline/` — the request-time chat orchestration pipeline (`pipeline_manage.py` + `step/`). This is the simpler "RAG chat" path.
- `flow/` — the agentic workflow engine. `workflow_manage.py` (plus `knowledge_workflow_manage.py`, `tool_workflow_manage.py`, `*_loop_workflow_manage.py`) executes graphs built from `step_node/` node types (`ai_chat_step_node`, `condition_node`, `intent_node`, `loop_node`/`loop_start_node`/`loop_break_node`/`loop_continue_node`, `mcp_node`, `parameter_extraction_node`, `search_knowledge_node`, `tool_lib_node`, multimodal `image_*`/`speech_*`/`text_to_*` nodes, etc.). Default workflows are JSON: `default_workflow{,_en,_zh,_zh_Hant}.json`. `i_step_node.py` is the node contract; new node types subclass it and register a folder under `step_node/`.
- `long_term_memory/` — agent-level memory beyond a single chat.
- **`chat`** — Runtime chat sessions, message persistence, MCP client integration (`mcp/`), and chat-page templates (`template/`).
- **`knowledge`** — Knowledge bases, documents, paragraphs, and the vector layer (`vector/`). Background indexing in `task/`.
- **`models_provider`** — LLM/embedding/rerank/STT/TTS provider abstraction. `base_model_provider.py` is the interface; the langchain-* deps (`openai`, `anthropic`, `deepseek`, `google-genai`, `community`, `huggingface`, `ollama`, `aws`, plus `qianfan`, `zhipuai`, `volcengine`, `dashscope`, `cohere`, `tencentcloud`, `xinference-client`) plug in here.
- **`tools`** — Function/tool library exposed to workflows and MCP.
- **`users`**, **`system_manage`**, **`folders`**, **`oss`**, **`trigger`** — auth/RBAC, platform settings, folder/tree organization (uses `django-mptt`), object storage, and scheduled/event triggers (built on `django-celery-beat` and `django-apscheduler`).
- **`local_model`** — Server-side model-serving views (only mounted when `SERVER_NAME=local_model`).
- **`common`** — Cross-cutting infrastructure used by every app. Notable subpackages: `auth/`, `cache/`, `chunk/` (text splitting), `db/`, `encoder/`, `event/`, `exception/`, `field/`, `handle/` (document parsers), `init/` (bootstrapping), `job/`, `lock/`, `log/`, `management/commands/` (the `start`/`stop`/`celery` commands), `middleware/`, `mixins/`, `result/` (standard API response envelope).

### Frontend (`ui/`)

`vite.config.ts` switches entry/output between admin and chat modes. Logic-flow editor for workflows uses `@logicflow/core` + `@logicflow/extension`. Markdown rendering uses `md-editor-v3` + `marked` + `highlight.js` + `katex` + `mermaid`. Audio recording (for STT nodes) via `recorder-core`. PDFs via `pdfjs-dist`. State is Pinia, routing is `vue-router`, i18n is `vue-i18n` with locale files under `src/` (also note backend locales at `apps/locales/`).

### Adding a workflow node

A new workflow step is a directory under `apps/application/flow/step_node/<name>_node/` implementing `i_step_node.INode` (look at `ai_chat_step_node` or `condition_node` as templates). The frontend counterpart lives under `ui/src/` in the workflow editor — both sides share the same node `type` string, and the node's JSON schema drives the editor form.

## Conventions worth knowing

- All Python code uses `ruff` with 120-char lines — run `uv run ruff format` before committing.
- `main.py` rewrites `HF_HOME` to `/opt/maxkb-app/model/base` and `TMPDIR` to `/opt/maxkb-app/tmp`. Local dev outside Docker generally needs those directories to exist and be writable, or those env vars set before launch.
- `collectstatic` is invoked on every `start`/`dev` — the built Vue assets from `ui/dist` are expected on disk; the Docker build does this for you, but for local dev run `npm run build` (or `npm run dev` and proxy) before hitting the Django server if you need the bundled UI.
- The `migrate` step in `main.py:perform_db_migrate` retries up to 10×5s while Postgres is in crash-recovery startup — useful to remember when debugging container boot loops.
- Celery task serializer is `hmac_signed_serializer` (custom); broker URL is built from the same Redis env vars as the Django cache and supports Sentinel.
- License is GPLv3. Contributions are expected to be small, incremental PRs (see `CONTRIBUTING.md`).
9 changes: 9 additions & 0 deletions apps/common/handle/impl/ocr/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# coding=utf-8
"""
@project: maxkb
@file: __init__.py
@desc: OCR provider 包入口。
"""
from .provider import OcrProvider, get_ocr_provider, OcrConfigError, OcrError

__all__ = ['OcrProvider', 'get_ocr_provider', 'OcrConfigError', 'OcrError']
56 changes: 56 additions & 0 deletions apps/common/handle/impl/ocr/local_provider.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# coding=utf-8
"""
@project: maxkb
@file: local_provider.py
@desc: 本地 OCR provider,基于 rapidocr-onnxruntime。

依赖采用 **lazy import** 策略:
- rapidocr-onnxruntime 不强制写进 pyproject.toml(CPU-only 也要 ~150 MB)
- 用户在系统设置选择"本地 OCR"模式时,必须先在容器内执行:
pip install rapidocr-onnxruntime
- 未安装时 LocalOcrProvider 在初始化阶段就抛 OcrConfigError,
让前端在测试 OCR 配置时直接收到清晰提示。
"""
import io

from common.handle.impl.ocr.provider import OcrProvider, OcrError, OcrConfigError
from common.utils.logger import maxkb_logger

_INSTALL_HINT = (
"本地 OCR 依赖未安装。请在 MaxKB 运行环境中执行:"
"pip install rapidocr-onnxruntime onnxruntime"
)


class LocalOcrProvider(OcrProvider):
def __init__(self, language: str = 'ch'):
self.language = language
try:
from rapidocr_onnxruntime import RapidOCR # type: ignore
except ImportError as e:
raise OcrConfigError(_INSTALL_HINT) from e
# 单例化 engine,避免每次 OCR 重新加载模型
self._engine = RapidOCR()

def recognize(self, image_bytes: bytes) -> str:
if not image_bytes:
return ''
try:
# RapidOCR 接受 bytes / PIL.Image / np.ndarray / 文件路径
# 这里直接用 BytesIO 给它(rapidocr_onnxruntime 支持 file-like 或 bytes)
result, _elapse = self._engine(io.BytesIO(image_bytes).getvalue())
except Exception as e:
maxkb_logger.error(f"OCR (local): recognize failed: {e}")
raise OcrError(f"本地 OCR 识别失败:{e}")

if not result:
return ''
# result 是 [[bbox, text, score], ...],按从上到下、从左到右的顺序拼接
lines = []
for item in result:
if not item or len(item) < 2:
continue
text = item[1]
if text:
lines.append(str(text))
return '\n'.join(lines).strip()
82 changes: 82 additions & 0 deletions apps/common/handle/impl/ocr/provider.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# coding=utf-8
"""
@project: maxkb
@file: provider.py
@desc: OCR Provider 抽象 + 工厂。

设计:
- OcrProvider 是一个简单 ABC,只有 recognize(image_bytes) -> str
- 两种实现:
* VisionLlmOcrProvider — 复用 models_provider 已接入的视觉大模型
* LocalOcrProvider — rapidocr-onnxruntime(lazy import,未安装时给清晰报错)
- 通过 SystemSetting(type=OCR).meta 读 mode + model_id + workspace_id 来分发
"""
from abc import ABC, abstractmethod
from typing import Optional


class OcrError(Exception):
"""OCR 识别过程发生的可恢复错误。"""


class OcrConfigError(Exception):
"""OCR 配置缺失或不合法。前端应让用户先去系统设置完成配置。"""


class OcrProvider(ABC):
@abstractmethod
def recognize(self, image_bytes: bytes) -> str:
"""识别一张图片,返回文本。失败时可抛 OcrError;
不应直接抛底层 SDK 的异常类,便于上层 catch。"""


# OCR 模式常量(同步前端 enum)
MODE_VISION_LLM = 'vision_model'
MODE_LOCAL = 'local'
ALL_MODES = (MODE_VISION_LLM, MODE_LOCAL)


# OCR 提示词:让视觉模型尽量"忠实抄写"而非"总结"
DEFAULT_OCR_PROMPT = (
"请把图片中的所有可见文字完整、忠实地识别并输出,保持原始的段落结构、列表项、表格行列关系;"
"不要做总结、解释、翻译或添加任何额外说明;"
"不能识别的部分用 [unclear] 占位。"
)


def get_ocr_provider(config: Optional[dict] = None) -> OcrProvider:
"""根据 OCR 系统配置返回对应 provider 实例。

config 直接来自 OcrSettingSerializer.one():
{
"mode": "vision_model" | "local",
"model_id": "<uuid>", # mode=vision_model 必填
"workspace_id": "default", # mode=vision_model 必填
"prompt": "..." # 可选,默认 DEFAULT_OCR_PROMPT
}
"""
if not config or not isinstance(config, dict):
raise OcrConfigError("OCR 未配置:请先到「系统设置 → OCR 设置」选择视觉模型或启用本地 OCR")

mode = config.get('mode')
if mode not in ALL_MODES:
raise OcrConfigError(f"OCR 配置无效:mode 必须是 {ALL_MODES} 之一,当前为 {mode!r}")

if mode == MODE_VISION_LLM:
model_id = config.get('model_id')
workspace_id = config.get('workspace_id') or 'default'
prompt = config.get('prompt') or DEFAULT_OCR_PROMPT
if not model_id:
raise OcrConfigError("OCR 配置不完整:选择了视觉大模型但未指定 model_id")
# 延迟到这里再 import,避免循环依赖(provider.py 被 settings 链路扫到时)
from common.handle.impl.ocr.vision_llm_provider import VisionLlmOcrProvider
return VisionLlmOcrProvider(model_id=model_id, workspace_id=workspace_id, prompt=prompt)

if mode == MODE_LOCAL:
from common.handle.impl.ocr.local_provider import LocalOcrProvider
return LocalOcrProvider(
language=config.get('language') or 'ch',
)

# 不会走到这里(已经被 ALL_MODES 校验拦住),保险起见
raise OcrConfigError(f"未知 OCR 模式:{mode!r}")
59 changes: 59 additions & 0 deletions apps/common/handle/impl/ocr/vision_llm_provider.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# coding=utf-8
"""
@project: maxkb
@file: vision_llm_provider.py
@desc: 视觉大模型 OCR provider。复用 models_provider 已接入的多模态模型
(OpenAI gpt-4o, Anthropic, Gemini, 通义千问 vl, 智谱 glm-4v 等)。
"""
import base64
from imghdr import what

from langchain_core.messages import HumanMessage

from common.handle.impl.ocr.provider import OcrProvider, OcrError, DEFAULT_OCR_PROMPT
from common.utils.logger import maxkb_logger


class VisionLlmOcrProvider(OcrProvider):
def __init__(self, model_id: str, workspace_id: str = 'default', prompt: str = DEFAULT_OCR_PROMPT):
self.model_id = model_id
self.workspace_id = workspace_id
self.prompt = prompt

def recognize(self, image_bytes: bytes) -> str:
if not image_bytes:
return ''
# 检测格式(PNG/JPEG/...),imghdr 返回 'png'/'jpeg' 等小写名
img_format = what(None, image_bytes) or 'png'
b64 = base64.b64encode(image_bytes).decode('utf-8')
data_url = f'data:image/{img_format};base64,{b64}'

try:
# 延迟 import 避免在 Django 启动早期触发 model provider 链路
from models_provider.tools import get_model_instance_by_model_workspace_id
model = get_model_instance_by_model_workspace_id(self.model_id, self.workspace_id)
except Exception as e:
maxkb_logger.error(f"OCR: failed to load vision model {self.model_id}: {e}")
raise OcrError(f"加载视觉模型失败:{e}")

message = HumanMessage(content=[
{'type': 'text', 'text': self.prompt},
{'type': 'image_url', 'image_url': {'url': data_url}},
])
try:
response = model.invoke([message])
except Exception as e:
maxkb_logger.error(f"OCR: vision model invoke failed: {e}")
raise OcrError(f"视觉模型识别失败:{e}")

# langchain AIMessage.content 可能是 str 或 list[dict]
content = response.content if hasattr(response, 'content') else str(response)
if isinstance(content, list):
parts = []
for chunk in content:
if isinstance(chunk, dict):
parts.append(chunk.get('text', ''))
else:
parts.append(str(chunk))
content = '\n'.join(parts)
return (content or '').strip()
Loading