1Panel-dev · youngzs · May 12, 2026 · May 12, 2026 · May 13, 2026 · May 13, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,112 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## What this project is
+
+MaxKB (Max Knowledge Brain) is an enterprise agent platform combining a RAG pipeline, an agentic workflow engine, and MCP tool-use. The repo is a single deployable that ships a Django/DRF backend and a Vue 3 SPA frontend built into Django's `staticfiles`.
+
+- **Backend**: Python 3.11, Django 5.2, DRF, LangChain/LangGraph, Celery + django-celery-beat + django-apscheduler, PostgreSQL (with pgvector), Redis. Dependency manager is **uv** (see `pyproject.toml`, `tool.uv` sections — torch is pinned to CPU index on Linux/Win and the default index on macOS).
+- **Frontend**: Vue 3 + Vite + Element Plus + Pinia, in `ui/`. Two entry HTMLs: `admin.html` (full console) and `chat.html` (embed/widget). The `build` script type-checks and bundles both; `build-chat` builds only the chat entry (`--mode chat`).
+- **Packaging**: `installer/` contains the Dockerfile(s) and the layered start scripts (`start-all.sh` → orchestrates `start-postgres.sh`, `start-redis.sh`, `start-maxkb.sh`). The published image is `1panel/maxkb`.
+
+## Common commands
+
+Backend (run from repo root; `uv sync` first to install deps):
+
+```bash
+# Dev server (Django runserver on 0.0.0.0:8080) — also runs collectstatic + migrate first
+python main.py dev web
+
+# Dev: Celery worker (named "celery") or the local model service
+python main.py dev celery
+python main.py dev local_model
+
+# Production-style: start everything (web + task workers)
+python main.py start all            # add -d for daemon, -w N for worker count, -f to force
+python main.py start web
+python main.py start task
+
+# DB / static only
+python main.py upgrade_db
+python main.py collect_static
+
+# Standard Django/DRF tooling (manage.py lives in apps/, not repo root)
+python apps/manage.py <command>
+python apps/manage.py test <app_label>           # e.g. application, knowledge, chat
+python apps/manage.py test application.tests     # single module/class/method
+```
+
+`main.py` is the canonical entrypoint; it inserts `apps/` onto `sys.path`, sets `DJANGO_SETTINGS_MODULE=maxkb.settings`, and dispatches to custom management commands in `apps/common/management/commands/` (`start`, `stop`, `restart`, `status`, `celery`). Don't bypass it for `start`/`stop` — those commands manage daemonization and worker pools.
+
+Frontend (run from `ui/`):
+
+```bash
+npm install
+npm run dev           # admin app (vite default mode)
+npm run chat          # chat embed (vite --mode chat)
+npm run build         # type-check + build admin
+npm run build-chat    # type-check + build chat embed
+npm run lint          # eslint --fix
+npm run format        # prettier write src/
+npm run type-check    # vue-tsc --build, no emit
+```
+
+Lint (Python uses ruff, configured in `pyproject.toml` — `line-length = 120`):
+
+```bash
+uv run ruff check .
+uv run ruff format .
+```
+
+Docker quickstart (what users actually run):
+
+```bash
+docker run -d --name=maxkb --restart=always -p 8080:8080 -v ~/.maxkb:/opt/maxkb 1panel/maxkb
+```
+
+## Architecture
+
+### Process model & settings split
+
+`apps/maxkb/` is the Django project. Settings and URL routing branch on `SERVER_NAME` (set in `main.py` from the requested service):
+
+- `SERVER_NAME=web` (default) → loads `settings/base/web.py`, `urls/web.py`. Full app: REST API, RAG, workflow engine, chat, Celery integration.
+- `SERVER_NAME=local_model` → loads `settings/base/model.py`, `urls/model.py`. A separate, much smaller Django process that hosts local embedding/rerank/STT/TTS models (so the heavy ML deps don't have to live in every web worker). Bind host/port come from `LOCAL_MODEL_HOST`/`LOCAL_MODEL_PORT`.
+
+`settings/__init__.py` composes `base + logging + auth + lib + mem`. `lib.py` builds the Celery broker URL (with Redis Sentinel support via `MAXKB_REDIS_SENTINEL_SENTINELS`). Config is read by `apps/maxkb/conf.py:ConfigManager` from env vars and `/opt/maxkb/conf` (overridable via `MAXKB_CONFIG`); key env vars are `MAXKB_DB_*`, `MAXKB_REDIS_*`, `MAXKB_CORE_WORKER`.
+
+The `installer/start-all.sh` is what runs inside the official image: it conditionally launches embedded Postgres/Redis (only when their host is `127.0.0.1`) and then `start-maxkb.sh` runs init-shell hooks from `MAXKB_INIT_SHELL_DIR` before calling `python main.py start`. Note the v1→v2 guard: presence of `PG_VERSION` in legacy paths aborts startup.
+
+### Django apps (under `apps/`)
+
+Each is a self-contained DRF app with the same shape: `api/` (serializers + permissions + view inputs), `views/` (DRF views and routers), `models/`, `serializers/`, `migrations/`, plus optional `sql/`, `template/`, `task/`.
+
+- **`application`** — Agents/applications themselves. Two big engines live here:
+  - `chat_pipeline/` — the request-time chat orchestration pipeline (`pipeline_manage.py` + `step/`). This is the simpler "RAG chat" path.
+  - `flow/` — the agentic workflow engine. `workflow_manage.py` (plus `knowledge_workflow_manage.py`, `tool_workflow_manage.py`, `*_loop_workflow_manage.py`) executes graphs built from `step_node/` node types (`ai_chat_step_node`, `condition_node`, `intent_node`, `loop_node`/`loop_start_node`/`loop_break_node`/`loop_continue_node`, `mcp_node`, `parameter_extraction_node`, `search_knowledge_node`, `tool_lib_node`, multimodal `image_*`/`speech_*`/`text_to_*` nodes, etc.). Default workflows are JSON: `default_workflow{,_en,_zh,_zh_Hant}.json`. `i_step_node.py` is the node contract; new node types subclass it and register a folder under `step_node/`.
+  - `long_term_memory/` — agent-level memory beyond a single chat.
+- **`chat`** — Runtime chat sessions, message persistence, MCP client integration (`mcp/`), and chat-page templates (`template/`).
+- **`knowledge`** — Knowledge bases, documents, paragraphs, and the vector layer (`vector/`). Background indexing in `task/`.
+- **`models_provider`** — LLM/embedding/rerank/STT/TTS provider abstraction. `base_model_provider.py` is the interface; the langchain-* deps (`openai`, `anthropic`, `deepseek`, `google-genai`, `community`, `huggingface`, `ollama`, `aws`, plus `qianfan`, `zhipuai`, `volcengine`, `dashscope`, `cohere`, `tencentcloud`, `xinference-client`) plug in here.
+- **`tools`** — Function/tool library exposed to workflows and MCP.
+- **`users`**, **`system_manage`**, **`folders`**, **`oss`**, **`trigger`** — auth/RBAC, platform settings, folder/tree organization (uses `django-mptt`), object storage, and scheduled/event triggers (built on `django-celery-beat` and `django-apscheduler`).
+- **`local_model`** — Server-side model-serving views (only mounted when `SERVER_NAME=local_model`).
+- **`common`** — Cross-cutting infrastructure used by every app. Notable subpackages: `auth/`, `cache/`, `chunk/` (text splitting), `db/`, `encoder/`, `event/`, `exception/`, `field/`, `handle/` (document parsers), `init/` (bootstrapping), `job/`, `lock/`, `log/`, `management/commands/` (the `start`/`stop`/`celery` commands), `middleware/`, `mixins/`, `result/` (standard API response envelope).
+
+### Frontend (`ui/`)
+
+`vite.config.ts` switches entry/output between admin and chat modes. Logic-flow editor for workflows uses `@logicflow/core` + `@logicflow/extension`. Markdown rendering uses `md-editor-v3` + `marked` + `highlight.js` + `katex` + `mermaid`. Audio recording (for STT nodes) via `recorder-core`. PDFs via `pdfjs-dist`. State is Pinia, routing is `vue-router`, i18n is `vue-i18n` with locale files under `src/` (also note backend locales at `apps/locales/`).
+
+### Adding a workflow node
+
+A new workflow step is a directory under `apps/application/flow/step_node/<name>_node/` implementing `i_step_node.INode` (look at `ai_chat_step_node` or `condition_node` as templates). The frontend counterpart lives under `ui/src/` in the workflow editor — both sides share the same node `type` string, and the node's JSON schema drives the editor form.
+
+## Conventions worth knowing
+
+- All Python code uses `ruff` with 120-char lines — run `uv run ruff format` before committing.
+- `main.py` rewrites `HF_HOME` to `/opt/maxkb-app/model/base` and `TMPDIR` to `/opt/maxkb-app/tmp`. Local dev outside Docker generally needs those directories to exist and be writable, or those env vars set before launch.
+- `collectstatic` is invoked on every `start`/`dev` — the built Vue assets from `ui/dist` are expected on disk; the Docker build does this for you, but for local dev run `npm run build` (or `npm run dev` and proxy) before hitting the Django server if you need the bundled UI.
+- The `migrate` step in `main.py:perform_db_migrate` retries up to 10×5s while Postgres is in crash-recovery startup — useful to remember when debugging container boot loops.
+- Celery task serializer is `hmac_signed_serializer` (custom); broker URL is built from the same Redis env vars as the Django cache and supports Sentinel.
+- License is GPLv3. Contributions are expected to be small, incremental PRs (see `CONTRIBUTING.md`).
diff --git a/apps/common/handle/impl/ocr/__init__.py b/apps/common/handle/impl/ocr/__init__.py
@@ -0,0 +1,9 @@
+# coding=utf-8
+"""
+    @project: maxkb
+    @file: __init__.py
+    @desc: OCR provider 包入口。
+"""
+from .provider import OcrProvider, get_ocr_provider, OcrConfigError, OcrError
+
+__all__ = ['OcrProvider', 'get_ocr_provider', 'OcrConfigError', 'OcrError']
diff --git a/apps/common/handle/impl/ocr/local_provider.py b/apps/common/handle/impl/ocr/local_provider.py
@@ -0,0 +1,56 @@
+# coding=utf-8
+"""
+    @project: maxkb
+    @file: local_provider.py
+    @desc: 本地 OCR provider，基于 rapidocr-onnxruntime。
+
+    依赖采用 **lazy import** 策略：
+      - rapidocr-onnxruntime 不强制写进 pyproject.toml（CPU-only 也要 ~150 MB）
+      - 用户在系统设置选择"本地 OCR"模式时，必须先在容器内执行：
+            pip install rapidocr-onnxruntime
+      - 未安装时 LocalOcrProvider 在初始化阶段就抛 OcrConfigError，
+        让前端在测试 OCR 配置时直接收到清晰提示。
+"""
+import io
+
+from common.handle.impl.ocr.provider import OcrProvider, OcrError, OcrConfigError
+from common.utils.logger import maxkb_logger
+
+_INSTALL_HINT = (
+    "本地 OCR 依赖未安装。请在 MaxKB 运行环境中执行："
+    "pip install rapidocr-onnxruntime onnxruntime"
+)
+
+
+class LocalOcrProvider(OcrProvider):
+    def __init__(self, language: str = 'ch'):
+        self.language = language
+        try:
+            from rapidocr_onnxruntime import RapidOCR  # type: ignore
+        except ImportError as e:
+            raise OcrConfigError(_INSTALL_HINT) from e
+        # 单例化 engine，避免每次 OCR 重新加载模型
+        self._engine = RapidOCR()
+
+    def recognize(self, image_bytes: bytes) -> str:
+        if not image_bytes:
+            return ''
+        try:
+            # RapidOCR 接受 bytes / PIL.Image / np.ndarray / 文件路径
+            # 这里直接用 BytesIO 给它（rapidocr_onnxruntime 支持 file-like 或 bytes）
+            result, _elapse = self._engine(io.BytesIO(image_bytes).getvalue())
+        except Exception as e:
+            maxkb_logger.error(f"OCR (local): recognize failed: {e}")
+            raise OcrError(f"本地 OCR 识别失败：{e}")
+
+        if not result:
+            return ''
+        # result 是 [[bbox, text, score], ...]，按从上到下、从左到右的顺序拼接
+        lines = []
+        for item in result:
+            if not item or len(item) < 2:
+                continue
+            text = item[1]
+            if text:
+                lines.append(str(text))
+        return '\n'.join(lines).strip()
diff --git a/apps/common/handle/impl/ocr/provider.py b/apps/common/handle/impl/ocr/provider.py
@@ -0,0 +1,82 @@
+# coding=utf-8
+"""
+    @project: maxkb
+    @file: provider.py
+    @desc: OCR Provider 抽象 + 工厂。
+
+    设计：
+      - OcrProvider 是一个简单 ABC，只有 recognize(image_bytes) -> str
+      - 两种实现：
+        * VisionLlmOcrProvider — 复用 models_provider 已接入的视觉大模型
+        * LocalOcrProvider     — rapidocr-onnxruntime（lazy import，未安装时给清晰报错）
+      - 通过 SystemSetting(type=OCR).meta 读 mode + model_id + workspace_id 来分发
+"""
+from abc import ABC, abstractmethod
+from typing import Optional
+
+
+class OcrError(Exception):
+    """OCR 识别过程发生的可恢复错误。"""
+
+
+class OcrConfigError(Exception):
+    """OCR 配置缺失或不合法。前端应让用户先去系统设置完成配置。"""
+
+
+class OcrProvider(ABC):
+    @abstractmethod
+    def recognize(self, image_bytes: bytes) -> str:
+        """识别一张图片，返回文本。失败时可抛 OcrError；
+        不应直接抛底层 SDK 的异常类，便于上层 catch。"""
+
+
+# OCR 模式常量（同步前端 enum）
+MODE_VISION_LLM = 'vision_model'
+MODE_LOCAL = 'local'
+ALL_MODES = (MODE_VISION_LLM, MODE_LOCAL)
+
+
+# OCR 提示词：让视觉模型尽量"忠实抄写"而非"总结"
+DEFAULT_OCR_PROMPT = (
+    "请把图片中的所有可见文字完整、忠实地识别并输出，保持原始的段落结构、列表项、表格行列关系；"
+    "不要做总结、解释、翻译或添加任何额外说明；"
+    "不能识别的部分用 [unclear] 占位。"
+)
+
+
+def get_ocr_provider(config: Optional[dict] = None) -> OcrProvider:
+    """根据 OCR 系统配置返回对应 provider 实例。
+
+    config 直接来自 OcrSettingSerializer.one()：
+      {
+        "mode": "vision_model" | "local",
+        "model_id": "<uuid>",        # mode=vision_model 必填
+        "workspace_id": "default",   # mode=vision_model 必填
+        "prompt": "..."              # 可选，默认 DEFAULT_OCR_PROMPT
+      }
+    """
+    if not config or not isinstance(config, dict):
+        raise OcrConfigError("OCR 未配置：请先到「系统设置 → OCR 设置」选择视觉模型或启用本地 OCR")
+
+    mode = config.get('mode')
+    if mode not in ALL_MODES:
+        raise OcrConfigError(f"OCR 配置无效：mode 必须是 {ALL_MODES} 之一，当前为 {mode!r}")
+
+    if mode == MODE_VISION_LLM:
+        model_id = config.get('model_id')
+        workspace_id = config.get('workspace_id') or 'default'
+        prompt = config.get('prompt') or DEFAULT_OCR_PROMPT
+        if not model_id:
+            raise OcrConfigError("OCR 配置不完整：选择了视觉大模型但未指定 model_id")
+        # 延迟到这里再 import，避免循环依赖（provider.py 被 settings 链路扫到时）
+        from common.handle.impl.ocr.vision_llm_provider import VisionLlmOcrProvider
+        return VisionLlmOcrProvider(model_id=model_id, workspace_id=workspace_id, prompt=prompt)
+
+    if mode == MODE_LOCAL:
+        from common.handle.impl.ocr.local_provider import LocalOcrProvider
+        return LocalOcrProvider(
+            language=config.get('language') or 'ch',
+        )
+
+    # 不会走到这里（已经被 ALL_MODES 校验拦住），保险起见
+    raise OcrConfigError(f"未知 OCR 模式：{mode!r}")
diff --git a/apps/common/handle/impl/ocr/vision_llm_provider.py b/apps/common/handle/impl/ocr/vision_llm_provider.py
@@ -0,0 +1,59 @@
+# coding=utf-8
+"""
+    @project: maxkb
+    @file: vision_llm_provider.py
+    @desc: 视觉大模型 OCR provider。复用 models_provider 已接入的多模态模型
+           （OpenAI gpt-4o, Anthropic, Gemini, 通义千问 vl, 智谱 glm-4v 等）。
+"""
+import base64
+from imghdr import what
+
+from langchain_core.messages import HumanMessage
+
+from common.handle.impl.ocr.provider import OcrProvider, OcrError, DEFAULT_OCR_PROMPT
+from common.utils.logger import maxkb_logger
+
+
+class VisionLlmOcrProvider(OcrProvider):
+    def __init__(self, model_id: str, workspace_id: str = 'default', prompt: str = DEFAULT_OCR_PROMPT):
+        self.model_id = model_id
+        self.workspace_id = workspace_id
+        self.prompt = prompt
+
+    def recognize(self, image_bytes: bytes) -> str:
+        if not image_bytes:
+            return ''
+        # 检测格式（PNG/JPEG/...），imghdr 返回 'png'/'jpeg' 等小写名
+        img_format = what(None, image_bytes) or 'png'
+        b64 = base64.b64encode(image_bytes).decode('utf-8')
+        data_url = f'data:image/{img_format};base64,{b64}'
+
+        try:
+            # 延迟 import 避免在 Django 启动早期触发 model provider 链路
+            from models_provider.tools import get_model_instance_by_model_workspace_id
+            model = get_model_instance_by_model_workspace_id(self.model_id, self.workspace_id)
+        except Exception as e:
+            maxkb_logger.error(f"OCR: failed to load vision model {self.model_id}: {e}")
+            raise OcrError(f"加载视觉模型失败：{e}")
+
+        message = HumanMessage(content=[
+            {'type': 'text', 'text': self.prompt},
+            {'type': 'image_url', 'image_url': {'url': data_url}},
+        ])
+        try:
+            response = model.invoke([message])
+        except Exception as e:
+            maxkb_logger.error(f"OCR: vision model invoke failed: {e}")
+            raise OcrError(f"视觉模型识别失败：{e}")
+
+        # langchain AIMessage.content 可能是 str 或 list[dict]
+        content = response.content if hasattr(response, 'content') else str(response)
+        if isinstance(content, list):
+            parts = []
+            for chunk in content:
+                if isinstance(chunk, dict):
+                    parts.append(chunk.get('text', ''))
+                else:
+                    parts.append(str(chunk))
+            content = '\n'.join(parts)
+        return (content or '').strip()