claude-proxy/writeup.md at master · sinistercodes/claude-proxy

[B][SIZE=5]Perception Claude Proxy[/SIZE][/B] [I]An OpenAI-compatible bridge that lets Perception (and any other OpenAI-format client) talk to a Claude subscription via the Claude Agent SDK.[/I]

[HR][/HR]

[B][SIZE=4]Why this exists[/SIZE][/B]

Perception speaks the OpenAI chat-completions format. Claude doesn't — and the official Anthropic API charges per token, which adds up fast when you're running long RE sessions with big memory dumps in the context.

This proxy sits between Perception and the [URL=https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk]Claude Agent SDK[/URL] and translates in both directions. The SDK runs against your existing [B]Claude subscription[/B] (Pro / Max), so you pay a flat monthly rate instead of per-token. Tool calls, streaming, thinking blocks, and image input all get translated transparently — Perception thinks it's talking to OpenAI, Claude thinks it's talking to Claude Code.

[HR][/HR]

[B][SIZE=4]What it does[/SIZE][/B]

[LIST] [][B]OpenAI-compatible endpoints[/B] — [ICODE]/v1/chat/completions[/ICODE] (streaming + non-streaming) and [ICODE]/v1/models[/ICODE]. Drop-in replacement for an OpenAI base URL. [][B]Persistent SDK sessions[/B] — an LRU pool reuses warm Claude sessions across turns. Same conversation prefix + tool list = cache hit, dropping warm-turn latency by 50–70%. [][B]Per-project context[/B] — hijacks Perception's [ICODE]update_notes[/ICODE] tool and routes notes to [ICODE]/.proxy/context.md[/ICODE] instead of the IDE's single global pool. Notes for one game don't bleed into another. Workspace root is auto-detected from absolute paths the model sees. [][B]Auto-fallback[/B] — on overload ([ICODE]529[/ICODE] / [ICODE]rate_limit_error[/ICODE]), idle timeout, or the SDK's ~120K-token "Usage Policy" refusal, the proxy walks [ICODE]opus → sonnet → haiku[/ICODE] automatically. Perception sees a status note in the stream. [][B]Thinking visibility[/B] — extended-thinking deltas surface as [ICODE]...[/ICODE] blocks (default), [ICODE]reasoning_content[/ICODE] deltas, or both. [][B]Image input[/B] — data URLs and [ICODE]image_url[/ICODE] fields are decoded and passed through as native Claude image blocks. [][B]1M-context beta on by default[/B] — long RE conversations don't get clipped on plans where Opus isn't auto-upgraded. [][B]Input-size guards[/B] — per-tool-result, per-message, and total-prompt char caps clip oversized history (e.g. a 5MB memory dump) before it reaches the SDK, preserving the tail so the active query stays intact. [*][B]Two-tier timeouts[/B] — hard wall-clock cap plus an idle-token cap that resets on every chunk and catches silently stuck upstreams. [/LIST]

[HR][/HR]

[B][SIZE=4]Tools the model gets[/SIZE][/B]

On top of whatever tools Perception already exposes, the proxy bundles a curated set of [B]Claude Agent SDK tools[/B] so the model isn't capped at the IDE workspace:

[LIST] [][B]Read / Glob / Grep[/B] — executed [I]inside the proxy[/I] via an internal text-tool protocol. This is what gives the model access to [B]every drive root[/B] (C:, D:, …) instead of just your IDE workspace. On by default. [][B]Bash[/B] — shell exec. [I]Opt-in.[/I] [][B]Write / Edit[/B] — mutating filesystem ops. [I]Opt-in.[/I] [][B]WebSearch[/B] — the SDK's web-search tool. [I]Opt-in.[/I] [*][B]EXTRA_SDK_TOOLS[/B] — comma-separated extra SDK tool names for advanced users. [/LIST]

The dangerous ones are off by default for obvious reasons. Flip them on in [ICODE].env[/ICODE] when you trust the workflow.

[HR][/HR]

[B][SIZE=4]Setup[/SIZE][/B]

[B]Requirements[/B] [LIST] []Node.js 18+ (20+ recommended for [ICODE]--env-file[/ICODE] support) []A working Claude subscription (Pro or Max) [*]Claude Code installed and signed in once on the machine — the Agent SDK reuses its credentials [/LIST]

[B]Install[/B] [CODE] git clone https://github.com/sinistercodes/claude-proxy cd claude-proxy npm install cp .env.example .env # optional; defaults are sane node server.js [/CODE]

The proxy listens on [ICODE]:4001[/ICODE] by default.

[B]Point Perception at it[/B] [LIST] []Base URL: [ICODE]http://localhost:4001/v1[/ICODE] []API key: anything non-empty (it's not validated — the proxy uses your Claude subscription) [*]Model: [ICODE]sonnet[/ICODE], [ICODE]opus[/ICODE], or [ICODE]haiku[/ICODE] [/LIST]

That's it. Send a request and you should see it stream back through Perception.

[B]Useful flags[/B] [LIST] [][ICODE]node server.js --verbose[/ICODE] — [ICODE][proxy][/ICODE] log lines [][ICODE]node server.js --live[/ICODE] — colored live request feed on stdout [*][ICODE]LOG_REQUESTS=1[/ICODE] — dumps every request body to [ICODE]logs/requests/[/ICODE] (last 50, rotated) [/LIST]

[B]Common tweaks (in [ICODE].env[/ICODE])[/B] [LIST] [][ICODE]CLAUDE_MODEL=opus[/ICODE] — default to Opus instead of Sonnet [][ICODE]CLAUDE_THINKING=max[/ICODE] — crank thinking budget to 32K tokens [][ICODE]ENABLE_BASH_TOOL=1[/ICODE] — let the model run shell commands [][ICODE]ENABLE_WRITE_TOOLS=1[/ICODE] — let the model write/edit files [*][ICODE]CLAUDE_FALLBACK=0[/ICODE] — pin the model, no auto-downgrade on overload [/LIST]

See [ICODE].env.example[/ICODE] in the repo for the full annotated list (~30 vars covering sessions, timeouts, tool toggles, input clipping, betas, observability).

[HR][/HR]

[B][SIZE=4]Debug endpoints[/SIZE][/B]

Bound to loopback only — useful when something looks weird:

[CODE] GET /debug/status runtime config + session count GET /debug/last-exchange last full request/response GET /debug/last-request last logged request body GET /debug/exchanges recent exchange ring GET /debug/tools tool catalog from the last request GET /debug/workspace resolved per-project workspace cache POST /debug/workspace/reset clear the workspace cache [/CODE]

[HR][/HR]

Feedback / bug reports welcome. Have fun.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

writeup.md

Latest commit

History

writeup.md

File metadata and controls