Structorium

Architecture. Automatically Enforced.

Structorium is the operating system for codebase quality — built from the ground up for AI coding agents.
It doesn't just find problems. It remembers them, ranks them, tracks how you fix them, and blocks you from making new ones.
Every scan builds on the last. Every fix is measured. Every regression is caught. This is not a linter — it's a quality runtime.

What This Document Covers

This is not a typical README. This is the complete operating manual — a 2,000+ line technical book that covers every concept, every formula, every detector, and every operational scenario in Structorium. Read it front-to-back to understand the entire system, or jump to any section as a reference. By the time you finish, you'll understand Structorium better than most people understand their own codebases.

If you want…	Go to…
Quick install and first scan	Installation → First Scan
Understand the core workflow	The Operating Loop
Score math and anti-gaming mechanics	Scoring Model → Anti-Gaming
Compare to SonarQube, CodeQL, etc.	Competitive Positioning
Full command reference	Command Atlas
All 28 languages and capabilities	Language Coverage Atlas
CI integration and enforcement	New-Code Gate → CI Integration
Review system and AI context	Review System → AI Context Layer
Production playbooks	Operator Scenarios

Estimated reading time: ~45 minutes for the full document. ~10 minutes for quick start only.

Storyboard

The 5-step operational flow, from codebase scan to CI enforcement.

Visual Architecture Pack

These diagrams are referenced throughout this document. Each one is explained in detail in its corresponding section.

Diagram	Section	What It Shows
Operating Loop	Operating Loop	The 7-stage scan → fix → gate cycle
Scan Pipeline	Scan Deep Dive	How scan works with parallel detectors
Scoring Model	Scoring Model	40/60 split, 4 score types, anti-gaming
Plugin Architecture	Plugin Architecture	28 languages in 3 layers
Review Pipeline	Review System	prepare → batch → validate → merge
AI Context Stack	AI Context Layer	7-layer AI enrichment stack
New-Code Gate	New-Code Gate	CI enforcement with 3 policy profiles
Fix/Resolve/Move	Fix & Resolve	Resolution paths + auto-fixers
Language Atlas	Language Atlas	6 full + 22 generic capabilities

PART 2 — ORIENTATION

🔍 What Structorium Is

Structorium is a CLI-native codebase quality operating system (Python 3.11+) that gives AI coding agents — and the humans who work alongside them — something no other tool gives them: a persistent, ranked, enforceable, anti-cheatable view of codebase quality that compounds over time.

Forget "run a linter, get a list, close the terminal, forget everything." Structorium does five things simultaneously that no other single tool on the planet does:

#	Capability	What It Means In Practice
1	Detects	30+ detectors rip through your codebase across 3 parallel lanes — mechanical (unused code, duplication, coupling, god files), security (bandit, semgrep, dependency audit), and subjective (12 AI-assessed quality dimensions). Nothing hides.
2	Persists	Every finding is written to `.structorium/state.json` and never silently dropped. Session 5 knows exactly what session 1 found, what you fixed, what you dismissed, and what regressed. This is memory, not noise.
3	Ranks	The `next` command always surfaces the single highest-impact item to fix right now — tier-weighted (T4 = 4× the weight of T1), confidence-adjusted, cluster-aware. You never waste time on low-impact items.
4	Reviews	AI-driven subjective review assesses 12 quality dimensions that no linter can see — elegance, contracts, type safety, abstraction fit, design coherence, AI-generated debt. This contributes 60% of your overall score. Architecture quality is not optional.
5	Enforces	A CI new-code gate blocks regressions on changed lines without requiring your entire legacy codebase to be perfect. Clean up old debt at your own pace. New code must be clean from day one.

What Structorium Is NOT

Let's kill misconceptions immediately:

It is NOT…	Why Not
A linter replacement	Structorium contains linters — it wraps ruff, bandit, knip, clippy, rubocop, and more. But a linter is a component inside Structorium, not the other way around. Structorium adds state, ranking, scoring, review, and enforcement on top of every linter it wraps.
A one-shot code scanner	One-shot scanners are amnesiac — they find 200 issues, you fix 10, close the terminal, and next time they find 200 issues again. Structorium remembers. It knows which 10 you fixed, which 3 regressed, and which 187 are still open. This is the difference between noise and intelligence.
A formatting tool	Formatting is solved. Prettier, Black, gofmt — done. Structorium operates at the architecture level: module boundaries, dependency cycles, god files, coupling violations, design coherence. The problems that actually kill projects.
"Ask an LLM for vibes" code review	Structorium's AI review is structured (12 explicit dimensions with weights), fail-closed (invalid reviews are rejected entirely), and scored (contributes to a persistent numeric quality metric). This is systematic assessment engineering, not chatbot opinions.

💡 Why It Exists

The Drift Problem

Every codebase rots. Not dramatically — silently. Even with linters, formatters, and type checkers running in every CI pipeline, the decay is relentless:

Dead code accumulates like sediment — unused imports, orphaned files, deprecated symbols pile up one commit at a time. Nobody notices until the codebase is 30% dead weight.
Module boundaries dissolve — coupling creeps in through private import violations and layer crossings. By the time someone says "why does the API layer import from the database internals?", it's already load-bearing.
God files metastasize — files are easy to extend, hard to split. So core.py grows from 200 lines to 500, then 800, then 1,200. Every bug in that file is now a high-blast-radius event.
Test coverage lies to your face — 80% coverage sounds great until you realize the 20% that's missing is the state transition logic, the error paths, and the edge cases. The hard parts.
Naming drifts from reality — helpers.ts that isn't helpers. utils/ that contains business logic. temp_fix.py that's been in production for 18 months.
Architecture patterns breed — three error handling patterns. Two state management approaches. Four different API response formats. Each one made sense when it was written. Together, they're chaos.
AI-generated code adds invisible debt — plausible-looking, compiles-fine code that violates every project convention, duplicates existing utilities, and introduces antipatterns that a senior engineer would catch in review but a linter never will.

Traditional tools catch individual rule violations — one file, one line, one warning. Nobody tracks the compound decay across sessions. Nobody ranks what actually matters most. Nobody blocks regressions on new code while allowing legacy cleanup to happen at its own pace.

Until Structorium.

The Thesis

Code quality should be continuously visible, continuously rankable, and continuously actionable — not a quarterly audit checkbox that everyone lies about.

Structorium turns "vibe coding" into vibe engineering: same velocity, same AI-augmented speed, but with persistent quality measurement underneath that compounds across every session. Every scan builds on the last. Every fix is tracked. Every regression is caught. Every attempt to game the score is detected.

This is not "run a linter and hope for the best." This is an operating system for quality.

🏗️ Design Principles

These aren't suggestions. These are the six non-negotiable commitments burned into every line of Structorium's architecture. Every design decision traces back to one of these:

#	Principle	What It Means — No Exceptions
1	Findings become state, not terminal noise	Every detected issue is written to `.structorium/state.json`. It persists across sessions, tracks resolution status, carries its full history, and is never silently dropped. If Structorium found it, it's tracked until you explicitly resolve it. The terminal is a view; the state file is the truth.
2	Work is ranked, not dumped	A flat list of 400 warnings is useless — it's a wall of noise that paralyzes instead of guiding. The `next` command always surfaces the single highest-impact item, ranked by tier weight (T1=1×, T2=2×, T3=3×, T4=4×), confidence-adjusted (high=1.0, medium=0.7, low=0.3), and cluster-aware. You always know exactly what to do next.
3	Subjective quality is first-class	Linters check rules. Rules don't catch "this abstraction is at the wrong level" or "this module has no clear boundary" or "this code was clearly AI-generated and follows none of our conventions." 12 dimensions assessed by AI review contribute 60% of the overall score. Architecture quality is not optional — it's the majority of what matters.
4	Trust boundaries are explicit and fail-closed	You can't sneak bad data into Structorium's state. Review imports are fail-closed — if any finding in the import is invalid, incomplete, inconsistent, or skipped, the entire import is aborted. No partial results. No "well, most of it was fine." Either the review passes every validation check, or nothing gets imported.
5	Score must resist gaming — aggressively	People game metrics. Always. Structorium is built assuming you're trying to cheat: `wontfix` still hurts strict score (dismissing debt ≠ fixing debt). Scores landing within ±0.05 of a round target are flagged as suspicious. Attestation is required for resolution (you must describe what you actually did). Suspect detector drops are held, not auto-resolved. A perfect score achieved dishonestly is worthless.
6	Enforcement must be operational, not aspirational	The new-code gate blocks regressions in CI on changed lines only. This is critical: it means you can enable enforcement on a legacy codebase on day one. Legacy debt is not gated — clean it up at your own pace. But every new PR must meet the standard. Quality enforcement has to work in the real world, not just in greenfield fantasies.

👥 Who This Is For

🤖 AI Agent Framework Developers

You're building the next generation of AI coding agents — Claude Code, Codex, Cursor, Copilot, Windsurf, Gemini — and you need them to produce architecturally sound code, not just code that compiles and passes tests. The problem: AI agents are phenomenally fast at generating code, and catastrophically indifferent to whether that code follows your project's conventions, respects module boundaries, or introduces coupling that will hurt you in 6 months.

Structorium gives your agent a persistent quality model to work against. The agent can scan, check its score, fix the highest-priority item, verify improvement, and repeat — autonomously. The agent skill system provides ready-made skill documents for 7 different agents that teach them the complete Structorium workflow out of the box.

Key features for you: Agent skill installation (update-skill), next command for autonomous quality loops, plan for work queue management, persistent state across sessions so the agent's progress compounds.

⚡ Solo Developers Using Vibe Coding Tools

You ship fast with AI assistance and it feels incredible — until one day you look at your codebase and realize you have three competing state management patterns, a utils/ directory that's become a junk drawer, and 400 lines of dead imports. You don't need a lecture. You need a measurable quality baseline that tracks across sessions and tells you exactly what to fix first.

Structorium is your architectural safety net. The next command tells you the single most impactful thing to fix right now — no architectural expertise required. Six auto-fixers handle the boring stuff (unused imports, dead variables, stale console.logs) automatically. Score progression tracking lets you see — in real numbers — that your codebase is getting better session over session.

Key features for you: Auto-fixers (6 available), ranked next queue, score progression tracking, status dashboard for instant health check.

👥 Teams with Growing Codebases

Your codebase crossed the "one person can understand it" threshold. Coupling is creeping in through module boundaries nobody agreed on. God files are forming because nobody wants to be the one to refactor a 1,200-line file. Test coverage is 78% but the missing 22% is all the tricky edge-case logic. Code reviews are inconsistent — Person A catches coupling violations, Person B misses them entirely.

Structorium provides the automated architectural visibility that manual code review cannot sustain at scale. 30+ detectors run consistently, across every scan, every time. Nobody has a bad day. Nobody misses the coupling violation because they were reviewing at 4 PM on a Friday. And the CI new-code gate ensures that whatever you catch, you enforce.

Key features for you: CI new-code gate (blocks regressions, not legacy), zone classifications, cluster-based planning for sprint work, team review workflows.

📊 Engineering Leads Who Want Measurable Quality

Your CEO asks: "Is our code getting better or worse?" You want to say "better" but you actually have no idea. Code review coverage is inconsistent. Linter warnings are either ignored or auto-suppressed. Technical debt conversations devolve into vibes and opinions.

Structorium gives you four score types that are precise, formula-driven, resistant to gaming, and tracked over time. The strict score is your north-star metric — it can't be inflated by mass wontfix dismissals, it can't be gamed by suppressing linter rules, and it tracks improvement with real numbers across sessions. When your strict score goes from 45 to 72 over a quarter, you have evidence — not opinions.

Key features for you: Strict score as north-star metric, anti-gaming controls (5 mechanisms), dimension health breakdown, score progression across sessions.

🌐 Open Source Maintainers

Every open source maintainer's nightmare: a well-meaning contributor opens a PR that introduces 3 coupling violations, a security finding, and a layer violation — all in code that passes CI because your existing CI only checks formatting and tests. You catch it in review (if you're lucky and have the energy to review thoroughly), or it gets merged and becomes your problem.

Structorium's new-code gate automatically blocks regressions on changed lines before you ever see the PR. Contributors fix their own issues before merge. You set the policy (strict, standard, or AI-generated), and the gate does the rest. 28 languages supported out of the box.

Key features for you: New-code gate in CI, standard and strict policy profiles, 28-language support, zero-config auto-detection.

⚡ What Makes It Different

This is not incremental improvement on existing tools. This is a category difference.

Differentiator	What Every Other Tool Does	What Structorium Does Instead
Persistent state	Each run starts fresh. Zero memory. Your Tuesday scan has no idea what Monday's scan found. You fix 10 things, rescan, and see the same 400 warnings. It's Groundhog Day for code quality.	Findings survive across sessions as tracked state. Session 5 knows exactly what session 1 found, which items you fixed, which you dismissed, which regressed, and which are new. This is the foundation for everything else.
Ranked work queue	Flat list of 400 warnings. Same severity. You scroll, pick something that looks easy, fix it, feel good, ignore the 399 remaining items that include 3 critical coupling violations buried on page 7.	`next` always surfaces the single highest-impact item — tier-weighted (T4 refactors = 4× the weight of T1 auto-fixes), confidence-adjusted, cluster-aware. You never have to decide what matters. The math decides.
Subjective AI review	Doesn't exist. Linters check rules. Nobody checks "does this abstraction make sense?" or "is the API shape intuitive?" or "is this module boundary coherent?" Those questions require judgment, and linters don't have any.	12 quality dimensions (elegance, contracts, type safety, abstraction fit, design coherence, AI debt, and more) assessed by structured AI review with fail-closed validation. This is 60% of your overall score — because architecture quality is the majority of what matters.
Anti-gaming scoring	Pass/fail on rule counts. Add a `// nolint` comment, suppress the warning, score goes up. Disable 3 rules in the config, score goes up. Mark everything as "won't fix" in Jira, score goes up. None of the code actually improved.	`wontfix` still hurts strict score — the metric that matters. Scores landing suspiciously close to round targets are flagged. Attestation is required for resolution. Suspect detector drops are held for human review. Structorium assumes you're trying to cheat and demands proof otherwise.
Architecture enforcement	Block on any violation (impossible for legacy code) or block on nothing (useless). Binary choice that forces teams into either "ignore all warnings" or "spend 6 months cleaning up before you can turn on CI."	New-code gate blocks only regressions on changed lines. Turn it on day one on any legacy codebase. Legacy debt is not gated — clean it up at your own pace. But every new PR must be clean. This is how enforcement actually works in the real world.

PART 3 — QUICK START & ACTIVATION

📦 Installation

Core Install

python3 -m venv .venv && source .venv/bin/activate
pip install structorium

Optional Extras

Structorium has a modular extras system. Install only what you need:

Extra	What It Adds	Install Command
`treesitter`	Deeper AST analysis for 22 generic language plugins — function extraction, import parsing, complexity metrics, god class detection, AST smell detection	`pip install structorium[treesitter]`
`python-security`	Bandit security scanner for Python projects	`pip install structorium[python-security]`
`scorecard`	Badge and scorecard image generation for READMEs	`pip install structorium[scorecard]`
`ai`	Neo4j + Turbopuffer integration for AI-enriched review context (graph neighborhoods, vector similarity, temporal coupling)	`pip install structorium[ai]`
`full`	Everything above — all extras installed	`pip install structorium[full]`

Verify Installation

structorium --version
structorium langs          # list supported language plugins

🤖 Agent Skill Installation

Structorium ships with ready-made skill documents for 7 AI coding agents. The skill document teaches your agent how to use Structorium effectively — scan, interpret findings, fix, resolve, and review.

structorium update-skill <agent>

Agent	Command	What It Creates
Claude Code	`structorium update-skill claude`	`.claude/structorium.md`
Cursor	`structorium update-skill cursor`	`.cursor/rules/structorium.mdc`
GitHub Copilot	`structorium update-skill copilot`	`.github/copilot-instructions.md` (appended)
Windsurf (Codeium)	`structorium update-skill windsurf`	`.windsurf/rules/structorium.md`
Gemini Code Assist	`structorium update-skill gemini`	`.gemini/structorium.md`
OpenAI Codex CLI	`structorium update-skill codex`	`codex.md` or `AGENTS.md` (appended)
OpenCode	`structorium update-skill opencode`	`.opencode/structorium.md`

The skill is versioned and idempotent — running update-skill again updates to the latest version without duplicating content.

🚀 First Scan & Fix Loop

This walkthrough takes you through the complete Structorium workflow in 6 commands. After this, you'll understand state, queue, score, and the operating loop.

Step 1: Scan Your Codebase

structorium scan --path .

This runs 30+ detectors across your codebase, auto-detects languages, merges findings into persistent state, and computes all four score types. Output shows:

Finding counts by detector
New vs unchanged vs resolved counts
Score change from previous scan (if any)
Dimension health breakdown

Step 2: Check Your Status

structorium status

The status dashboard shows your current scores, dimension health bars, finding counts by tier, and score progression trend.

Step 3: Get Your Highest-Priority Fix

structorium next

Structorium surfaces the single highest-impact item to fix right now — ranked by tier weight, confidence, and cluster priority. The output includes:

Finding ID, detector, file, and line
Tier and confidence
Guidance: what to do and why
Available fixers (if auto-fixable)

Step 4: Fix It

For auto-fixable issues (T1):

structorium fix unused-imports

For manual fixes: make the change yourself, then resolve:

structorium plan done "unused::src/api/routes.ts::React" \
  --note "removed unused React import" \
  --attest "I have actually removed this import and verified the file still compiles"

Why attestation? Structorium requires you to attest that the fix was actually applied. This prevents drive-by done commands that game the score without doing real work.

Step 5: Get the Next Item

structorium next

The queue has advanced. A new highest-priority item is surfaced. Repeat steps 3-5 until you've addressed your target items.

Step 6: Rescan to Verify

structorium scan --path .

The rescan picks up your fixes, resolves findings that are genuinely gone, detects any regressions, and recomputes all scores. You should see your scores improve.

What You Now Understand

After this loop, you've experienced:

State: findings persist in .structorium/state.json
Queue: next always gives you the highest-impact item
Score: four score types that resist gaming
Loop: scan → next → fix → resolve → rescan → repeat

📋 Agent Prompt Block

Copy-paste this into your AI agent's prompt to give it full Structorium awareness:

You have access to the Structorium codebase quality tool. Use it to maintain
architectural quality as you work.

WORKFLOW:
1. Run `structorium scan --path .` to detect issues
2. Run `structorium next` to get the highest-priority item
3. Fix the issue
4. Run `structorium plan done "<id>" --note "<what>" --attest "I have actually <verified>"` to resolve
5. Run `structorium next` for the next item
6. After fixes, run `structorium scan --path .` to verify improvement

KEY COMMANDS:
- `structorium status` — score dashboard
- `structorium next --count 5` — top 5 priorities
- `structorium show <file>` — findings for a specific file
- `structorium tree` — annotated codebase tree
- `structorium fix <fixer>` — auto-fix (unused-imports, unused-vars, unused-params, debug-logs, dead-useeffect, empty-if-chain)
- `structorium plan` — full ranked plan
- `structorium plan cluster create <name>` — group related issues
- `structorium review --prepare` — prepare subjective review packet

SCORE TYPES (track all four):
- Overall: broad health (failures = open)
- Objective: mechanical only (failures = open)
- Strict ⭐: north-star metric (failures = open + wontfix)
- Verified strict: highest confidence (failures = open + wontfix + fixed + false_positive)

RULES:
- Always attest your fixes honestly
- wontfix still hurts strict score — don't dismiss issues casually
- Run scan after significant changes to track improvement
- Use `next` to stay focused — don't cherry-pick easy wins

PART 4 — THE OPERATING LOOP

🔄 The Operating Loop

Structorium operates as a continuous improvement loop with 7 stages. This isn't a linear pipeline — it's a flywheel. Each stage feeds the next, state persists across sessions, and progress compounds over time. The more you use it, the smarter and more valuable it becomes.

Think of it as the quality equivalent of a CI/CD pipeline: just as CI/CD made "ship and pray" obsolete for deployments, the operating loop makes "lint and forget" obsolete for architecture quality.

Stage	Command	What Happens	What Changes in State
1. SCAN	`structorium scan`	30+ detectors run across all files in 3 parallel lanes — mechanical, security, subjective. Languages auto-detected across 28 plugins. Every file is classified, analyzed, and scored.	New findings added. Resolved findings marked. All four score types recomputed.
2. STATE	(automatic)	Findings merged into `.structorium/state.json` using merge semantics — not replacement. New findings added, existing findings preserved with updated timestamps, genuinely-gone findings resolved. Suspect detector drops are held, not auto-resolved.	State file updated. History preserved. Nothing lost.
3. NEXT	`structorium next`	Priority queue does the thinking for you. Surfaces the single highest-impact item — ranked by tier weight × confidence × cluster focus. This is the command AI agents call in autonomous loops.	Nothing changes — `next` is read-only. Pure query.
4. FIX	`structorium fix` or manual	Auto-fixer runs (6 available for T1 items), or you make the change manually. This is where code actually changes.	Source code changed. State unchanged until rescan.
5. RESOLVE	`structorium plan done`	You attest the resolution: `fixed`, `wontfix`, or `false_positive`. Note and attestation required — you must describe what you actually did. No drive-by closures.	Finding status updated in state. Queue reranked. Scores reflect new resolution.
6. REVIEW	`structorium review`	AI subjective review assesses 12 quality dimensions with fail-closed import validation. If any finding is invalid, skipped, or inconsistent — the entire import is rejected. No partial results contaminate state.	Subjective scores updated. 60% of overall score affected.
7. GATE	CI integration	New-code gate evaluates changed lines against policy thresholds. Three profiles (strict/standard/ai_generated). Pass or fail — and fail is merge-blocking.	Gate status recorded. Regressions blocked before they enter the codebase.

The key insight: Remove any one stage and the system degrades. Without persistence, you lose continuity — every session starts from zero. Without ranking, you lose focus — you cherry-pick easy wins and ignore critical issues. Without subjective review, you lose depth — linters can't see architecture. Without strictness, you lose honesty — people game the metric. Without the gate, you lose enforcement — quality becomes optional. All seven stages are load-bearing.

🔬 Scan Deep Dive

What Scan Does (8 Steps)

When you run structorium scan --path ., the following happens in order:

Step	What Happens	Key Detail
1	Discover files	Walk the project tree, apply exclusions and zone classifications
2	Resolve languages	Auto-detect language for each file. 28 languages supported. Configurable via `--lang`
3	Run mechanical detectors	Parallel execution of 25+ mechanical detectors: unused, structural, coupling, dupes, cycles, naming, orphaned, patterns, etc.
4	Run security scanners	Language-specific security tools (bandit for Python, semgrep rules, etc.)
5	Run subjective review	If configured, assess 12 quality dimensions via AI review
6	Normalize findings	Deduplicate, classify by tier (T1-T4), assign confidence (high/medium/low)
7	Merge into state	Compare new findings with existing state. Add new, preserve existing, resolve genuinely-gone
8	Compute scores	Calculate all four score types across all dimensions

Scan Profiles

Profile	Flag	What Runs	Use Case
`objective`	`--profile objective`	Mechanical detectors only. No subjective review.	Fast quality snapshot
`full`	(default)	All detectors including subjective if configured	Complete analysis
`ci`	`--profile ci`	All detectors + new-code gate evaluation	CI/CD pipeline integration

Suspect Detector Protection

If a detector that previously reported 40 findings suddenly reports 0, Structorium does not silently mark them all as resolved. Instead:

The sudden-drop event is flagged as suspect
Previous findings are held in state, not auto-resolved
A warning is emitted in scan output
This prevents tool misconfiguration or environment issues from silently inflating scores

Optional Tool Degradation

When an external tool (e.g., bandit, knip, rubocop) is not installed:

Structorium does not crash or skip the entire language
The affected detector runs with reduced confidence
Findings are still generated from available sources
A warning is emitted noting the missing tool

Key Scan Flags

Flag	What It Does
`--path <dir>`	Scan a specific directory (default: current directory)
`--lang <lang>`	Force a specific language (skip auto-detection)
`--profile <name>`	Scan profile: `objective`, `full`, `ci`
`--skip-slow`	Skip long-running detectors for faster iteration
`--exclude <pattern>`	Exclude path pattern (repeatable)

Source: app/commands/scan/scan_workflow.py, app/commands/scan/scan_reporting_dimensions.py

📊 Next & Ranking Deep Dive

How `next` Decides

The next command doesn't just return the first finding — it computes a priority score for every open finding and surfaces the highest:

Factor	How It Affects Ranking
Tier weight	T1 (auto_fix) = 1×, T2 (quick_fix) = 2×, T3 (judgment) = 3×, T4 (major_refactor) = 4×
Confidence	High = 1.0, Medium = 0.7, Low = 0.3
Detector type	Detectors with available auto-fixers are surfaced earlier for quick wins
Cluster focus	If a cluster is focused (`plan focus <cluster>`), only items in that cluster appear
Review weighting	Items with subjective review findings get boosted priority

Why Guidance Quality Matters

Every finding in next output includes guidance — a human-readable explanation of what to do and why. This is critical for AI agents that need actionable instructions, not just a finding name.

Example next output:

#1 [T3] coupling :: src/api/handler.ts → src/internal/auth.ts
   Guidance: fix boundary violations with `structorium move`
   Tool: move
   Confidence: high
   Cluster: api-cleanup

`next` Command Variants

Command	What It Shows
`structorium next`	Single highest-priority item with full detail
`structorium next --explain`	Extended reasoning for the priority decision
`structorium next --tier 3`	Only judgment-tier items (filter by tier)
`structorium next --cluster <name>`	Only items in a specific cluster
`structorium next --count 5`	Top 5 items in ranked order

Source: engine/_work_queue/ranking.py, engine/_work_queue/core.py

📋 Plan Deep Dive

The plan command gives you full control over the work queue. It's the workflow control surface for managing priorities, grouping related issues, and tracking progress.

All Plan Operations

Operation	Command	What It Does
View plan	`structorium plan`	Full prioritized markdown of all open findings
View queue	`structorium plan queue`	Compact table of all open items
Mark done	`structorium plan done "<id>" --note "..." --attest "..."`	Resolve a finding with attestation
Move to top	`structorium plan move "<pat>" top`	Reorder — push an item to the front
Create cluster	`structorium plan cluster create <name>`	Group related findings by name
Focus cluster	`structorium plan focus <cluster>`	`next` only returns items from this cluster
Unfocus	`structorium plan unfocus`	Remove cluster focus
Defer	`structorium plan defer "<pat>"`	Push item to the back of the queue
Skip	`structorium plan skip "<pat>"`	Hide from `next` (still in state)
Reopen	`structorium plan reopen "<pat>"`	Reopen a resolved finding

Work Queue as a Workflow Surface

The plan isn't just a list — it's a workflow management tool:

Clusters group related issues. E.g., create api-cleanup cluster for all API boundary violations, then plan focus api-cleanup to work through them systematically.
Focus scopes next to a single cluster — useful for sprint planning or deep-dive sessions.
Defer/skip let you manage noise without dismissing issues — deferred items return to the queue later, skipped items stay in state but don't appear in next.

Source: engine/planning/, app/commands/plan/

🔧 Fix, Resolve, Move & Anti-Gaming

Auto-Fixers

Structorium ships with 6 auto-fixers that handle T1 (auto_fix tier) items automatically:

Fixer	What It Does	Target Detector
`unused-imports`	Removes dead import statements	`unused`
`unused-vars`	Removes unused variable declarations	`unused`
`unused-params`	Removes unused function parameters	`unused`
`debug-logs`	Removes `console.log`, `print()`, `debug()` statements	`logs`
`dead-useeffect`	Removes empty React `useEffect` hooks	`smells`
`empty-if-chain`	Removes empty `if`/`else` blocks	`smells`

Usage:

structorium fix unused-imports          # fix one type
structorium fix unused-imports --dry    # preview changes without applying

Resolution Statuses

Every finding has a resolution status that determines how it affects each score type:

Status	How You Set It	What It Means
`open`	(default — set by scan)	Finding is active and unresolved
`fixed`	`plan done "<id>" --status fixed --attest "..."`	You fixed the issue and attested to it
`wontfix`	`plan done "<id>" --status wontfix --note "..."`	You're deliberately not fixing it — and you know it hurts strict score
`false_positive`	`plan done "<id>" --status false_positive`	The detector was wrong — this isn't actually an issue

How Resolution Status Affects Scores

This is the most important table in this document for understanding Structorium's scoring philosophy:

Status	Overall	Objective	Strict ⭐	Verified Strict
`open`	❌ Fails	❌ Fails	❌ Fails	❌ Fails
`fixed`	✅ Passes	✅ Passes	✅ Passes	❌ Fails
`wontfix`	✅ Passes	✅ Passes	❌ Fails	❌ Fails
`false_positive`	✅ Passes	✅ Passes	✅ Passes	❌ Fails

Key insight: wontfix passes overall/objective but still fails strict score. This is by design — dismissing debt is not the same as fixing it. The strict score is the north-star metric because it cannot be gamed by mass wontfix dismissals.

Move — Repository Surgery

The move command relocates files and automatically rewrites all import references across the codebase:

structorium move src/utils/helpers.ts src/lib/string-utils.ts

This:

Moves the file to the new location
Updates every import across the project that referenced the old path
Language-aware path resolution (handles relative/absolute imports)
Works for: orphaned, flat_dirs, naming, coupling, facade detector findings

Supported languages for move: Python, TypeScript, C#, Dart, GDScript, Go

Anti-Gaming Philosophy

Structorium is designed to be gamed-resistant by default. Five mechanisms prevent score inflation without real improvement:

Mechanism	How It Works
Wontfix penalty	`wontfix` passes overall but FAILS strict score. Mass dismissal shows up immediately.
Attestation requirement	`plan done` requires `--attest` for `fixed` status. You must describe what you actually did.
Target match detection	Subjective scores within ±0.05 of target are flagged as potential gaming. `SUBJECTIVE_TARGET_MATCH_TOLERANCE = 0.05`
Fail-closed review import	If any finding in a review import is invalid, skipped, or inconsistent, the entire import is aborted. No partial results.
Suspect detector drops	If a detector suddenly reports 0 findings (down from 40+), the drop is held for review rather than auto-resolving.

The philosophy: A perfect score achieved dishonestly is useless. Structorium's scoring is strict-first — the system assumes you're trying to cheat and requires proof otherwise. This makes genuinely good scores trustworthy.

Source: engine/_scoring/policy/core.py, intelligence/integrity.py

PART 5 — REVIEW SYSTEM

📝 Review System Deep Dive

Why Mechanical Evidence Is Not Enough

Here's the uncomfortable truth about linters: they can only see what can be expressed as rules. And the most important quality properties of a codebase — the ones that determine whether your project survives its second year or collapses under its own weight — cannot be expressed as rules.

Property	Can a Linter See It?	Can Structorium Review See It?
Is this abstraction at the right level — or is it over-engineered / under-engineered?	❌ Impossible	✅
Does the error handling pattern make architectural sense?	❌ Impossible	✅
Is this API shape intuitive to a new developer?	❌ Impossible	✅
Does this module have clear, documented boundaries?	❌ Impossible	✅
Is the naming consistent with the rest of the project conventions?	⚡ Partially	✅
Does the overall design cohere — or are there 3 competing patterns?	❌ Impossible	✅
Does this code look like it was generated by AI and pasted without review?	❌ Impossible	✅

This is why Structorium gives 60% of the overall score to subjective review — not as a nice-to-have, but as the majority signal. Architecture quality matters more than any individual rule violation. A codebase that passes every lint rule but has incoherent module boundaries, inconsistent patterns, and brittle abstractions is a ticking time bomb.

The Three-Step Review Workflow

# Step 1: Prepare the review packet
structorium review --prepare
# Creates: .structorium/reviews/query.json
# Contains: source code, existing findings, historical status per file

# Step 2: Run review batches with an AI runner
structorium review --run-batches --runner codex --parallel
# Splits files into 3-4 independent batches
# Each batch assessed by the runner in isolation
# Runners: codex, claude, external

# Step 3: Import results under integrity constraints
structorium review --import .structorium/reviews/latest.json
# Validates schema, consistency, completeness
# ANY failure → ENTIRE import aborted (fail-closed)

Fail-Closed Import Validation

When you import review results, Structorium validates every field before accepting:

Check	What It Validates	On Failure
Schema validation	JSON structure matches expected format	❌ Full import aborted
Score-feedback consistency	Scores align with written assessments	❌ Full import aborted
Completeness	All requested files were assessed	❌ Full import aborted
Dimension coverage	All 12 dimensions have scores	❌ Full import aborted
No skipped findings	Every existing finding was addressed	❌ Full import aborted

Why fail-closed? Partial review results would create inconsistent state — some files scored, others not. This would make the overall score meaningless. Better to reject and re-run than to accept incomplete data.

External Review Sessions

For review runners that aren't directly integrated (e.g., Claude cloud):

structorium review --external-start --external-runner claude
# Generates the review packet for external processing
# You paste the packet into Claude, get assessments back
# Then import the results

Review with Retrospective Context

The --retrospective flag includes historical issue status in the review packet, so the reviewer sees what changed since the last assessment:

structorium review --prepare --retrospective

Source: app/commands/review/cmd.py, app/commands/review/prepare.py, app/commands/review/batches.py, app/commands/review/import_cmd.py

📐 Subjective Dimensions Reference

Structorium's subjective review assesses 12 quality dimensions, each with a specific weight that determines its contribution to the subjective score pool (60% of overall):

Dimension	Weight	What It Assesses
High elegance	22.0	Top-tier files: are they simple, clear, beautifully structured? Would a senior engineer admire this code?
Mid elegance	22.0	Average files: decent organization, readable, follows conventions — but not exceptional
Low elegance	12.0	Bottom-tier files: messy, confusing, friction-heavy. Pain to work in.
Contracts	12.0	Interface clarity — are API surfaces well-defined? Are module boundaries respected?
Type safety	12.0	Type discipline — are types precise and meaningful, or loose and permissive?
Abstraction fit	8.0	Is the abstraction level right for the problem? Over-abstracted? Under-abstracted?
Logic clarity	6.0	Control flow readability — are conditionals clear? Are state transitions understandable?
Structure navigation	5.0	File/module layout — can you find what you need? Is the project navigable?
Error consistency	3.0	Error handling patterns — are they consistent? Are edge cases covered?
Naming quality	2.0	Identifier naming — are names clear, consistent, and convention-following?
AI generated debt	1.0	AI-specific debt — patterns typical of AI-generated code (plausible but wrong, duplicated utilities, convention violations)
Design coherence	10.0	Architectural coherence — does the overall design make sense? Are patterns consistent across modules?

Why These Weights?

The weights emphasize elegance (44.0 combined for high + mid) and contracts/type safety (24.0 combined) because these are the most impactful quality properties:

Elegance determines whether code is a joy or a nightmare to modify
Contracts and types determine whether modules can be safely composed
Design coherence catches systemic problems that don't show up as individual findings

Lower-weight dimensions (naming, AI debt) are still assessed — they just don't dominate the score because a badly-named file with great structure is still better than a well-named file with terrible architecture.

Source: engine/_scoring/policy/core.py (lines 163-178)

🧠 AI Context Layer

A human code reviewer doesn't just read the file in front of them. They bring years of context — they know which files tend to break together, which modules are frequently imported, what the codebase looked like 6 months ago. Structorium's AI context layer replicates this by enriching every review packet with 6 layers of intelligence that transform raw source code into deeply contextualized review input.

The Provider Stack

Layer	Provider	What It Does	Config Key
1	OpenAI	Generates dense vector embeddings of source files	`OPENAI_API_KEY` + `ai_embedding_model`
2	Turbopuffer	Stores vectors persistently. Retrieves semantically similar code segments.	`TURBOPUFFER_API_KEY`
3	Cohere	Reranks retrieved segments for relevance to the review context	`COHERE_API_KEY` + `ai_reranker_model`
4	Neo4j	Graph database storing import/dependency/call relationships. Finds architectural neighbors and ripple-risk zones.	`NEO4J_URI`, `NEO4J_USERNAME`, `NEO4J_PASSWORD`
5	Review Memory	Historical review data persisted across sessions. Reviewer sees what changed since last assessment.	Automatic (`.structorium/`)
6	Temporal Coupling	Analyzes git commit history for co-change patterns. Files that always change together = latent coupling.	Automatic (`git log`)

What Each Layer Contributes to Review Context

Layer	Example Enrichment
Semantic neighbors	"This file is 92% similar to `engine/_scoring/compute.py` — review for consistency"
Graph neighbors	"This module is imported by 14 files — changes here have high ripple risk"
Review memory	"Last reviewed in session 4: elegance was 7/10, contracts 6/10. 14 lines changed since."
Temporal coupling	"This file co-changes with `state.py` 82% of the time — likely latent coupling"

Graceful Degradation

Every layer is optional. If a provider isn't configured, Structorium skips that enrichment layer and continues with whatever context is available:

Configuration	What You Get
No API keys at all	Basic review with source code only — still functional
OpenAI only	Embeddings + similarity for semantic neighbors
OpenAI + Cohere	Similarity + reranking for more relevant context
OpenAI + Cohere + Turbopuffer	All above + persistent vector storage
Full stack (+ Neo4j)	All above + graph neighborhoods + temporal coupling

Setup

# API keys (set in environment or via config)
structorium config set OPENAI_API_KEY sk-...
structorium config set COHERE_API_KEY ...
structorium config set TURBOPUFFER_API_KEY ...

# Neo4j (via Docker Compose)
docker compose -f docker-compose.neo4j.yml up -d
structorium config set NEO4J_URI bolt://localhost:7687
structorium config set NEO4J_USERNAME neo4j
structorium config set NEO4J_PASSWORD <password>

Source: intelligence/ai/, docs/AI_STACK.md

PART 6 — ARCHITECTURE ENFORCEMENT

🚧 New-Code Gate

The Concept

The new-code gate solves the most common excuse in software engineering: "We can't turn on quality enforcement because our legacy code has too many issues."

Every other quality gate forces you into a binary choice:

Block on everything — which means your 500-finding legacy codebase can never pass CI, which means you never turn on enforcement, which means quality stays optional forever.
Block on nothing — which means the gate is decorative. It exists. It does nothing. Warnings pile up.

Structorium's new-code gate takes a different approach: it only evaluates findings on changed lines. Legacy debt is not gated. You clean it up at your own pace, when you choose, on your timeline. But every new PR — every piece of new code your team writes or your AI agent generates — must meet the quality standard. From day one. No exceptions.

How It Works (4 Steps)

Step	What Happens
1. Git diff	Compute changed file + line ranges from `git diff --unified=0 base...HEAD`
2. Classify	Match open findings against changed line ranges — only findings on new/changed code are "in scope"
3. Policy check	Compare in-scope findings against policy thresholds (max findings, max high, max critical, blocked detectors)
4. Result	Pass (PR can merge) or Fail (merge-blocking — findings must be resolved first)

Three Policy Profiles

Structorium ships with three built-in policy profiles. Each has different threshold strictness:

Policy	`max_new_findings`	Blocked Detectors	Use Case
`strict`	0	`security`, `layer_violation`, `private_imports`	Zero tolerance for new issues
`standard`	3	`security`	Reasonable for most teams
`ai_generated_code`	1	`security`, `layer_violation`, `private_imports`, `coupling`	Tighter control on AI-generated PRs

Blocked Detectors

Some detectors are always blocked regardless of threshold — if any new finding matches a blocked detector, the gate fails immediately:

security — security findings on new code are never acceptable
layer_violation — new architectural violations break the dependency structure
private_imports — new private imports create hidden coupling
coupling — (in ai_generated_code profile) AI agents tend to create coupling

Configuration

# Enable the gate
structorium config set new_code_gate_enabled true

# Set policy profile
structorium config set new_code_gate_policy strict

# Override individual thresholds
structorium config set new_code_gate_max_new_findings 0
structorium config set new_code_gate_max_new_high 0
structorium config set new_code_gate_max_new_critical 0

# Set base ref for diff
# Default: origin/main
structorium config set new_code_gate_base_ref origin/develop

Source: intelligence/new_code_gate.py

🏷️ Zone Classifications

Zones determine which files are scored and how security findings are filtered. Not all code is production code — and not all code should affect your quality score.

Zone Types

Zone	Scored?	Security Findings?	Typical Paths
`production`	✅ Yes	✅ Yes	`src/`, `lib/`, `app/`
`script`	✅ Yes	✅ Yes	`scripts/`, `bin/`, `tools/`
`test`	❌ No	❌ Excluded	`tests/`, `__tests__/`, `spec/`
`config`	❌ No	❌ Excluded	`config/`, `.config.js`, `.toml`
`generated`	❌ No	❌ Excluded	`generated/`, `.gen.ts`, `.pb.go`
`vendor`	❌ No	❌ Excluded	`vendor/`, `node_modules/`, `third_party/`

Zone Commands

structorium zone show                    # show all zone classifications
structorium zone set src/scripts script  # classify path as script zone
structorium zone clear src/scripts       # remove zone override

Source: engine/policy/zones.py, engine/policy/zones_data.py

⚙️ CI Integration Playbook

GitHub Actions

name: Structorium Gate
on: [pull_request]

jobs:
  quality-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # full history for git diff
      
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install Structorium
        run: pip install structorium[full]
      
      - name: Run scan with CI profile
        run: structorium scan --path . --profile ci
      
      - name: Check new-code gate
        run: structorium status --gate-check
        # Exits non-zero if gate fails → blocks PR merge

GitLab CI

structorium-gate:
  stage: test
  image: python:3.11
  script:
    - pip install structorium[full]
    - structorium scan --path . --profile ci
    - structorium status --gate-check
  allow_failure: false

Generic CI Script

#!/bin/bash
set -e
pip install structorium[full]
structorium scan --path . --profile ci
structorium status --gate-check
echo "Quality gate passed ✅"

PART 7 — SCORING MODEL

📊 Scoring Model Deep Dive

The 4 Score Types

Structorium doesn't give you one number and call it a day. It computes four progressively stricter score types, each designed to catch a different kind of dishonesty. They all look at the same findings — the difference is what counts as a "failure" in each mode:

Score	Counts as Failure	North Star?	Why It Exists
Overall	`open` only		The broadest, most forgiving view. Good for dashboards and high-level trending. But easy to game — just dismiss everything as `wontfix`.
Objective	`open` (mechanical dimensions only)		The mechanical-only lens. How clean is the code ignoring subjective assessment? Useful for comparing tooling quality, but misses 60% of what matters.
Strict ⭐	`open` + `wontfix`	⭐ Yes	The metric that matters. The only way to improve this score is to actually fix things or get better review scores. Mass `wontfix` dismissals — which inflate overall score — still hurt strict. This is the metric you track day-to-day, sprint-to-sprint, quarter-to-quarter.
Verified Strict	`open` + `wontfix` + `fixed` + `false_positive`		The maximum-paranoia metric. Even `fixed` and `false_positive` count as failures — only genuinely absent issues pass. Useful for audits and highest-confidence reporting. Too strict for daily tracking, but invaluable when you need absolute trust.

Why Strict Is the North Star

Let's be blunt about why the other scores are insufficient:

Overall can be trivially gamed by marking everything wontfix. Your score jumps from 45 to 85 overnight. Your code is exactly as terrible as it was yesterday. The number is a lie.
Objective ignores subjective quality entirely — which means it ignores 60% of what matters. A codebase that passes every lint rule but has incoherent architecture, brittle abstractions, and AI-generated spaghetti gets a high objective score. Useless.
Strict combines both halves AND penalizes wontfix. The only way to improve it: actually fix things, get better review scores, or prevent new regressions. There's no shortcut. There's no cheat code. That's why it's the north star.
Verified strict goes even further (penalizes fixed and false_positive too) — useful as an audit metric, but too strict for daily tracking because it requires zero uncertainty.

📐 Score Composition Formula

Top-Level Split

Overall Score = 40% × Mechanical Pool + 60% × Subjective Pool

This is defined in engine/_scoring/policy/core.py:

MECHANICAL_WEIGHT_FRACTION = 0.40
SUBJECTIVE_WEIGHT_FRACTION = 0.60

Mechanical Pool (40%)

The mechanical pool is computed from 5 dimensions, each with a weight:

Dimension	Weight	Detectors Assigned
File health	2.0	`structural`
Code quality	1.0	`unused`, `logs`, `exports`, `deprecated`, `props`, `smells`, `react`, `orphaned`, `naming`, `facade`, `patterns`, `single_use`, `coupling`, `dict_keys`, `flat_dirs`, `global_mutable_config`, `private_imports`, `layer_violation`, `stale_exclude`, `responsibility_cohesion`, `uncalled_functions`, `signature`
Duplication	1.0	`dupes`, `boilerplate_duplication`
Test health	1.0	`test_coverage`, `subjective_review`
Security	1.0	`security`, `cycles`

Why File Health gets 2× weight: Structural issues (god files, massive modules) are force-multipliers — they make every other problem worse. A 1000-line file with 3 issues is harder to fix than three 100-line files with 1 issue each.

Subjective Pool (60%)

The subjective pool is computed from 12 dimensions (see Subjective Dimensions Reference for full details), with weights from SUBJECTIVE_DIMENSION_WEIGHTS:

high elegance:     22.0    |    logic clarity:        6.0
mid elegance:      22.0    |    structure nav:         5.0
low elegance:      12.0    |    error consistency:     3.0
contracts:         12.0    |    naming quality:        2.0
type safety:       12.0    |    ai generated debt:     1.0
abstraction fit:    8.0    |    design coherence:     10.0

Tier Weighting

Within each dimension, findings are weighted by their tier:

Tier	Action Type	Weight	Meaning
T1	`auto_fix`	1×	Can be fixed automatically — lowest impact per finding
T2	`quick_fix`	2×	Quick manual fix — moderate impact
T3	`judgment`	3×	Requires design judgment — significant impact
T4	`major_refactor`	4×	Major refactoring needed — highest impact per finding

Confidence Weighting

Finding confidence affects how much weight it gets in the score:

Confidence	Weight	When
High	1.0	Full external tool support, deterministic detection
Medium	0.7	Heuristic detection or partial tool support
Low	0.3	Speculative or environment-dependent

MIN_SAMPLE Dampening

If a dimension has fewer than MIN_SAMPLE = 200 checks, its weight is proportionally reduced:

effective_weight = base_weight × min(actual_checks / MIN_SAMPLE, 1.0)

This prevents small-sample dimensions from swinging the overall score. A project with only 5 files shouldn't have its structural score dominate — the sample is too small for confidence.

Source: engine/_scoring/policy/core.py (lines 127-159)

🛡️ Anti-Gaming Deep Dive

Score Gaming Detection

Structorium actively watches for gaming patterns:

Pattern	How It's Detected	What Happens
Mass wontfix	`wontfix` count increases while `fixed` stays flat	Strict score drops. Gap between overall and strict widens — visible in `status`.
Target matching	Subjective score lands within ±0.05 of a round target (e.g., 80.00)	Flagged as potential gaming. `SUBJECTIVE_TARGET_MATCH_TOLERANCE = 0.05`
Sudden detector drops	A detector goes from 40 findings to 0	Findings held in state (not auto-resolved). Warning emitted.
Partial review import	Some findings in review are invalid or skipped	Entire import aborted. No partial results accepted.

Failure Statuses by Score Mode

The exact definitions from source code (engine/_scoring/policy/core.py line 183):

FAILURE_STATUSES_BY_MODE = {
    "lenient":         frozenset({"open"}),
    "strict":          frozenset({"open", "wontfix"}),
    "verified_strict": frozenset({"open", "wontfix", "fixed", "false_positive"}),
}

The Suppression Gap

A useful diagnostic: compare your overall score (lenient mode) to your strict score:

Gap	What It Means
Overall ≈ Strict (< 2 points)	Healthy — very few wontfix dismissals
Overall > Strict by 5-10	Some debt is being dismissed — review wontfix decisions
Overall > Strict by 15+	Significant gaming — many issues marked wontfix instead of fixed

This gap is visible in structorium status and tracked across sessions.

Source: engine/_scoring/policy/core.py, intelligence/integrity.py

PART 8 — PLUGIN ARCHITECTURE

🔌 Plugin Architecture

Structorium supports 28 programming languages through a layered plugin architecture so clean that adding a new language can never break an existing one. This is not an accident — it's a hard architectural rule enforced by import direction constraints.

Architecture Layers

Layer	What It Contains	Key Rule
Core Engine (bottom)	Detectors, scoring, state management, work queue	Shared by all languages. Never imports from `languages/`.
Shared Framework (middle)	`languages/_framework/`: contracts, phase builders, tree-sitter integration, review data	Common infrastructure for all plugins. Handles generic analysis.
Language Plugins (top)	`languages/<name>/`: 6 full plugins + 22 generic plugins	Import from framework and engine. Plugins never import each other.

Import Direction Rule

✅ Plugin → Framework → Engine (allowed)
❌ Engine → Plugin (NEVER)
❌ Plugin → Plugin (NEVER)

This strict import direction ensures:

Adding a new language can never break existing ones
The engine is language-agnostic — it works with any plugin
Plugins are isolated — a Python bug can't crash the TypeScript scanner

📦 Full Plugin Contract

A full plugin (6 languages) provides deep, language-aware integration. The required package structure:

languages/<name>/
├── __init__.py          # @register_lang() — config, markers, extensions
├── commands.py          # Language-specific CLI commands
├── extractors.py        # Source code extractors (functions, imports, classes)
├── phases.py            # Detector phase definitions
├── move.py              # File relocation + import path rewriting
├── review.py            # Subjective review dimension definitions
├── test_coverage.py     # Test-to-source mapping logic
├── detectors/           # Custom language-specific detectors
├── fixers/              # Auto-fixer implementations
└── tests/               # Language-specific test suite

What Full Plugins Get (Beyond Generic)

Capability	Generic	Full
External linter wrapping	✅	✅
Security scanning	✅	✅
Subjective review	✅	✅ (custom dimensions)
Boilerplate detection	✅	✅
Zone classification	✅	✅
Scoring integration	✅	✅
Tree-sitter AST (if installed)	✅	✅
Custom smell detectors	❌	✅
Language-aware auto-fixers	❌	✅
Custom review dimensions	❌	✅
Framework-specific patterns	❌	✅ (e.g., React for TS, Flutter for Dart)
Move + import rewriting	❌	✅

🔧 Generic Plugin System

A generic plugin (22 languages) provides solid coverage with minimal code. Most generic plugins are a single __init__.py file calling the generic_lang() factory:

# languages/rust/__init__.py (simplified example)
from languages._framework.generic import generic_lang

config = generic_lang(
    name="rust",
    extensions=[".rs"],
    root_markers=["Cargo.toml"],
    tools=[
        {"name": "cargo clippy", "cmd": ["cargo", "clippy", "--message-format=json"]},
        {"name": "cargo check",  "cmd": ["cargo", "check",  "--message-format=json"]},
    ],
    treesitter_lang="rust",
)

What Tree-sitter Adds to Generic Plugins

When pip install structorium[treesitter] is installed, generic plugins gain AST-powered analysis:

Capability	Without Tree-sitter	With Tree-sitter
Function extraction	❌	✅ Names, ranges, complexity scores
Import parsing	❌	✅ Import statements with source resolution
Complexity metrics	❌	✅ Cyclomatic complexity per function
God class detection	❌	✅ Large classes with many methods
Unused import detection	❌	✅ Cross-reference imports vs usage
AST smell detection	❌	✅ Language-generic code smells
Cohesion analysis	❌	✅ Module cohesion metrics

Upgrade Path

Generic plugins can be upgraded incrementally:

Generic → Basic linter wrapping + optional tree-sitter
Extended-in-place → Add custom detectors or fixers to the generic plugin
Full plugin → Scaffold with structorium dev scaffold-lang <name> and implement the full contract

🧩 Shared Framework Deep Dive

The languages/_framework/ directory provides the shared infrastructure that powers all plugins:

languages/_framework/
├── generic.py           # generic_lang() factory for single-file plugins
├── base/                # Core contracts and shared phase builders
│   ├── contracts.py     # LangConfig, LangRun protocol definitions
│   ├── phase_builders.py # Shared detector phase implementations
│   ├── structural.py    # Structural analysis (file size, complexity)
│   └── shared_phases.py # Phases available to all plugins
├── treesitter/          # Optional tree-sitter integration
│   ├── specs.py         # Language spec definitions
│   ├── extractors.py    # AST-based code extraction
│   ├── imports.py       # Import statement parsing
│   ├── complexity.py    # Cyclomatic complexity computation
│   ├── smells.py        # AST-based smell detection
│   ├── cohesion.py      # Module cohesion analysis
│   └── unused_imports.py # Cross-reference import usage
├── runtime.py           # LangRun per-invocation mutable state
├── resolution.py        # Language detection and resolution
├── discovery.py         # Plugin auto-discovery via importlib
├── commands_base.py     # Shared detect-command factories
└── review_data/         # Shared review dimension JSON payloads

Key Contracts

LangConfig: Static configuration for a language plugin — extensions, markers, tools, capabilities. Set once at registration. Never mutated.
LangRun: Per-invocation mutable state — file lists, findings, scores. Created fresh for each scan. Discarded after.
Phase protocol: Each detector phase is a function (LangRun) → list[Finding]. Phases can be composed, filtered, and ordered.

🌍 Language Coverage Atlas

Full Plugins (6 Languages — Deep Integration)

Language	External Tools	Custom Detectors	Auto-Fixers	Move/Rewrite	Review Dimensions
🐍 Python	ruff, bandit, import-linter	AST smells, dict keys, security patterns	`unused-imports`, `unused-vars`, `unused-params`, `debug-logs`	✅ Import rewriting	✅ Custom
📘 TypeScript	knip, biome	React patterns, props, exports, concerns	`unused-imports`, `unused-vars`, `unused-params`, `debug-logs`, `dead-useeffect`, `empty-if-chain`	✅ Import rewriting	✅ Custom
🔷 C# / .NET	dotnet analyzers	Structural, coupling	—	✅ Using directives	✅ Custom
🎯 Dart	dart analyze, flutter test	Flutter patterns	—	✅ Package imports	✅ Custom
🎮 GDScript	gdtoolkit	Godot scene-aware	—	✅ Preload/load	✅ Custom
🐹 Go	golangci-lint, go vet	—	—	✅ Package imports	✅ Custom

Generic Plugins (22 Languages — Linter Wrappers + Tree-sitter)

Language	External Tools	Tree-sitter
🦀 Rust	cargo clippy, cargo check	✅
💎 Ruby	rubocop	✅
☕ Java	checkstyle, pmd	✅
🟣 Kotlin	ktlint, detekt	✅
🍎 Swift	swiftlint	✅
🟨 JavaScript	(via TypeScript plugin)	✅
🐘 PHP	phpstan, psalm	✅
🔴 Scala	scalafmt, scalafix	✅
💧 Elixir	credo	✅
λ Haskell	hlint	✅
🌙 Lua	luacheck	✅
🐪 Perl	perlcritic	✅
📊 R	lintr	✅
⚡ C/C++	clang-tidy, cppcheck	✅
🔷 F#	fantomas	✅
🐫 OCaml	—	✅
👑 Nim	—	✅
⚡ Zig	—	✅
🟢 Clojure	clj-kondo	✅
📡 Erlang	elvis	✅
🐚 Bash	shellcheck	✅
💠 PowerShell	PSScriptAnalyzer	✅

What Every Plugin Gets (Full or Generic)

Regardless of plugin tier, every language gets:

✅ Security scanning
✅ Subjective AI review (12 dimensions)
✅ Boilerplate duplication detection
✅ Zone classification
✅ Scoring integration (4 score types)
✅ Priority queue ranking
✅ State persistence

Adding a New Language

# Scaffold a new full plugin
structorium dev scaffold-lang <name> --extension .ext --marker <root-file>

# Or create a minimal generic plugin
# Create: languages/<name>/__init__.py
# Use the generic_lang() factory (see Generic Plugin System above)

Source: languages/_framework/, languages/__init__.py, languages/README.md

PART 9 — COMPETITIVE POSITIONING

🏆 Structorium vs Traditional Linters

Capability	Structorium	ESLint / Ruff / RuboCop
Purpose	Codebase quality operating system	Rule-based code linting
State	Persistent — findings survive across sessions	Stateless — each run starts fresh
Multi-language	28 languages via plugin system	1 language per tool
Ranking	Tier-weighted priority queue	Flat list (all warnings equal)
Review	12-dimension AI subjective review (60% of score)	None — rules only
Scoring	4 score types with anti-gaming	Pass/fail on rule count
CI gate	New-code gate (line-scoped, policy profiles)	Block on any violation
Architecture	Detects coupling, god files, layer violations, cycles	Syntax/style rules only
Fix tracking	Resolution status (fixed/wontfix/false_positive) with attestation	Not applicable
Agent support	Skill system for 7 AI agents	No agent integration

Summary: Linters are components of Structorium — it wraps ruff, bandit, knip, rubocop, etc. and adds state, ranking, scoring, and enforcement on top.

🏢 Structorium vs Enterprise SAST

Capability	Structorium	SonarQube	GitHub CodeQL
Deployment	CLI — zero infrastructure	Server (requires DB, Compute, ElasticSearch)	GitHub-hosted or self-hosted runner
Cost	Open source (MIT)	Community (free) / Enterprise ($$$)	Free for public repos / Advanced Security (paid)
Architecture quality	Coupling, god files, layer violations, cycles, facades, design coherence	Basic duplications, complexity	Security-focused queries
Subjective review	✅ 12 dimensions, 60% of score	❌	❌
Anti-gaming	✅ wontfix penalty, target detection, fail-closed import, suspect drops	⚡ Basic (quality gate thresholds)	❌
Persistent state	✅ File-based, no server needed	✅ Server-based database	❌ No state between runs
Ranked queue	✅ Tier-weighted, confidence-adjusted	❌ Issues sorted by severity	❌
Agent-first	✅ Skill system, CLI-native	❌ Web UI focused	❌
Language count	28 (6 full + 22 generic)	30+	~15
New-code gate	✅ Line-scoped with policy profiles	✅ Quality Gate (metric-based)	✅ PR check annotations
Move/rewrite	✅ File relocation + import rewriting	❌	❌
Setup time	`pip install structorium` (30 seconds)	30-60 minutes (server setup)	10-30 minutes (workflow config)

Summary: SonarQube is a server — good for enterprises with ops capacity. CodeQL is security-focused. Structorium is a CLI-native operating system designed for solo developers, small teams, and AI agents.

🤖 Structorium vs AI Code Review Tools

Capability	Structorium	DeepSource	Snyk Code	CodeRabbit
Architecture	CLI (zero infra)	Cloud-hosted	Cloud-hosted	GitHub App
State	✅ Persistent file-based	✅ Cloud database	❌	❌
Subjective review	✅ 12 structured dimensions	❌ Rule-based	❌ Rule-based	✅ Unstructured LLM review
Anti-gaming	✅ 5 mechanisms	❌	❌	❌
Ranked queue	✅	❌	❌	❌
Structured scoring	✅ 4 types, formula-driven	⚡ Basic metrics	❌	❌
Review memory	✅ Incremental, session-aware	❌	❌	❌
Fail-closed import	✅ Invalid reviews rejected entirely	N/A	N/A	N/A
Privacy	✅ Local-first (AI context is opt-in)	❌ Cloud analysis	❌ Cloud analysis	❌ Cloud analysis
Agent skill system	✅ 7 agents supported	❌	❌	❌

Summary: DeepSource and Snyk are cloud analyzers — they see your code. CodeRabbit does unstructured LLM review. Structorium does structured, dimension-scoped, fail-closed review with full state persistence and anti-gaming. And it runs locally.

🧹 Structorium vs AI Slop Catchers

Capability	Structorium	Grain	Sloppylint	QodoAI
Scope	Full codebase quality OS	AI-generated code detection	AI slop detection	AI code review
Detectors	30+ (mechanical + subjective)	Focused on AI patterns	Focused on AI patterns	LLM-based review
State persistence	✅	❌	❌	❌
Multi-language	28	Limited	Limited	Limited
Scoring	4 types with anti-gaming	Basic	Basic	LLM-based
AI debt detection	✅ "ai generated debt" dimension (weight 1.0)	✅ Primary focus	✅ Primary focus	⚡
CI enforcement	✅ New-code gate	⚡	⚡	⚡
Non-AI quality	✅ Full architecture analysis	❌ AI-only	❌ AI-only	⚡

Summary: Slop catchers handle one problem (detecting AI-generated code). Structorium handles all quality problems — including AI debt as one of 12 subjective dimensions.

🧠 AI Model Comparison Matrix

Structorium's review system works with multiple AI providers. Here's how different models perform in subjective review:

Dimension	GPT-4o	Claude 3.5 Sonnet	Codex	Gemini 2.5 Pro
High elegance	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
Contracts	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
Type safety	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
Design coherence	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
AI debt detection	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐

Note: Model quality varies by codebase. Structorium's fail-closed import validation catches low-quality reviews regardless of model. Use --parallel with batch splitting to reduce variance.

PART 10 — INTERNALS & REFERENCE

📖 Detector Registry — Complete Reference

Every detector in Structorium is registered in core/registry.py. This is the canonical source:

Auto-Fix Detectors (T1)

Detector	Display	Dimension	Fixers	Guidance
`unused`	unused	Code quality	`unused-imports`, `unused-vars`, `unused-params`	Remove unused imports and variables
`logs`	logs	Code quality	`debug-logs`	Remove debug logs
`smells`	smells	Code quality	`dead-useeffect`, `empty-if-chain`	Fix code smells

Reorganize Detectors (T2 — use `move`)

Detector	Display	Dimension	Tool	Guidance
`orphaned`	orphaned	Code quality	`move`	Delete dead files or relocate
`flat_dirs`	flat dirs	Code quality	`move`	Create subdirectories
`naming`	naming	Code quality	`move`	Rename files to fix conventions
`single_use`	single_use	Code quality	`move`	Inline or relocate
`coupling`	coupling	Code quality	`move`	Fix boundary violations
`cycles`	cycles	Security	`move`	Break circular dependencies
`facade`	facade	Code quality	`move`	Flatten re-export facades

Refactor Detectors (T3)

Detector	Display	Dimension	Judgment?	Guidance
`structural`	structural	File health	✅	Decompose large files
`props`	props	Code quality	✅	Split bloated components
`react`	react	Code quality	✅	Refactor React antipatterns
`dupes`	dupes	Duplication	✅	Extract shared utility
`patterns`	patterns	Code quality	✅	Align to single pattern
`dict_keys`	dict keys	Code quality	✅	Fix dict key mismatches
`test_coverage`	test coverage	Test health		Add tests for untested modules
`signature`	signature	Code quality	✅	Consolidate inconsistent signatures
`responsibility_cohesion`	responsibility cohesion	Code quality	✅	Split modules with too many responsibilities
`boilerplate_duplication`	boilerplate duplication	Duplication	✅	Extract shared boilerplate
`uncalled_functions`	uncalled functions	Code quality	✅	Remove dead functions
`concerns`	design concerns	Design coherence		Address design concerns from review

Manual Fix Detectors (T4)

Detector	Display	Dimension	Guidance
`exports`	exports	Code quality	Run `knip --fix` to remove dead exports
`deprecated`	deprecated	Code quality	Remove deprecated symbols or migrate callers
`stale_exclude`	stale exclude	Code quality	Remove stale exclusion or verify it's still needed
`global_mutable_config`	global mutable config	Code quality	Refactor module-level mutable state
`private_imports`	private imports	Code quality	Stop importing private symbols across boundaries
`layer_violation`	layer violation	Code quality	Fix architectural layer violations
`security`	security	Security	Review and fix security findings
`stale_wontfix`	stale wontfix	Code quality	Re-evaluate old wontfix decisions

Review Detectors

Detector	Display	Dimension	Guidance
`review`	design review	Test health	Address design quality findings from AI review
`subjective_review`	subjective review	Test health	Run `structorium review --prepare`

Source: core/registry.py (lines 59-321)

🗺️ Command Atlas

Structorium provides 17 commands organized by workflow mode:

Workflow Commands (Day-to-Day)

Command	What It Does	Key Flags
`structorium scan`	Run all detectors, merge findings, compute scores	`--path`, `--profile`, `--lang`, `--skip-slow`
`structorium status`	Score dashboard, dimension health, finding summary	`--gate-check`, `--json`
`structorium next`	Surface highest-priority finding	`--count`, `--tier`, `--cluster`, `--explain`
`structorium fix`	Run auto-fixers	`<fixer-name>`, `--dry`

Investigation Commands

Command	What It Does	Key Flags
`structorium show`	Show findings for a specific file	`<filepath>`
`structorium tree`	Annotated codebase tree with finding counts	`--depth`, `--zone`
`structorium diff`	Show score/finding changes between states	`--before`, `--after`
`structorium langs`	List all detected language plugins	—

Review Commands

Command	What It Does	Key Flags
`structorium review --prepare`	Generate review query packet	`--retrospective`, `--files`
`structorium review --run-batches`	Execute review with AI runner	`--runner`, `--parallel`
`structorium review --import`	Import review results (fail-closed)	`<filepath>`
`structorium review --external-start`	Start external review session	`--external-runner`

Enforcement Commands

Command	What It Does	Key Flags
`structorium plan`	Full prioritized work queue	`queue`, `done`, `move`, `defer`, `skip`, `reopen`
`structorium plan cluster`	Cluster operations	`create`, `delete`, `list`
`structorium plan focus/unfocus`	Scope `next` to a cluster	`<cluster-name>`

Admin Commands

Command	What It Does	Key Flags
`structorium config`	View/set configuration	`set`, `get`, `list`
`structorium update-skill`	Install/update agent skill document	`<agent-name>`
`structorium move`	Relocate file + rewrite imports	`<source>`, `<destination>`
`structorium zone`	Manage zone classifications	`show`, `set`, `clear`
`structorium scorecard`	Generate quality badge image	`--output`

Source: app/cli_support/parser.py

🗂️ Repository Structure

structorium/
├── app/                           # Application layer — CLI entry point
│   ├── cli_support/               # Parser, output formatting, terminal
│   │   ├── parser.py              # Argument parser (17 commands)
│   │   └── output.py              # Terminal output formatting
│   ├── commands/                   # Command implementations
│   │   ├── scan/                   # Scan workflow and reporting
│   │   ├── plan/                   # Plan operations (queue, clusters)
│   │   ├── review/                 # Review preparation, batching, import
│   │   └── ...                     # next, show, tree, fix, move, etc.
│   └── main.py                    # CLI entry point
├── core/                           # Shared core — enums, registry, types
│   ├── registry.py                 # Canonical detector registry (30+ detectors)
│   ├── enums.py                    # Tier, Confidence, ScoreMode enums
│   └── types.py                    # Shared type definitions
├── engine/                         # Engine — scoring, state, detection
│   ├── _scoring/                   # Score computation
│   │   ├── policy/                 # Scoring policies, weights, dimensions
│   │   │   └── core.py             # THE canonical scoring config
│   │   └── compute.py              # Score calculation logic
│   ├── _work_queue/                # Priority queue and ranking
│   ├── planning/                   # Plan operations and cluster management
│   ├── policy/                     # Zone policies
│   └── detection/                  # Detector execution and orchestration
├── intelligence/                   # AI layer — context, gate, integrity
│   ├── ai/                         # AI context enrichment providers
│   ├── new_code_gate.py            # CI new-code gate evaluation
│   └── integrity.py                # Anti-gaming integrity checks
├── languages/                      # Language plugins (28 languages)
│   ├── _framework/                 # Shared framework
│   │   ├── base/                   # Core contracts and phase builders
│   │   ├── treesitter/             # Optional tree-sitter integration
│   │   └── generic.py              # generic_lang() factory
│   ├── python/                     # Full plugin — ruff, bandit, 6 fixers
│   ├── typescript/                 # Full plugin — knip, biome, 7 fixers
│   ├── csharp/                     # Full plugin — dotnet analyzers
│   ├── dart/                       # Full plugin — dart analyze, flutter
│   ├── gdscript/                   # Full plugin — gdtoolkit
│   ├── go/                         # Full plugin — golangci-lint
│   ├── rust/                       # Generic plugin — cargo clippy
│   ├── ruby/                       # Generic plugin — rubocop
│   ├── java/                       # Generic plugin — checkstyle, pmd
│   └── ...                         # +19 more generic plugins
├── skills/                         # Agent skill documents (7 agents)
├── .structorium/                   # State directory (created on first scan)
│   ├── state.json                  # Persistent findings and scores
│   ├── reviews/                    # Review packets and results
│   └── config.toml                 # Project configuration
└── docs/                           # Documentation

⚙️ Configuration Reference

Configuration is stored in .structorium/config.toml and managed via structorium config:

Key	Default	What It Controls
`new_code_gate_enabled`	`false`	Enable/disable CI gate
`new_code_gate_policy`	`standard`	Policy profile (strict/standard/ai_generated_code)
`new_code_gate_base_ref`	`origin/main`	Git base ref for diff
`new_code_gate_max_new_findings`	(policy default)	Max new findings before gate fails
`new_code_gate_max_new_high`	`0`	Max T3+ findings
`new_code_gate_max_new_critical`	`0`	Max T4 findings
`ai_embedding_model`	`text-embedding-3-large`	OpenAI embedding model
`ai_reranker_model`	`rerank-v4.0`	Cohere reranker model
`exclude_patterns`	`[]`	Path exclusion patterns

💾 State Schema Deep Dive

The .structorium/state.json file is the persistent state file. Key fields:

{
  "version": "1.0",
  "last_scan": "2025-03-25T12:00:00Z",
  "scores": {
    "overall": 72.3,
    "objective": 78.1,
    "strict": 64.8,
    "verified_strict": 61.2
  },
  "dimension_scores": {
    "file_health": { "score": 82.0, "checks": 450, "failures": 81 },
    "code_quality": { "score": 54.0, "checks": 1200, "failures": 552 },
    // ... all 17 dimensions
  },
  "findings": [
    {
      "id": "unused::src/api/routes.ts::React",
      "detector": "unused",
      "file": "src/api/routes.ts",
      "line": 1,
      "detail": "unused import: React",
      "tier": 1,
      "confidence": "high",
      "status": "open",           // open | fixed | wontfix | false_positive
      "first_seen": "2025-03-20T10:00:00Z",
      "last_seen": "2025-03-25T12:00:00Z",
      "note": null,
      "attestation": null
    }
    // ... all findings
  ]
}

State Merge Semantics

On each scan, state merging follows these rules:

New findings (not in state) → added with status open
Existing findings (still detected) → last_seen updated, status preserved
Gone findings (not detected, status = open) → resolved automatically
Resolved findings (status ≠ open) → preserved regardless of scan results
Suspect drops (detector went from many findings to 0) → held, not auto-resolved

PART 11 — FORMULA APPENDIX

📊 Score Composition Formulas

Dimension Pass Rate

For each dimension, the pass rate is computed as:

pass_rate = 1 - (weighted_failures / potential)

weighted_failures = Σ (tier_weight × confidence_weight) for each failing finding
potential = total_checks in that dimension

For file-based detectors (smells, dict_keys, test_coverage, security, subjective_review):

weighted_failure_per_file = min(sum_of_findings_in_file, 1.0)
# Capped at 1.0 per file to match file-based denominator

Pool Score

mechanical_pool = Σ (dimension_pass_rate × effective_weight) / Σ effective_weight
    where effective_weight = base_weight × min(checks / MIN_SAMPLE, 1.0)
    for each mechanical dimension

subjective_pool = Σ (dimension_score × weight) / Σ weight
    for each subjective dimension

Overall Score

overall = (MECHANICAL_WEIGHT_FRACTION × mechanical_pool
         + SUBJECTIVE_WEIGHT_FRACTION × subjective_pool) × 100

Score Mode Selection

# What counts as "failure" depends on the mode:
lenient:         {"open"}
strict:          {"open", "wontfix"}
verified_strict: {"open", "wontfix", "fixed", "false_positive"}

📈 Detection Pass Rate

LOC-Weighted Detectors

For test_coverage, findings are weighted by source lines of code:

loc_weight = file_loc / total_project_loc
weighted_failure = loc_weight × finding_weight

This ensures that a missing test for a 500-line module hurts the score more than a missing test for a 20-line module.

Zone Exclusions

Security findings in excluded zones (test, config, generated, vendor) are not counted in the score:

SECURITY_EXCLUDED_ZONES = frozenset({"test", "config", "generated", "vendor"})

🚦 New-Code Gate Threshold Logic

# Pseudocode from intelligence/new_code_gate.py
def evaluate_gate(findings, changed_ranges, policy):
    in_scope = [f for f in findings if f.file in changed_ranges
                and f.line in changed_ranges[f.file]]
    
    new_count = len(in_scope)
    new_high = len([f for f in in_scope if f.tier >= 3])
    new_critical = len([f for f in in_scope if f.tier >= 4])
    
    blocked_hits = [f for f in in_scope
                    if f.detector in policy.blocked_detectors]
    
    if new_count > policy.max_new_findings: return FAIL
    if new_high > policy.max_new_high: return FAIL
    if new_critical > policy.max_new_critical: return FAIL
    if blocked_hits: return FAIL
    
    return PASS

🔗 Temporal Coupling

Temporal coupling is computed from git history:

coupling_score(file_a, file_b) = co_change_count / max(change_count_a, change_count_b)

A score of 0.8 means "these two files change together 80% of the time"
High temporal coupling suggests latent architectural coupling
Used to enrich review context: "This file has 82% temporal coupling with state.py"

PART 12 — OPERATIONAL GUIDES & CLOSURE

🎯 Real Operator Scenarios

Scenario 1: Solo Developer Starting Cleanup

# Day 1: Baseline
pip install structorium
structorium scan --path .
structorium status              # See where you stand

# Day 1: Quick wins (auto-fix)
structorium fix unused-imports
structorium fix debug-logs
structorium scan --path .       # Score should improve
structorium status              # Verify improvement

# Day 2: Address highest priority
structorium next
# Fix the issue
structorium plan done "<id>" --note "..." --attest "I have actually..."
structorium next
# Repeat 3-5 times per session

# Day 3+: Systematic
structorium plan cluster create "api-boundary"
structorium plan focus "api-boundary"
structorium next                # Only api-boundary items
# Work through the cluster

Scenario 2: Team Enabling CI Gate

# Step 1: Run initial scan, see the landscape
structorium scan --path . --profile ci
structorium status

# Step 2: Start with standard policy (3 new findings allowed)
structorium config set new_code_gate_enabled true
structorium config set new_code_gate_policy standard

# Step 3: Add to CI (GitHub Actions example)
# See CI Integration Playbook above

# Step 4: After 2 weeks, tighten to strict (0 new findings)
structorium config set new_code_gate_policy strict

# Step 5: Monitor the gap (overall vs strict)
structorium status
# If gap grows > 5: review wontfix decisions

Scenario 3: AI Agent Autonomous Cleanup Loop

# Agent runs this loop automatically:
while score < target:
    structorium scan --path .
    structorium next --explain
    # Agent reads guidance, applies fix
    structorium plan done "<id>" --note "<what agent did>" \
        --attest "I have actually applied the fix and verified compilation"
    structorium scan --path .   # Verify improvement

Scenario 4: Adding Structorium to an Existing Large Codebase

# Step 1: Initial full scan — expect a lot of findings
structorium scan --path .
# Finding: 400+ findings. Score: 35. Don't panic.

# Step 2: Auto-fix everything auto-fixable
structorium fix unused-imports
structorium fix unused-vars
structorium fix debug-logs
structorium fix dead-useeffect
structorium fix empty-if-chain
structorium scan --path .
# Finding count drops. Score improves. Momentum.

# Step 3: Cluster by module
structorium plan cluster create "engine"
structorium plan cluster create "api"
structorium plan cluster create "utils"
# Assign findings to clusters based on file paths

# Step 4: Focus and work through one cluster at a time
structorium plan focus "engine"
structorium next --count 10     # See the work for this cluster
# Work through it

# Step 5: Enable gate once score is above 50
structorium config set new_code_gate_enabled true
structorium config set new_code_gate_policy standard
# New code must be clean, legacy cleaned up at own pace

⚠️ Known Failure Modes & Edge Cases

Scenario	What Happens	Mitigation
External tool not installed	Detector runs with reduced confidence. Warning emitted. Scan completes.	Install the tool for full detection. `structorium langs` shows tool status.
Very large monorepo (10K+ files)	Scan may be slow. Score may be dominated by one language.	Use `--exclude` for vendor/generated. Use `--lang` to scope.
Review runner produces garbage	Fail-closed import validation rejects the import entirely. No state corruption.	Re-run the review with a better model or different batch split.
Git history too shallow	Temporal coupling analysis has no data.	Use `fetch-depth: 0` in CI.
New language not detected	Files skipped in scan.	Use `--lang <name>` to force. Or create a generic plugin.
State file corrupted	Structorium creates a backup before each write. Restore from `.structorium/state.json.bak`.	If both corrupted: delete and rescan. Findings rebuilt fresh.
Score plateaus	Remaining findings are all T3/T4 requiring judgment.	Use `review` to get subjective assessment. Focus clusters help.

🛠️ Development & Contributing

Local Development Setup

git clone https://github.com/your-org/structorium.git
cd structorium
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[full,dev]"
pytest

Running Tests

pytest                          # Full test suite
pytest tests/unit/              # Unit tests only
pytest tests/integration/       # Integration tests only
pytest -k "test_scoring"        # Filter by test name

Architecture Rules for Contributors

Never import from languages/ into core/ or engine/ — plugins depend on the engine, not vice versa
Never import between language plugins — languages/python/ cannot import from languages/typescript/
All detectors go through core/registry.py — never create ad-hoc detector lists
All scoring policies go through engine/_scoring/policy/core.py — never hardcode weights elsewhere
State mutations go through the state manager — never write to state.json directly

❓ FAQ

Does Structorium send my code to any cloud service?

No — by default, everything runs locally. The AI context layer (OpenAI, Cohere, Turbopuffer, Neo4j) is opt-in. If you don't configure API keys, no code leaves your machine.

How is this different from just running ruff/eslint/rubocop?

Linters find rule violations. Structorium wraps linters (it runs ruff, bandit, knip, etc. internally) and adds: persistent state, ranked priority queue, 4-type scoring with anti-gaming, subjective AI review (12 dimensions), and CI enforcement via new-code gate. It's the operating system that sits on top of linters.

Can I use Structorium without AI review?

Yes. Run structorium scan --profile objective for mechanical-only analysis. You still get state persistence, scoring, ranking, auto-fixers, and CI gating — just without the 60% subjective component.

How long does a scan take?

Depends on codebase size and enabled detectors:

Small project (50 files): 5-15 seconds
Medium project (500 files): 30-90 seconds
Large project (5000 files): 3-10 minutes
Use --skip-slow for faster iteration during development

Can I add my own custom detectors?

Yes. For full plugins, add a detector to languages/<name>/detectors/. For generic plugins, extend the generic factory. Register in core/registry.py. Register scoring policy in engine/_scoring/policy/core.py.

What happens if I delete .structorium/state.json?

You lose all tracked state — findings, resolution history, and score progression. The next scan rebuilds state from scratch with all findings as open. It's like starting over.

Does the strict score ever go down?

Yes — if you mark things wontfix (they still fail strict), if new findings appear on rescan, or if a review import adds new subjective concerns. The gap between overall and strict widening is a diagnostic signal.

Can I run Structorium on a monorepo?

Yes. Use --path to scope scans to specific directories, --exclude for vendor/generated code, and --lang to focus on specific languages.

🗺️ Roadmap

Phase	Focus	Status
v1.0	Core loop (scan → state → next → fix → score)	✅ Released
v1.1	CI gate, policy profiles, zone classifications	✅ Released
v1.2	Subjective review system (prepare → batch → import)	✅ Released
v1.3	AI context layer (embeddings, vectors, graph, temporal)	✅ Released
v1.4	Generic plugin system (22 languages), tree-sitter integration	✅ Released
v2.0	Team features: shared state, PR annotations, dashboard	🔄 In progress
v2.1	Custom detector SDK, review runner marketplace	📋 Planned
v2.2	IDE integration (VS Code, JetBrains)	📋 Planned

📜 License & Community

License: MIT

Structorium is open source and free for commercial use. Contributions welcome.

MIT License

Copyright (c) 2025 Structorium Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

Architecture. Automatically Enforced.
Scan once, track forever, improve measurably.

↑ Back to top

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.github		.github
app		app
assets/readme		assets/readme
core		core
docs		docs
engine		engine
intelligence		intelligence
languages		languages
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_TECHNICAL.md		README_TECHNICAL.md
__init__.py		__init__.py
__main__.py		__main__.py
cli.py		cli.py
conftest.py		conftest.py
docker-compose.neo4j.yml		docker-compose.neo4j.yml
file_discovery.py		file_discovery.py
hook_registry.py		hook_registry.py
pyproject.toml		pyproject.toml
scoring.py		scoring.py
search.py		search.py
state.py		state.py
utils.py		utils.py
versioning.py		versioning.py

Folders and files

Latest commit

History

Repository files navigation