From 1d1b5fbc5cf3cdbdd0f6affb7ece0a527eff65db Mon Sep 17 00:00:00 2001
From: "Carlos D. Escobar-Valbuena" <devteam@getstimulus.ai>
Date: Wed, 3 Jun 2026 22:28:13 -0500
Subject: [PATCH] docs: sync README to current contract + surface bench + P11
 cross-ref (0.23.2)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Post-ship audit of BRO-1205 (bench MVP) + BRO-1211 (Databricks live mode)
found the canonical surfaces current (SKILL.md, references/provider-standards.md,
CHANGELOG, spec, KG research entities) but README.md frozen at the P11 era —
a CLAUDE.md Self-Documenting Standards rule #3 violation (counts must match
SKILL.md, the authoritative source). bstack doctor does not lint README, so
this rot was not CI-enforced.

README.md:
- "Eleven irreducible primitives" → twenty; table extended P1-P11 → full
  P1-P20 (wording from SKILL.md's enforcement table)
- "28 curated skills" → 30 (matches SKILL.md) across intro + Stack-layers
  header + bootstrap description
- Commands: was six (bootstrap/doctor/repair/status/validate/revamp); now
  also documents bench, wave, crystallize, metrics, skills under an
  "Orchestration & observability" subsection; bench links to
  references/provider-standards.md
- Reasoning-enforced set corrected (P6, P9-P20) vs mechanism-enforced
  (P1,P2,P4,P5,P7,P8); closing narrative "eleven" → "twenty"

references/primitives.md:
- P11 Empirical Feedback Loop now lists `bstack bench` as the dedicated P11
  measurement substrate (table row + paragraph), cross-referencing
  provider-standards.md + the bench spec

No code, no behavior change. Companion KG artifact (workspace repo):
research/entities/pattern/openai-compatible-provider-abstraction.md.

Out of scope (separate audit): deeper skill-count reconciliation (SKILL.md
30 curated vs companion-skills.yaml 65 full roster). README follows SKILL.md
per rule #1.

Ticket: BRO-1376

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md             | 16 ++++++++++++++++
 README.md                | 31 ++++++++++++++++++++++++-------
 VERSION                  |  2 +-
 references/primitives.md |  3 +++
 4 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 729ecc7..156f64f 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,21 @@
 # Changelog
 
+## 0.23.2 — 2026-06-04
+
+### docs: README sync to current contract + bench surfaced + P11 cross-reference (BRO-1376)
+
+Closes a documentation-reconciliation gap surfaced by a post-ship audit of BRO-1205 (bench MVP) + BRO-1211 (Databricks live mode). The canonical surfaces (SKILL.md, `references/provider-standards.md`, CHANGELOG, spec, KG research entities) were already current; the public `README.md` was frozen at the P11 era and violated the CLAUDE.md Self-Documenting Standards rule #3 (counts must match SKILL.md as the authoritative source). `bstack doctor` does not lint the README, so the rot was not CI-enforced.
+
+- **CHANGED** `README.md` — synced to the current contract:
+  - "Eleven irreducible primitives" → **twenty**; primitive table extended P1–P11 → full **P1–P20** (wording from SKILL.md's enforcement table).
+  - "28 curated skills" → **30** (matches SKILL.md authoritative count) across intro, Stack-layers header, and bootstrap description.
+  - Commands section: was six (bootstrap/doctor/repair/status/validate/revamp); now also documents **`bench`**, `wave`, `crystallize`, `metrics`, `skills` under an "Orchestration & observability" subsection. `bench` links to `references/provider-standards.md`.
+  - Reasoning-enforced primitive set corrected (P6, P9–P20) vs mechanism-enforced (P1, P2, P4, P5, P7, P8); closing narrative "eleven" → "twenty".
+- **CHANGED** `references/primitives.md` — P11 Empirical Feedback Loop section now lists `bstack bench` as the dedicated P11 *measurement* substrate (table row + paragraph), cross-referencing `provider-standards.md` and the bench spec.
+- **NOTE** Out of scope (separate audit): the deeper skill-count reconciliation (SKILL.md says 30 curated; `companion-skills.yaml` lists 65 full roster incl. optional). README follows SKILL.md per rule #1.
+
+No code, no behavior change. Companion KG artifact (workspace repo, not this repo): `research/entities/pattern/openai-compatible-provider-abstraction.md` documents the provider-abstraction architecture that BRO-1211 introduced.
+
 ## 0.23.1 — 2026-06-01
 
 ### docs: P6 reflex tightening — "a reflex, not a request, **and never a question**" (BRO-1288)
diff --git a/README.md b/README.md
index 92b46d4..8acf031 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 # bstack — The Broomva Stack
 
-**A portable harness metalayer for AI-native development.** Eleven irreducible primitives plus 28 curated agent skills that turn any agent-driven workspace into a self-operating system.
+**A portable harness metalayer for AI-native development.** Twenty irreducible primitives plus 30 curated agent skills that turn any agent-driven workspace into a self-operating system.
 
 ```bash
 npx skills add broomva/bstack
@@ -8,7 +8,7 @@ npx skills add broomva/bstack
 
 This installs the meta-skill that bootstraps the full stack — primitive contract, governance scaffolding, hooks, and skill roster — into your project. Works with Claude Code, Codex, Gemini CLI, OpenCode, and the [50+ agent CLIs the skills ecosystem supports](https://github.com/vercel-labs/skills).
 
-## The eleven primitives
+## The twenty primitives
 
 Each primitive closes one specific failure mode that drifts into entropy in unsupervised agent sessions.
 
@@ -25,12 +25,21 @@ Each primitive closes one specific failure mode that drifts into entropy in unsu
 | **P9** | Productive Wait (`broomva/p9` skill) | sleep-on-wait dead time (CI, deploys, builds — PR CI is the canonical case) |
 | **P10** | Worktree Hygiene Discipline | dirty trees and orphan worktrees compounding across sessions |
 | **P11** | Empirical Feedback Loop | shipping code that compiles but doesn't actually work when exercised |
+| **P12** | Persistent Loop Discipline (`broomva/persist` skill) | long-horizon work decaying as the context window rots |
+| **P13** | Dream Cycle Discipline | tier-crossing consolidation corrupting upper-tier rules without replay (the *shadow dream* failure mode) |
+| **P14** | Dependency-Chain Reasoning Discipline | "think deeply through chain of dependencies" becoming ritual without concrete upstream/downstream enumeration |
+| **P15** | State-Snapshot Before Action | plans built on stale state (uncommitted work, in-flight PRs, stale deploys) |
+| **P16** | Crystallization Discipline (the Bstack Engine) | recurring valuable patterns living only in the user's head, never promoted to infrastructure |
+| **P17** | Lens-Routed Request Articulation (`broomva/role-x` skill) | flat-dispatch fan-out failing to load the domain context that shapes the correct quality bar |
+| **P18** | Format-Follows-Audience Discipline | markdown-by-default regardless of audience; specs nobody reads; ASCII pseudo-diagrams where SVG-in-HTML belongs |
+| **P19** | Orchestration-Mechanism Selection Discipline | implicit between-reflex handoffs ("continue please"); wrong mechanism for the work shape |
+| **P20** | Cross-Model Adversarial Review Gate (`broomva/cross-review` skill) | same-model echo chamber; writer self-validates own work; AI slop merged with no independent evaluator |
 
 Full reference with reflexive trigger rules, invariants, and cohesion narrative: **[references/primitives.md](references/primitives.md)**.
 
-P6, P9, P10, and P11 are *reasoning-enforced* — they bind every agent through reflexive trigger rules in `AGENTS.md` rather than through hooks. The other primitives are mechanism-enforced through hooks, scripts, or CI gates.
+The majority of primitives (P6, P9–P20) are *reasoning-enforced* — they bind every agent through reflexive trigger rules in `AGENTS.md` rather than through hooks. The mechanism-enforced primitives (P1, P2, P4, P5, P7, P8) run through hooks, scripts, or CI gates.
 
-## Stack layers (28 skills)
+## Stack layers (30 skills)
 
 | Layer | Skills | Purpose |
 |-------|--------|---------|
@@ -44,15 +53,23 @@ P6, P9, P10, and P11 are *reasoning-enforced* — they bind every agent through
 
 ## Commands
 
-Once installed, the skill exposes six commands:
+Once installed, the skill exposes these commands:
 
-- **`bootstrap`** — install all 28 skills + scaffold governance (CLAUDE.md, AGENTS.md, `.control/policy.yaml`) + wire hooks + run doctor
+**Lifecycle**
+- **`bootstrap`** — install all 30 skills + scaffold governance (CLAUDE.md, AGENTS.md, `.control/policy.yaml`) + wire hooks + run doctor
 - **`doctor`** — verify primitive contract compliance (always exits 0 by default; `--strict` for CI)
 - **`repair`** — apply targeted fixes for gaps the doctor surfaces
 - **`status`** — show installed-vs-missing skills + harness health
 - **`validate`** — check skill SKILL.md frontmatter health
 - **`revamp`** — full reconfiguration: force-reinstall + rewire + re-doctor
 
+**Orchestration & observability**
+- **`wave`** — Orchestrate (P19) parallel sub-phase dispatch: one background agent + worktree per plan file
+- **`crystallize`** — Crystallize (P16) rule-of-three candidate detector over conversation logs
+- **`metrics`** — setpoint measurement pipeline (collect / observe)
+- **`skills`** — companion-skill roster manager (install / status / list)
+- **`bench`** — Empirical (P11) skill-evolution benchmark: two-phase cold→warm runs with pluggable LLM providers (OpenAI-compatible; Databricks Gateway built in). See [references/provider-standards.md](references/provider-standards.md).
+
 ## Governance & stability
 
 bstack's governance layer (`CLAUDE.md` + `AGENTS.md` + `.control/policy.yaml`) is the **Level 3 controller** in a [Recursive Controlled Systems hierarchy](https://broomva.tech/writing/recursive-controlled-systems) with formal stability proofs. The L3 stability margin is narrow on purpose — governance changes consume budget, so the contract evolves slowly and deliberately.
@@ -72,7 +89,7 @@ Interactive catalog with descriptions, install commands, and layer diagrams:
 
 **[broomva.tech/skills](https://broomva.tech/skills)**
 
-The narrative on what bstack is, why it exists, and what the eleven primitives buy you in measured throughput is at:
+The narrative on what bstack is, why it exists, and what the twenty primitives buy you in measured throughput is at:
 
 **[broomva.tech/writing/bstack-portable-harness-metalayer](https://broomva.tech/writing/bstack-portable-harness-metalayer)**
 
diff --git a/VERSION b/VERSION
index 610e287..fda96dc 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-0.23.1
+0.23.2
diff --git a/references/primitives.md b/references/primitives.md
index f3145ff..8033ef2 100644
--- a/references/primitives.md
+++ b/references/primitives.md
@@ -216,9 +216,12 @@ Mental checklist: *Did I decide on a worktree? Is `git status` clean? Are merged
 | Deploy verification | Vercel preview URL → screenshot via `gstack` | After CI green, before claiming "shipped" |
 | Audio diff | TTS comparison | When narration changes |
 | Multi-agent observation | Parallel `Agent` calls watching different surfaces | Long-running work |
+| Skill-evolution benchmark | `bstack bench` — two-phase cold→warm runs + rubric/LLM-judge over pluggable providers | Measuring whether a skill/primitive change actually cuts tokens or lifts quality |
 
 The agent picks the right subset, runs as parallel watchers via `run_in_background` where applicable, and **captures evidence** — not just exit codes, but actual screenshots, log snippets, response bodies, browser transcripts.
 
+`bstack bench` is the dedicated P11 *measurement* substrate: it turns "this primitive reduces token waste" from an assertion into a falsifiable number. Two-phase protocol (Phase 1 cold skills → snapshot → Phase 2 warm), pluggable LLM providers via the OpenAI-compatible contract (Databricks Gateway built in), and a P20-enforced judge-model-isolation gate. See [provider-standards.md](provider-standards.md) and `specs/bench-skill-evolution.md`.
+
 **Invariant**: before claiming any work *complete*, the agent has interacted with the deployed/running version (or stated explicitly why interaction wasn't possible). The interaction is captured (screenshot, log snippet, video clip, terminal output, response body) and surfaced in the response. *Reasoning isn't validation; interaction is.*
 
 ### P11 Reflexive Trigger Rule (binding on every agent)