From 1d1b5fbc5cf3cdbdd0f6affb7ece0a527eff65db Mon Sep 17 00:00:00 2001 From: "Carlos D. Escobar-Valbuena" Date: Wed, 3 Jun 2026 22:28:13 -0500 Subject: [PATCH] docs: sync README to current contract + surface bench + P11 cross-ref (0.23.2) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Post-ship audit of BRO-1205 (bench MVP) + BRO-1211 (Databricks live mode) found the canonical surfaces current (SKILL.md, references/provider-standards.md, CHANGELOG, spec, KG research entities) but README.md frozen at the P11 era — a CLAUDE.md Self-Documenting Standards rule #3 violation (counts must match SKILL.md, the authoritative source). bstack doctor does not lint README, so this rot was not CI-enforced. README.md: - "Eleven irreducible primitives" → twenty; table extended P1-P11 → full P1-P20 (wording from SKILL.md's enforcement table) - "28 curated skills" → 30 (matches SKILL.md) across intro + Stack-layers header + bootstrap description - Commands: was six (bootstrap/doctor/repair/status/validate/revamp); now also documents bench, wave, crystallize, metrics, skills under an "Orchestration & observability" subsection; bench links to references/provider-standards.md - Reasoning-enforced set corrected (P6, P9-P20) vs mechanism-enforced (P1,P2,P4,P5,P7,P8); closing narrative "eleven" → "twenty" references/primitives.md: - P11 Empirical Feedback Loop now lists `bstack bench` as the dedicated P11 measurement substrate (table row + paragraph), cross-referencing provider-standards.md + the bench spec No code, no behavior change. Companion KG artifact (workspace repo): research/entities/pattern/openai-compatible-provider-abstraction.md. Out of scope (separate audit): deeper skill-count reconciliation (SKILL.md 30 curated vs companion-skills.yaml 65 full roster). README follows SKILL.md per rule #1. Ticket: BRO-1376 Co-Authored-By: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 16 ++++++++++++++++ README.md | 31 ++++++++++++++++++++++++------- VERSION | 2 +- references/primitives.md | 3 +++ 4 files changed, 44 insertions(+), 8 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 729ecc7..156f64f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,21 @@ # Changelog +## 0.23.2 — 2026-06-04 + +### docs: README sync to current contract + bench surfaced + P11 cross-reference (BRO-1376) + +Closes a documentation-reconciliation gap surfaced by a post-ship audit of BRO-1205 (bench MVP) + BRO-1211 (Databricks live mode). The canonical surfaces (SKILL.md, `references/provider-standards.md`, CHANGELOG, spec, KG research entities) were already current; the public `README.md` was frozen at the P11 era and violated the CLAUDE.md Self-Documenting Standards rule #3 (counts must match SKILL.md as the authoritative source). `bstack doctor` does not lint the README, so the rot was not CI-enforced. + +- **CHANGED** `README.md` — synced to the current contract: + - "Eleven irreducible primitives" → **twenty**; primitive table extended P1–P11 → full **P1–P20** (wording from SKILL.md's enforcement table). + - "28 curated skills" → **30** (matches SKILL.md authoritative count) across intro, Stack-layers header, and bootstrap description. + - Commands section: was six (bootstrap/doctor/repair/status/validate/revamp); now also documents **`bench`**, `wave`, `crystallize`, `metrics`, `skills` under an "Orchestration & observability" subsection. `bench` links to `references/provider-standards.md`. + - Reasoning-enforced primitive set corrected (P6, P9–P20) vs mechanism-enforced (P1, P2, P4, P5, P7, P8); closing narrative "eleven" → "twenty". +- **CHANGED** `references/primitives.md` — P11 Empirical Feedback Loop section now lists `bstack bench` as the dedicated P11 *measurement* substrate (table row + paragraph), cross-referencing `provider-standards.md` and the bench spec. +- **NOTE** Out of scope (separate audit): the deeper skill-count reconciliation (SKILL.md says 30 curated; `companion-skills.yaml` lists 65 full roster incl. optional). README follows SKILL.md per rule #1. + +No code, no behavior change. Companion KG artifact (workspace repo, not this repo): `research/entities/pattern/openai-compatible-provider-abstraction.md` documents the provider-abstraction architecture that BRO-1211 introduced. + ## 0.23.1 — 2026-06-01 ### docs: P6 reflex tightening — "a reflex, not a request, **and never a question**" (BRO-1288) diff --git a/README.md b/README.md index 92b46d4..8acf031 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # bstack — The Broomva Stack -**A portable harness metalayer for AI-native development.** Eleven irreducible primitives plus 28 curated agent skills that turn any agent-driven workspace into a self-operating system. +**A portable harness metalayer for AI-native development.** Twenty irreducible primitives plus 30 curated agent skills that turn any agent-driven workspace into a self-operating system. ```bash npx skills add broomva/bstack @@ -8,7 +8,7 @@ npx skills add broomva/bstack This installs the meta-skill that bootstraps the full stack — primitive contract, governance scaffolding, hooks, and skill roster — into your project. Works with Claude Code, Codex, Gemini CLI, OpenCode, and the [50+ agent CLIs the skills ecosystem supports](https://github.com/vercel-labs/skills). -## The eleven primitives +## The twenty primitives Each primitive closes one specific failure mode that drifts into entropy in unsupervised agent sessions. @@ -25,12 +25,21 @@ Each primitive closes one specific failure mode that drifts into entropy in unsu | **P9** | Productive Wait (`broomva/p9` skill) | sleep-on-wait dead time (CI, deploys, builds — PR CI is the canonical case) | | **P10** | Worktree Hygiene Discipline | dirty trees and orphan worktrees compounding across sessions | | **P11** | Empirical Feedback Loop | shipping code that compiles but doesn't actually work when exercised | +| **P12** | Persistent Loop Discipline (`broomva/persist` skill) | long-horizon work decaying as the context window rots | +| **P13** | Dream Cycle Discipline | tier-crossing consolidation corrupting upper-tier rules without replay (the *shadow dream* failure mode) | +| **P14** | Dependency-Chain Reasoning Discipline | "think deeply through chain of dependencies" becoming ritual without concrete upstream/downstream enumeration | +| **P15** | State-Snapshot Before Action | plans built on stale state (uncommitted work, in-flight PRs, stale deploys) | +| **P16** | Crystallization Discipline (the Bstack Engine) | recurring valuable patterns living only in the user's head, never promoted to infrastructure | +| **P17** | Lens-Routed Request Articulation (`broomva/role-x` skill) | flat-dispatch fan-out failing to load the domain context that shapes the correct quality bar | +| **P18** | Format-Follows-Audience Discipline | markdown-by-default regardless of audience; specs nobody reads; ASCII pseudo-diagrams where SVG-in-HTML belongs | +| **P19** | Orchestration-Mechanism Selection Discipline | implicit between-reflex handoffs ("continue please"); wrong mechanism for the work shape | +| **P20** | Cross-Model Adversarial Review Gate (`broomva/cross-review` skill) | same-model echo chamber; writer self-validates own work; AI slop merged with no independent evaluator | Full reference with reflexive trigger rules, invariants, and cohesion narrative: **[references/primitives.md](references/primitives.md)**. -P6, P9, P10, and P11 are *reasoning-enforced* — they bind every agent through reflexive trigger rules in `AGENTS.md` rather than through hooks. The other primitives are mechanism-enforced through hooks, scripts, or CI gates. +The majority of primitives (P6, P9–P20) are *reasoning-enforced* — they bind every agent through reflexive trigger rules in `AGENTS.md` rather than through hooks. The mechanism-enforced primitives (P1, P2, P4, P5, P7, P8) run through hooks, scripts, or CI gates. -## Stack layers (28 skills) +## Stack layers (30 skills) | Layer | Skills | Purpose | |-------|--------|---------| @@ -44,15 +53,23 @@ P6, P9, P10, and P11 are *reasoning-enforced* — they bind every agent through ## Commands -Once installed, the skill exposes six commands: +Once installed, the skill exposes these commands: -- **`bootstrap`** — install all 28 skills + scaffold governance (CLAUDE.md, AGENTS.md, `.control/policy.yaml`) + wire hooks + run doctor +**Lifecycle** +- **`bootstrap`** — install all 30 skills + scaffold governance (CLAUDE.md, AGENTS.md, `.control/policy.yaml`) + wire hooks + run doctor - **`doctor`** — verify primitive contract compliance (always exits 0 by default; `--strict` for CI) - **`repair`** — apply targeted fixes for gaps the doctor surfaces - **`status`** — show installed-vs-missing skills + harness health - **`validate`** — check skill SKILL.md frontmatter health - **`revamp`** — full reconfiguration: force-reinstall + rewire + re-doctor +**Orchestration & observability** +- **`wave`** — Orchestrate (P19) parallel sub-phase dispatch: one background agent + worktree per plan file +- **`crystallize`** — Crystallize (P16) rule-of-three candidate detector over conversation logs +- **`metrics`** — setpoint measurement pipeline (collect / observe) +- **`skills`** — companion-skill roster manager (install / status / list) +- **`bench`** — Empirical (P11) skill-evolution benchmark: two-phase cold→warm runs with pluggable LLM providers (OpenAI-compatible; Databricks Gateway built in). See [references/provider-standards.md](references/provider-standards.md). + ## Governance & stability bstack's governance layer (`CLAUDE.md` + `AGENTS.md` + `.control/policy.yaml`) is the **Level 3 controller** in a [Recursive Controlled Systems hierarchy](https://broomva.tech/writing/recursive-controlled-systems) with formal stability proofs. The L3 stability margin is narrow on purpose — governance changes consume budget, so the contract evolves slowly and deliberately. @@ -72,7 +89,7 @@ Interactive catalog with descriptions, install commands, and layer diagrams: **[broomva.tech/skills](https://broomva.tech/skills)** -The narrative on what bstack is, why it exists, and what the eleven primitives buy you in measured throughput is at: +The narrative on what bstack is, why it exists, and what the twenty primitives buy you in measured throughput is at: **[broomva.tech/writing/bstack-portable-harness-metalayer](https://broomva.tech/writing/bstack-portable-harness-metalayer)** diff --git a/VERSION b/VERSION index 610e287..fda96dc 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.23.1 +0.23.2 diff --git a/references/primitives.md b/references/primitives.md index f3145ff..8033ef2 100644 --- a/references/primitives.md +++ b/references/primitives.md @@ -216,9 +216,12 @@ Mental checklist: *Did I decide on a worktree? Is `git status` clean? Are merged | Deploy verification | Vercel preview URL → screenshot via `gstack` | After CI green, before claiming "shipped" | | Audio diff | TTS comparison | When narration changes | | Multi-agent observation | Parallel `Agent` calls watching different surfaces | Long-running work | +| Skill-evolution benchmark | `bstack bench` — two-phase cold→warm runs + rubric/LLM-judge over pluggable providers | Measuring whether a skill/primitive change actually cuts tokens or lifts quality | The agent picks the right subset, runs as parallel watchers via `run_in_background` where applicable, and **captures evidence** — not just exit codes, but actual screenshots, log snippets, response bodies, browser transcripts. +`bstack bench` is the dedicated P11 *measurement* substrate: it turns "this primitive reduces token waste" from an assertion into a falsifiable number. Two-phase protocol (Phase 1 cold skills → snapshot → Phase 2 warm), pluggable LLM providers via the OpenAI-compatible contract (Databricks Gateway built in), and a P20-enforced judge-model-isolation gate. See [provider-standards.md](provider-standards.md) and `specs/bench-skill-evolution.md`. + **Invariant**: before claiming any work *complete*, the agent has interacted with the deployed/running version (or stated explicitly why interaction wasn't possible). The interaction is captured (screenshot, log snippet, video clip, terminal output, response body) and surfaced in the response. *Reasoning isn't validation; interaction is.* ### P11 Reflexive Trigger Rule (binding on every agent)