Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,21 @@
# Changelog

## 0.23.2 — 2026-06-04

### docs: README sync to current contract + bench surfaced + P11 cross-reference (BRO-1376)

Closes a documentation-reconciliation gap surfaced by a post-ship audit of BRO-1205 (bench MVP) + BRO-1211 (Databricks live mode). The canonical surfaces (SKILL.md, `references/provider-standards.md`, CHANGELOG, spec, KG research entities) were already current; the public `README.md` was frozen at the P11 era and violated the CLAUDE.md Self-Documenting Standards rule #3 (counts must match SKILL.md as the authoritative source). `bstack doctor` does not lint the README, so the rot was not CI-enforced.

- **CHANGED** `README.md` — synced to the current contract:
- "Eleven irreducible primitives" → **twenty**; primitive table extended P1–P11 → full **P1–P20** (wording from SKILL.md's enforcement table).
- "28 curated skills" → **30** (matches SKILL.md authoritative count) across intro, Stack-layers header, and bootstrap description.
- Commands section: was six (bootstrap/doctor/repair/status/validate/revamp); now also documents **`bench`**, `wave`, `crystallize`, `metrics`, `skills` under an "Orchestration & observability" subsection. `bench` links to `references/provider-standards.md`.
- Reasoning-enforced primitive set corrected (P6, P9–P20) vs mechanism-enforced (P1, P2, P4, P5, P7, P8); closing narrative "eleven" → "twenty".
- **CHANGED** `references/primitives.md` — P11 Empirical Feedback Loop section now lists `bstack bench` as the dedicated P11 *measurement* substrate (table row + paragraph), cross-referencing `provider-standards.md` and the bench spec.
- **NOTE** Out of scope (separate audit): the deeper skill-count reconciliation (SKILL.md says 30 curated; `companion-skills.yaml` lists 65 full roster incl. optional). README follows SKILL.md per rule #1.

No code, no behavior change. Companion KG artifact (workspace repo, not this repo): `research/entities/pattern/openai-compatible-provider-abstraction.md` documents the provider-abstraction architecture that BRO-1211 introduced.

## 0.23.1 — 2026-06-01

### docs: P6 reflex tightening — "a reflex, not a request, **and never a question**" (BRO-1288)
Expand Down
31 changes: 24 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# bstack — The Broomva Stack

**A portable harness metalayer for AI-native development.** Eleven irreducible primitives plus 28 curated agent skills that turn any agent-driven workspace into a self-operating system.
**A portable harness metalayer for AI-native development.** Twenty irreducible primitives plus 30 curated agent skills that turn any agent-driven workspace into a self-operating system.

```bash
npx skills add broomva/bstack
```

This installs the meta-skill that bootstraps the full stack — primitive contract, governance scaffolding, hooks, and skill roster — into your project. Works with Claude Code, Codex, Gemini CLI, OpenCode, and the [50+ agent CLIs the skills ecosystem supports](https://github.com/vercel-labs/skills).

## The eleven primitives
## The twenty primitives

Each primitive closes one specific failure mode that drifts into entropy in unsupervised agent sessions.

Expand All @@ -25,12 +25,21 @@ Each primitive closes one specific failure mode that drifts into entropy in unsu
| **P9** | Productive Wait (`broomva/p9` skill) | sleep-on-wait dead time (CI, deploys, builds — PR CI is the canonical case) |
| **P10** | Worktree Hygiene Discipline | dirty trees and orphan worktrees compounding across sessions |
| **P11** | Empirical Feedback Loop | shipping code that compiles but doesn't actually work when exercised |
| **P12** | Persistent Loop Discipline (`broomva/persist` skill) | long-horizon work decaying as the context window rots |
| **P13** | Dream Cycle Discipline | tier-crossing consolidation corrupting upper-tier rules without replay (the *shadow dream* failure mode) |
| **P14** | Dependency-Chain Reasoning Discipline | "think deeply through chain of dependencies" becoming ritual without concrete upstream/downstream enumeration |
| **P15** | State-Snapshot Before Action | plans built on stale state (uncommitted work, in-flight PRs, stale deploys) |
| **P16** | Crystallization Discipline (the Bstack Engine) | recurring valuable patterns living only in the user's head, never promoted to infrastructure |
| **P17** | Lens-Routed Request Articulation (`broomva/role-x` skill) | flat-dispatch fan-out failing to load the domain context that shapes the correct quality bar |
| **P18** | Format-Follows-Audience Discipline | markdown-by-default regardless of audience; specs nobody reads; ASCII pseudo-diagrams where SVG-in-HTML belongs |
| **P19** | Orchestration-Mechanism Selection Discipline | implicit between-reflex handoffs ("continue please"); wrong mechanism for the work shape |
| **P20** | Cross-Model Adversarial Review Gate (`broomva/cross-review` skill) | same-model echo chamber; writer self-validates own work; AI slop merged with no independent evaluator |

Full reference with reflexive trigger rules, invariants, and cohesion narrative: **[references/primitives.md](references/primitives.md)**.

P6, P9, P10, and P11 are *reasoning-enforced* — they bind every agent through reflexive trigger rules in `AGENTS.md` rather than through hooks. The other primitives are mechanism-enforced through hooks, scripts, or CI gates.
The majority of primitives (P6, P9–P20) are *reasoning-enforced* — they bind every agent through reflexive trigger rules in `AGENTS.md` rather than through hooks. The mechanism-enforced primitives (P1, P2, P4, P5, P7, P8) run through hooks, scripts, or CI gates.

## Stack layers (28 skills)
## Stack layers (30 skills)

| Layer | Skills | Purpose |
|-------|--------|---------|
Expand All @@ -44,15 +53,23 @@ P6, P9, P10, and P11 are *reasoning-enforced* — they bind every agent through

## Commands

Once installed, the skill exposes six commands:
Once installed, the skill exposes these commands:

- **`bootstrap`** — install all 28 skills + scaffold governance (CLAUDE.md, AGENTS.md, `.control/policy.yaml`) + wire hooks + run doctor
**Lifecycle**
- **`bootstrap`** — install all 30 skills + scaffold governance (CLAUDE.md, AGENTS.md, `.control/policy.yaml`) + wire hooks + run doctor
- **`doctor`** — verify primitive contract compliance (always exits 0 by default; `--strict` for CI)
- **`repair`** — apply targeted fixes for gaps the doctor surfaces
- **`status`** — show installed-vs-missing skills + harness health
- **`validate`** — check skill SKILL.md frontmatter health
- **`revamp`** — full reconfiguration: force-reinstall + rewire + re-doctor

**Orchestration & observability**
- **`wave`** — Orchestrate (P19) parallel sub-phase dispatch: one background agent + worktree per plan file
- **`crystallize`** — Crystallize (P16) rule-of-three candidate detector over conversation logs
- **`metrics`** — setpoint measurement pipeline (collect / observe)
- **`skills`** — companion-skill roster manager (install / status / list)
- **`bench`** — Empirical (P11) skill-evolution benchmark: two-phase cold→warm runs with pluggable LLM providers (OpenAI-compatible; Databricks Gateway built in). See [references/provider-standards.md](references/provider-standards.md).

## Governance & stability

bstack's governance layer (`CLAUDE.md` + `AGENTS.md` + `.control/policy.yaml`) is the **Level 3 controller** in a [Recursive Controlled Systems hierarchy](https://broomva.tech/writing/recursive-controlled-systems) with formal stability proofs. The L3 stability margin is narrow on purpose — governance changes consume budget, so the contract evolves slowly and deliberately.
Expand All @@ -72,7 +89,7 @@ Interactive catalog with descriptions, install commands, and layer diagrams:

**[broomva.tech/skills](https://broomva.tech/skills)**

The narrative on what bstack is, why it exists, and what the eleven primitives buy you in measured throughput is at:
The narrative on what bstack is, why it exists, and what the twenty primitives buy you in measured throughput is at:

**[broomva.tech/writing/bstack-portable-harness-metalayer](https://broomva.tech/writing/bstack-portable-harness-metalayer)**

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.23.1
0.23.2
3 changes: 3 additions & 0 deletions references/primitives.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,9 +216,12 @@ Mental checklist: *Did I decide on a worktree? Is `git status` clean? Are merged
| Deploy verification | Vercel preview URL → screenshot via `gstack` | After CI green, before claiming "shipped" |
| Audio diff | TTS comparison | When narration changes |
| Multi-agent observation | Parallel `Agent` calls watching different surfaces | Long-running work |
| Skill-evolution benchmark | `bstack bench` — two-phase cold→warm runs + rubric/LLM-judge over pluggable providers | Measuring whether a skill/primitive change actually cuts tokens or lifts quality |

The agent picks the right subset, runs as parallel watchers via `run_in_background` where applicable, and **captures evidence** — not just exit codes, but actual screenshots, log snippets, response bodies, browser transcripts.

`bstack bench` is the dedicated P11 *measurement* substrate: it turns "this primitive reduces token waste" from an assertion into a falsifiable number. Two-phase protocol (Phase 1 cold skills → snapshot → Phase 2 warm), pluggable LLM providers via the OpenAI-compatible contract (Databricks Gateway built in), and a P20-enforced judge-model-isolation gate. See [provider-standards.md](provider-standards.md) and `specs/bench-skill-evolution.md`.

**Invariant**: before claiming any work *complete*, the agent has interacted with the deployed/running version (or stated explicitly why interaction wasn't possible). The interaction is captured (screenshot, log snippet, video clip, terminal output, response body) and surfaced in the response. *Reasoning isn't validation; interaction is.*

### P11 Reflexive Trigger Rule (binding on every agent)
Expand Down
Loading