runtime-obligation-testops is a TestOps control system for teams that want automated testing managed against real runtime behavior, not just test labels or line coverage.
The package exists for one governing rule:
automated testing is managed against the full set of runtime behavior units
That rule is enforced through four properties:
event completenessoutcome closureobservabilitytraceability
This project started from a real failure mode in a real product.
The original product had:
- a large automated test suite
- very high code coverage
- passing builds
- clear
unit / integration / componentlabels
and still had missing runtime layers.
The core problem was not that the product had too few tests. The problem was that the product had no durable way to answer:
- what the real runtime denominator is
- which runtime layers are actually managed
- what observable outcomes are proven
- which tests own that proof
- whether the denominator has been silently narrowed by hand
In practice that caused exactly the kind of mistake this package is designed to prevent:
- a repo can look “100% covered”
- a declared control plane can look clean
- and a whole runtime layer can still be absent from the managed denominator
That is why this package exists.
It is not a prettier coverage tool. It is a control system for the runtime denominator itself.
flowchart LR
A[Runtime codebase] --> B[Discovery policy]
B --> C[Discovered runtime candidates]
C --> D[Reviewed runtime inventory]
D --> E[Runtime surfaces]
E --> F[Reviewed runtime behaviors]
F --> G[Implemented behavior units and owner tests]
G --> H[Validate / Review / Impact]
H --> I[CI gate]
J[Fidelity policy] --> F
K[Quality policy] --> D
K --> F
The package governs whether reviewed runtime behaviors are actually implemented as tests. Not just whether a repo has many tests at the bottom of the stack.
This package is not designed around manual test bookkeeping. It is designed around a reviewed operating loop:
- the scanner proposes candidates
- repo-local policy shapes how those candidates should be interpreted
- AI agents do most of the inventory, surface, behavior, annotation, and owner-test maintenance
- reviewers approve semantic decisions when acceptance, suppression, fidelity, or granularity is non-obvious
- CI enforces the resulting behavior-completeness gates
That distinction matters. If you describe this package as just a validator, people will use it too late in the workflow.
Most repos can tell you:
- how many tests exist
- which folders contain tests
- which runner they use
- what line coverage says
Most repos cannot tell you:
- which runtime events define the real denominator
- which surfaces partition that denominator
- which reviewed runtime behaviors make up the denominator
- which behavior units implement those behaviors
- what evidence proves each implemented behavior
- which tests own that evidence
- whether discovery and the reviewed model have drifted apart
This package makes those questions explicit and operational.
It also handles a second failure mode that appears after teams adopt a reviewed model:
- the reviewed model exists
- validation looks clean
- but one inventory source or one behavior unit is so broad that real test gaps still hide inside it
runtime-quality-policy.json exists to make that reviewed-model smell explicit instead of letting it live behind a clean-looking control plane.
flowchart TD
A[Many tests] --> B[High code coverage]
C[unit / integration / component labels] --> B
B --> D[Looks healthy]
D --> E[Missing runtime layer can still hide]
F[runtime behavior completeness system] --> G[Discovered vs reviewed drift]
F --> H[Reviewed behavior closure]
F --> I[Evidence / owner-test traceability]
F --> J[Granularity quality gates]
G --> K[Managed runtime denominator]
H --> K
I --> K
J --> K
The package manages six connected artifacts:
runtime-discovery-policy.json- scanner rules, ignore patterns, reviewed suppressions
runtime-inventory.json- the reviewed runtime denominator
runtime-surfaces.json- the project-specific management partition over that denominator
runtime-control-plane.json- implemented behavior units, evidence, fidelity, owner tests
fidelity-policy.json- the minimum proof strength required by surface, source, or behavior unit
runtime-quality-policy.json- reviewed-model quality gates that flag overly broad sources or behavior units
runtime-self-check-policy.json- rules that challenge the reviewed model itself for implicit mappings, diffuse proof ownership, and risky weak-fidelity assumptions
runtime-retrospective.json- escaped runtime misses and the hardening actions that must feed them back into the reviewed model
The first artifact manages discovery. The next four artifacts are the reviewed runtime model. The next artifact questions whether that reviewed model is explicit enough to trust. The final artifact records escaped misses so they harden future review instead of being forgotten after QA or incidents.
runtime-obligation-testops keeps its historical name, but the preferred proof unit is now the runtime behavior unit.
Legacy obligations remain supported for backward compatibility and are interpreted as behavior units by the validator.
This package is intentionally split into two layers:
- a universal control core
- repo-local operating policy
The universal core gives every project the same runtime-behavior model:
- sources
- surfaces
- reviewed behaviors
- implemented behavior units
- outcomes
- evidence
- fidelity
- owner tests
- traceability
Repo-local policy tells the core how this specific codebase should be interpreted:
- which files are even eligible for discovery
- which generated or vendor paths must be ignored
- which candidate matches are reviewed suppressions
- which runtime categories need explicit source overrides
- whether discovered-vs-reviewed drift should fail CI now or only warn
That split matters. It lets the package stay universal without pretending that every codebase emits the same scanner signals.
flowchart LR
A[Scanner output] -->|candidate set| B{Review}
B -->|accept| C[Reviewed inventory]
B -->|suppress| D[Reviewed suppression]
B -->|defer| E[Review backlog]
C --> F[Surfaces]
F --> G[Behavior units]
G --> H[Owner tests]
H --> I[Runtime proof]
The package keeps two layers of truth in tension:
discovered runtime candidatesreviewed runtime model
That distinction is the whole point.
If you manage only the reviewed model, teams can accidentally leave real runtime files out of scope. If you trust only discovery, you get noisy heuristics instead of an operable system.
rotops validate exists to stop those two layers from drifting apart silently.
| Actor | Primary job | What it should not do |
|---|---|---|
| Discovery engine | Propose runtime candidates and drift | Declare truth by itself |
| Repo-local policy | Teach the scanner how this repo expresses runtime | Hide real runtime files just to get clean-looking output |
| AI agent | Perform most model, annotation, and owner-test updates | Stop at line coverage or raw test counts |
| Reviewer | Approve semantic decisions | Rebuild the whole model manually every time |
| CI | Enforce completeness gates | Replace semantic review |
The package is strongest at the control layer:
validateimpact- the reviewed runtime model
- fidelity policy
- quality policy
- owner-test traceability
Discovery is intentionally a bootstrap engine, not an oracle.
That means:
- the package can be used in repos that do not look like the example repo
- teams can start with a hand-authored reviewed model
- discovery can begin in advisory mode
- repo-local policy can gradually tighten discovery quality over time
This is the intended operating model. The package is not promising that raw scanner output is universally correct on day one.
This package is for governing automated verification below and around the top black-box layer.
It helps teams manage:
- request boundaries
- client state transitions
- workflow orchestration
- persistence semantics
- background execution
- external contracts
- runtime invariants
It does not eliminate the need for:
- real-dependency integration tests
- full-system tests
- browser or manual black-box checks
Those layers still matter. This package exists so the rest of the automated stack is not managed blindly.
It is not:
- a replacement for your test runner
- a replacement for E2E or manual testing
- an oracle that invents the correct runtime model without review
- a promise that line coverage now means runtime completeness
It is a control system for keeping your runtime denominator, proof graph, and test ownership aligned.
Install the package in the target repo first:
npm install -D runtime-obligation-testopsThen run the CLI:
npx rotops init
npx rotops inventory scan
npx rotops surfaces derive
npx rotops doctor
npx rotops review
npx rotops self-check
npx rotops retro
npx rotops validate
npx rotops report
npx rotops impact --changed src/path/to/file.ts
npx rotops export agent-contract
npx rotops export vitest-workspace --out vitest.runtime.workspace.tsIf your repo wraps rotops behind project-local scripts, export the agent contract with those commands:
npx rotops export agent-contract \
--review-command "npm run test:review" \
--self-check-command "npm run test:self-check" \
--retro-command "npm run test:retro" \
--doctor-command "npm run test:doctor" \
--impact-command "npm run test:impact -- --changed <path>" \
--validate-command "npm run test:control"If your repo uses non-default paths such as testing/ instead of testops/, keep the artifacts where they are and wrap the CLI with project-local scripts.
runtime-quality-policy.json is the package's guard against a clean-but-coarse reviewed model.
It can express rules such as:
- maximum files per reviewed inventory source
- maximum files per behavior unit
- maximum reviewed inventory sources per behavior unit
- maximum reviewed inventory behaviors per behavior unit
Fidelity policy governs proof strength. Quality policy governs proof granularity.
runtime-self-check-policy.json governs whether the reviewed model is still explicit enough to trust.
It can express rules such as:
- whether inventory behaviors must be explicit instead of synthesized
- whether behavior units must explicitly name reviewed inventory behaviors
- maximum runtime behaviors one owner test may claim
- maximum owner tests one behavior unit may diffuse across
- minimum fidelity for risky source kinds
- required outcome classes for risky source kinds
- required evidence for risky source kinds
- owner-test proof patterns for higher-fidelity claims
Do not assume every repo should start in strict discovery mode.
The safe sequence is:
- start with a reviewed model for one important runtime slice
- keep discovery scoped to the slice you actually intend to manage first
- use
rotops validateandrotops impactas the first CI gate - add
rotops doctorso stale installs or missing control artifacts do not fake a clean result - use
rotops reviewto keep discovered candidates visible while repo-local policy is still being shaped - add
rotops self-checkonce the denominator is explicit enough to question its own granularity - add
rotops retroonce escaped runtime misses should harden the model automatically - move discovery drift to
erroronce the scanner is trustworthy for that repo
That rollout works better than pretending heuristics are already perfect.
sequenceDiagram
participant AI as AI agent
participant Repo as Repo-local policy
participant R as rotops
participant CI as CI gate
AI->>R: impact --changed <files>
AI->>R: inventory scan
R-->>AI: discovered candidates
AI->>Repo: accept / suppress / defer
AI->>Repo: update inventory / surfaces / behaviors / owner tests
AI->>R: review
AI->>R: self-check
AI->>R: retro
AI->>R: validate
AI->>R: runtime tests
AI->>R: code coverage (secondary)
R-->>CI: completeness result
CI-->>AI: pass or fail
- Run
inventory scanto discover candidate runtime sources. - Run
reviewto turn scanner output into an explicit review backlog. - Record suppressions, scope decisions, and scanner noise in
runtime-discovery-policy.json. - Accept the reviewed denominator in
runtime-inventory.json. - Derive or refine runtime surfaces in
runtime-surfaces.json. - Register reviewed behaviors, implemented behavior units, evidence, fidelity, and owner tests in
runtime-control-plane.json. - Add self-check and retrospective policy once the reviewed model is stable enough to challenge itself.
- Export the machine-readable agent contract for local tooling or CI.
- Run
doctor,self-check,retro, andvalidatebefore the main test suite.
- artifact schemas are valid
- principles are consistent across artifacts
- reviewed inventory sources map to reviewed surfaces
- reviewed surfaces map to the control plane
- reviewed runtime files are closed by implemented behavior units
- owner tests exist and are referenced
// runtime-behaviors: ...annotations do not drift- fidelity does not regress below policy
- discovered runtime files are not missing from the reviewed denominator
rotops self-check- questions the reviewed model itself instead of only checking declared consistency
- catches implicit behavior mappings and overly broad owner-test claims
- flags weak proof assumptions for risky source kinds
- can require outcome/evidence closure for whole source kinds instead of trusting broad behavior wording
rotops retro- records escaped runtime misses so they cannot disappear after QA or incident response
- fails when open retrospective entries still exist
- warns when the same miss pattern keeps recurring and should become a stronger rule
rotops doctor- verifies that the installed package, lockfile, and local runtime artifacts actually match the reviewed setup
- catches stale installs before completeness results are trusted
This package is designed for AI coding agents as much as for humans.
The expected loop is:
- detect changed runtime files
- run
rotops doctor - run
rotops impact - run
rotops reviewwhen denominator drift may have changed - compare discovered candidates to the reviewed model
- run
rotops self-checkso the reviewed model itself is challenged for hidden coarseness - accept, suppress, or continue reviewing candidate drift through repo-local policy
- update reviewed behaviors and owner tests
- record escaped misses in
runtime-retrospective.jsonand runrotops retrowhen a miss should harden the system - export or refresh the machine-readable runtime agent contract when project paths or process changed
- rerun
rotops validate
The control system exists so an agent cannot “solve” testing by only adding lines of test code. The agent has to maintain the runtime model too.
Use it when you want a repo to answer, concretely:
- what runtime behavior exists
- what part of it is managed
- what part is still unreviewed
- what proof exists
- what proof is too weak
- what changed files affect which implemented behavior units
Do not use it as a cosmetic wrapper around existing folder labels. If the runtime denominator is not reviewed, the system is being used incorrectly.
Different parts of the package mature at different speeds in different stacks.
- control plane validation is generally portable
- impact analysis is generally portable
- reviewed-model completeness checks are generally portable
- raw discovery quality depends on repo-local policy and codebase signals
That is why this package exposes policy files instead of hiding heuristics inside the binary.
A repo using this package is publishable when:
- new runtime entrypoints cannot land without denominator review
- discovered-vs-declared drift fails CI
- behavior units own tests and annotations
- proof strength is visible through fidelity policy
- humans and AI agents read the same artifacts first
An AI agent should not treat the control plane as documentation. It should treat it as the runtime source of truth for automated testing changes.
Start here:
- Why This Exists
- Principles
- Runtime Model
- Adoption Guide
- Repo-Local Policy
- Bootstrap Maturity
- Completeness Workflow
- AI Agent Integration
The package includes a concrete product example under examples/apppulse.
It also includes a smaller staged-adoption example for a non-JS runtime slice under examples/flutter-session.
That example matters because this package was not invented from a blank framework template. It was extracted from a real product that exposed the exact failure mode this system is meant to prevent.
MIT