runtime-obligation-testops

runtime-obligation-testops is a TestOps control system for teams that want automated testing managed against real runtime behavior, not just test labels or line coverage.

The package exists for one governing rule:

automated testing is managed against the full set of runtime behavior units

That rule is enforced through four properties:

event completeness
outcome closure
observability
traceability

Why this exists

This project started from a real failure mode in a real product.

The original product had:

a large automated test suite
very high code coverage
passing builds
clear unit / integration / component labels

and still had missing runtime layers.

The core problem was not that the product had too few tests. The problem was that the product had no durable way to answer:

what the real runtime denominator is
which runtime layers are actually managed
what observable outcomes are proven
which tests own that proof
whether the denominator has been silently narrowed by hand

In practice that caused exactly the kind of mistake this package is designed to prevent:

a repo can look “100% covered”
a declared control plane can look clean
and a whole runtime layer can still be absent from the managed denominator

That is why this package exists.

It is not a prettier coverage tool. It is a control system for the runtime denominator itself.

Runtime behavior completeness model

flowchart LR
  A[Runtime codebase] --> B[Discovery policy]
  B --> C[Discovered runtime candidates]
  C --> D[Reviewed runtime inventory]
  D --> E[Runtime surfaces]
  E --> F[Reviewed runtime behaviors]
  F --> G[Implemented behavior units and owner tests]
  G --> H[Validate / Review / Impact]
  H --> I[CI gate]

  J[Fidelity policy] --> F
  K[Quality policy] --> D
  K --> F

The package governs whether reviewed runtime behaviors are actually implemented as tests. Not just whether a repo has many tests at the bottom of the stack.

Who decides what

This package is not designed around manual test bookkeeping. It is designed around a reviewed operating loop:

the scanner proposes candidates
repo-local policy shapes how those candidates should be interpreted
AI agents do most of the inventory, surface, behavior, annotation, and owner-test maintenance
reviewers approve semantic decisions when acceptance, suppression, fidelity, or granularity is non-obvious
CI enforces the resulting behavior-completeness gates

That distinction matters. If you describe this package as just a validator, people will use it too late in the workflow.

What problem it solves

Most repos can tell you:

how many tests exist
which folders contain tests
which runner they use
what line coverage says

Most repos cannot tell you:

which runtime events define the real denominator
which surfaces partition that denominator
which reviewed runtime behaviors make up the denominator
which behavior units implement those behaviors
what evidence proves each implemented behavior
which tests own that evidence
whether discovery and the reviewed model have drifted apart

This package makes those questions explicit and operational.

It also handles a second failure mode that appears after teams adopt a reviewed model:

the reviewed model exists
validation looks clean
but one inventory source or one behavior unit is so broad that real test gaps still hide inside it

runtime-quality-policy.json exists to make that reviewed-model smell explicit instead of letting it live behind a clean-looking control plane.

Why line coverage is not enough

flowchart TD
  A[Many tests] --> B[High code coverage]
  C[unit / integration / component labels] --> B
  B --> D[Looks healthy]
  D --> E[Missing runtime layer can still hide]

  F[runtime behavior completeness system] --> G[Discovered vs reviewed drift]
  F --> H[Reviewed behavior closure]
  F --> I[Evidence / owner-test traceability]
  F --> J[Granularity quality gates]
  G --> K[Managed runtime denominator]
  H --> K
  I --> K
  J --> K

What the package actually manages

The package manages six connected artifacts:

runtime-discovery-policy.json
- scanner rules, ignore patterns, reviewed suppressions
runtime-inventory.json
- the reviewed runtime denominator
runtime-surfaces.json
- the project-specific management partition over that denominator
runtime-control-plane.json
- implemented behavior units, evidence, fidelity, owner tests
fidelity-policy.json
- the minimum proof strength required by surface, source, or behavior unit
runtime-quality-policy.json
- reviewed-model quality gates that flag overly broad sources or behavior units
runtime-self-check-policy.json
- rules that challenge the reviewed model itself for implicit mappings, diffuse proof ownership, and risky weak-fidelity assumptions
runtime-retrospective.json
- escaped runtime misses and the hardening actions that must feed them back into the reviewed model

The first artifact manages discovery. The next four artifacts are the reviewed runtime model. The next artifact questions whether that reviewed model is explicit enough to trust. The final artifact records escaped misses so they harden future review instead of being forgotten after QA or incidents.

runtime-obligation-testops keeps its historical name, but the preferred proof unit is now the runtime behavior unit. Legacy obligations remain supported for backward compatibility and are interpreted as behavior units by the validator.

Universal core, repo-local policy

This package is intentionally split into two layers:

a universal control core
repo-local operating policy

The universal core gives every project the same runtime-behavior model:

sources
surfaces
reviewed behaviors
implemented behavior units
outcomes
evidence
fidelity
owner tests
traceability

Repo-local policy tells the core how this specific codebase should be interpreted:

which files are even eligible for discovery
which generated or vendor paths must be ignored
which candidate matches are reviewed suppressions
which runtime categories need explicit source overrides
whether discovered-vs-reviewed drift should fail CI now or only warn

That split matters. It lets the package stay universal without pretending that every codebase emits the same scanner signals.

Discovered vs reviewed

flowchart LR
  A[Scanner output] -->|candidate set| B{Review}
  B -->|accept| C[Reviewed inventory]
  B -->|suppress| D[Reviewed suppression]
  B -->|defer| E[Review backlog]
  C --> F[Surfaces]
  F --> G[Behavior units]
  G --> H[Owner tests]
  H --> I[Runtime proof]

The key design choice: discovered vs reviewed

The package keeps two layers of truth in tension:

discovered runtime candidates
reviewed runtime model

That distinction is the whole point.

If you manage only the reviewed model, teams can accidentally leave real runtime files out of scope. If you trust only discovery, you get noisy heuristics instead of an operable system.

rotops validate exists to stop those two layers from drifting apart silently.

Who does what in practice

Actor	Primary job	What it should not do
Discovery engine	Propose runtime candidates and drift	Declare truth by itself
Repo-local policy	Teach the scanner how this repo expresses runtime	Hide real runtime files just to get clean-looking output
AI agent	Perform most model, annotation, and owner-test updates	Stop at line coverage or raw test counts
Reviewer	Approve semantic decisions	Rebuild the whole model manually every time
CI	Enforce completeness gates	Replace semantic review

What is universal and what is heuristic

The package is strongest at the control layer:

validate
impact
the reviewed runtime model
fidelity policy
quality policy
owner-test traceability

Discovery is intentionally a bootstrap engine, not an oracle.

That means:

the package can be used in repos that do not look like the example repo
teams can start with a hand-authored reviewed model
discovery can begin in advisory mode
repo-local policy can gradually tighten discovery quality over time

This is the intended operating model. The package is not promising that raw scanner output is universally correct on day one.

Where this fits in a real test strategy

This package is for governing automated verification below and around the top black-box layer.

It helps teams manage:

request boundaries
client state transitions
workflow orchestration
persistence semantics
background execution
external contracts
runtime invariants

It does not eliminate the need for:

real-dependency integration tests
full-system tests
browser or manual black-box checks

Those layers still matter. This package exists so the rest of the automated stack is not managed blindly.

What this is not

It is not:

a replacement for your test runner
a replacement for E2E or manual testing
an oracle that invents the correct runtime model without review
a promise that line coverage now means runtime completeness

It is a control system for keeping your runtime denominator, proof graph, and test ownership aligned.

Commands

Install the package in the target repo first:

npm install -D runtime-obligation-testops

Then run the CLI:

npx rotops init
npx rotops inventory scan
npx rotops surfaces derive
npx rotops doctor
npx rotops review
npx rotops self-check
npx rotops retro
npx rotops validate
npx rotops report
npx rotops impact --changed src/path/to/file.ts
npx rotops export agent-contract
npx rotops export vitest-workspace --out vitest.runtime.workspace.ts

If your repo wraps rotops behind project-local scripts, export the agent contract with those commands:

npx rotops export agent-contract \
  --review-command "npm run test:review" \
  --self-check-command "npm run test:self-check" \
  --retro-command "npm run test:retro" \
  --doctor-command "npm run test:doctor" \
  --impact-command "npm run test:impact -- --changed <path>" \
  --validate-command "npm run test:control"

If your repo uses non-default paths such as testing/ instead of testops/, keep the artifacts where they are and wrap the CLI with project-local scripts.

Reviewed-model quality

runtime-quality-policy.json is the package's guard against a clean-but-coarse reviewed model.

It can express rules such as:

maximum files per reviewed inventory source
maximum files per behavior unit
maximum reviewed inventory sources per behavior unit
maximum reviewed inventory behaviors per behavior unit

Fidelity policy governs proof strength. Quality policy governs proof granularity.

runtime-self-check-policy.json governs whether the reviewed model is still explicit enough to trust.

It can express rules such as:

whether inventory behaviors must be explicit instead of synthesized
whether behavior units must explicitly name reviewed inventory behaviors
maximum runtime behaviors one owner test may claim
maximum owner tests one behavior unit may diffuse across
minimum fidelity for risky source kinds
required outcome classes for risky source kinds
required evidence for risky source kinds
owner-test proof patterns for higher-fidelity claims

Recommended bootstrap strategy

Do not assume every repo should start in strict discovery mode.

The safe sequence is:

start with a reviewed model for one important runtime slice
keep discovery scoped to the slice you actually intend to manage first
use rotops validate and rotops impact as the first CI gate
add rotops doctor so stale installs or missing control artifacts do not fake a clean result
use rotops review to keep discovered candidates visible while repo-local policy is still being shaped
add rotops self-check once the denominator is explicit enough to question its own granularity
add rotops retro once escaped runtime misses should harden the model automatically
move discovery drift to error once the scanner is trustworthy for that repo

That rollout works better than pretending heuristics are already perfect.

Agent operating loop

sequenceDiagram
  participant AI as AI agent
  participant Repo as Repo-local policy
  participant R as rotops
  participant CI as CI gate

  AI->>R: impact --changed <files>
  AI->>R: inventory scan
  R-->>AI: discovered candidates
  AI->>Repo: accept / suppress / defer
  AI->>Repo: update inventory / surfaces / behaviors / owner tests
  AI->>R: review
  AI->>R: self-check
  AI->>R: retro
  AI->>R: validate
  AI->>R: runtime tests
  AI->>R: code coverage (secondary)
  R-->>CI: completeness result
  CI-->>AI: pass or fail

Recommended operating loop

Run inventory scan to discover candidate runtime sources.
Run review to turn scanner output into an explicit review backlog.
Record suppressions, scope decisions, and scanner noise in runtime-discovery-policy.json.
Accept the reviewed denominator in runtime-inventory.json.
Derive or refine runtime surfaces in runtime-surfaces.json.
Register reviewed behaviors, implemented behavior units, evidence, fidelity, and owner tests in runtime-control-plane.json.
Add self-check and retrospective policy once the reviewed model is stable enough to challenge itself.
Export the machine-readable agent contract for local tooling or CI.
Run doctor, self-check, retro, and validate before the main test suite.

What `validate` checks

artifact schemas are valid
principles are consistent across artifacts
reviewed inventory sources map to reviewed surfaces
reviewed surfaces map to the control plane
reviewed runtime files are closed by implemented behavior units
owner tests exist and are referenced
// runtime-behaviors: ... annotations do not drift
fidelity does not regress below policy
discovered runtime files are not missing from the reviewed denominator

What `self-check` and `retro` add

rotops self-check
- questions the reviewed model itself instead of only checking declared consistency
- catches implicit behavior mappings and overly broad owner-test claims
- flags weak proof assumptions for risky source kinds
- can require outcome/evidence closure for whole source kinds instead of trusting broad behavior wording
rotops retro
- records escaped runtime misses so they cannot disappear after QA or incident response
- fails when open retrospective entries still exist
- warns when the same miss pattern keeps recurring and should become a stronger rule
rotops doctor
- verifies that the installed package, lockfile, and local runtime artifacts actually match the reviewed setup
- catches stale installs before completeness results are trusted

AI operating model

This package is designed for AI coding agents as much as for humans.

The expected loop is:

detect changed runtime files
run rotops doctor
run rotops impact
run rotops review when denominator drift may have changed
compare discovered candidates to the reviewed model
run rotops self-check so the reviewed model itself is challenged for hidden coarseness
accept, suppress, or continue reviewing candidate drift through repo-local policy
update reviewed behaviors and owner tests
record escaped misses in runtime-retrospective.json and run rotops retro when a miss should harden the system
export or refresh the machine-readable runtime agent contract when project paths or process changed
rerun rotops validate

The control system exists so an agent cannot “solve” testing by only adding lines of test code. The agent has to maintain the runtime model too.

How to use this in practice

Use it when you want a repo to answer, concretely:

what runtime behavior exists
what part of it is managed
what part is still unreviewed
what proof exists
what proof is too weak
what changed files affect which implemented behavior units

Do not use it as a cosmetic wrapper around existing folder labels. If the runtime denominator is not reviewed, the system is being used incorrectly.

Bootstrap maturity

Different parts of the package mature at different speeds in different stacks.

control plane validation is generally portable
impact analysis is generally portable
reviewed-model completeness checks are generally portable
raw discovery quality depends on repo-local policy and codebase signals

That is why this package exposes policy files instead of hiding heuristics inside the binary.

Public repo readiness

A repo using this package is publishable when:

new runtime entrypoints cannot land without denominator review
discovered-vs-declared drift fails CI
behavior units own tests and annotations
proof strength is visible through fidelity policy
humans and AI agents read the same artifacts first

AI agents

An AI agent should not treat the control plane as documentation. It should treat it as the runtime source of truth for automated testing changes.

Start here:

Example

The package includes a concrete product example under examples/apppulse.

It also includes a smaller staged-adoption example for a non-JS runtime slice under examples/flutter-session.

That example matters because this package was not invented from a blank framework template. It was extracted from a real product that exposed the exact failure mode this system is meant to prevent.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
docs		docs
examples		examples
fixtures		fixtures
schema		schema
src		src
templates		templates
test		test
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

runtime-obligation-testops

Why this exists

Runtime behavior completeness model

Who decides what

What problem it solves

Why line coverage is not enough

What the package actually manages

Universal core, repo-local policy

Discovered vs reviewed

The key design choice: discovered vs reviewed

Who does what in practice

What is universal and what is heuristic

Where this fits in a real test strategy

What this is not

Commands

Reviewed-model quality

Recommended bootstrap strategy

Agent operating loop

Recommended operating loop

What `validate` checks

What `self-check` and `retro` add

AI operating model

How to use this in practice

Bootstrap maturity

Public repo readiness

AI agents

Example

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

runtime-obligation-testops

Why this exists

Runtime behavior completeness model

Who decides what

What problem it solves

Why line coverage is not enough

What the package actually manages

Universal core, repo-local policy

Discovered vs reviewed

The key design choice: discovered vs reviewed

Who does what in practice

What is universal and what is heuristic

Where this fits in a real test strategy

What this is not

Commands

Reviewed-model quality

Recommended bootstrap strategy

Agent operating loop

Recommended operating loop

What validate checks

What self-check and retro add

AI operating model

How to use this in practice

Bootstrap maturity

Public repo readiness

AI agents

Example

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

What `validate` checks

What `self-check` and `retro` add

Packages