Skip to content

philo-kim/runtime-obligation-testops

Repository files navigation

runtime-obligation-testops

runtime-obligation-testops is a TestOps control system for teams that want automated testing managed against real runtime behavior, not just test labels or line coverage.

The package exists for one governing rule:

automated testing is managed against the full set of runtime behavior units

That rule is enforced through four properties:

  • event completeness
  • outcome closure
  • observability
  • traceability

Why this exists

This project started from a real failure mode in a real product.

The original product had:

  • a large automated test suite
  • very high code coverage
  • passing builds
  • clear unit / integration / component labels

and still had missing runtime layers.

The core problem was not that the product had too few tests. The problem was that the product had no durable way to answer:

  • what the real runtime denominator is
  • which runtime layers are actually managed
  • what observable outcomes are proven
  • which tests own that proof
  • whether the denominator has been silently narrowed by hand

In practice that caused exactly the kind of mistake this package is designed to prevent:

  • a repo can look “100% covered”
  • a declared control plane can look clean
  • and a whole runtime layer can still be absent from the managed denominator

That is why this package exists.

It is not a prettier coverage tool. It is a control system for the runtime denominator itself.

Runtime behavior completeness model

flowchart LR
  A[Runtime codebase] --> B[Discovery policy]
  B --> C[Discovered runtime candidates]
  C --> D[Reviewed runtime inventory]
  D --> E[Runtime surfaces]
  E --> F[Reviewed runtime behaviors]
  F --> G[Implemented behavior units and owner tests]
  G --> H[Validate / Review / Impact]
  H --> I[CI gate]

  J[Fidelity policy] --> F
  K[Quality policy] --> D
  K --> F
Loading

The package governs whether reviewed runtime behaviors are actually implemented as tests. Not just whether a repo has many tests at the bottom of the stack.

Who decides what

This package is not designed around manual test bookkeeping. It is designed around a reviewed operating loop:

  • the scanner proposes candidates
  • repo-local policy shapes how those candidates should be interpreted
  • AI agents do most of the inventory, surface, behavior, annotation, and owner-test maintenance
  • reviewers approve semantic decisions when acceptance, suppression, fidelity, or granularity is non-obvious
  • CI enforces the resulting behavior-completeness gates

That distinction matters. If you describe this package as just a validator, people will use it too late in the workflow.

What problem it solves

Most repos can tell you:

  • how many tests exist
  • which folders contain tests
  • which runner they use
  • what line coverage says

Most repos cannot tell you:

  • which runtime events define the real denominator
  • which surfaces partition that denominator
  • which reviewed runtime behaviors make up the denominator
  • which behavior units implement those behaviors
  • what evidence proves each implemented behavior
  • which tests own that evidence
  • whether discovery and the reviewed model have drifted apart

This package makes those questions explicit and operational.

It also handles a second failure mode that appears after teams adopt a reviewed model:

  • the reviewed model exists
  • validation looks clean
  • but one inventory source or one behavior unit is so broad that real test gaps still hide inside it

runtime-quality-policy.json exists to make that reviewed-model smell explicit instead of letting it live behind a clean-looking control plane.

Why line coverage is not enough

flowchart TD
  A[Many tests] --> B[High code coverage]
  C[unit / integration / component labels] --> B
  B --> D[Looks healthy]
  D --> E[Missing runtime layer can still hide]

  F[runtime behavior completeness system] --> G[Discovered vs reviewed drift]
  F --> H[Reviewed behavior closure]
  F --> I[Evidence / owner-test traceability]
  F --> J[Granularity quality gates]
  G --> K[Managed runtime denominator]
  H --> K
  I --> K
  J --> K
Loading

What the package actually manages

The package manages six connected artifacts:

  • runtime-discovery-policy.json
    • scanner rules, ignore patterns, reviewed suppressions
  • runtime-inventory.json
    • the reviewed runtime denominator
  • runtime-surfaces.json
    • the project-specific management partition over that denominator
  • runtime-control-plane.json
    • implemented behavior units, evidence, fidelity, owner tests
  • fidelity-policy.json
    • the minimum proof strength required by surface, source, or behavior unit
  • runtime-quality-policy.json
    • reviewed-model quality gates that flag overly broad sources or behavior units
  • runtime-self-check-policy.json
    • rules that challenge the reviewed model itself for implicit mappings, diffuse proof ownership, and risky weak-fidelity assumptions
  • runtime-retrospective.json
    • escaped runtime misses and the hardening actions that must feed them back into the reviewed model

The first artifact manages discovery. The next four artifacts are the reviewed runtime model. The next artifact questions whether that reviewed model is explicit enough to trust. The final artifact records escaped misses so they harden future review instead of being forgotten after QA or incidents.

runtime-obligation-testops keeps its historical name, but the preferred proof unit is now the runtime behavior unit. Legacy obligations remain supported for backward compatibility and are interpreted as behavior units by the validator.

Universal core, repo-local policy

This package is intentionally split into two layers:

  • a universal control core
  • repo-local operating policy

The universal core gives every project the same runtime-behavior model:

  • sources
  • surfaces
  • reviewed behaviors
  • implemented behavior units
  • outcomes
  • evidence
  • fidelity
  • owner tests
  • traceability

Repo-local policy tells the core how this specific codebase should be interpreted:

  • which files are even eligible for discovery
  • which generated or vendor paths must be ignored
  • which candidate matches are reviewed suppressions
  • which runtime categories need explicit source overrides
  • whether discovered-vs-reviewed drift should fail CI now or only warn

That split matters. It lets the package stay universal without pretending that every codebase emits the same scanner signals.

Discovered vs reviewed

flowchart LR
  A[Scanner output] -->|candidate set| B{Review}
  B -->|accept| C[Reviewed inventory]
  B -->|suppress| D[Reviewed suppression]
  B -->|defer| E[Review backlog]
  C --> F[Surfaces]
  F --> G[Behavior units]
  G --> H[Owner tests]
  H --> I[Runtime proof]
Loading

The key design choice: discovered vs reviewed

The package keeps two layers of truth in tension:

  • discovered runtime candidates
  • reviewed runtime model

That distinction is the whole point.

If you manage only the reviewed model, teams can accidentally leave real runtime files out of scope. If you trust only discovery, you get noisy heuristics instead of an operable system.

rotops validate exists to stop those two layers from drifting apart silently.

Who does what in practice

Actor Primary job What it should not do
Discovery engine Propose runtime candidates and drift Declare truth by itself
Repo-local policy Teach the scanner how this repo expresses runtime Hide real runtime files just to get clean-looking output
AI agent Perform most model, annotation, and owner-test updates Stop at line coverage or raw test counts
Reviewer Approve semantic decisions Rebuild the whole model manually every time
CI Enforce completeness gates Replace semantic review

What is universal and what is heuristic

The package is strongest at the control layer:

  • validate
  • impact
  • the reviewed runtime model
  • fidelity policy
  • quality policy
  • owner-test traceability

Discovery is intentionally a bootstrap engine, not an oracle.

That means:

  • the package can be used in repos that do not look like the example repo
  • teams can start with a hand-authored reviewed model
  • discovery can begin in advisory mode
  • repo-local policy can gradually tighten discovery quality over time

This is the intended operating model. The package is not promising that raw scanner output is universally correct on day one.

Where this fits in a real test strategy

This package is for governing automated verification below and around the top black-box layer.

It helps teams manage:

  • request boundaries
  • client state transitions
  • workflow orchestration
  • persistence semantics
  • background execution
  • external contracts
  • runtime invariants

It does not eliminate the need for:

  • real-dependency integration tests
  • full-system tests
  • browser or manual black-box checks

Those layers still matter. This package exists so the rest of the automated stack is not managed blindly.

What this is not

It is not:

  • a replacement for your test runner
  • a replacement for E2E or manual testing
  • an oracle that invents the correct runtime model without review
  • a promise that line coverage now means runtime completeness

It is a control system for keeping your runtime denominator, proof graph, and test ownership aligned.

Commands

Install the package in the target repo first:

npm install -D runtime-obligation-testops

Then run the CLI:

npx rotops init
npx rotops inventory scan
npx rotops surfaces derive
npx rotops doctor
npx rotops review
npx rotops self-check
npx rotops retro
npx rotops validate
npx rotops report
npx rotops impact --changed src/path/to/file.ts
npx rotops export agent-contract
npx rotops export vitest-workspace --out vitest.runtime.workspace.ts

If your repo wraps rotops behind project-local scripts, export the agent contract with those commands:

npx rotops export agent-contract \
  --review-command "npm run test:review" \
  --self-check-command "npm run test:self-check" \
  --retro-command "npm run test:retro" \
  --doctor-command "npm run test:doctor" \
  --impact-command "npm run test:impact -- --changed <path>" \
  --validate-command "npm run test:control"

If your repo uses non-default paths such as testing/ instead of testops/, keep the artifacts where they are and wrap the CLI with project-local scripts.

Reviewed-model quality

runtime-quality-policy.json is the package's guard against a clean-but-coarse reviewed model.

It can express rules such as:

  • maximum files per reviewed inventory source
  • maximum files per behavior unit
  • maximum reviewed inventory sources per behavior unit
  • maximum reviewed inventory behaviors per behavior unit

Fidelity policy governs proof strength. Quality policy governs proof granularity.

runtime-self-check-policy.json governs whether the reviewed model is still explicit enough to trust.

It can express rules such as:

  • whether inventory behaviors must be explicit instead of synthesized
  • whether behavior units must explicitly name reviewed inventory behaviors
  • maximum runtime behaviors one owner test may claim
  • maximum owner tests one behavior unit may diffuse across
  • minimum fidelity for risky source kinds
  • required outcome classes for risky source kinds
  • required evidence for risky source kinds
  • owner-test proof patterns for higher-fidelity claims

Recommended bootstrap strategy

Do not assume every repo should start in strict discovery mode.

The safe sequence is:

  1. start with a reviewed model for one important runtime slice
  2. keep discovery scoped to the slice you actually intend to manage first
  3. use rotops validate and rotops impact as the first CI gate
  4. add rotops doctor so stale installs or missing control artifacts do not fake a clean result
  5. use rotops review to keep discovered candidates visible while repo-local policy is still being shaped
  6. add rotops self-check once the denominator is explicit enough to question its own granularity
  7. add rotops retro once escaped runtime misses should harden the model automatically
  8. move discovery drift to error once the scanner is trustworthy for that repo

That rollout works better than pretending heuristics are already perfect.

Agent operating loop

sequenceDiagram
  participant AI as AI agent
  participant Repo as Repo-local policy
  participant R as rotops
  participant CI as CI gate

  AI->>R: impact --changed <files>
  AI->>R: inventory scan
  R-->>AI: discovered candidates
  AI->>Repo: accept / suppress / defer
  AI->>Repo: update inventory / surfaces / behaviors / owner tests
  AI->>R: review
  AI->>R: self-check
  AI->>R: retro
  AI->>R: validate
  AI->>R: runtime tests
  AI->>R: code coverage (secondary)
  R-->>CI: completeness result
  CI-->>AI: pass or fail
Loading

Recommended operating loop

  1. Run inventory scan to discover candidate runtime sources.
  2. Run review to turn scanner output into an explicit review backlog.
  3. Record suppressions, scope decisions, and scanner noise in runtime-discovery-policy.json.
  4. Accept the reviewed denominator in runtime-inventory.json.
  5. Derive or refine runtime surfaces in runtime-surfaces.json.
  6. Register reviewed behaviors, implemented behavior units, evidence, fidelity, and owner tests in runtime-control-plane.json.
  7. Add self-check and retrospective policy once the reviewed model is stable enough to challenge itself.
  8. Export the machine-readable agent contract for local tooling or CI.
  9. Run doctor, self-check, retro, and validate before the main test suite.

What validate checks

  • artifact schemas are valid
  • principles are consistent across artifacts
  • reviewed inventory sources map to reviewed surfaces
  • reviewed surfaces map to the control plane
  • reviewed runtime files are closed by implemented behavior units
  • owner tests exist and are referenced
  • // runtime-behaviors: ... annotations do not drift
  • fidelity does not regress below policy
  • discovered runtime files are not missing from the reviewed denominator

What self-check and retro add

  • rotops self-check
    • questions the reviewed model itself instead of only checking declared consistency
    • catches implicit behavior mappings and overly broad owner-test claims
    • flags weak proof assumptions for risky source kinds
    • can require outcome/evidence closure for whole source kinds instead of trusting broad behavior wording
  • rotops retro
    • records escaped runtime misses so they cannot disappear after QA or incident response
    • fails when open retrospective entries still exist
    • warns when the same miss pattern keeps recurring and should become a stronger rule
  • rotops doctor
    • verifies that the installed package, lockfile, and local runtime artifacts actually match the reviewed setup
    • catches stale installs before completeness results are trusted

AI operating model

This package is designed for AI coding agents as much as for humans.

The expected loop is:

  1. detect changed runtime files
  2. run rotops doctor
  3. run rotops impact
  4. run rotops review when denominator drift may have changed
  5. compare discovered candidates to the reviewed model
  6. run rotops self-check so the reviewed model itself is challenged for hidden coarseness
  7. accept, suppress, or continue reviewing candidate drift through repo-local policy
  8. update reviewed behaviors and owner tests
  9. record escaped misses in runtime-retrospective.json and run rotops retro when a miss should harden the system
  10. export or refresh the machine-readable runtime agent contract when project paths or process changed
  11. rerun rotops validate

The control system exists so an agent cannot “solve” testing by only adding lines of test code. The agent has to maintain the runtime model too.

How to use this in practice

Use it when you want a repo to answer, concretely:

  • what runtime behavior exists
  • what part of it is managed
  • what part is still unreviewed
  • what proof exists
  • what proof is too weak
  • what changed files affect which implemented behavior units

Do not use it as a cosmetic wrapper around existing folder labels. If the runtime denominator is not reviewed, the system is being used incorrectly.

Bootstrap maturity

Different parts of the package mature at different speeds in different stacks.

  • control plane validation is generally portable
  • impact analysis is generally portable
  • reviewed-model completeness checks are generally portable
  • raw discovery quality depends on repo-local policy and codebase signals

That is why this package exposes policy files instead of hiding heuristics inside the binary.

Public repo readiness

A repo using this package is publishable when:

  • new runtime entrypoints cannot land without denominator review
  • discovered-vs-declared drift fails CI
  • behavior units own tests and annotations
  • proof strength is visible through fidelity policy
  • humans and AI agents read the same artifacts first

AI agents

An AI agent should not treat the control plane as documentation. It should treat it as the runtime source of truth for automated testing changes.

Start here:

Example

The package includes a concrete product example under examples/apppulse.

It also includes a smaller staged-adoption example for a non-JS runtime slice under examples/flutter-session.

That example matters because this package was not invented from a blank framework template. It was extracted from a real product that exposed the exact failure mode this system is meant to prevent.

License

MIT

About

Runtime-obligation-first TestOps control plane for any software project.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors