AI Harness Scorecard: devbcn-nextjs

Grade: F (31.8/100) | No meaningful harness. AI output is essentially unaudited.

Repository: /home/runner/work/devbcn-nextjs/devbcn-nextjs
Languages: javascript, typescript
Assessed: 2026-03-11 20:03 UTC
Checks: 10/31 passed

Summary

Category	Weight	Score	Checks
Architectural Documentation	20%	0% [----------]	0/5
Mechanical Constraints	25%	59% [######----]	4/7
Testing & Stability	25%	48% [#####-----]	4/8
Review & Drift Prevention	15%	33% [###-------]	2/6
AI-Specific Safeguards	15%	0% [----------]	0/5

Architectural Documentation (0%)

[FAIL] Architecture Documentation (0/5)

matklad ARCHITECTURE.md guide

Evidence: No architecture documentation found

Remediation: Create ARCHITECTURE.md at repo root following matklad's pattern: short, stable, focused on module boundaries and constraints.

[FAIL] Agent Instructions (0/5)

OpenAI Harness Engineering (2026)

Evidence: No AI agent instruction files found

Remediation: Create CLAUDE.md or AGENTS.md with project context, code style, and constraints so AI agents produce consistent output.

[FAIL] Architecture Decision Records (0/3)

DORA 2025 Report - AI-accessible documentation

Evidence: No Architecture Decision Records found

Remediation: Create docs/adr/ directory with numbered markdown decision records. Use adr-tools or a simple template.

[FAIL] Module Boundary Documentation (0/4)

matklad ARCHITECTURE.md - constraints as absences

Evidence: No module boundary constraints documented

Remediation: Document which modules must NOT depend on each other in ARCHITECTURE.md. Example: 'The fields crate never depends on any other workspace crate.'

[FAIL] API Documentation (0/3)

DORA 2025 - AI-accessible documentation

Evidence: No API documentation generation or spec files found

Remediation: Add doc generation to CI (cargo doc, typedoc, sphinx) or maintain OpenAPI/Swagger specs.

Mechanical Constraints (59%)

[PASS] CI Pipeline (3/3)

DORA 2025 Report

Evidence: CI detected: github, github, github

[PASS] Linter Enforcement (4/4)

OpenAI Harness Engineering - mechanical constraints

Evidence: Blocking linter found in CI: eslint

[PASS] Formatter Enforcement (3/3)

OpenAI Harness Engineering - mechanical constraints

Evidence: Formatter check found in CI: prettier\s+--check

[PASS] Type Safety (3/3)

SlopCodeBench - preventing subtle type errors

Evidence: TypeScript strict mode enabled

[FAIL] Dependency Auditing (0/4)

Blog: security infrastructure reliability

Evidence: No dependency auditing found

Remediation: Add cargo deny/audit, npm audit, pip-audit, or Snyk to CI as a blocking check.

[FAIL] Conventional Commits (0/2)

DORA 2025 - working in small batches

Evidence: No conventional commit enforcement found

Remediation: Add commitlint or equivalent to CI to enforce consistent commit message format.

[FAIL] Unsafe Code Policy (0/3)

Blog: 80% problem in AI-generated code

Evidence: No explicit policy against unsafe code patterns

Remediation: Add unsafe_code = forbid (Rust), security linting (semgrep/bandit), or ESLint rules against dangerous patterns.

Testing & Stability (48%)

[PASS] Test Suite (3/3)

Kent Beck - tests define what correct means

Evidence: Tests present and executed in CI

[PASS] Feature Matrix Testing (3/3)

DORA 2025 - stability through comprehensive testing

Evidence: Matrix/parallel testing strategy found in CI

[PASS] Code Coverage (4/4)

DORA 2025 - stability feedback loops

Evidence: Coverage measurement in CI: coverage.py|pytest-cov|--cov

[FAIL] Mutation Testing (0/4)

SlopCodeBench - code that 'appears correct but is unreliable'

Evidence: No mutation testing found

Remediation: Add cargo-mutants (Rust), Stryker (JS/TS), mutmut (Python), or PIT (Java). Mutation testing catches tests that pass without verifying behavior.

[FAIL] Property-Based Testing (0/3)

Blog: catching edge cases in AI-generated code

Evidence: No property-based testing found

Remediation: Add proptest (Rust), hypothesis (Python), fast-check (JS/TS), or jqwik (Java) for testing invariants with random structured inputs.

[FAIL] Fuzz Testing (0/3)

Blog: 80% problem - catching what AI misses

Evidence: No fuzz testing found

Remediation: Add fuzz targets for parsing-heavy and input-handling code paths.

[FAIL] Contract / Compatibility Tests (0/3)

OpenAI Harness Engineering - mechanical constraints

Evidence: No contract or compatibility tests found

Remediation: Add contract tests that verify external interface stability (golden fixtures, snapshot tests, wire-format checks).

[PASS] Tests Block Merge (2/2)

DORA 2025 - stability metrics

Evidence: All test jobs are blocking: test

Review & Drift Prevention (33%)

[FAIL] Code Review Required (0/4)

OpenAI Harness Engineering - author/reviewer separation

Evidence: Cannot verify branch protection without API access. Run with --github-token or --gitlab-token for full assessment.

Remediation: Enable required reviews in branch protection settings and add CODEOWNERS.

[PASS] Scheduled CI Jobs (3/3)

OpenAI Harness Engineering - garbage collection agents

Evidence: Scheduled CI pipeline found

[FAIL] Stale Documentation Detection (0/2)

OpenAI Harness Engineering - quality drift

Evidence: No stale documentation detection found

Remediation: Add TODO/FIXME scanning, link checking (lychee), or prose linting (vale) to CI.

[FAIL] PR/MR Template (0/2)

DORA 2025 - working in small batches

Evidence: No PR/MR template found

Remediation: Add .github/PULL_REQUEST_TEMPLATE.md or .gitlab/merge_request_templates/Default.md with sections for description, testing, and impact.

[PASS] Automated Code Review (2/2)

OpenAI Harness Engineering - separate authoring and reviewing agents

Evidence: Automated review tool configured: .github/dependabot.yml

[FAIL] Documentation Sync Check (0/2)

OpenAI Harness Engineering - curated knowledge base

Evidence: No documentation sync checks found in CI

Remediation: Add CI jobs that verify related docs stay in sync (e.g. diff AGENTS.md CLAUDE.md, golden fixture checks).

AI-Specific Safeguards (0%)

[FAIL] AI Usage Norms (0/4)

DORA 2025 - clear organizational stance on AI use

Evidence: No AI usage norms documented

Remediation: Document AI usage policies: review expectations for AI-generated code, when manual implementation is required, testing-before-implementation norms.

[FAIL] Small Batch Enforcement (0/3)

DORA 2025 - working in small batches

Evidence: No small batch enforcement found

Remediation: Add PR size checks (Danger, pr-size-labeler) or document size guidelines in CONTRIBUTING.md. Large AI-generated PRs are harder to review.

[FAIL] Design-Before-Code Culture (0/3)

Blog: cognitive offloading guardrails

Evidence: No design-before-code process found

Remediation: Create docs/rfcs/ or docs/designs/ directory. Document a process where significant changes start with a design doc or plan before implementation.

[FAIL] Error Handling Policy (0/3)

Blog: AI agents deleting tests, using expect()

Evidence: No error handling policy found

Remediation: Add clippy lints (unwrap_used, expect_used) for Rust, ESLint rules for JS/TS, or document error handling patterns in agent instructions.

[FAIL] Security-Critical Path Marking (0/2)

Blog: 80% problem in security infrastructure

Evidence: No security-critical path marking found

Remediation: Add CODEOWNERS for sensitive directories, SECURITY.md for vuln reporting, or SAST scanning in CI.

References

Blog: 80% problem - catching what AI misses
Blog: 80% problem in AI-generated code
Blog: 80% problem in security infrastructure
Blog: AI agents deleting tests, using expect()
Blog: catching edge cases in AI-generated code
Blog: cognitive offloading guardrails
Blog: security infrastructure reliability
DORA 2025 - AI-accessible documentation
DORA 2025 - clear organizational stance on AI use
DORA 2025 - stability feedback loops
DORA 2025 - stability metrics
DORA 2025 - stability through comprehensive testing
DORA 2025 - working in small batches
DORA 2025 Report
DORA 2025 Report - AI-accessible documentation
Kent Beck - tests define what correct means
OpenAI Harness Engineering (2026)
OpenAI Harness Engineering - author/reviewer separation
OpenAI Harness Engineering - curated knowledge base
OpenAI Harness Engineering - garbage collection agents
OpenAI Harness Engineering - mechanical constraints
OpenAI Harness Engineering - quality drift
OpenAI Harness Engineering - separate authoring and reviewing agents
SlopCodeBench - code that 'appears correct but is unreliable'
SlopCodeBench - preventing subtle type errors
matklad ARCHITECTURE.md - constraints as absences
matklad ARCHITECTURE.md guide

Generated by ai-harness-scorecard

FilesExpand file tree

scorecard-report.md

Latest commit

History

scorecard-report.md

File metadata and controls

AI Harness Scorecard: devbcn-nextjs

Summary

Architectural Documentation (0%)

[FAIL] Architecture Documentation (0/5)

[FAIL] Agent Instructions (0/5)

[FAIL] Architecture Decision Records (0/3)

[FAIL] Module Boundary Documentation (0/4)

[FAIL] API Documentation (0/3)

Mechanical Constraints (59%)

[PASS] CI Pipeline (3/3)

[PASS] Linter Enforcement (4/4)

[PASS] Formatter Enforcement (3/3)

[PASS] Type Safety (3/3)

[FAIL] Dependency Auditing (0/4)

[FAIL] Conventional Commits (0/2)

[FAIL] Unsafe Code Policy (0/3)

Testing & Stability (48%)

[PASS] Test Suite (3/3)

[PASS] Feature Matrix Testing (3/3)

[PASS] Code Coverage (4/4)

[FAIL] Mutation Testing (0/4)

[FAIL] Property-Based Testing (0/3)

[FAIL] Fuzz Testing (0/3)

[FAIL] Contract / Compatibility Tests (0/3)

[PASS] Tests Block Merge (2/2)

Review & Drift Prevention (33%)

[FAIL] Code Review Required (0/4)

[PASS] Scheduled CI Jobs (3/3)

[FAIL] Stale Documentation Detection (0/2)

[FAIL] PR/MR Template (0/2)

[PASS] Automated Code Review (2/2)

[FAIL] Documentation Sync Check (0/2)

AI-Specific Safeguards (0%)

[FAIL] AI Usage Norms (0/4)

[FAIL] Small Batch Enforcement (0/3)

[FAIL] Design-Before-Code Culture (0/3)

[FAIL] Error Handling Policy (0/3)

[FAIL] Security-Critical Path Marking (0/2)

References