Skip to content

Latest commit

 

History

History
290 lines (163 loc) · 9.25 KB

File metadata and controls

290 lines (163 loc) · 9.25 KB

AI Harness Scorecard: devbcn-nextjs

Grade: F (31.8/100) | No meaningful harness. AI output is essentially unaudited.

  • Repository: /home/runner/work/devbcn-nextjs/devbcn-nextjs
  • Languages: javascript, typescript
  • Assessed: 2026-03-11 20:03 UTC
  • Checks: 10/31 passed

Summary

Category Weight Score Checks
Architectural Documentation 20% 0% [----------] 0/5
Mechanical Constraints 25% 59% [######----] 4/7
Testing & Stability 25% 48% [#####-----] 4/8
Review & Drift Prevention 15% 33% [###-------] 2/6
AI-Specific Safeguards 15% 0% [----------] 0/5

Architectural Documentation (0%)

[FAIL] Architecture Documentation (0/5)

matklad ARCHITECTURE.md guide

Evidence: No architecture documentation found

Remediation: Create ARCHITECTURE.md at repo root following matklad's pattern: short, stable, focused on module boundaries and constraints.

[FAIL] Agent Instructions (0/5)

OpenAI Harness Engineering (2026)

Evidence: No AI agent instruction files found

Remediation: Create CLAUDE.md or AGENTS.md with project context, code style, and constraints so AI agents produce consistent output.

[FAIL] Architecture Decision Records (0/3)

DORA 2025 Report - AI-accessible documentation

Evidence: No Architecture Decision Records found

Remediation: Create docs/adr/ directory with numbered markdown decision records. Use adr-tools or a simple template.

[FAIL] Module Boundary Documentation (0/4)

matklad ARCHITECTURE.md - constraints as absences

Evidence: No module boundary constraints documented

Remediation: Document which modules must NOT depend on each other in ARCHITECTURE.md. Example: 'The fields crate never depends on any other workspace crate.'

[FAIL] API Documentation (0/3)

DORA 2025 - AI-accessible documentation

Evidence: No API documentation generation or spec files found

Remediation: Add doc generation to CI (cargo doc, typedoc, sphinx) or maintain OpenAPI/Swagger specs.

Mechanical Constraints (59%)

[PASS] CI Pipeline (3/3)

DORA 2025 Report

Evidence: CI detected: github, github, github

[PASS] Linter Enforcement (4/4)

OpenAI Harness Engineering - mechanical constraints

Evidence: Blocking linter found in CI: eslint

[PASS] Formatter Enforcement (3/3)

OpenAI Harness Engineering - mechanical constraints

Evidence: Formatter check found in CI: prettier\s+--check

[PASS] Type Safety (3/3)

SlopCodeBench - preventing subtle type errors

Evidence: TypeScript strict mode enabled

[FAIL] Dependency Auditing (0/4)

Blog: security infrastructure reliability

Evidence: No dependency auditing found

Remediation: Add cargo deny/audit, npm audit, pip-audit, or Snyk to CI as a blocking check.

[FAIL] Conventional Commits (0/2)

DORA 2025 - working in small batches

Evidence: No conventional commit enforcement found

Remediation: Add commitlint or equivalent to CI to enforce consistent commit message format.

[FAIL] Unsafe Code Policy (0/3)

Blog: 80% problem in AI-generated code

Evidence: No explicit policy against unsafe code patterns

Remediation: Add unsafe_code = forbid (Rust), security linting (semgrep/bandit), or ESLint rules against dangerous patterns.

Testing & Stability (48%)

[PASS] Test Suite (3/3)

Kent Beck - tests define what correct means

Evidence: Tests present and executed in CI

[PASS] Feature Matrix Testing (3/3)

DORA 2025 - stability through comprehensive testing

Evidence: Matrix/parallel testing strategy found in CI

[PASS] Code Coverage (4/4)

DORA 2025 - stability feedback loops

Evidence: Coverage measurement in CI: coverage.py|pytest-cov|--cov

[FAIL] Mutation Testing (0/4)

SlopCodeBench - code that 'appears correct but is unreliable'

Evidence: No mutation testing found

Remediation: Add cargo-mutants (Rust), Stryker (JS/TS), mutmut (Python), or PIT (Java). Mutation testing catches tests that pass without verifying behavior.

[FAIL] Property-Based Testing (0/3)

Blog: catching edge cases in AI-generated code

Evidence: No property-based testing found

Remediation: Add proptest (Rust), hypothesis (Python), fast-check (JS/TS), or jqwik (Java) for testing invariants with random structured inputs.

[FAIL] Fuzz Testing (0/3)

Blog: 80% problem - catching what AI misses

Evidence: No fuzz testing found

Remediation: Add fuzz targets for parsing-heavy and input-handling code paths.

[FAIL] Contract / Compatibility Tests (0/3)

OpenAI Harness Engineering - mechanical constraints

Evidence: No contract or compatibility tests found

Remediation: Add contract tests that verify external interface stability (golden fixtures, snapshot tests, wire-format checks).

[PASS] Tests Block Merge (2/2)

DORA 2025 - stability metrics

Evidence: All test jobs are blocking: test

Review & Drift Prevention (33%)

[FAIL] Code Review Required (0/4)

OpenAI Harness Engineering - author/reviewer separation

Evidence: Cannot verify branch protection without API access. Run with --github-token or --gitlab-token for full assessment.

Remediation: Enable required reviews in branch protection settings and add CODEOWNERS.

[PASS] Scheduled CI Jobs (3/3)

OpenAI Harness Engineering - garbage collection agents

Evidence: Scheduled CI pipeline found

[FAIL] Stale Documentation Detection (0/2)

OpenAI Harness Engineering - quality drift

Evidence: No stale documentation detection found

Remediation: Add TODO/FIXME scanning, link checking (lychee), or prose linting (vale) to CI.

[FAIL] PR/MR Template (0/2)

DORA 2025 - working in small batches

Evidence: No PR/MR template found

Remediation: Add .github/PULL_REQUEST_TEMPLATE.md or .gitlab/merge_request_templates/Default.md with sections for description, testing, and impact.

[PASS] Automated Code Review (2/2)

OpenAI Harness Engineering - separate authoring and reviewing agents

Evidence: Automated review tool configured: .github/dependabot.yml

[FAIL] Documentation Sync Check (0/2)

OpenAI Harness Engineering - curated knowledge base

Evidence: No documentation sync checks found in CI

Remediation: Add CI jobs that verify related docs stay in sync (e.g. diff AGENTS.md CLAUDE.md, golden fixture checks).

AI-Specific Safeguards (0%)

[FAIL] AI Usage Norms (0/4)

DORA 2025 - clear organizational stance on AI use

Evidence: No AI usage norms documented

Remediation: Document AI usage policies: review expectations for AI-generated code, when manual implementation is required, testing-before-implementation norms.

[FAIL] Small Batch Enforcement (0/3)

DORA 2025 - working in small batches

Evidence: No small batch enforcement found

Remediation: Add PR size checks (Danger, pr-size-labeler) or document size guidelines in CONTRIBUTING.md. Large AI-generated PRs are harder to review.

[FAIL] Design-Before-Code Culture (0/3)

Blog: cognitive offloading guardrails

Evidence: No design-before-code process found

Remediation: Create docs/rfcs/ or docs/designs/ directory. Document a process where significant changes start with a design doc or plan before implementation.

[FAIL] Error Handling Policy (0/3)

Blog: AI agents deleting tests, using expect()

Evidence: No error handling policy found

Remediation: Add clippy lints (unwrap_used, expect_used) for Rust, ESLint rules for JS/TS, or document error handling patterns in agent instructions.

[FAIL] Security-Critical Path Marking (0/2)

Blog: 80% problem in security infrastructure

Evidence: No security-critical path marking found

Remediation: Add CODEOWNERS for sensitive directories, SECURITY.md for vuln reporting, or SAST scanning in CI.

References

  • Blog: 80% problem - catching what AI misses
  • Blog: 80% problem in AI-generated code
  • Blog: 80% problem in security infrastructure
  • Blog: AI agents deleting tests, using expect()
  • Blog: catching edge cases in AI-generated code
  • Blog: cognitive offloading guardrails
  • Blog: security infrastructure reliability
  • DORA 2025 - AI-accessible documentation
  • DORA 2025 - clear organizational stance on AI use
  • DORA 2025 - stability feedback loops
  • DORA 2025 - stability metrics
  • DORA 2025 - stability through comprehensive testing
  • DORA 2025 - working in small batches
  • DORA 2025 Report
  • DORA 2025 Report - AI-accessible documentation
  • Kent Beck - tests define what correct means
  • OpenAI Harness Engineering (2026)
  • OpenAI Harness Engineering - author/reviewer separation
  • OpenAI Harness Engineering - curated knowledge base
  • OpenAI Harness Engineering - garbage collection agents
  • OpenAI Harness Engineering - mechanical constraints
  • OpenAI Harness Engineering - quality drift
  • OpenAI Harness Engineering - separate authoring and reviewing agents
  • SlopCodeBench - code that 'appears correct but is unreliable'
  • SlopCodeBench - preventing subtle type errors
  • matklad ARCHITECTURE.md - constraints as absences
  • matklad ARCHITECTURE.md guide

Generated by ai-harness-scorecard