Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 17 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,22 @@ jobs:
- name: Build
run: pnpm build

- name: Test
run: pnpm test

- name: Check doc freshness
run: pnpm check:docs

- name: Install Playwright browsers
run: pnpm exec playwright install --with-deps chromium

- name: Harness test
run: pnpm harness:test

- name: Upload Playwright report
if: always()
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: |
playwright-report/
test-results/
.harness/
if-no-files-found: ignore
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
node_modules/
dist/
.harness/
playwright-report/
test-results/
.env
.env.local
*.generated.*
19 changes: 8 additions & 11 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,36 +11,33 @@ This is a TypeScript monorepo using pnpm workspaces. The application follows a d
| What | Where |
|------|-------|
| Architecture & dependency rules | [ARCHITECTURE.md](./ARCHITECTURE.md) |
| Testing & harness procedures | [docs/testing.md](./docs/testing.md) |
| Design documents | [docs/design/](./docs/design/) |
| Core beliefs & principles | [docs/beliefs.md](./docs/beliefs.md) |
| Quality tracking | [docs/quality.md](./docs/quality.md) |
| Documentation catalog | [docs/catalog.md](./docs/catalog.md) |
| Active plans | [plans/active/](./plans/active/) |
| Completed plans | [plans/completed/](./plans/completed/) |
| Technical debt | [plans/debt.md](./plans/debt.md) |

## Stack

pnpm · TypeScript · Fastify + React/Vite · PostgreSQL + Drizzle · Zod · Vitest · Biome · GitHub Actions · OTel + Pino · Docker Compose
pnpm · TypeScript · Fastify + React/Vite · PostgreSQL + Drizzle · Zod · Vitest · Playwright · Biome · GitHub Actions · Pino · Docker Compose

## Key Rules

1. **Layered architecture is law.** Each domain follows: Types → Config → Repo → Service → Runtime → UI. Dependencies flow forward only. See [ARCHITECTURE.md](./ARCHITECTURE.md).
2. **Parse at the boundary.** All external data (API inputs, DB rows, env vars) must be validated with Zod schemas before entering the domain.
3. **Structured logging only.** Use the Pino logger from `src/providers/telemetry`. No `console.log`.
4. **Cross-cutting via Providers.** Auth, telemetry, feature flags enter through `src/providers/`. No direct imports of cross-cutting concerns in domain code.
5. **Tests are required.** Every module must have co-located tests. Run `pnpm test` before opening a PR.
6. **Plans live in the repo.** No external docs. If it's not in `plans/` or `docs/`, it doesn't exist.
4. **Cross-cutting via Providers.** Database, telemetry, auth, and feature flags enter through `src/providers/`. No direct imports of cross-cutting concerns in domain code.
5. **Tests are required.** Every module must have co-located tests. Use `pnpm harness:test` for full-stack changes.
6. **Docs live in the repo.** No external docs. If it's not in `docs/`, it doesn't exist.

## Before You Start a Task

1. Read this file (you're here)
2. Check [plans/active/](./plans/active/) for related work
3. Read the relevant domain's types layer first
4. Check [docs/quality.md](./docs/quality.md) for known gaps in the area you're touching
2. Read the relevant domain's types layer first
3. Check [docs/quality.md](./docs/quality.md) and [docs/testing.md](./docs/testing.md) for known gaps and validation procedures in the area you're touching

## When You're Done

1. Run `pnpm lint && pnpm test`
1. Run `pnpm lint && pnpm test`; run `pnpm harness:test` for API, database, UI, or e2e changes
2. Update [docs/quality.md](./docs/quality.md) if you improved coverage or fixed gaps
3. If you made architectural decisions, document them in [docs/design/](./docs/design/)
13 changes: 9 additions & 4 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,13 @@ Types → Config → Repo → Service → Runtime → UI

### Cross-Cutting Concerns (Providers)

Auth, telemetry, feature flags, and shared connectors (database, cache, queue) live in `src/providers/`. Any layer may import from providers — this is the **only** exception to the forward-only rule.
Database, telemetry, auth, feature flags, and shared connectors (cache, queue) live in `src/providers/`. Any layer may import from providers — this is the **only** exception to the forward-only rule.

```
src/providers/
├── database/ # Postgres client and lifecycle
├── telemetry/ # Structured Pino logging; metrics/traces are future work
├── auth/ # Authentication & authorization
├── telemetry/ # Logging (Pino), tracing (OTel), metrics
└── feature-flags/ # Feature flag evaluation
```

Expand All @@ -38,17 +39,21 @@ These rules are enforced by the custom linter at `lints/check-deps.ts`:
2. **No cross-domain imports at lower layers.** `domainA/repo` cannot import `domainB/repo`. Cross-domain communication happens at the `service` layer or above.
3. **No direct cross-cutting imports.** Use `src/providers/`, not raw `pino` or `@opentelemetry/*` imports in domain code.
4. **UI only imports types and client-safe config.** No server-side code in the UI layer.
5. **Co-located tests are required.** Source modules must have adjacent unit or integration tests unless they are approved entrypoints or barrel files.
6. **Structured logging only.** Application code must not use `console.*`; use providers so harness logs stay queryable.

### Adding a New Domain

1. Create `src/domains/<name>/` with all six layer directories
2. Add types and Zod schemas first (types layer is the foundation)
3. Register routes in the runtime layer
4. Update [docs/catalog.md](./docs/catalog.md)
4. Add co-located tests for every source module
5. Add browser e2e coverage when the domain exposes user-visible flows
6. Update [docs/catalog.md](./docs/catalog.md) or domain-specific docs when behavior changes

### File Conventions

- One export per file preferred (agents navigate better)
- Co-locate tests: `foo.ts` → `foo.test.ts`
- Co-locate tests: `foo.ts` → `foo.test.ts`; database tests use `foo.integration.test.ts`
- Max file size: 300 lines (enforced by linter)
- Schemas named `<Thing>Schema`, types inferred as `type Thing = z.infer<typeof ThingSchema>`
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@ Based on principles from [Harness Engineering](https://openai.com/index/harness-

```bash
pnpm install
pnpm dev # Start dev servers
pnpm harness:boot # Start Docker Compose Postgres, API, and web for this worktree
pnpm test # Run tests
pnpm lint # Biome + architectural linting
pnpm check:docs # Verify doc freshness
pnpm harness:test # Boot Docker Compose Postgres, run migrations, seed, test, e2e, and tear down
pnpm harness:down # Stop the local harness and Docker Compose resources
```

## Architecture
Expand All @@ -24,16 +26,15 @@ Each business domain follows a strict layered model:
Types → Config → Repo → Service → Runtime → UI
```

Dependencies flow forward only. Cross-cutting concerns (auth, logging, feature flags) go through `src/providers/`.
Dependencies flow forward only. Cross-cutting concerns (database, logging, auth, feature flags) go through `src/providers/`.

## For Agents

Start with [AGENTS.md](./AGENTS.md) — it's your map to the codebase.
Start with [AGENTS.md](./AGENTS.md) — it's your map to the codebase. Use [docs/testing.md](./docs/testing.md) for the full harness and testing procedure.

## For Humans

Your job is to:
1. Define intent (what should the system do?)
2. Write plans (in `plans/active/`)
3. Review agent output
4. Encode taste into linters and docs
2. Review agent output
3. Encode taste into linters and docs
9 changes: 8 additions & 1 deletion biome.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,13 @@
"lineWidth": 100
},
"files": {
"ignore": ["node_modules", "dist", "*.generated.*"]
"ignore": [
"node_modules",
"dist",
".harness",
"test-results",
"playwright-report",
"*.generated.*"
]
}
}
7 changes: 6 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,12 @@ services:
POSTGRES_USER: app
POSTGRES_PASSWORD: localdev
ports:
- "5432:5432"
- "${POSTGRES_PORT:-5432}:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app -d app"]
interval: 2s
timeout: 2s
retries: 30
volumes:
- pgdata:/var/lib/postgresql/data

Expand Down
27 changes: 10 additions & 17 deletions docs/catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,29 +6,22 @@ All documentation in this repository, indexed for discoverability.

| Document | Path | Status | Last Verified |
|----------|------|--------|---------------|
| Agent instructions | [AGENTS.md](../AGENTS.md) | ✅ Current | YYYY-MM-DD |
| Architecture overview | [ARCHITECTURE.md](../ARCHITECTURE.md) | ✅ Current | YYYY-MM-DD |
| Core beliefs | [docs/beliefs.md](./beliefs.md) | ✅ Current | YYYY-MM-DD |
| Quality tracking | [docs/quality.md](./quality.md) | ✅ Current | YYYY-MM-DD |
| Agent instructions | [AGENTS.md](../AGENTS.md) | Current | 2026-05-03 |
| Architecture overview | [ARCHITECTURE.md](../ARCHITECTURE.md) | Current | 2026-05-03 |
| Core beliefs | [docs/beliefs.md](./beliefs.md) | Current | 2026-05-03 |
| Quality tracking | [docs/quality.md](./quality.md) | Current | 2026-05-03 |
| Testing and harness procedures | [docs/testing.md](./testing.md) | Current | 2026-05-03 |

## Design Documents

| Document | Path | Status | Last Verified |
|----------|------|--------|---------------|
| *(none yet)* | | | |

## Plans

| Plan | Path | Status |
|------|------|--------|
| Technical debt | [plans/debt.md](../plans/debt.md) | Active |

---
| Harness engineering readiness | [docs/design/harness-engineering-readiness.md](./design/harness-engineering-readiness.md) | Current | 2026-05-03 |

### Freshness Rules

- **Current** — Verified to match actual code behavior
- **⚠️ Stale** — May not reflect current implementation
- **Obsolete** — Scheduled for removal or rewrite
- **Current** — Verified to match actual code behavior
- **Stale** — May not reflect current implementation
- **Obsolete** — Scheduled for removal or rewrite

Documents should be re-verified at least every 2 weeks. The doc-gardening CI job flags stale entries.
Documents should be re-verified at least every 2 weeks. The current CI check validates catalog links and `Last Verified` dates.
100 changes: 100 additions & 0 deletions docs/design/harness-engineering-readiness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Harness Engineering Readiness

Last verified: 2026-05-03

This document tracks the repository against the agent-first harness model described in OpenAI's Harness Engineering article: https://openai.com/index/harness-engineering/

The target state is not only "agents can edit code." The target is a repository where agents can understand the product, boot isolated full-stack environments, inspect browser behavior, query runtime signals, run meaningful checks, and turn repeated review feedback into durable guardrails.

## Implemented Harness

| Capability | Evidence | Assessment |
|------------|----------|------------|
| Small agent map | `AGENTS.md` points to architecture, testing, docs, and quality | Good shape for progressive disclosure |
| Layered domain model | `ARCHITECTURE.md` defines Types -> Config -> Repo -> Service -> Runtime -> UI | Strong foundation |
| Mechanical guardrails | `lints/check-deps.ts` runs through `pnpm lint` | Enforces layer shape, dependency direction, co-located tests, structured logging, and no app `console.*` |
| Docker Compose database | `docker-compose.yml`, `src/providers/database/`, `migrations/`, `scripts/db-migrate.ts` | Postgres is the full-stack database layer |
| Deterministic harness | `scripts/harness/` and `pnpm harness:test` | Boots per-worktree DB/API/web, migrates, seeds, tests, runs e2e, tears down, and keeps artifacts |
| Browser e2e | `playwright.config.ts`, `tests/e2e/item-flow.spec.ts` | Validates item create/reload/delete and API failure UI behavior |
| Agent-queryable logs | `pnpm harness:logs` and `.harness/<worktree>/logs/` | API logs include request ID, method, URL, status, and duration |
| CI artifacts | `.github/workflows/ci.yml` | Runs harness tests and uploads Playwright and harness artifacts |
| Repository-local docs | `docs/catalog.md`, `docs/testing.md`, and `docs/quality.md` | Testing procedures and quality state are versioned in repo |

## Remaining Gaps

| Gap | Evidence | Why It Matters For Agents |
|-----|----------|---------------------------|
| No metrics/traces backend | `docs/quality.md` tracks this as a known gap | Logs and Playwright traces cover many failures, but performance analysis would benefit from metrics and service traces |
| No production deployment config | `docs/quality.md` tracks this as a known gap | The harness validates local full-stack behavior, not production release mechanics |

## Implemented Backlog

### 1. Deterministic App Harness

Implemented as `scripts/harness/` with commands for `boot`, `health`, `seed`, `test`, `logs`, and `down`.

Implemented behavior:

- Allocate stable per-worktree ports for API, web, and database.
- Start Postgres, API, and Vite in the background.
- Wait for `/healthz` and the web root before returning.
- Write process IDs, ports, and log paths under `.harness/<worktree>/`.
- Provide teardown that works even after failed runs.

### 2. Browser-Legible E2E Testing

Implemented Playwright tests that exercise the current item flow from the browser:

- Empty state renders.
- Creating an item through the UI persists through API reload.
- Deleting an item removes it from the UI.
- Failed API responses produce visible error states.

CI uploads Playwright HTML reports, traces, screenshots, videos, and harness artifacts on every run.

### 3. Persistent Example Domain

Drizzle and Postgres are wired into the example domain:

- Schema and migration files are present.
- A database provider lives under `src/providers/database/`.
- The repo layer parses database rows with Zod before returning domain values.
- Integration tests run against an isolated Docker Compose test database when `DATABASE_URL` is present.

### 4. Agent-Queryable Runtime Signals

Local development now has an inspectable observability path:

- Structured Pino logs are written to per-harness log files.
- Request IDs and route timing are logged for HTTP requests.
- `pnpm harness:logs -- --query ...` provides lightweight local log filtering.
- Metrics and service traces remain future work behind providers.

### 5. Mechanical Guardrails

Custom checks now enforce:

- Validate required domain layer directories.
- Enforce co-located tests for non-test modules.
- Enforce no `console.*` outside approved scripts.
- Enforce docs catalog `Last verified` dates.
- Enforce that active and completed plan directories exist through tracked `.gitkeep` files.

### 6. Agent Review And Garbage Collection Loops

Recurring cleanup has an initial repo-local command:

- `pnpm quality:audit` reports layer source and test counts from code evidence.
- `pnpm check:docs` checks stale docs and broken catalog links.
- Keep durable cleanup guidance in `docs/quality.md` and focused design docs instead of relying on memory.

## Success Criteria

This repo is a full-stack agent harness because a fresh agent can run one documented command that:

1. Boots an isolated app stack for the current worktree.
2. Applies migrations and seeds known test data.
3. Runs unit, integration, and browser e2e tests.
4. Captures logs, screenshots, videos, and browser traces as local artifacts.
5. Tears down cleanly.
6. Leaves enough evidence for another agent to diagnose any failure without asking a human to reproduce it manually.
11 changes: 5 additions & 6 deletions docs/quality.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,22 @@ Track the health of each domain and architectural layer. Update this when you im

| Domain | Types | Config | Repo | Service | Runtime | UI | Overall | Notes |
|--------|-------|--------|------|---------|---------|----|---------|----|
| example | B | B | C | C | C | D | C | Scaffold only, needs real implementation |
| example | B | B | B | B | B | B | B | Docker Compose-backed full-stack example with unit, integration, and e2e coverage |

## Cross-Cutting

| Provider | Grade | Notes |
|----------|-------|-------|
| auth | D | Placeholder |
| telemetry | B | Pino + OTel wired up |
| database | B | Postgres provider wired through Docker Compose harness |
| telemetry | B | Pino logger, request IDs, route timings, and per-harness queryable logs are wired; metrics/traces are future work |
| feature-flags | D | Placeholder |

## Known Gaps

- [ ] No integration tests yet
- [ ] Database migrations not wired up
- [ ] CI pipeline incomplete
- [ ] Telemetry does not yet include a metrics/traces backend beyond structured logs and Playwright traces
- [ ] No production deployment config

---

*Last updated: YYYY-MM-DD*
*Last updated: 2026-05-03*
Loading
Loading