docs: updated documents to match v0.6.0

radim · radim · commit 9d68fa0982c6 · 2026-05-03T22:42:03.000+02:00
It's migration release for new features in v0.7.0
diff --git a/README.md b/README.md
@@ -8,7 +8,7 @@ The PostgreSQL MCP server that doesn't need connection to the production.
 
 ## The problem
 
-LLM/AI coding assistants are very good in writing code/SQL queries. But they are blind. THey don't know your schema, your indexes or your constraints. They might generate a migration that takes an `ACCESS EXCLUSIVE` lock on your busiest table and send your app down.
+LLM/AI coding assistants are very good in writing code/SQL queries. But they are blind. They don't know your schema, your indexes or your constraints. They might generate a migration that takes an `ACCESS EXCLUSIVE` lock on your busiest table and send your app down.
 
 Some PostgreSQL MCP server ask you for the database connection. And to perform the administrative tasks you might need SUPERUSER permission. But that's like asking for problem.
 
@@ -133,6 +133,20 @@ dryrun lint
 
 All commands work offline from the schema file. Each project has its own `dryrun.toml` and `.dryrun/`, there is no global state. Add `.dryrun/` to your `.gitignore`.
 
+Snapshots live in `~/.dryrun/history.db`, keyed by `(project_id, database_id)`. The MCP server reads from the history db first and falls back to `.dryrun/schema.json` for first-run or shared snapshots. After `dryrun snapshot take` it will switch to DB.
+
+### Multi-node: capture activity from replicas
+
+`snapshot take` runs against the primary and writes schema + planner stats. Activity counters (`idx_scan`, `n_dead_tup`, last vacuum) live on each replica, so capture them separately:
+
+```sh
+dryrun --profile primary  snapshot take
+dryrun --profile replica1 snapshot activity --from "$REPLICA1_URL" --label replica1
+dryrun --profile replica2 snapshot activity --from "$REPLICA2_URL" --label replica2
+```
+
+The MCP `compare_nodes` tool then exposes per-node `idx_scan` so you can spot routing imbalances. See [docs/multi-node-stats.md](docs/multi-node-stats.md).
+
 ### Multiple databases per project
 
 `dryrun snapshot take` keys snapshots by `(project_id, database_id)`. The defaults work — `project_id` is your folder name, `database_id` is the actual database name from `current_database()`:
@@ -167,9 +181,9 @@ dryrun --profile billing snapshot diff --latest
 
 See [`docs/dryrun-toml.md`](docs/dryrun-toml.md) for all profile options.
 
-Every DB-related command (`init`, `import`, `probe`, `dump-schema`, `lint`, `drift`, `stats apply`, all `snapshot` subcommands) accepts `--profile` and falls back to the resolved profile's `db_url` and `schema_file` when the corresponding CLI flag is not provider.
+Every DB-related command (`init`, `import`, `probe`, `dump-schema`, `lint`, `drift`, `stats apply`, all `snapshot` subcommands) accepts `--profile` and falls back to the resolved profile's `db_url` and `schema_file` when the corresponding CLI flag is not provided.
 
-> **Note:** the MCP server is currently single-database. Using the default profile. Or the option is to run one `dryrun mcp-serve` process per database. Native multi-database support inside one MCP process is tracked in [#4](https://github.com/boringSQL/dryrun/issues/7).
+> **Note:** the MCP server is currently single-database. Using the default profile. Or the option is to run one `dryrun mcp-serve` process per database. Native multi-database support inside one MCP process is tracked in [#7](https://github.com/boringSQL/dryrun/issues/7).
 
 ## MCP server
 
@@ -200,6 +214,12 @@ See the [Tutorial](TUTORIAL.md) for live database setup, SSE transport, and Clau
 - **[RegreSQL](https://github.com/boringsql/regresql)**, SQL regression testing and **`dryrun`**'s companion tool
 
 
+## Upgrading from 0.5.x
+
+- `dump-schema --stats-only` is removed. Use `dryrun snapshot take` (primary) and `dryrun snapshot activity` (replicas).
+- Snapshot JSON no longer embeds `Table.stats`, `Column.stats`, `Index.stats`, or `node_stats`. Stats are read per-kind from the history db via `HistoryStore::get_annotated`.
+- `check_drift` is now schema-only. It no longer flaps when `reltuples` or `idx_scan` change.
+
 ## License
 
 [BSD 2-Clause License](LICENSE)
diff --git a/TUTORIAL.md b/TUTORIAL.md
@@ -103,7 +103,7 @@ Privileges:
 dryrun init --db "$DATABASE_URL"
 ```
 
-Creates `.dryrun/schema.json` and `.dryrun/history.db`.
+Creates `dryrun.toml`, the `.dryrun/` directory, and `.dryrun/schema.json`. Snapshot history lives in `~/.dryrun/history.db` (shared across projects, keyed by `(project_id, database_id)`).
 
 ### 3. Snapshots and diffing
 
@@ -149,55 +149,64 @@ All tools available including EXPLAIN ANALYZE (runs in rolled-back transactions,
 
 ## Part C: Multi-node workflow
 
-For setups with one master and N replicas serving different query patterns. Stats (seq_scan, idx_scan, reltuples) differ per node. dryrun can aggregate them.
+For setups with one primary and N replicas serving different query patterns. Activity counters (`seq_scan`, `idx_scan`, `n_dead_tup`) differ per node and only live where the queries actually run, on the replicas. dryrun captures schema + planner stats from the primary and activity stats from each replica, then aggregates them.
 
-### 1. Full dump from master
+In v0.6.0 a snapshot is split into three rows in `~/.dryrun/history.db`: `schema`, `planner_stats`, `activity_stats`. `snapshot take` writes the first two from the primary; `snapshot activity` writes one `activity_stats` row per replica, tagged with `--label`.
+
+### 1. Schema + planner stats from the primary
 
 ```sh
-dryrun dump-schema --source "$MASTER_DB" --name "master" -o master.json
+dryrun --profile primary snapshot take
 ```
 
-### 2. Stats-only dumps from replicas
+Refuses to run on a standby. Writes `schema` (DDL) + `planner_stats` (`reltuples`, `relpages`, `pg_statistic`) to history.
 
-No structural schema, just pg_stat_user_tables and pg_stat_user_indexes data:
+### 2. Activity stats from each replica
 
 ```sh
-dryrun dump-schema --source "$REPLICA1_DB" --stats-only --name "replica-1" -o r1-stats.json
-dryrun dump-schema --source "$REPLICA2_DB" --stats-only --name "replica-2" -o r2-stats.json
-dryrun dump-schema --source "$REPLICA3_DB" --stats-only --name "replica-3" -o r3-stats.json
+dryrun --profile replica1 snapshot activity --from "$REPLICA1_URL" --label replica1
+dryrun --profile replica2 snapshot activity --from "$REPLICA2_URL" --label replica2
+dryrun --profile replica3 snapshot activity --from "$REPLICA3_URL" --label replica3
 ```
 
-These are lightweight, good for nightly cron. Example cron entry:
+`--label` is required and identifies the node in `compare_nodes` and `detect`. `snapshot activity` refuses to run on the primary. Activity rows attach to the most recent `schema` row by `schema_ref_hash`; pass `--allow-orphan` to capture before a schema exists.
 
-```sh
-# /etc/cron.d/dryrun-stats
-0 2 * * * app dryrun dump-schema --source "$REPLICA1_DB" --stats-only --name "replica-1" -o /data/dryrun/r1-stats.json
-```
+### 3. Define profiles for repeatable runs
 
-### 3. Import with merged stats
+```toml
+# dryrun.toml
+[project]
+id = "myapp"
 
-```sh
-dryrun import master.json --stats r1-stats.json r2-stats.json r3-stats.json
+[profiles.primary]
+db_url = "${PRIMARY_DATABASE_URL}"
+
+[profiles.replica1]
+db_url = "${REPLICA1_DATABASE_URL}"
+
+[profiles.replica2]
+db_url = "${REPLICA2_DATABASE_URL}"
 ```
 
-The resulting `.dryrun/schema.json` contains the full schema from master plus per-node stats from each replica. Consumers (suggest, validate, lint) automatically use aggregated values:
+### 4. Cron
 
-- **reltuples**: max across nodes
-- **seq_scan / idx_scan**: sum across nodes (reveals which replicas are doing seq scans)
-- **table_size**: max across nodes
+Schema changes rarely; activity counters shift daily. Capture each on its own schedule:
 
-### 4. Verify
+```sh
+# /etc/cron.d/dryrun-stats
+0  2 * * * app dryrun --profile primary  snapshot take
+15 2 * * * app dryrun --profile replica1 snapshot activity --from "$REPLICA1_URL" --label replica1
+15 2 * * * app dryrun --profile replica2 snapshot activity --from "$REPLICA2_URL" --label replica2
+```
+
+### 5. Verify
 
 ```sh
-cat .dryrun/schema.json | python3 -c "
-import sys, json
-d = json.load(sys.stdin)
-print(f'{len(d.get(\"node_stats\", []))} node stats attached')
-for ns in d.get('node_stats', []):
-    print(f'  {ns[\"source\"]}: {len(ns[\"table_stats\"])} tables, {len(ns[\"index_stats\"])} indexes')
-"
+dryrun snapshot list
 ```
 
+Each row prints its `kind` (`schema` / `planner_stats` / `activity_stats`), `node_label` for activity rows, and the `schema_ref_hash` linking activity to schema. The MCP `compare_nodes` tool then exposes per-node `idx_scan` for any table.
+
 ---
 
 ## Part D: MCP setup reference
@@ -285,6 +294,6 @@ GRANT pg_monitor TO your_readonly_user;
 
 **"invalid schema JSON"** - The file must be a valid SchemaSnapshot. If you renamed fields or edited by hand, re-dump from the database.
 
-**Multi-node stats not showing** - Verify `node_stats` array is present in `.dryrun/schema.json`. Each stats file must be a valid NodeStats JSON (from `--stats-only`).
+**Multi-node stats not showing** - Run `dryrun snapshot list` and confirm you see both `schema` rows (from `snapshot take` on the primary) and `activity_stats` rows (from `snapshot activity --label ...` on each replica) sharing the same `schema_ref_hash`. Activity captured before any schema exists needs `--allow-orphan` and won't reattach automatically.
 
 
diff --git a/docs/dryrun-toml.md b/docs/dryrun-toml.md
@@ -65,7 +65,7 @@ A profile is selected from:
 
 CLI flags `--db` and `--schema-file` override the resolved profile's matching fields for that invocation; they don't bypass the profile, so `database_id` and `project_id` are still taken from it. `--profile billing --db $OTHER` connects to `$OTHER` but keys snapshots under billing's `database_id`.
 
-Every DB command (`init`, `import`, `probe`, `dump-schema`, `lint`, `drift`, `stats apply`, all `snapshot` subcommands) accepts `--profile` and falls back to the resolved profile's `db_url` / `schema_file` when the corresponding CLI flag is omitted.
+Every DB command (`init`, `import`, `probe`, `dump-schema`, `lint`, `drift`, all `snapshot` subcommands) accepts `--profile` and falls back to the resolved profile's `db_url` / `schema_file` when the corresponding CLI flag is omitted.
 
 Relative paths in `schema_file` are resolved from the project root (the directory containing `dryrun.toml`). Absolute paths work too.
 
diff --git a/docs/multi-node-stats.md b/docs/multi-node-stats.md