From 4d0f5de5ba8a128cc4b2a811dabb78186aa629a8 Mon Sep 17 00:00:00 2001 From: bean1352 Date: Wed, 3 Jun 2026 10:19:02 +0200 Subject: [PATCH 1/8] add log reference page --- debugging/log-reference.mdx | 421 ++++++++++++++++++++ docs.json | 1 + maintenance-ops/monitoring-and-alerting.mdx | 2 + 3 files changed, 424 insertions(+) create mode 100644 debugging/log-reference.mdx diff --git a/debugging/log-reference.mdx b/debugging/log-reference.mdx new file mode 100644 index 00000000..951f36db --- /dev/null +++ b/debugging/log-reference.mdx @@ -0,0 +1,421 @@ +--- +title: "Log Reference" +description: "Reference for PowerSync Service log lines (Sync & API and Replicator), with field meanings, recovery behavior, and what each event indicates." +sidebarTitle: "Log Reference" +--- + +This reference explains the log lines you will see under **Sync & API logs** and **Replicator logs**. To view them, open the [PowerSync Dashboard](https://dashboard.powersync.com/), select your project and instance, and go to the **Logs** view (or use your own log aggregator if you are self-hosting). It covers what each event indicates, the structured fields it carries, and whether the service recovers on its own or operator action is needed. + +For error code lookups, see [Error Codes Reference](/debugging/error-codes). For how to view, filter, and export logs, see [Monitoring and Alerting](/maintenance-ops/monitoring-and-alerting#instance-logs). + +## How to read these logs + + +**Most error and warn lines describe transient events the service handles on its own.** PowerSync is designed so: + +- Per-session errors are isolated to the affected client. `Sync stream error` and `Streaming sync request failed` tear down that one stream; the client reconnects on its own, and other clients are unaffected. Service-wide failures (such as a pod restart) are the exception and do interrupt all connected clients until the service recovers. +- The replicator wraps source-database operations in retry loops. Transient connection drops, query timeouts, and keep-alive failures are caught, logged, and retried automatically. +- Some `[error]`-level lines (for example `Replication failed.`) trigger a pod restart, but the service supervisor brings the pod back up. Recovery is operator-action-required only when the underlying cause persists across restarts. + +The **Recovery** column on each table below states what happens after the line fires. The values are: + +- **Auto**: the service handles it; no client or operator action is required. +- **Client retries**: a single client is affected; its SDK reconnects. +- **Pod restart**: the pod restarts automatically. Persistent occurrences usually indicate a configuration or source-database problem. +- **Investigate**: the line on its own does not break anything, but a burst of them is signal worth looking into. +- **Operator action**: the service cannot recover without intervention. + + + +**MongoDB is the storage backend for all PowerSync Cloud instances.** Several log lines (for example `Successfully activated storage: mongodb.`, MongoDB-specific checksum errors like [PSYNC_S2403](/debugging/error-codes#psync-s23xx-sync-api-errors-mongodb-storage)) reflect that. If you are self-hosting, you may be on Postgres-backed storage instead, in which case those lines look different. + + + +**Specific log message wording is subject to change between versions.** The structured fields are more stable than the human-readable string. Filter on structured fields (`user_id`, `error_code`, `close_reason`, `replication_lag`) rather than substring-matching the message text where possible. + + + +**Each section below is illustrative, not exhaustive.** New log lines are added regularly, source-specific connectors (Postgres, MongoDB, MySQL, MSSQL, Convex) each emit their own variants, and the service produces additional lines at debug level that are not covered here. If you see a line that is not in this reference, it is most likely safe to read as additional context rather than as something this document forgot to flag. + + +--- + +## Sync & API logs + +Produced by the PowerSync Service API pods. These cover client sync sessions (WebSocket and HTTP), HTTP API requests, checkpoint calculations, and write checkpoint endpoints. + +### Common fields + +A few fields appear across many sync log entries. + +| Field | Meaning | +| -------------- | ------- | +| `user_id` | The authenticated user ID extracted from the JWT for the sync session. | +| `client_id` | The client device ID, assigned per-install by the SDK. Different from `user_id`: one user can have multiple clients. | +| `rid` | Request ID. The same `rid` is on both the start and complete event of a single session, which lets you correlate them. The prefix tells you the endpoint type: `s/` for the WebSocket sync endpoint, `h/` for HTTP API requests. | +| `user_agent` | The SDK and client platform reported by the client, for example `powersync-js/1.43.1 powersync-react-native react-native/0.77 ios/26.5`. | +| `app_metadata` | Custom metadata the client attached via `connect({ params })`. See [Custom Metadata in Sync Logs](/maintenance-ops/monitoring-and-alerting#custom-metadata-in-sync-logs). | +| `duration_ms` | On HTTP access log lines and on the `STREAM /sync/stream` line, the wall-clock duration of the request in milliseconds. For a sync session this is the time between the start and complete events. | +| `pod` | The Kubernetes pod that emitted the line. Useful for distinguishing per-pod behavior in multi-pod deployments. | +| `level` | Log severity: `info`, `warn`, `error`, `fatal`. | +| `timestamp` | ISO-8601 UTC timestamp when the line was emitted. | + +### Sync session lifecycle + +These events describe a client's sync session from start to finish. Searching for a `rid` returns both ends of one session. + +| Event | Level | Recovery | When it fires | +| -------------------------------- | ----- | --------------- | ------------- | +| `Sync stream started` | info | n/a | A client opened a sync connection and authenticated. Both the HTTP streaming endpoint and the WebSocket endpoint emit this. | +| `Sync stream complete` | info | n/a | A sync session ended. The fields on this line summarise what the session did. | +| `Sync stream error` | error | Client retries | The WebSocket sync stream threw and is being torn down. The SDK reconnects on its 5s retry. Other clients are unaffected. | +| `Streaming sync request failed` | error | Client retries | The HTTP streaming sync request threw and is being torn down. Same impact as `Sync stream error` but on the HTTP endpoint. | + +#### Fields on `Sync stream started` + +| Field | Meaning | +| --------------- | ------- | +| `user_id`, `client_id`, `rid`, `user_agent`, `app_metadata` | See common fields. | +| `client_params` | Parameter values the client passed to `connect()`. Subject to size limits, so very large values may be truncated. | + +#### Fields on `Sync stream complete` + +| Field | Meaning | +| --------------------- | ------- | +| `user_id`, `client_id`, `rid`, `user_agent`, `app_metadata` | See common fields. | +| `stream_ms` | Duration the session was open, in milliseconds. Same value as the `duration_ms` on the corresponding `STREAM /sync/stream` access line. | +| `operations_synced` | Total operations sent to this client in the session. | +| `operation_counts` | Per-type breakdown: `{ "put": N, "remove": N, "move": N, "clear": N }`. | +| `data_synced_bytes` | Plaintext (uncompressed) bytes synced to the client. | +| `data_sent_bytes` | Bytes actually sent on the wire. With compression enabled, this is smaller than `data_synced_bytes`. | +| `encoding` | Wire encoding used for the session, for example `permessage-deflate` (WebSocket compression). | +| `large_buckets` | Buckets that contributed an unusually large number of operations in this session, as an object mapping ` -> `. Useful for spotting outlier buckets that dominate a session's payload. | +| `close_reason` | Why the session ended. See [Close reasons](#close-reasons). | + +A short `stream_ms` paired with `operations_synced: 0` typically means the connection was opened and torn down before any sync line was sent. This is normal during reconnect storms or when the client cancels quickly. + +#### Fields on `Sync stream error` and `Streaming sync request failed` + +| Field | Meaning | +| ------------------ | ------- | +| `user_id`, `client_id`, `rid`, `user_agent` | See common fields. | +| `errorData` | Structured error: `{ code, name, description, details?, severity, status }`. The `code` is the PSYNC code, and `details` carries the underlying-driver message (for example, `"Connection to : interrupted due to server monitor timeout"`). | +| `is_service_error` | `true` when the error came from a recognised service code path; `false` for unexpected exceptions. | +| `stack` | Stack trace from the throw site. Useful for cross-referencing against the service source. | + +#### Close reasons + +The `close_reason` field on `Sync stream complete` takes one of these values: + +| Value | Recovery | Meaning | +| ------------------------ | --------------- | ------- | +| `client closing stream` | n/a | The client called `disconnect()` or its TCP connection closed cleanly. Normal for a healthy session that ends because the user closed the app. | +| `service closing stream` | n/a | The server closed the stream for non-error reasons, for example because the Sync Config changed and existing streams need to reconnect. | +| `stream error` | Client retries | An exception was thrown during the stream. See the preceding `Sync stream error` or `Streaming sync request failed` for details. | +| `process shutdown` | n/a | The API pod is shutting down (deploy, scale-down, restart). | +| `unknown` | Investigate | The session ended without a recognised cause. Treat as a bug signal if it appears frequently. | + +### Checkpoint events + +The PowerSync Service evaluates whether each connected client needs to be informed of new data by computing checkpoints. + +| Event | Level | Recovery | When it fires | +| ---------------------- | ----- | -------- | ------------- | +| `New checkpoint: ...` | info | n/a | The first checkpoint sent to a session, or after a parameter or sync-stream membership change forces a full checkpoint. | +| `Updated checkpoint: ...` | info | n/a | An incremental checkpoint (diff). | + +Example payloads: + +``` +New checkpoint: 1152792849 | write: 1585 | buckets: 2 | param_results: 2 ["1#global[]","1#user_bucket[81097]"] +``` + +``` +Updated checkpoint: 1152792918 | write: 376 | buckets: 2 | param_results: 0 | updated: ["1#user_bucket[72594]"] | removed: [] +``` + +| Component | Meaning | +| ---------------- | ------- | +| `1152792849` | The internal checkpoint op_id this session is at. | +| `write: 1585` | The latest write checkpoint applicable to this user, if any. `null` if there are no pending writes. See [Write checkpoints](#http-endpoint-and-write-checkpoint-events). | +| `buckets: 2` | The total number of buckets visible to this user given current parameters. The default cap is 1,000 (see [PSYNC_S2305](/debugging/error-codes#psync-s23xx-sync-api-errors)). | +| `param_results: 2` | The number of distinct parameter results evaluated for this user. Each parameter result corresponds to a row returned by a parameter query. | +| `[]` | The list of bucket descriptors, truncated to 20. Format is `#_[]`. | +| `updated: [...]` (Updated only) | Buckets whose data changed since the previous checkpoint. | +| `removed: [...]` (Updated only) | Buckets that left the user's set since the previous checkpoint. | + +Structured fields on the same line include `checkpoint` (the op_id), `buckets` (count), `parameter_query_results` (count), and on `Updated checkpoint` only, `updated` and `removed` counts as integers. These are easier to filter on than parsing the message body. + +For most steady-state sessions you will see a stream of `Updated checkpoint` lines with empty `updated`/`removed` arrays. Those represent rechecks where nothing changed for this user. + +### HTTP endpoint and write checkpoint events + +| Event | Level | Recovery | When it fires | +| ---------------------------------------------------- | ----- | --------------- | ------------- | +| `STREAM /sync/stream` | info | n/a | RSocket access log line for a WebSocket sync subscription. Emitted on stream close (so the `duration_ms` is the full session length). Carries `client_id`, `rid`, `duration_ms`, `user_id`, `user_agent`. | +| `GET /write-checkpoint2.json?client_id=` | info | n/a | A client requested a write checkpoint number to wait for. See [Write Checkpoints in the PowerSync protocol](/architecture/powersync-protocol#write-checkpoints). Carries `method`, `path`, `route`, `status`, `rid`, `duration_ms`. | +| `Write checkpoint for /: \| ` | info | n/a | Result of a write checkpoint computation. `` is the source database LSN this checkpoint corresponds to (Postgres LSN, MongoDB resume token, MSSQL LSN). | +| `Currently have N active WebSocket connection(s)` | info | n/a | Periodic count of open WebSocket connections on this pod. Per-pod, not per-instance. | +| ` ` with `status: 429, queue_overflow: true` | warn | Client retries | The request queue is full and the request is being rejected with HTTP 429. The client retries on its own. | + +#### Fields on HTTP access lines + +Every HTTP access log line carries these structured fields. They are not visible in the rendered message but are filterable. + +| Field | Meaning | +| ------------- | ------- | +| `method` | HTTP method, for example `GET`, `POST`, `STREAM`. | +| `path` | Full request path, including query string. | +| `route` | The matched route template (with placeholders), useful for grouping. | +| `status` | HTTP response status code. `200` for streaming sync (it streams as a 200 even if it later errors), `401` for auth failures, `429` for queue overflow. | +| `duration_ms` | Wall-clock duration in milliseconds. For streaming endpoints this is the full session length. | +| `rid` | Request ID. `h/` prefix for HTTP, `s/` for sync stream. | + +### Concurrency and queueing + +These appear when the API pod is near its configured concurrent-sync limit. + +| Event | Level | Recovery | When it fires | +| ------------------------------------------------------ | ----- | -------------- | ------------- | +| `Sync concurrency limit reached, waiting for lock` | info | Auto | A new sync session is queued behind currently-active sessions. It will start when a slot frees up. | +| `Got sync lock. Slots available: N` | info | n/a | The queued session acquired a slot. | +| `Releasing sync lock` | info | n/a | The session released its slot, either because it finished or because it errored. | + +Frequent `Sync concurrency limit reached` typically means traffic exceeds the pod's headroom, or the per-pod cap is set conservatively. See [Production Readiness Guide](/maintenance-ops/production-readiness-guide). + +### Replication propagation (observed from the API pod) + +| Event | Level | Recovery | When it fires | +| ------------------------------------------------ | ----- | -------- | ------------- | +| `Waiting for LSN checkpoint: ` | info | n/a | The write checkpoint endpoint is waiting for the replicator to catch up to ``. | +| `Got write checkpoint: : ` | info | n/a | The wait completed; the checkpoint number is returned to the client. | + +A long delay between these two lines means the replicator is behind. If sustained, see [Replication Lag](/maintenance-ops/replication-lag). + +### Common errors in Sync & API logs + +| Code | What it means | Recovery | +| ------------- | ------------- | -------- | +| `PSYNC_S2001` | Generic internal server error wrapper. The real cause is in the nested error. | Depends on cause | +| `PSYNC_S2103` | JWT has expired. The client should be refreshing credentials. | Client retries | +| `PSYNC_S2305` | Too many buckets for the user. Sync for this user is blocked until either the data or the Sync Config changes. See [Troubleshooting: Too Many Buckets](/debugging/troubleshooting#too-many-buckets-psync_s2305). | Operator action (per user) | +| `PSYNC_S2401` | Checksum mismatch. The client deletes the affected bucket and re-fetches. No data loss. | Auto | +| `PSYNC_S2403` | Query timed out while reading checksums (MongoDB storage). The Mongo aggregate that builds the checkpoint exceeded its timeout (50s). | Client retries | + +For the full code list, see [Error Codes Reference](/debugging/error-codes). + +--- + +## Replicator logs + +Produced by the replication pod that pulls changes from your source database (Postgres, MongoDB, MySQL, or MSSQL) and writes them into the PowerSync storage layer. + + +Most replicator log lines are prefixed with the Sync Config identifier in brackets, for example `[powersync_69b93ff7358aa0646ff0dbca_1_e6dc]`. The format is `[powersync___]`. The `group_id` increments when the Sync Config changes (a new version means a new group). Use this prefix to scope log queries to a particular Sync Config iteration. + + +### Startup and lifecycle + +Fired when a replication pod starts, stops, or restarts. Outside of a deploy, these usually mean the pod was killed and restarted by the supervisor. + +| Event | Level | Recovery | When it fires | +| --------------------------------------------------------- | ----- | -------- | ------------- | +| `Starting system` | info | n/a | First line after pod boot. | +| `Starting Storage Engine...` / `Successfully started Storage Engine.` | info | n/a | The bucket storage backend is being connected. The `Successfully activated storage: ` line in between names the backend (typically `mongodb` for Cloud). | +| `Starting Replication Engine...` / `Successfully started Replication Engine.` | info | n/a | The replication subsystem is initialising. | +| `Starting Replicator: ` | info | n/a | One per source database connection (for example `postgresql-postgres`). | +| `Starting Router Engine...` / `Running on port 8080` / `Successfully started Router Engine.` | info | n/a | The replicator's HTTP health and metrics endpoints are up. The replicator does not serve sync traffic. | +| `Loaded sync config` (or `Loaded sync rules`) | info | n/a | The current Sync Config was loaded from storage. Both wordings exist depending on service version. | +| `No sync streams or rules configured - configure via API` | info | n/a | No Sync Config is deployed yet. | +| `Locked replication stream for processing` | info | n/a | This pod has acquired the replication lock for a Sync Config iteration. Only one pod can replicate a given iteration at a time. | +| `Sync rules N has been locked for replication with lock ID ` | info | n/a | Companion log line emitted after `Locked replication stream for processing`. The lock ID is the 8-byte hex identifier this pod is holding. Useful for confirming which pod owns the active lock. | +| `[PSYNC_S1003] Sync rules N have been locked by another process for replication, expiring at .` | info | Auto | Pod tried to acquire the lock but another pod (or a prior incarnation of this pod) still holds it. The lock has a 60s TTL so this typically clears within a minute. Logged at `info` (not `warn`) precisely because it is expected behavior. | +| `Successfully registered Replicator with ReplicationEngine` | info | n/a | Per-connector registration event during startup. | +| `System started` | info | n/a | All subsystems are up. The pod is actively replicating. | + +### Sync Config changes + +| Event | Level | Recovery | When it fires | +| ------------------------------------------------------------------------------ | ----- | -------- | ------------- | +| `Failed to update sync config` | error | Pod restart | Sync Config update validation failed. The pod exits and supervisor restarts it. | +| `Stopping replication job for sync rule iteration: N` | info | n/a | An older Sync Config iteration is being stopped (usually because a new iteration replaced it). | + +### Shutdown sequence + +Triggered by SIGTERM (deploys, scale-down) or by a fatal error. A clean shutdown ends with `Exiting`. Lines you will see in order: + +``` +Stopping Replicator: +Shutting down Replication Engine... +Terminating gracefully ... +[] Stopping replication job for sync rule iteration: N +Shutting down Storage Engine... +Shutting down HTTP server... +HTTP server stopped +Closing N streams. +Running cleanup. +Shutting down Router Engine... +Successfully shut down Router Engine. +Successfully shut down Replication Engine. + connection Managers shutdown completed. +Shutting down connection Managers... +Exiting +Successfully shut down Storage Engine. +``` + +If you see `Replication failed.` or `Fatal replication error` before the shutdown lines, the pod is being restarted by the supervisor after a crash. + +### Initial snapshot + +The first time a Sync Config iteration is set up, every table covered by it is copied over. The exact wording differs by source database. + +| Event | Source DBs | Level | Recovery | When it fires | +| ------------------------------------------------------------------ | ---------- | ----- | -------- | ------------- | +| `To replicate: ./` | Postgres, MongoDB | info | n/a | Logged once per table before the snapshot starts. The fraction is replicated rows over estimated total. | +| `Replicating .
/ - resumable` | Postgres | info | n/a | Table snapshot is starting or resuming. `resumable` means the snapshot is chunked on the table's single-column primary key, so it can pick up after an interruption. | +| `Replicating .
/ - not resumable` | Postgres | info | n/a | The table has a composite primary key or no usable primary key, so an interrupted snapshot has to restart from the beginning. | +| `Replicating .
/ - resuming from > ` | Postgres | info | n/a | An interrupted resumable snapshot is picking up from a stored cursor. | +| `Replicating .
/` | Postgres, MongoDB | info | n/a | Progress update during a snapshot, logged per chunk flush. | +| `Snapshotting .
/` | MSSQL | info | n/a | MSSQL CDC equivalent of `Replicating`. | +| `Skipping
- snapshot already done` (or `Skipping table [
] - snapshot already done.`) | All connectors | info | n/a | Table has already been fully snapshotted for the current iteration. Logged on startup when resuming. | +| `Initial replication already done` | All connectors | info | n/a | All tables in the iteration are already snapshotted. The replicator goes straight to streaming. | + +### Snapshot completion and replication slot management + +| Event | Source DBs | Level | Recovery | When it fires | +| -------------------------------------------------- | ---------- | ----- | -------- | ------------- | +| `Created replication slot ` | Postgres | info | n/a | A new logical replication slot was created. Happens on first replication or after the previous slot was dropped (Sync Config changes, slot health check failure, `pg_drop_replication_slot`, `pg_upgrade`). See [Postgres Maintenance](/configuration/source-db/postgres-maintenance). | +| `Cleaning up Postgres replication slot: ...` | Postgres | info | n/a | A slot is being dropped, usually after Sync Config changes. The next replication run creates a fresh slot. | +| `WAL budget: no limit configured (max_slot_wal_keep_size is unlimited).` | Postgres | info | n/a | Source Postgres has no slot retention limit. Normal. | +| `WAL budget: ...` budget-low | Postgres | warn | Investigate | `max_slot_wal_keep_size` is set and the slot is approaching it. If the slot is invalidated, replication has to restart from snapshot. See [PSYNC_S1146](/debugging/error-codes#psync-s1xxx-replication-issues). | +| `Marking snapshot at ` | MongoDB, MSSQL | info | n/a | The snapshot transaction captured all tables at ``. Streaming will resume from this point. | +| `Resuming snapshot at ` | MongoDB, MSSQL | info | n/a | A previously interrupted snapshot is resuming from a stored LSN. | +| `Flushed snapshot at ` | MongoDB | info | n/a | The MongoDB snapshotter finished writing a collection up to the resume token ``. | + +### Streaming replication (steady state) + +Once the snapshot is done, the replicator switches to streaming. These are the bulk of replicator logs in normal operation. + +| Event | Source DBs | Level | Recovery | When it fires | +| ------------------------------------------------------------------------------ | ---------- | ----- | -------- | ------------- | +| `[] Replicating op N ` | Postgres | info | n/a | Periodic progress log during initial sync streaming, every 100 operations. | +| `[] Flushed N + M + K updates, Xkb in Yms. Last op_id: Z. Replication lag: Ls` | All connectors | info | n/a | A batch of changes was flushed to bucket storage. Most common steady-state line. See fields below. | +| `[] Processed batch of N changes / X bytes in Yms` | MongoDB | info | n/a | A batch of MongoDB change-stream events was consumed from the source. | +| `Resume streaming at / \| Token age: Ns` | MongoDB | info | Investigate if Token age is large | The change stream resumed from a stored token. `Token age` approaching MongoDB's oplog retention window means the slot is at risk. | +| `New collection: .` | MongoDB | info | n/a | A new matching collection was discovered after replication started. It will be snapshotted before streaming resumes. | +| `Enabled postImages on .` | MongoDB | info | n/a | The replicator enabled `changeStreamPreAndPostImages` on a collection that needs it for replica identity. Done once per collection. | + +#### Fields on `Flushed N + M + K updates, Xkb in Yms` + +This is the single most important steady-state line. Components: + +| Component | Meaning | +| -------------------------- | ------- | +| `N + M + K updates` | `N` = bucket data rows written. `M` = bucket parameter rows written. `K` = current_data records updated (source-row mirror used for parameter recalculation). | +| `Xkb` | Uncompressed size of the bucket data written in this flush. | +| `Yms` | Wall-clock time the flush took. Sustained values above ~500ms indicate storage-side pressure. | +| `Last op_id: Z` | The highest internal op_id assigned in this flush. Monotonically increasing across all flushes for the iteration. | +| `Replication lag: Ls` | Seconds between when the oldest uncommitted change in this flush was made on the source database, and when it was flushed to storage. Zero or one second is healthy. Sustained values above 60s warrant investigation. See [Replication Lag](/maintenance-ops/replication-lag). | + +If `Replication lag` is absent, the flush had no source-side change timestamps to derive it from (typically only true for snapshot flushes, never steady-state). + +The same numbers are also available as a single structured `flushed` object on the log line, easier to filter on than parsing the message: + +```json +"flushed": { + "bucket_data_count": 14, + "parameter_data_count": 0, + "current_data_count": 14, + "duration": 5, + "size": 7498, + "replication_lag_seconds": 0 +} +``` + +| Sub-field | Maps to message component | +| -------------------------- | ------------------------- | +| `bucket_data_count` | `N` | +| `parameter_data_count` | `M` | +| `current_data_count` | `K` | +| `duration` | `Yms` | +| `size` | Bytes (the `Xkb` value is `size / 1024`, rounded). | +| `replication_lag_seconds` | `Ls` | + +The `Processed batch of N changes / X bytes in Yms` line (MongoDB) has its own structured fields: `count`, `bytes`, `duration`. These are useful for filtering on slow batches. + +### Schema changes + +| Event | Source DBs | Level | Recovery | When it fires | +| ------------------------------------------------------------------------------------ | ---------- | ----- | -------- | ------------- | +| `Processing schema change for table [.
]` | MySQL | info | n/a | A DDL event from the source binlog is being applied to the replicator's schema view. | +| `Successfully processed N schema change(s).` | MySQL | info | n/a | All DDL events from one binlog chunk were applied. | +| `Table .
has been dropped. Handling schema change...` | MSSQL | info | n/a | The MSSQL CDC poller noticed a tracked table was dropped or its capture instance was removed. Handled by halting replication for that table or re-snapshotting, depending on config. | +| `Unable to retrieve schema changes for capture instance: [].` | MSSQL | warn | Auto | Schema delta query failed. Usually a transient permissions or connection issue. | + +### CDC polling (MSSQL only) + +The MSSQL connector polls CDC capture tables rather than streaming a logical replication slot. + +| Event | Level | Recovery | When it fires | +| -------------------------------------------------------------- | ----- | -------- | ------------- | +| `CDC polling started with interval of Nms...` | info | n/a | Replication is starting. `N` is the polling interval in ms (default 1000). | +| `Polling a maximum of N transactions per polling cycle.` | info | n/a | The per-cycle batch limit. | +| `Polling bounds are -> spanning N transaction(s).` | info | n/a | One polling cycle's range. | +| `CDC polling stopped...` | info | n/a | The polling loop is shutting down. | + +### Replication health + +These are the lines most likely to look alarming. **Almost all of them are non-fatal and the replicator handles them on its own.** + +| Event | Source DBs | Level | Recovery | What it indicates | +| -------------------------------------------------------------- | ---------- | ----- | -------- | ----------------- | +| `Failed to ping connection` | Postgres | warn | Auto | A pooled connection failed its health check. Dropped and replaced. | +| `KeepAlive failed, unable to post to WAL` | Postgres | warn | Auto | The replicator could not write a keep-alive heartbeat to the source WAL. Usually transient (source briefly read-only, brief network blip). Sustained occurrences indicate credentials or permissions. | +| `KeepAlive failed, unable to write an update to the
` | MSSQL | warn | Auto | Same as above for MSSQL. | +| `Could not check RLS access for .
` | Postgres | warn | Investigate | The replicator could not verify whether its user bypasses Row-Level Security. If RLS is in fact enforced, you will see missing data downstream. | +| `Could not query max_slot_wal_keep_size` | Postgres | warn | Investigate | The Postgres user lacks permission to read `pg_settings`. Cosmetic but indicates a permission gap. | +| `Query error, retrying..` | MSSQL | warn | Auto | A CDC query failed and is being retried. | +| `Replication error` | All | warn | Auto | A non-fatal error in the replication loop. The replicator retries. | +| `Replication failed.` | All | error | Pod restart | Top-level error from the replication job loop. The pod is being restarted by the supervisor. Persistent occurrences usually carry a [PSYNC code](/debugging/error-codes#psync-s1xxx-replication-issues) and indicate the underlying cause needs intervention. | +| `Fatal replication error` | All | error | Pod restart | Same as above. | + +### Common errors in Replicator logs + +| Code | What it means | Recovery | +| ------------- | ------------- | -------- | +| `PSYNC_S1003` | Replication stream is locked by another process. Normal for up to 1 minute after a deploy or after a process was killed; the lock has a 60s TTL. | Auto (within 1 minute) | +| `PSYNC_S1004` | JSON nested object depth exceeds 20. Affects replication of that row. | Operator action (data fix) | +| `PSYNC_S1146` | Postgres replication slot was invalidated (source dropped or exceeded `max_slot_wal_keep_size`). Replication restarts from snapshot. | Auto (re-snapshot) | +| `PSYNC_S1344` | MongoDB change stream invalidated. Replication restarts with a new change stream. | Auto | +| `PSYNC_S1345` | MongoDB change stream timed out reading from the source. | Auto (retry) | + +For the full code list and additional connector-specific codes, see [Error Codes Reference](/debugging/error-codes). + +--- + +## Filtering tips + +In the Dashboard log view, the search bar supports both free-text and `alias:value` filters. Full syntax in [Monitoring and Alerting: Search and Filter Logs](/maintenance-ops/monitoring-and-alerting#search-and-filter-logs). + +### Sync & API logs + +``` +user_id: # one user +rid: # both ends of one session +close_reason:"stream error" # errored sessions +error:PSYNC_S2403 # specific error code +user_id: close_reason:"stream error" # errored sessions for one user +"Sync stream complete" operations_synced:0 # sessions that synced nothing +``` + +### Replicator logs + +``` +lag:>=10 # flushes with significant lag +"Replicating" "public.orders" # snapshot progress for a specific table +"Replication error" # non-fatal retries +"Replication failed" # pod restarts +"Created replication slot" # Postgres slot recreation events +"Processing schema change" # MySQL DDL +"System started" # pod restart points +``` diff --git a/docs.json b/docs.json index c81b5787..49a347f0 100644 --- a/docs.json +++ b/docs.json @@ -413,6 +413,7 @@ "pages": [ "debugging/troubleshooting", "debugging/error-codes", + "debugging/log-reference", "tools/diagnostics-client", "maintenance-ops/client-database-diagnostics" ] diff --git a/maintenance-ops/monitoring-and-alerting.mdx b/maintenance-ops/monitoring-and-alerting.mdx index 515481f4..15bbae3a 100644 --- a/maintenance-ops/monitoring-and-alerting.mdx +++ b/maintenance-ops/monitoring-and-alerting.mdx @@ -78,6 +78,8 @@ You can review logs for your PowerSync instance(s) to troubleshoot replication o * **Replicator logs**: Reflect replication activity from your source database to the PowerSync Service. +See the [Log Reference](/debugging/log-reference) for what each log line means and which ones the service recovers from on its own. + **Availability** From 62b6ea613863aebbe57c830cd8c3103c4308ed0d Mon Sep 17 00:00:00 2001 From: Benita Volkmann Date: Thu, 4 Jun 2026 15:17:58 +0200 Subject: [PATCH 2/8] Restructure the page to make it more digestable --- .claude/CLAUDE.md | 2 + .claude/skills/doc-author/SKILL.md | 2 + .../config/vocabularies/PowerSync/accept.txt | 8 + debugging/log-reference.mdx | 619 ++++++++++-------- 4 files changed, 347 insertions(+), 284 deletions(-) diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 1f0888b0..7d3a7969 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -151,6 +151,8 @@ Always use the left column. Never use the right. | Sync Rules | sync rules | | Sync Streams | sync streams | +Capitalize **Service** whenever it refers to the PowerSync Service, including when it stands alone without the "PowerSync" prefix (for example, "the Service recovers on its own" or "depending on Service version"). Lowercase "service" alone is ambiguous and can refer to many things. Leave casing unchanged inside literal log output, code, field names, or identifiers (for example, the `service closing stream` log value or the `is_service_error` field). + ## Mintlify Components ### When to Use What diff --git a/.claude/skills/doc-author/SKILL.md b/.claude/skills/doc-author/SKILL.md index 9dfd5e7b..25cb4011 100644 --- a/.claude/skills/doc-author/SKILL.md +++ b/.claude/skills/doc-author/SKILL.md @@ -81,6 +81,8 @@ The `description` and opening paragraph must not duplicate each other — the op | Sync Rules | sync rules | | Sync Streams | sync streams | +Capitalize **Service** whenever it refers to the PowerSync Service, including when it stands alone without the "PowerSync" prefix (for example, "the Service recovers on its own"). Lowercase "service" alone is ambiguous. Leave casing unchanged inside literal log output, code, field names, or identifiers. + ### Tone - No promotional language: "powerful," "seamless," "breathtaking," "stands as a testament" diff --git a/.github/vale/config/vocabularies/PowerSync/accept.txt b/.github/vale/config/vocabularies/PowerSync/accept.txt index 47081d17..f40f95ee 100644 --- a/.github/vale/config/vocabularies/PowerSync/accept.txt +++ b/.github/vale/config/vocabularies/PowerSync/accept.txt @@ -43,6 +43,7 @@ deserializer deserializers dev dotnet +errored failover geo geospatial @@ -67,6 +68,7 @@ nameservers namespace nullable offline-first +oplog orgs pdf performant @@ -85,6 +87,7 @@ realtime repo repos replication +replicator requeue requeues resync @@ -106,6 +109,7 @@ sidecar signups slidable snapshotted +snapshotter subqueries subquery subnet @@ -157,6 +161,7 @@ DAO DAOs DAU DAUs +DBs ECDSA E2EE ECS @@ -209,7 +214,9 @@ YAML bytea cacert client_auth +close_reason clusterTime +current_data datetime DBPointer inet @@ -218,6 +225,7 @@ json_extract jwks_uri listitem macaddr +op_id plpgsql replicaset subsec diff --git a/debugging/log-reference.mdx b/debugging/log-reference.mdx index 951f36db..cf0f7874 100644 --- a/debugging/log-reference.mdx +++ b/debugging/log-reference.mdx @@ -1,128 +1,258 @@ --- title: "Log Reference" -description: "Reference for PowerSync Service log lines (Sync & API and Replicator), with field meanings, recovery behavior, and what each event indicates." +description: "Reference for PowerSync Service log lines: what the structured fields mean, what each event indicates, and whether the Service recovers on its own." sidebarTitle: "Log Reference" --- -This reference explains the log lines you will see under **Sync & API logs** and **Replicator logs**. To view them, open the [PowerSync Dashboard](https://dashboard.powersync.com/), select your project and instance, and go to the **Logs** view (or use your own log aggregator if you are self-hosting). It covers what each event indicates, the structured fields it carries, and whether the service recovers on its own or operator action is needed. +This reference explains the log lines under **Sync & API logs** and **Replicator logs**, starting with the most common structured fields. To view them, open the [PowerSync Dashboard](https://dashboard.powersync.com/), select your project and instance, and go to the **Logs** view. If you self-host, use your own log aggregator. For error code lookups, see [Error Codes Reference](/debugging/error-codes). For how to view, filter, and export logs, see [Monitoring and Alerting](/maintenance-ops/monitoring-and-alerting#instance-logs). -## How to read these logs +## Field Glossary + +Most log lines carry structured fields alongside their human-readable message. Because message wording can change between Service versions, filter on these fields rather than the message text. The table below covers the fields on most lines, but a line can carry many more (especially with metadata expanded), and each source connector adds its own. Treat fields not listed here as additional context, and see the [Detailed Event Reference](#detailed-event-reference) for fields specific to individual events. + +| Field | Meaning | +| ----- | ------- | +| `level` | Log severity: `info`, `warn`, `error`, or `fatal`. | +| `timestamp` | ISO-8601 UTC time the line was emitted. | +| `rid` | Request ID. The same `rid` appears on both the start and complete line of a session, so you can correlate them. The prefix marks the endpoint type: `s/` for the WebSocket sync stream, `h/` for HTTP API requests. | +| `user_id` | The authenticated user ID extracted from the client's JWT. | +| `client_id` | The client device ID, assigned per install by the SDK. One user can have many clients, so this differs from `user_id`. | +| `user_agent` | The SDK and platform the client reported, for example `powersync-js/1.43.1 powersync-react-native react-native/0.77 ios/26.5`. | +| `app_metadata` | Custom metadata the client attached via `connect({ params })`. See [Custom Metadata in Sync Logs](/maintenance-ops/monitoring-and-alerting#custom-metadata-in-sync-logs). | +| `client_params` | The parameter values the client passed to `connect()`. Large values may be truncated. | +| `pod` | The Kubernetes pod that emitted the line. Distinguishes pods in a multi-pod deployment. | +| `duration_ms` | Wall-clock duration of the request in milliseconds. On streaming access log lines this is the full session length. | +| `stream_ms` | How long a sync session stayed open, in milliseconds. Same value as the `duration_ms` on the session's streaming access log line. | +| `operations_synced` | Total number of operations sent to the client during the session. | +| `operation_counts` | Per-type breakdown of operations: `{ "put": N, "remove": N, "move": N, "clear": N }`. | +| `data_synced_bytes` | Uncompressed (plaintext) bytes synced to the client. | +| `data_sent_bytes` | Bytes actually sent over the wire. Smaller than `data_synced_bytes` when compression is enabled. | +| `close_reason` | Why a sync session ended. See [Close Reasons](#close-reasons). | +| `status` | HTTP response status code. `200` for streaming sync (it streams as `200` even if it later errors), `401` for auth failures, `429` for queue overflow. | +| `error_code` / `errorData` | The structured error on a failed line. `errorData` is `{ code, name, description, details?, severity, status }`, where `code` is the [PSYNC error code](/debugging/error-codes) and `details` carries the underlying-driver message. | +| `lsn` | The source database position a checkpoint or flush corresponds to: a Postgres LSN, a MongoDB resume token, or an MSSQL LSN. | +| `replication_lag` / `replication_lag_seconds` | Seconds between when a change was committed on the source database and when the replicator flushed it to storage. Zero or one second is healthy. See [Replication Lag](/maintenance-ops/replication-lag). | +| `checkpoint` | The internal checkpoint op_id a sync session has reached. | + +## Severity and Recovery + +A line's severity (`level`) does not necessarily tell you whether you need to act. Most `error` and `warn` lines describe transient events that the Service recovers from on its own: + +- Per-session errors stay isolated to the affected client. `Sync stream error` and `Streaming sync request failed` tear down a single stream, and that client reconnects on its own. Service-wide failures such as a pod restart are the exception. They interrupt all connected clients until the Service recovers. +- The replicator retries source-database operations automatically, so transient connection drops, query timeouts, and keep-alive failures are caught, logged, and retried. +- Some `error` lines, such as `Replication failed.`, trigger a pod restart that the supervisor handles automatically. Operator action is needed only when the cause persists across restarts. + +To judge whether a line needs action, each table below includes a **Recovery** column: + +| Recovery | What it means | +| -------- | ------------- | +| Auto | The Service handles it. No client or operator action required. | +| Client retries | A single client is affected. Its SDK reconnects. | +| Pod restart | The pod restarts automatically. Persistent occurrences usually point to a configuration or source-database problem. | +| Investigate | The line alone breaks nothing, but a burst of them is worth looking into. | +| Operator action | The Service cannot recover without intervention. | -**Most error and warn lines describe transient events the service handles on its own.** PowerSync is designed so: +MongoDB is the storage backend for all PowerSync Cloud instances, which some lines reflect (for example `Successfully activated storage: mongodb.` or checksum errors like [PSYNC_S2403](/debugging/error-codes#psync-s23xx-sync-api-errors-mongodb-storage)). If you self-host on Postgres-backed storage, those lines look different. + -- Per-session errors are isolated to the affected client. `Sync stream error` and `Streaming sync request failed` tear down that one stream; the client reconnects on its own, and other clients are unaffected. Service-wide failures (such as a pod restart) are the exception and do interrupt all connected clients until the service recovers. -- The replicator wraps source-database operations in retry loops. Transient connection drops, query timeouts, and keep-alive failures are caught, logged, and retried automatically. -- Some `[error]`-level lines (for example `Replication failed.`) trigger a pod restart, but the service supervisor brings the pod back up. Recovery is operator-action-required only when the underlying cause persists across restarts. +## Sync & API Logs -The **Recovery** column on each table below states what happens after the line fires. The values are: +Produced by the PowerSync Service API pods. These cover client sync sessions (WebSocket and HTTP), HTTP API requests, checkpoint calculations, and write checkpoint endpoints. -- **Auto**: the service handles it; no client or operator action is required. -- **Client retries**: a single client is affected; its SDK reconnects. -- **Pod restart**: the pod restarts automatically. Persistent occurrences usually indicate a configuration or source-database problem. -- **Investigate**: the line on its own does not break anything, but a burst of them is signal worth looking into. -- **Operator action**: the service cannot recover without intervention. - +### Sync Session Lifecycle - -**MongoDB is the storage backend for all PowerSync Cloud instances.** Several log lines (for example `Successfully activated storage: mongodb.`, MongoDB-specific checksum errors like [PSYNC_S2403](/debugging/error-codes#psync-s23xx-sync-api-errors-mongodb-storage)) reflect that. If you are self-hosting, you may be on Postgres-backed storage instead, in which case those lines look different. - +These events trace a client's sync session from start to finish. Searching for a `rid` returns both ends. - -**Specific log message wording is subject to change between versions.** The structured fields are more stable than the human-readable string. Filter on structured fields (`user_id`, `error_code`, `close_reason`, `replication_lag`) rather than substring-matching the message text where possible. - +| Event | Level | Recovery | When it fires | +| ----- | ----- | -------- | ------------- | +| `Sync stream started` | info | n/a | A client opened a sync connection and authenticated. Both the HTTP streaming endpoint and the WebSocket endpoint emit this. | +| `Sync stream complete` | info | n/a | A sync session ended. The fields on this line summarize what the session did (`operations_synced`, `data_synced_bytes`, `close_reason`). | +| `Sync stream error` | error | Client retries | The WebSocket sync stream threw an exception and is being torn down. The SDK reconnects on its 5s retry. Other clients are unaffected. | +| `Streaming sync request failed` | error | Client retries | The HTTP streaming sync request threw an exception and is being torn down. Same impact as `Sync stream error`, but on the HTTP endpoint. | + +A short `stream_ms` paired with `operations_synced: 0` usually means the connection was opened and torn down before any sync line was sent. This is normal when many clients reconnect at once, or when a client cancels quickly. + +For the full field breakdown of each line, see [Sync session field details](#detailed-event-reference) below. + +#### Close Reasons + +The `close_reason` field on `Sync stream complete` takes one of these values: + +| Value | Recovery | Meaning | +| ----- | -------- | ------- | +| `client closing stream` | n/a | The client called `disconnect()` or its TCP connection closed cleanly. Normal for a session that ends because the user closed the app. | +| `service closing stream` | n/a | The server closed the stream for non-error reasons, for example because the Sync Config changed and existing streams need to reconnect. | +| `stream error` | Client retries | An exception was thrown during the stream. See the preceding `Sync stream error` or `Streaming sync request failed` for details. | +| `process shutdown` | n/a | The API pod is shutting down (deploy, scale-down, restart). | +| `unknown` | Investigate | The session ended without a recognized cause. Treat as a likely bug if it appears frequently. | + +### Checkpoint Events + +The PowerSync Service computes checkpoints to decide whether each connected client needs new data. In steady state you mostly see `Updated checkpoint` lines with empty `updated`/`removed` arrays, which are rechecks where nothing changed for that user. + +| Event | Level | Recovery | When it fires | +| ----- | ----- | -------- | ------------- | +| `New checkpoint: ...` | info | n/a | The first checkpoint sent to a session, or after a parameter or sync-stream membership change forces a full checkpoint. | +| `Updated checkpoint: ...` | info | n/a | An incremental checkpoint (diff). | + +``` +New checkpoint: 1152792849 | write: 1585 | buckets: 2 | param_results: 2 ["1#global[]","1#user_bucket[81097]"] +``` + +For the full payload breakdown, see [Checkpoint payload anatomy](#detailed-event-reference) below. + +### Common Errors in Sync & API Logs + +| Code | What it means | Recovery | +| ---- | ------------- | -------- | +| `PSYNC_S2001` | Generic internal server error wrapper. The real cause is in the nested error. | Depends on cause | +| `PSYNC_S2103` | JWT has expired. The client should be refreshing credentials. | Client retries | +| `PSYNC_S2305` | Too many buckets for the user. Sync for this user is blocked until either the data or the Sync Config changes. See [Troubleshooting: Too Many Buckets](/debugging/troubleshooting#too-many-buckets-psync_s2305). | Operator action (per user) | +| `PSYNC_S2401` | Checksum mismatch. The client deletes the affected bucket and re-fetches. No data loss. | Auto | +| `PSYNC_S2403` | Query timed out while reading checksums (MongoDB storage). The Mongo aggregate that builds the checkpoint exceeded its timeout (50s). | Client retries | + +For the full code list, see [Error Codes Reference](/debugging/error-codes). + +## Replicator Logs + +Produced by the replication pod that pulls changes from your source database (Postgres, MongoDB, MySQL, or MSSQL) and writes them into the PowerSync storage layer. -**Each section below is illustrative, not exhaustive.** New log lines are added regularly, source-specific connectors (Postgres, MongoDB, MySQL, MSSQL, Convex) each emit their own variants, and the service produces additional lines at debug level that are not covered here. If you see a line that is not in this reference, it is most likely safe to read as additional context rather than as something this document forgot to flag. +Most replicator lines are prefixed with the Sync Config identifier in brackets, for example `[powersync_69b93ff7358aa0646ff0dbca_1_e6dc]`. The format is `[powersync___]`. The `group_id` increments when the Sync Config changes, so a new version means a new group. Use this prefix to scope log queries to a particular Sync Config iteration. ---- +### Streaming Replication (Steady State) -## Sync & API logs +Once the initial snapshot is done, the replicator streams changes. The `Flushed` line is the most important steady-state line and the bulk of normal replicator logs. -Produced by the PowerSync Service API pods. These cover client sync sessions (WebSocket and HTTP), HTTP API requests, checkpoint calculations, and write checkpoint endpoints. +``` +[] Flushed N + M + K updates, Xkb in Yms. Last op_id: Z. Replication lag: Ls +``` -### Common fields +| Component | Meaning | +| --------- | ------- | +| `N + M + K updates` | `N` = bucket data rows written. `M` = bucket parameter rows written. `K` = current_data records updated (the source-row mirror used for parameter recalculation). | +| `Xkb` | Uncompressed size of the bucket data written in this flush. | +| `Yms` | Wall-clock time the flush took. Sustained values above ~500ms indicate the storage layer is under load. | +| `Last op_id: Z` | The highest internal op_id assigned in this flush. Increases monotonically across all flushes for the iteration. | +| `Replication lag: Ls` | Seconds between when the oldest uncommitted change in this flush was made on the source and when it was flushed to storage. Zero or one second is healthy. Sustained values above 60s warrant investigation. See [Replication Lag](/maintenance-ops/replication-lag). | -A few fields appear across many sync log entries. +The same numbers appear as a structured `flushed` object on the line, easier to filter on than the message text: -| Field | Meaning | -| -------------- | ------- | -| `user_id` | The authenticated user ID extracted from the JWT for the sync session. | -| `client_id` | The client device ID, assigned per-install by the SDK. Different from `user_id`: one user can have multiple clients. | -| `rid` | Request ID. The same `rid` is on both the start and complete event of a single session, which lets you correlate them. The prefix tells you the endpoint type: `s/` for the WebSocket sync endpoint, `h/` for HTTP API requests. | -| `user_agent` | The SDK and client platform reported by the client, for example `powersync-js/1.43.1 powersync-react-native react-native/0.77 ios/26.5`. | -| `app_metadata` | Custom metadata the client attached via `connect({ params })`. See [Custom Metadata in Sync Logs](/maintenance-ops/monitoring-and-alerting#custom-metadata-in-sync-logs). | -| `duration_ms` | On HTTP access log lines and on the `STREAM /sync/stream` line, the wall-clock duration of the request in milliseconds. For a sync session this is the time between the start and complete events. | -| `pod` | The Kubernetes pod that emitted the line. Useful for distinguishing per-pod behavior in multi-pod deployments. | -| `level` | Log severity: `info`, `warn`, `error`, `fatal`. | -| `timestamp` | ISO-8601 UTC timestamp when the line was emitted. | +```json +"flushed": { + "bucket_data_count": 14, + "parameter_data_count": 0, + "current_data_count": 14, + "duration": 5, + "size": 7498, + "replication_lag_seconds": 0 +} +``` -### Sync session lifecycle +If `Replication lag` is absent, the flush had no source-side change timestamps to derive it from. This happens only on snapshot flushes, never steady-state. -These events describe a client's sync session from start to finish. Searching for a `rid` returns both ends of one session. +### Replication Health -| Event | Level | Recovery | When it fires | -| -------------------------------- | ----- | --------------- | ------------- | -| `Sync stream started` | info | n/a | A client opened a sync connection and authenticated. Both the HTTP streaming endpoint and the WebSocket endpoint emit this. | -| `Sync stream complete` | info | n/a | A sync session ended. The fields on this line summarise what the session did. | -| `Sync stream error` | error | Client retries | The WebSocket sync stream threw and is being torn down. The SDK reconnects on its 5s retry. Other clients are unaffected. | -| `Streaming sync request failed` | error | Client retries | The HTTP streaming sync request threw and is being torn down. Same impact as `Sync stream error` but on the HTTP endpoint. | +These lines report problems in the replication loop. Almost all are non-fatal, and the replicator handles them on its own. -#### Fields on `Sync stream started` +| Event | Source DBs | Level | Recovery | What it indicates | +| ----- | ---------- | ----- | -------- | ----------------- | +| `Failed to ping connection` | Postgres | warn | Auto | A pooled connection failed its health check. Dropped and replaced. | +| `KeepAlive failed, unable to post to WAL` | Postgres | warn | Auto | The replicator could not write a keep-alive heartbeat to the source WAL. Usually transient (source briefly read-only, brief network blip). Sustained occurrences indicate credentials or permissions. | +| `KeepAlive failed, unable to write an update to the
` | MSSQL | warn | Auto | Same as above, for MSSQL. | +| `Could not check RLS access for .
` | Postgres | warn | Investigate | The replicator could not verify whether its user bypasses Row-Level Security. If RLS is in fact enforced, you will see missing data downstream. | +| `Could not query max_slot_wal_keep_size` | Postgres | warn | Investigate | The Postgres user lacks permission to read `pg_settings`. Cosmetic, but indicates a permission gap. | +| `Query error, retrying..` | MSSQL | warn | Auto | A CDC query failed and is being retried. | +| `Replication error` | All | warn | Auto | A non-fatal error in the replication loop. The replicator retries. | +| `Replication failed.` | All | error | Pod restart | Top-level error from the replication job loop. The pod is being restarted by the supervisor. Persistent occurrences usually carry a [PSYNC code](/debugging/error-codes#psync-s1xxx-replication-issues) and indicate the underlying cause needs intervention. | +| `Fatal replication error` | All | error | Pod restart | Same as above. | -| Field | Meaning | -| --------------- | ------- | -| `user_id`, `client_id`, `rid`, `user_agent`, `app_metadata` | See common fields. | -| `client_params` | Parameter values the client passed to `connect()`. Subject to size limits, so very large values may be truncated. | +### Common Errors in Replicator Logs -#### Fields on `Sync stream complete` +| Code | What it means | Recovery | +| ---- | ------------- | -------- | +| `PSYNC_S1003` | Replication stream is locked by another process. Normal for up to 1 minute after a deploy or after a process was killed; the lock has a 60s TTL. | Auto (within 1 minute) | +| `PSYNC_S1004` | JSON nested object depth exceeds 20. Affects replication of that row. | Operator action (data fix) | +| `PSYNC_S1146` | Postgres replication slot was invalidated (source dropped it or exceeded `max_slot_wal_keep_size`). Replication restarts from snapshot. | Auto (re-snapshot) | +| `PSYNC_S1344` | MongoDB change stream invalidated. Replication restarts with a new change stream. | Auto | +| `PSYNC_S1345` | MongoDB change stream timed out reading from the source. | Auto (retry) | -| Field | Meaning | -| --------------------- | ------- | -| `user_id`, `client_id`, `rid`, `user_agent`, `app_metadata` | See common fields. | -| `stream_ms` | Duration the session was open, in milliseconds. Same value as the `duration_ms` on the corresponding `STREAM /sync/stream` access line. | -| `operations_synced` | Total operations sent to this client in the session. | -| `operation_counts` | Per-type breakdown: `{ "put": N, "remove": N, "move": N, "clear": N }`. | -| `data_synced_bytes` | Plaintext (uncompressed) bytes synced to the client. | -| `data_sent_bytes` | Bytes actually sent on the wire. With compression enabled, this is smaller than `data_synced_bytes`. | -| `encoding` | Wire encoding used for the session, for example `permessage-deflate` (WebSocket compression). | -| `large_buckets` | Buckets that contributed an unusually large number of operations in this session, as an object mapping ` -> `. Useful for spotting outlier buckets that dominate a session's payload. | -| `close_reason` | Why the session ended. See [Close reasons](#close-reasons). | +For the full code list and additional connector-specific codes, see [Error Codes Reference](/debugging/error-codes). -A short `stream_ms` paired with `operations_synced: 0` typically means the connection was opened and torn down before any sync line was sent. This is normal during reconnect storms or when the client cancels quickly. +## Filtering Tips -#### Fields on `Sync stream error` and `Streaming sync request failed` +In the Dashboard log view, the search bar supports both free-text and `alias:value` filters. Full syntax is in [Monitoring and Alerting: Search and Filter Logs](/maintenance-ops/monitoring-and-alerting#search-and-filter-logs). -| Field | Meaning | -| ------------------ | ------- | -| `user_id`, `client_id`, `rid`, `user_agent` | See common fields. | -| `errorData` | Structured error: `{ code, name, description, details?, severity, status }`. The `code` is the PSYNC code, and `details` carries the underlying-driver message (for example, `"Connection to : interrupted due to server monitor timeout"`). | -| `is_service_error` | `true` when the error came from a recognised service code path; `false` for unexpected exceptions. | -| `stack` | Stack trace from the throw site. Useful for cross-referencing against the service source. | +### Sync & API Logs -#### Close reasons +``` +user_id: # one user +rid: # both ends of one session +close_reason:"stream error" # errored sessions +error:PSYNC_S2403 # specific error code +user_id: close_reason:"stream error" # errored sessions for one user +operations_synced:>=10000 # high-volume sync sessions +``` -The `close_reason` field on `Sync stream complete` takes one of these values: +### Replicator Logs + +``` +lag:>=10 # flushes with significant lag +"Replicating" "public.orders" # snapshot progress for a specific table +"Replication error" # non-fatal retries +"Replication failed" # pod restarts +"Created replication slot" # Postgres slot recreation events +"Processing schema change" # MySQL DDL +"System started" # pod restart points +``` + +## Detailed Event Reference -| Value | Recovery | Meaning | -| ------------------------ | --------------- | ------- | -| `client closing stream` | n/a | The client called `disconnect()` or its TCP connection closed cleanly. Normal for a healthy session that ends because the user closed the app. | -| `service closing stream` | n/a | The server closed the stream for non-error reasons, for example because the Sync Config changed and existing streams need to reconnect. | -| `stream error` | Client retries | An exception was thrown during the stream. See the preceding `Sync stream error` or `Streaming sync request failed` for details. | -| `process shutdown` | n/a | The API pod is shutting down (deploy, scale-down, restart). | -| `unknown` | Investigate | The session ended without a recognised cause. Treat as a bug signal if it appears frequently. | +The sections above cover the most common lines. The events below are lower-frequency or verbose, collapsed here to keep the page scannable but documented in full. -### Checkpoint events +### Sync & API Logs -The PowerSync Service evaluates whether each connected client needs to be informed of new data by computing checkpoints. + -| Event | Level | Recovery | When it fires | -| ---------------------- | ----- | -------- | ------------- | -| `New checkpoint: ...` | info | n/a | The first checkpoint sent to a session, or after a parameter or sync-stream membership change forces a full checkpoint. | -| `Updated checkpoint: ...` | info | n/a | An incremental checkpoint (diff). | + + +#### Fields on `Sync stream started` + +| Field | Meaning | +| ----- | ------- | +| `user_id`, `client_id`, `rid`, `user_agent`, `app_metadata` | See [Field Glossary](#field-glossary). | +| `client_params` | Parameter values the client passed to `connect()`. Subject to size limits, so very large values may be truncated. | + +#### Fields on `Sync stream complete` + +| Field | Meaning | +| ----- | ------- | +| `user_id`, `client_id`, `rid`, `user_agent`, `app_metadata` | See [Field Glossary](#field-glossary). | +| `stream_ms` | Duration the session was open, in milliseconds. Same value as `duration_ms` on the corresponding `STREAM /sync/stream` access line. | +| `operations_synced` | Total operations sent to this client in the session. | +| `operation_counts` | Per-type breakdown: `{ "put": N, "remove": N, "move": N, "clear": N }`. | +| `data_synced_bytes` | Plaintext (uncompressed) bytes synced to the client. | +| `data_sent_bytes` | Bytes actually sent on the wire. With compression enabled, this is smaller than `data_synced_bytes`. | +| `encoding` | Wire encoding used for the session, for example `permessage-deflate` (WebSocket compression). | +| `large_buckets` | Buckets that contributed an unusually large number of operations in this session, as an object mapping ` -> `. Useful for spotting outlier buckets that dominate a session's payload. | +| `close_reason` | Why the session ended. See [Close Reasons](#close-reasons). | + +#### Fields on `Sync stream error` and `Streaming sync request failed` + +| Field | Meaning | +| ----- | ------- | +| `user_id`, `client_id`, `rid`, `user_agent` | See [Field Glossary](#field-glossary). | +| `errorData` | Structured error: `{ code, name, description, details?, severity, status }`. The `code` is the PSYNC code, and `details` carries the underlying-driver message (for example, `"Connection to : interrupted due to server monitor timeout"`). | +| `is_service_error` | `true` when the error came from a recognized Service code path; `false` for unexpected exceptions. | +| `stack` | Stack trace from the throw site. Useful for cross-referencing against the Service source. | + + + + Example payloads: @@ -134,11 +264,11 @@ New checkpoint: 1152792849 | write: 1585 | buckets: 2 | param_results: 2 ["1#glo Updated checkpoint: 1152792918 | write: 376 | buckets: 2 | param_results: 0 | updated: ["1#user_bucket[72594]"] | removed: [] ``` -| Component | Meaning | -| ---------------- | ------- | -| `1152792849` | The internal checkpoint op_id this session is at. | -| `write: 1585` | The latest write checkpoint applicable to this user, if any. `null` if there are no pending writes. See [Write checkpoints](#http-endpoint-and-write-checkpoint-events). | -| `buckets: 2` | The total number of buckets visible to this user given current parameters. The default cap is 1,000 (see [PSYNC_S2305](/debugging/error-codes#psync-s23xx-sync-api-errors)). | +| Component | Meaning | +| --------- | ------- | +| `1152792849` | The internal checkpoint op_id this session is at. | +| `write: 1585` | The latest write checkpoint applicable to this user, if any. `null` if there are no pending writes. See [Write Checkpoints in the PowerSync protocol](/architecture/powersync-protocol#write-checkpoints). | +| `buckets: 2` | The total number of buckets visible to this user given current parameters. The default cap is 1,000 (see [PSYNC_S2305](/debugging/error-codes#psync-s23xx-sync-api-errors)). | | `param_results: 2` | The number of distinct parameter results evaluated for this user. Each parameter result corresponds to a row returned by a parameter query. | | `[]` | The list of bucket descriptors, truncated to 20. Format is `#_[]`. | | `updated: [...]` (Updated only) | Buckets whose data changed since the previous checkpoint. | @@ -146,101 +276,93 @@ Updated checkpoint: 1152792918 | write: 376 | buckets: 2 | param_results: 0 | up Structured fields on the same line include `checkpoint` (the op_id), `buckets` (count), `parameter_query_results` (count), and on `Updated checkpoint` only, `updated` and `removed` counts as integers. These are easier to filter on than parsing the message body. -For most steady-state sessions you will see a stream of `Updated checkpoint` lines with empty `updated`/`removed` arrays. Those represent rechecks where nothing changed for this user. + -### HTTP endpoint and write checkpoint events + -| Event | Level | Recovery | When it fires | -| ---------------------------------------------------- | ----- | --------------- | ------------- | -| `STREAM /sync/stream` | info | n/a | RSocket access log line for a WebSocket sync subscription. Emitted on stream close (so the `duration_ms` is the full session length). Carries `client_id`, `rid`, `duration_ms`, `user_id`, `user_agent`. | -| `GET /write-checkpoint2.json?client_id=` | info | n/a | A client requested a write checkpoint number to wait for. See [Write Checkpoints in the PowerSync protocol](/architecture/powersync-protocol#write-checkpoints). Carries `method`, `path`, `route`, `status`, `rid`, `duration_ms`. | -| `Write checkpoint for /: \| ` | info | n/a | Result of a write checkpoint computation. `` is the source database LSN this checkpoint corresponds to (Postgres LSN, MongoDB resume token, MSSQL LSN). | -| `Currently have N active WebSocket connection(s)` | info | n/a | Periodic count of open WebSocket connections on this pod. Per-pod, not per-instance. | +| Event | Level | Recovery | When it fires | +| ----- | ----- | -------- | ------------- | +| `STREAM /sync/stream` | info | n/a | RSocket access log line for a WebSocket sync subscription. Emitted on stream close, so the `duration_ms` is the full session length. Carries `client_id`, `rid`, `duration_ms`, `user_id`, `user_agent`. | +| `GET /write-checkpoint2.json?client_id=` | info | n/a | A client requested a write checkpoint number to wait for. See [Write Checkpoints in the PowerSync protocol](/architecture/powersync-protocol#write-checkpoints). Carries `method`, `path`, `route`, `status`, `rid`, `duration_ms`. | +| `Write checkpoint for /: \| ` | info | n/a | Result of a write checkpoint computation. `` is the source database LSN this checkpoint corresponds to (Postgres LSN, MongoDB resume token, MSSQL LSN). | +| `Currently have N active WebSocket connection(s)` | info | n/a | Periodic count of open WebSocket connections on this pod. Per-pod, not per-instance. | | ` ` with `status: 429, queue_overflow: true` | warn | Client retries | The request queue is full and the request is being rejected with HTTP 429. The client retries on its own. | -#### Fields on HTTP access lines - Every HTTP access log line carries these structured fields. They are not visible in the rendered message but are filterable. -| Field | Meaning | -| ------------- | ------- | -| `method` | HTTP method, for example `GET`, `POST`, `STREAM`. | -| `path` | Full request path, including query string. | -| `route` | The matched route template (with placeholders), useful for grouping. | -| `status` | HTTP response status code. `200` for streaming sync (it streams as a 200 even if it later errors), `401` for auth failures, `429` for queue overflow. | +| Field | Meaning | +| ----- | ------- | +| `method` | HTTP method, for example `GET`, `POST`, `STREAM`. | +| `path` | Full request path, including query string. | +| `route` | The matched route template (with placeholders), useful for grouping. | +| `status` | HTTP response status code. `200` for streaming sync (it streams as a 200 even if it later errors), `401` for auth failures, `429` for queue overflow. | | `duration_ms` | Wall-clock duration in milliseconds. For streaming endpoints this is the full session length. | -| `rid` | Request ID. `h/` prefix for HTTP, `s/` for sync stream. | +| `rid` | Request ID. `h/` prefix for HTTP, `s/` for sync stream. | -### Concurrency and queueing + -These appear when the API pod is near its configured concurrent-sync limit. + -| Event | Level | Recovery | When it fires | -| ------------------------------------------------------ | ----- | -------------- | ------------- | -| `Sync concurrency limit reached, waiting for lock` | info | Auto | A new sync session is queued behind currently-active sessions. It will start when a slot frees up. | -| `Got sync lock. Slots available: N` | info | n/a | The queued session acquired a slot. | -| `Releasing sync lock` | info | n/a | The session released its slot, either because it finished or because it errored. | +These appear when the API pod is near its configured concurrent-sync limit. -Frequent `Sync concurrency limit reached` typically means traffic exceeds the pod's headroom, or the per-pod cap is set conservatively. See [Production Readiness Guide](/maintenance-ops/production-readiness-guide). +| Event | Level | Recovery | When it fires | +| ----- | ----- | -------- | ------------- | +| `Sync concurrency limit reached, waiting for lock` | info | Auto | A new sync session is queued behind currently-active sessions. It will start when a slot frees up. | +| `Got sync lock. Slots available: N` | info | n/a | The queued session acquired a slot. | +| `Releasing sync lock` | info | n/a | The session released its slot, either because it finished or because it errored. | -### Replication propagation (observed from the API pod) +Frequent `Sync concurrency limit reached` typically means traffic exceeds what the pod can handle, or the per-pod cap is set too low. See [Production Readiness Guide](/maintenance-ops/production-readiness-guide). -| Event | Level | Recovery | When it fires | -| ------------------------------------------------ | ----- | -------- | ------------- | -| `Waiting for LSN checkpoint: ` | info | n/a | The write checkpoint endpoint is waiting for the replicator to catch up to ``. | -| `Got write checkpoint: : ` | info | n/a | The wait completed; the checkpoint number is returned to the client. | + -A long delay between these two lines means the replicator is behind. If sustained, see [Replication Lag](/maintenance-ops/replication-lag). + -### Common errors in Sync & API logs +| Event | Level | Recovery | When it fires | +| ----- | ----- | -------- | ------------- | +| `Waiting for LSN checkpoint: ` | info | n/a | The write checkpoint endpoint is waiting for the replicator to catch up to ``. | +| `Got write checkpoint: : ` | info | n/a | The wait completed; the checkpoint number is returned to the client. | -| Code | What it means | Recovery | -| ------------- | ------------- | -------- | -| `PSYNC_S2001` | Generic internal server error wrapper. The real cause is in the nested error. | Depends on cause | -| `PSYNC_S2103` | JWT has expired. The client should be refreshing credentials. | Client retries | -| `PSYNC_S2305` | Too many buckets for the user. Sync for this user is blocked until either the data or the Sync Config changes. See [Troubleshooting: Too Many Buckets](/debugging/troubleshooting#too-many-buckets-psync_s2305). | Operator action (per user) | -| `PSYNC_S2401` | Checksum mismatch. The client deletes the affected bucket and re-fetches. No data loss. | Auto | -| `PSYNC_S2403` | Query timed out while reading checksums (MongoDB storage). The Mongo aggregate that builds the checkpoint exceeded its timeout (50s). | Client retries | - -For the full code list, see [Error Codes Reference](/debugging/error-codes). +A long delay between these two lines means the replicator is behind. If sustained, see [Replication Lag](/maintenance-ops/replication-lag). ---- + -## Replicator logs + -Produced by the replication pod that pulls changes from your source database (Postgres, MongoDB, MySQL, or MSSQL) and writes them into the PowerSync storage layer. +### Replicator Logs - -Most replicator log lines are prefixed with the Sync Config identifier in brackets, for example `[powersync_69b93ff7358aa0646ff0dbca_1_e6dc]`. The format is `[powersync___]`. The `group_id` increments when the Sync Config changes (a new version means a new group). Use this prefix to scope log queries to a particular Sync Config iteration. - + -### Startup and lifecycle + Fired when a replication pod starts, stops, or restarts. Outside of a deploy, these usually mean the pod was killed and restarted by the supervisor. -| Event | Level | Recovery | When it fires | -| --------------------------------------------------------- | ----- | -------- | ------------- | -| `Starting system` | info | n/a | First line after pod boot. | +| Event | Level | Recovery | When it fires | +| ----- | ----- | -------- | ------------- | +| `Starting system` | info | n/a | First line after the pod starts. | | `Starting Storage Engine...` / `Successfully started Storage Engine.` | info | n/a | The bucket storage backend is being connected. The `Successfully activated storage: ` line in between names the backend (typically `mongodb` for Cloud). | -| `Starting Replication Engine...` / `Successfully started Replication Engine.` | info | n/a | The replication subsystem is initialising. | -| `Starting Replicator: ` | info | n/a | One per source database connection (for example `postgresql-postgres`). | +| `Starting Replication Engine...` / `Successfully started Replication Engine.` | info | n/a | The replication subsystem is initializing. | +| `Starting Replicator: ` | info | n/a | One per source database connection (for example `postgresql-postgres`). | | `Starting Router Engine...` / `Running on port 8080` / `Successfully started Router Engine.` | info | n/a | The replicator's HTTP health and metrics endpoints are up. The replicator does not serve sync traffic. | -| `Loaded sync config` (or `Loaded sync rules`) | info | n/a | The current Sync Config was loaded from storage. Both wordings exist depending on service version. | -| `No sync streams or rules configured - configure via API` | info | n/a | No Sync Config is deployed yet. | -| `Locked replication stream for processing` | info | n/a | This pod has acquired the replication lock for a Sync Config iteration. Only one pod can replicate a given iteration at a time. | -| `Sync rules N has been locked for replication with lock ID ` | info | n/a | Companion log line emitted after `Locked replication stream for processing`. The lock ID is the 8-byte hex identifier this pod is holding. Useful for confirming which pod owns the active lock. | -| `[PSYNC_S1003] Sync rules N have been locked by another process for replication, expiring at .` | info | Auto | Pod tried to acquire the lock but another pod (or a prior incarnation of this pod) still holds it. The lock has a 60s TTL so this typically clears within a minute. Logged at `info` (not `warn`) precisely because it is expected behavior. | -| `Successfully registered Replicator with ReplicationEngine` | info | n/a | Per-connector registration event during startup. | -| `System started` | info | n/a | All subsystems are up. The pod is actively replicating. | +| `Loaded sync config` (or `Loaded sync rules`) | info | n/a | The current Sync Config was loaded from storage. Both wordings exist depending on Service version. | +| `No sync streams or rules configured - configure via API` | info | n/a | No Sync Config is deployed yet. | +| `Locked replication stream for processing` | info | n/a | This pod has acquired the replication lock for a Sync Config iteration. Only one pod can replicate a given iteration at a time. | +| `Sync rules N has been locked for replication with lock ID ` | info | n/a | Companion line emitted after `Locked replication stream for processing`. The lock ID is the 8-byte hex identifier this pod is holding. Useful for confirming which pod owns the active lock. | +| `[PSYNC_S1003] Sync rules N have been locked by another process for replication, expiring at .` | info | Auto | Pod tried to acquire the lock but another pod (or a prior incarnation of this pod) still holds it. The lock has a 60s TTL, so this typically clears within a minute. Logged at `info` (not `warn`) precisely because it is expected behavior. | +| `Successfully registered Replicator with ReplicationEngine` | info | n/a | Per-connector registration event during startup. | +| `System started` | info | n/a | All subsystems are up. The pod is actively replicating. | + + + + -### Sync Config changes +| Event | Level | Recovery | When it fires | +| ----- | ----- | -------- | ------------- | +| `Failed to update sync config` | error | Pod restart | Sync Config update validation failed. The pod exits and the supervisor restarts it. | +| `Stopping replication job for sync rule iteration: N` | info | n/a | An older Sync Config iteration is being stopped, usually because a new iteration replaced it. | -| Event | Level | Recovery | When it fires | -| ------------------------------------------------------------------------------ | ----- | -------- | ------------- | -| `Failed to update sync config` | error | Pod restart | Sync Config update validation failed. The pod exits and supervisor restarts it. | -| `Stopping replication job for sync rule iteration: N` | info | n/a | An older Sync Config iteration is being stopped (usually because a new iteration replaced it). | + -### Shutdown sequence + Triggered by SIGTERM (deploys, scale-down) or by a fatal error. A clean shutdown ends with `Exiting`. Lines you will see in order: @@ -265,157 +387,86 @@ Successfully shut down Storage Engine. If you see `Replication failed.` or `Fatal replication error` before the shutdown lines, the pod is being restarted by the supervisor after a crash. -### Initial snapshot + + + The first time a Sync Config iteration is set up, every table covered by it is copied over. The exact wording differs by source database. -| Event | Source DBs | Level | Recovery | When it fires | -| ------------------------------------------------------------------ | ---------- | ----- | -------- | ------------- | -| `To replicate: .
/` | Postgres, MongoDB | info | n/a | Logged once per table before the snapshot starts. The fraction is replicated rows over estimated total. | -| `Replicating .
/ - resumable` | Postgres | info | n/a | Table snapshot is starting or resuming. `resumable` means the snapshot is chunked on the table's single-column primary key, so it can pick up after an interruption. | -| `Replicating .
/ - not resumable` | Postgres | info | n/a | The table has a composite primary key or no usable primary key, so an interrupted snapshot has to restart from the beginning. | +| Event | Source DBs | Level | Recovery | When it fires | +| ----- | ---------- | ----- | -------- | ------------- | +| `To replicate: .
/` | Postgres, MongoDB | info | n/a | Logged once per table before the snapshot starts. The fraction is replicated rows over estimated total. | +| `Replicating .
/ - resumable` | Postgres | info | n/a | Table snapshot is starting or resuming. `resumable` means the snapshot is chunked on the table's single-column primary key, so it can pick up after an interruption. | +| `Replicating .
/ - not resumable` | Postgres | info | n/a | The table has a composite primary key or no usable primary key, so an interrupted snapshot has to restart from the beginning. | | `Replicating .
/ - resuming from > ` | Postgres | info | n/a | An interrupted resumable snapshot is picking up from a stored cursor. | -| `Replicating .
/` | Postgres, MongoDB | info | n/a | Progress update during a snapshot, logged per chunk flush. | -| `Snapshotting .
/` | MSSQL | info | n/a | MSSQL CDC equivalent of `Replicating`. | +| `Replicating .
/` | Postgres, MongoDB | info | n/a | Progress update during a snapshot, logged per chunk flush. | +| `Snapshotting .
/` | MSSQL | info | n/a | MSSQL CDC equivalent of `Replicating`. | | `Skipping
- snapshot already done` (or `Skipping table [
] - snapshot already done.`) | All connectors | info | n/a | Table has already been fully snapshotted for the current iteration. Logged on startup when resuming. | -| `Initial replication already done` | All connectors | info | n/a | All tables in the iteration are already snapshotted. The replicator goes straight to streaming. | +| `Initial replication already done` | All connectors | info | n/a | All tables in the iteration are already snapshotted. The replicator goes straight to streaming. | + + -### Snapshot completion and replication slot management + -| Event | Source DBs | Level | Recovery | When it fires | -| -------------------------------------------------- | ---------- | ----- | -------- | ------------- | -| `Created replication slot ` | Postgres | info | n/a | A new logical replication slot was created. Happens on first replication or after the previous slot was dropped (Sync Config changes, slot health check failure, `pg_drop_replication_slot`, `pg_upgrade`). See [Postgres Maintenance](/configuration/source-db/postgres-maintenance). | +| Event | Source DBs | Level | Recovery | When it fires | +| ----- | ---------- | ----- | -------- | ------------- | +| `Created replication slot ` | Postgres | info | n/a | A new logical replication slot was created. Happens on first replication or after the previous slot was dropped (Sync Config changes, slot health check failure, `pg_drop_replication_slot`, `pg_upgrade`). See [Postgres Maintenance](/configuration/source-db/postgres-maintenance). | | `Cleaning up Postgres replication slot: ...` | Postgres | info | n/a | A slot is being dropped, usually after Sync Config changes. The next replication run creates a fresh slot. | | `WAL budget: no limit configured (max_slot_wal_keep_size is unlimited).` | Postgres | info | n/a | Source Postgres has no slot retention limit. Normal. | -| `WAL budget: ...` budget-low | Postgres | warn | Investigate | `max_slot_wal_keep_size` is set and the slot is approaching it. If the slot is invalidated, replication has to restart from snapshot. See [PSYNC_S1146](/debugging/error-codes#psync-s1xxx-replication-issues). | -| `Marking snapshot at ` | MongoDB, MSSQL | info | n/a | The snapshot transaction captured all tables at ``. Streaming will resume from this point. | -| `Resuming snapshot at ` | MongoDB, MSSQL | info | n/a | A previously interrupted snapshot is resuming from a stored LSN. | -| `Flushed snapshot at ` | MongoDB | info | n/a | The MongoDB snapshotter finished writing a collection up to the resume token ``. | - -### Streaming replication (steady state) - -Once the snapshot is done, the replicator switches to streaming. These are the bulk of replicator logs in normal operation. - -| Event | Source DBs | Level | Recovery | When it fires | -| ------------------------------------------------------------------------------ | ---------- | ----- | -------- | ------------- | -| `[] Replicating op N ` | Postgres | info | n/a | Periodic progress log during initial sync streaming, every 100 operations. | -| `[] Flushed N + M + K updates, Xkb in Yms. Last op_id: Z. Replication lag: Ls` | All connectors | info | n/a | A batch of changes was flushed to bucket storage. Most common steady-state line. See fields below. | -| `[] Processed batch of N changes / X bytes in Yms` | MongoDB | info | n/a | A batch of MongoDB change-stream events was consumed from the source. | -| `Resume streaming at / \| Token age: Ns` | MongoDB | info | Investigate if Token age is large | The change stream resumed from a stored token. `Token age` approaching MongoDB's oplog retention window means the slot is at risk. | -| `New collection: .` | MongoDB | info | n/a | A new matching collection was discovered after replication started. It will be snapshotted before streaming resumes. | -| `Enabled postImages on .` | MongoDB | info | n/a | The replicator enabled `changeStreamPreAndPostImages` on a collection that needs it for replica identity. Done once per collection. | +| `WAL budget: ...` budget-low | Postgres | warn | Investigate | `max_slot_wal_keep_size` is set and the slot is approaching it. If the slot is invalidated, replication has to restart from snapshot. See [PSYNC_S1146](/debugging/error-codes#psync-s1xxx-replication-issues). | +| `Marking snapshot at ` | MongoDB, MSSQL | info | n/a | The snapshot transaction captured all tables at ``. Streaming will resume from this point. | +| `Resuming snapshot at ` | MongoDB, MSSQL | info | n/a | A previously interrupted snapshot is resuming from a stored LSN. | +| `Flushed snapshot at ` | MongoDB | info | n/a | The MongoDB snapshotter finished writing a collection up to the resume token ``. | -#### Fields on `Flushed N + M + K updates, Xkb in Yms` + -This is the single most important steady-state line. Components: + -| Component | Meaning | -| -------------------------- | ------- | -| `N + M + K updates` | `N` = bucket data rows written. `M` = bucket parameter rows written. `K` = current_data records updated (source-row mirror used for parameter recalculation). | -| `Xkb` | Uncompressed size of the bucket data written in this flush. | -| `Yms` | Wall-clock time the flush took. Sustained values above ~500ms indicate storage-side pressure. | -| `Last op_id: Z` | The highest internal op_id assigned in this flush. Monotonically increasing across all flushes for the iteration. | -| `Replication lag: Ls` | Seconds between when the oldest uncommitted change in this flush was made on the source database, and when it was flushed to storage. Zero or one second is healthy. Sustained values above 60s warrant investigation. See [Replication Lag](/maintenance-ops/replication-lag). | +Beyond the `Flushed` line, the streaming phase emits these connector-specific lines. -If `Replication lag` is absent, the flush had no source-side change timestamps to derive it from (typically only true for snapshot flushes, never steady-state). +| Event | Source DBs | Level | Recovery | When it fires | +| ----- | ---------- | ----- | -------- | ------------- | +| `[] Replicating op N ` | Postgres | info | n/a | Periodic progress log during initial sync streaming, every 100 operations. | +| `[] Processed batch of N changes / X bytes in Yms` | MongoDB | info | n/a | A batch of MongoDB change-stream events was consumed from the source. Carries structured fields `count`, `bytes`, `duration`, useful for filtering on slow batches. | +| `Resume streaming at / \| Token age: Ns` | MongoDB | info | Investigate if Token age is large | The change stream resumed from a stored token. `Token age` approaching MongoDB's oplog retention window means the slot is at risk. | +| `New collection: .` | MongoDB | info | n/a | A new matching collection was discovered after replication started. It will be snapshotted before streaming resumes. | +| `Enabled postImages on .` | MongoDB | info | n/a | The replicator enabled `changeStreamPreAndPostImages` on a collection that needs it for replica identity. Done once per collection. | -The same numbers are also available as a single structured `flushed` object on the log line, easier to filter on than parsing the message: +The `Flushed` line's structured `flushed` object maps to the message components as follows: -```json -"flushed": { - "bucket_data_count": 14, - "parameter_data_count": 0, - "current_data_count": 14, - "duration": 5, - "size": 7498, - "replication_lag_seconds": 0 -} -``` +| Sub-field | Maps to message component | +| --------- | ------------------------- | +| `bucket_data_count` | `N` | +| `parameter_data_count` | `M` | +| `current_data_count` | `K` | +| `duration` | `Yms` | +| `size` | Bytes (the `Xkb` value is `size / 1024`, rounded). | +| `replication_lag_seconds` | `Ls` | -| Sub-field | Maps to message component | -| -------------------------- | ------------------------- | -| `bucket_data_count` | `N` | -| `parameter_data_count` | `M` | -| `current_data_count` | `K` | -| `duration` | `Yms` | -| `size` | Bytes (the `Xkb` value is `size / 1024`, rounded). | -| `replication_lag_seconds` | `Ls` | + -The `Processed batch of N changes / X bytes in Yms` line (MongoDB) has its own structured fields: `count`, `bytes`, `duration`. These are useful for filtering on slow batches. + -### Schema changes +| Event | Source DBs | Level | Recovery | When it fires | +| ----- | ---------- | ----- | -------- | ------------- | +| `Processing schema change for table [.
]` | MySQL | info | n/a | A DDL event from the source binlog is being applied to the replicator's schema view. | +| `Successfully processed N schema change(s).` | MySQL | info | n/a | All DDL events from one binlog chunk were applied. | +| `Table .
has been dropped. Handling schema change...` | MSSQL | info | n/a | The MSSQL CDC poller noticed a tracked table was dropped or its capture instance was removed. Handled by halting replication for that table or re-snapshotting, depending on config. | +| `Unable to retrieve schema changes for capture instance: [].` | MSSQL | warn | Auto | Schema delta query failed. Usually a transient permissions or connection issue. | -| Event | Source DBs | Level | Recovery | When it fires | -| ------------------------------------------------------------------------------------ | ---------- | ----- | -------- | ------------- | -| `Processing schema change for table [.
]` | MySQL | info | n/a | A DDL event from the source binlog is being applied to the replicator's schema view. | -| `Successfully processed N schema change(s).` | MySQL | info | n/a | All DDL events from one binlog chunk were applied. | -| `Table .
has been dropped. Handling schema change...` | MSSQL | info | n/a | The MSSQL CDC poller noticed a tracked table was dropped or its capture instance was removed. Handled by halting replication for that table or re-snapshotting, depending on config. | -| `Unable to retrieve schema changes for capture instance: [].` | MSSQL | warn | Auto | Schema delta query failed. Usually a transient permissions or connection issue. | + -### CDC polling (MSSQL only) + The MSSQL connector polls CDC capture tables rather than streaming a logical replication slot. -| Event | Level | Recovery | When it fires | -| -------------------------------------------------------------- | ----- | -------- | ------------- | -| `CDC polling started with interval of Nms...` | info | n/a | Replication is starting. `N` is the polling interval in ms (default 1000). | -| `Polling a maximum of N transactions per polling cycle.` | info | n/a | The per-cycle batch limit. | +| Event | Level | Recovery | When it fires | +| ----- | ----- | -------- | ------------- | +| `CDC polling started with interval of Nms...` | info | n/a | Replication is starting. `N` is the polling interval in ms (default 1000). | +| `Polling a maximum of N transactions per polling cycle.` | info | n/a | The per-cycle batch limit. | | `Polling bounds are -> spanning N transaction(s).` | info | n/a | One polling cycle's range. | -| `CDC polling stopped...` | info | n/a | The polling loop is shutting down. | - -### Replication health - -These are the lines most likely to look alarming. **Almost all of them are non-fatal and the replicator handles them on its own.** - -| Event | Source DBs | Level | Recovery | What it indicates | -| -------------------------------------------------------------- | ---------- | ----- | -------- | ----------------- | -| `Failed to ping connection` | Postgres | warn | Auto | A pooled connection failed its health check. Dropped and replaced. | -| `KeepAlive failed, unable to post to WAL` | Postgres | warn | Auto | The replicator could not write a keep-alive heartbeat to the source WAL. Usually transient (source briefly read-only, brief network blip). Sustained occurrences indicate credentials or permissions. | -| `KeepAlive failed, unable to write an update to the
` | MSSQL | warn | Auto | Same as above for MSSQL. | -| `Could not check RLS access for .
` | Postgres | warn | Investigate | The replicator could not verify whether its user bypasses Row-Level Security. If RLS is in fact enforced, you will see missing data downstream. | -| `Could not query max_slot_wal_keep_size` | Postgres | warn | Investigate | The Postgres user lacks permission to read `pg_settings`. Cosmetic but indicates a permission gap. | -| `Query error, retrying..` | MSSQL | warn | Auto | A CDC query failed and is being retried. | -| `Replication error` | All | warn | Auto | A non-fatal error in the replication loop. The replicator retries. | -| `Replication failed.` | All | error | Pod restart | Top-level error from the replication job loop. The pod is being restarted by the supervisor. Persistent occurrences usually carry a [PSYNC code](/debugging/error-codes#psync-s1xxx-replication-issues) and indicate the underlying cause needs intervention. | -| `Fatal replication error` | All | error | Pod restart | Same as above. | - -### Common errors in Replicator logs - -| Code | What it means | Recovery | -| ------------- | ------------- | -------- | -| `PSYNC_S1003` | Replication stream is locked by another process. Normal for up to 1 minute after a deploy or after a process was killed; the lock has a 60s TTL. | Auto (within 1 minute) | -| `PSYNC_S1004` | JSON nested object depth exceeds 20. Affects replication of that row. | Operator action (data fix) | -| `PSYNC_S1146` | Postgres replication slot was invalidated (source dropped or exceeded `max_slot_wal_keep_size`). Replication restarts from snapshot. | Auto (re-snapshot) | -| `PSYNC_S1344` | MongoDB change stream invalidated. Replication restarts with a new change stream. | Auto | -| `PSYNC_S1345` | MongoDB change stream timed out reading from the source. | Auto (retry) | - -For the full code list and additional connector-specific codes, see [Error Codes Reference](/debugging/error-codes). - ---- - -## Filtering tips +| `CDC polling stopped...` | info | n/a | The polling loop is shutting down. | -In the Dashboard log view, the search bar supports both free-text and `alias:value` filters. Full syntax in [Monitoring and Alerting: Search and Filter Logs](/maintenance-ops/monitoring-and-alerting#search-and-filter-logs). + -### Sync & API logs - -``` -user_id: # one user -rid: # both ends of one session -close_reason:"stream error" # errored sessions -error:PSYNC_S2403 # specific error code -user_id: close_reason:"stream error" # errored sessions for one user -"Sync stream complete" operations_synced:0 # sessions that synced nothing -``` - -### Replicator logs - -``` -lag:>=10 # flushes with significant lag -"Replicating" "public.orders" # snapshot progress for a specific table -"Replication error" # non-fatal retries -"Replication failed" # pod restarts -"Created replication slot" # Postgres slot recreation events -"Processing schema change" # MySQL DDL -"System started" # pod restart points -``` + From 4c09f98e42aedb1e0bd5ab755074cdd8eeaf77f1 Mon Sep 17 00:00:00 2001 From: Benita Volkmann Date: Thu, 4 Jun 2026 15:25:37 +0200 Subject: [PATCH 3/8] Move example queries --- debugging/log-reference.mdx | 27 --------------------- maintenance-ops/monitoring-and-alerting.mdx | 27 +++++++++++++++++++++ 2 files changed, 27 insertions(+), 27 deletions(-) diff --git a/debugging/log-reference.mdx b/debugging/log-reference.mdx index cf0f7874..f33e6e7c 100644 --- a/debugging/log-reference.mdx +++ b/debugging/log-reference.mdx @@ -183,33 +183,6 @@ These lines report problems in the replication loop. Almost all are non-fatal, a For the full code list and additional connector-specific codes, see [Error Codes Reference](/debugging/error-codes). -## Filtering Tips - -In the Dashboard log view, the search bar supports both free-text and `alias:value` filters. Full syntax is in [Monitoring and Alerting: Search and Filter Logs](/maintenance-ops/monitoring-and-alerting#search-and-filter-logs). - -### Sync & API Logs - -``` -user_id: # one user -rid: # both ends of one session -close_reason:"stream error" # errored sessions -error:PSYNC_S2403 # specific error code -user_id: close_reason:"stream error" # errored sessions for one user -operations_synced:>=10000 # high-volume sync sessions -``` - -### Replicator Logs - -``` -lag:>=10 # flushes with significant lag -"Replicating" "public.orders" # snapshot progress for a specific table -"Replication error" # non-fatal retries -"Replication failed" # pod restarts -"Created replication slot" # Postgres slot recreation events -"Processing schema change" # MySQL DDL -"System started" # pod restart points -``` - ## Detailed Event Reference The sections above cover the most common lines. The events below are lower-frequency or verbose, collapsed here to keep the page scannable but documented in full. diff --git a/maintenance-ops/monitoring-and-alerting.mdx b/maintenance-ops/monitoring-and-alerting.mdx index 15bbae3a..8140ec13 100644 --- a/maintenance-ops/monitoring-and-alerting.mdx +++ b/maintenance-ops/monitoring-and-alerting.mdx @@ -201,6 +201,33 @@ checkpoint sync user_id:123 checkpoint:>=1224 This matches entries containing both `checkpoint` and `sync` in the message, with `user_id` equal to `123` and a `checkpoint` property greater than or equal to `1224`. +#### Example Queries + +Common queries for Sync & API logs: + +``` +user_id: # one user +rid: # both ends of one session +close_reason:"stream error" # errored sessions +error:PSYNC_S2403 # a specific error code +user_id: close_reason:"stream error" # errored sessions for one user +operations_synced:>=10000 # high-volume sync sessions +``` + +Common queries for Replication logs: + +``` +lag:>=10 # flushes with significant lag +"Replicating" "public.orders" # snapshot progress for a specific table +"Replication error" # non-fatal retries +"Replication failed" # pod restarts +"Created replication slot" # Postgres slot recreation events +"Processing schema change" # MySQL DDL +"System started" # pod restart points +``` + +For what each of these events and fields means, see the [Log Reference](/debugging/log-reference). + ### Export Logs Click the **Download** button on the right side of the toolbar to export the currently visible log entries to a CSV file. The export reflects the active log type, time range, log levels, and search query, so you can narrow the result set before downloading. From c6198e924f35fa51890291eaa512a9060f0ddf1e Mon Sep 17 00:00:00 2001 From: Benita Volkmann Date: Thu, 4 Jun 2026 15:35:48 +0200 Subject: [PATCH 4/8] Some more acceptable terms for vale --- .github/vale/config/vocabularies/PowerSync/accept.txt | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/.github/vale/config/vocabularies/PowerSync/accept.txt b/.github/vale/config/vocabularies/PowerSync/accept.txt index f40f95ee..4a30de3c 100644 --- a/.github/vale/config/vocabularies/PowerSync/accept.txt +++ b/.github/vale/config/vocabularies/PowerSync/accept.txt @@ -5,12 +5,15 @@ addons agnostically allowlisting autoscaling +backfill +backfills backpressure baz binlog bitwise bundler bundlers +bursty cardinalities cardinality changetables @@ -166,6 +169,8 @@ ECDSA E2EE ECS EdDSA +GTID +GTIDs HA HS256 IDE From 859980d3d6ab56be6592420defb916f2a8db5be7 Mon Sep 17 00:00:00 2001 From: Benita Volkmann Date: Thu, 4 Jun 2026 15:36:04 +0200 Subject: [PATCH 5/8] Polish --- debugging/log-reference.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/debugging/log-reference.mdx b/debugging/log-reference.mdx index f33e6e7c..1be08f5a 100644 --- a/debugging/log-reference.mdx +++ b/debugging/log-reference.mdx @@ -1,6 +1,6 @@ --- title: "Log Reference" -description: "Reference for PowerSync Service log lines: what the structured fields mean, what each event indicates, and whether the Service recovers on its own." +description: "Reference for reading PowerSync Service logs: what the structured fields mean, what common log lines indicate, and whether each one needs action." sidebarTitle: "Log Reference" --- @@ -126,7 +126,7 @@ Most replicator lines are prefixed with the Sync Config identifier in brackets, ### Streaming Replication (Steady State) -Once the initial snapshot is done, the replicator streams changes. The `Flushed` line is the most important steady-state line and the bulk of normal replicator logs. +Once initial replication is done, the replicator streams changes. The `Flushed` line is the most important steady-state line and the bulk of normal replicator logs. ``` [] Flushed N + M + K updates, Xkb in Yms. Last op_id: Z. Replication lag: Ls @@ -362,7 +362,7 @@ If you see `Replication failed.` or `Fatal replication error` before the shutdow - + The first time a Sync Config iteration is set up, every table covered by it is copied over. The exact wording differs by source database. From 2140cca97b1ff9772ac9bc8243193058f881b0f4 Mon Sep 17 00:00:00 2001 From: Benita Volkmann Date: Thu, 4 Jun 2026 15:36:30 +0200 Subject: [PATCH 6/8] Polish --- configuration/source-db/postgres-maintenance.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configuration/source-db/postgres-maintenance.mdx b/configuration/source-db/postgres-maintenance.mdx index 8c90cea3..443f2049 100644 --- a/configuration/source-db/postgres-maintenance.mdx +++ b/configuration/source-db/postgres-maintenance.mdx @@ -56,7 +56,7 @@ Replace `powersync_1_xxxx` with the actual slot name from the error message. 3. Restart the PowerSync Service. It will create a new replication slot and begin replication from scratch. -If the slot was invalidated during the initial snapshot (before it completed), the PowerSync Service will not automatically retry. You must drop the invalidated slot manually before the service can recover. +If the slot was invalidated during initial replication (before it completed), the PowerSync Service will not automatically retry. You must drop the invalidated slot manually before the Service can recover. If the invalidation reason is `idle_timeout` (Postgres 18+), the slot was invalidated due to inactivity. In this case, increase `idle_replication_slot_timeout` on the source database instead. From aaf98d0221b043453cac92369282a85fe0766b66 Mon Sep 17 00:00:00 2001 From: Benita Volkmann Date: Thu, 4 Jun 2026 15:38:43 +0200 Subject: [PATCH 7/8] Fix typo --- maintenance-ops/implementing-schema-changes.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/maintenance-ops/implementing-schema-changes.mdx b/maintenance-ops/implementing-schema-changes.mdx index d9384c41..4a037f10 100644 --- a/maintenance-ops/implementing-schema-changes.mdx +++ b/maintenance-ops/implementing-schema-changes.mdx @@ -145,7 +145,7 @@ Due to a limitation in the replication process, dropping a collection does not i Renaming a synced collection to a name that _is not included_ in Sync Streams (or legacy Sync Rules) has the same effect as dropping the collection. -Renaming an unsynced collection to a name that is included in your Sync/Streams/Sync Rules triggers an initial snapshot replication. The time required for this process depends on the collection size. +Renaming an unsynced collection to a name that is included in your Sync Streams/Sync Rules triggers an initial snapshot replication. The time required for this process depends on the collection size. Circular renames (e.g., renaming `todos` → `todos_old` → `todos`) are not directly supported. To reprocess the database after such changes, a Sync Streams/Sync Rules update must be deployed. From f10076bac027e53cc6083b9ed483b5ec015db69f Mon Sep 17 00:00:00 2001 From: Benita Volkmann Date: Thu, 4 Jun 2026 15:39:15 +0200 Subject: [PATCH 8/8] Better wording around initial snapshot vs initial replication --- maintenance-ops/replication-lag.mdx | 2 +- maintenance-ops/self-hosting/diagnostics.mdx | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/maintenance-ops/replication-lag.mdx b/maintenance-ops/replication-lag.mdx index 5eff7218..3d3ff8b8 100644 --- a/maintenance-ops/replication-lag.mdx +++ b/maintenance-ops/replication-lag.mdx @@ -60,7 +60,7 @@ The causes below are grouped into ones that apply to any source, and ones that a When you first connect a source database, or when you deploy Sync Config changes that trigger reprocessing, the PowerSync Service replicates the full set of matching rows. During this period: -* Replication lag will be elevated until the initial snapshot completes. +* Replication lag will be elevated until initial replication completes. * The source-side replication buffer (WAL on Postgres, oplog on MongoDB, binlog on MySQL, CDC change tables on SQL Server) grows because the service has not yet acknowledged those changes. This is expected. Plan for it by sizing the relevant retention setting appropriately (see the source-specific sections below) and by coordinating large Sync Config changes during lower-traffic windows. diff --git a/maintenance-ops/self-hosting/diagnostics.mdx b/maintenance-ops/self-hosting/diagnostics.mdx index 02370aaa..25ddc74c 100644 --- a/maintenance-ops/self-hosting/diagnostics.mdx +++ b/maintenance-ops/self-hosting/diagnostics.mdx @@ -54,7 +54,7 @@ Each connection in `active_sync_rules.connections[]` includes: | Field | Description | | --- | --- | | `slot_name` | The name of the Postgres replication slot used by this sync rules version. | -| `initial_replication_done` | Whether the initial snapshot has completed. | +| `initial_replication_done` | Whether initial replication has completed. | | `replication_lag_bytes` | Replication lag in bytes. | | `wal_status` | The WAL status of the replication slot (`reserved`, `extended`, `unreserved`, or `lost`). | | `safe_wal_size` | Remaining WAL budget in bytes before the slot risks invalidation. |