Skip to content

POST /v1/support/db: PRD caller polls stale schema (asset, monitoring_balance.rootstockBalance) → 6 PG errors/min #192

@TaprootFreak

Description

@TaprootFreak

Symptom

On the production LDS API (lightningdotspacecom/lds-api:latest, deployed at lightning.space, dfxprd Postgres lds-api DB), two queries fail every ~60 s on the database side. They have been observed continuously since the Azure → dfxprd Stage 2.2 cutover (2026-05-26). DEV is silent — only PRD logs them.

Rates (3 h sample, Loki):

Container Errors / 20 min Cadence
lds-api-postgres-1 40 2 distinct queries × ~1/min

The lds-api container itself returns 201 Created to the HTTP caller; only Postgres surfaces the error in its log. Functional impact is limited to whatever the caller does with the empty result — no Customer-visible LDS endpoint is broken — but the noise drowns out real PG errors and indicates the publish/pull contract is out of sync.

Reproduction (verbatim from lds-api-postgres-1 log)

Query A — relation "asset" does not exist:

SELECT * FROM "asset" "asset"
WHERE asset.id >= $1 AND asset.updated >= $2
ORDER BY asset.id DESC LIMIT 10

Postgres response:

ERROR:  relation "asset" does not exist at character 15

Query B — column monitoring_balance.rootstockbalance does not exist:

SELECT
  "monitoring_balance"."id" AS "monitoring_balance_id",
  "monitoring_balance"."created" AS "monitoring_balance_created",
  "monitoring_balance"."updated" AS "monitoring_balance_updated",
  "monitoring_balance"."onchainBalance" AS "monitoring_balance_onchainBalance",
  "monitoring_balance"."lndOnchainBalance" AS "monitoring_balance_lndOnchainBalance",
  "monitoring_balance"."lightningBalance" AS "monitoring_balance_lightningBalance",
  "monitoring_balance"."citreaBalance" AS "monitoring_balance_citreaBalance",
  "monitoring_balance"."customerBalance" AS "monitoring_balance_customerBalance",
  "monitoring_balance"."assetPriceInCHF" AS "monitoring_balance_assetPriceInCHF",
  "monitoring_balance"."ldsBalance" AS "monitoring_balance_ldsBalance",
  "monitoring_balance"."ldsBalanceInCHF" AS "monitoring_balance_ldsBalanceInCHF",
  "monitoring_balance"."assetId",
  monitoring_balance.rootstockBalance               -- unquoted, hand-injected
FROM "monitoring_balance" "monitoring_balance"
WHERE "monitoring_balance"."id" >= $1 AND "monitoring_balance"."updated" >= $2
ORDER BY "monitoring_balance_id" DESC LIMIT 10

Postgres response:

ERROR:  column monitoring_balance.rootstockbalance does not exist at character 852
HINT:  Perhaps you meant to reference the column "monitoring_balance.rootstockBalance".

Root cause

Both queries match the exact shape produced by SupportService.getRawData() in src/subdomains/support/services/support.service.ts:90-105:

this.dataSource
  .createQueryBuilder()
  .from(query.table, query.table)
  .orderBy(`${query.table}.id`, query.sorting)
  .limit(query.maxLine)
  .where(`${query.table}.id >= :id`, { id: query.min })
  .andWhere(`${query.table}.updated >= :updated`, { updated: query.updatedSince });

if (query.select) request.select(query.select);

i.e. POST /v1/support/db (SupportController.getRawData, ADMIN role) accepts an arbitrary { table, select, where, join } from a Bearer-authenticated caller. The endpoint is the data-pull contract that an external consumer (almost certainly DFX-API PRD) polls every 60 s to mirror lds-api rows.

The PRD caller is hardcoded to two stale schema entries:

Caller-supplied Current schema state Removed in
table: "asset" renamed to asset_account migration 1715583133193-setupFrankencoinPay.js (ALTER TABLE "asset" RENAME TO "asset_account") — 2024-05-13
select: ["monitoring_balance.rootstockBalance"] (unquoted) column still exists in DB, but no longer in MonitoringBalanceEntity (src/subdomains/monitoring/entities/monitoring-balance.entity.ts); Rootstock client/service/module/enum deleted PR #124 ("Remove Rootstock integration and add generic EVM balance monitoring") — 2026-02-06

So the bug is dual-sided:

  1. Schema drift — the caller's column list has not been updated since two schema changes (one ~2 years old, one ~3 months old). This is a configuration-side bug, but lds-api is what publicly fails.
  2. Identifier quoting — even if rootstockBalance were still in the schema, the caller's payload is monitoring_balance.rootstockBalance (unquoted), which Postgres case-folds to rootstockbalance. The DDL column is "rootstockBalance" (mixed-case, was created as such in migration 1762853446614-changeMonitoringColumns.js). Postgres' own hint shows it knows the right name: Perhaps you meant to reference the column "monitoring_balance.rootstockBalance". This is the same class of bug PR fix: remove non-existent version column from chainSwaps query #184 fixed for chainSwaps.version.

DEV vs PRD asymmetry

{container_name="lds-api-postgres-1", server="dfxdev"} for the same 3 h: 0 errors. The DEV consumer either polls with the correct (updated) table list, or does not poll dfxdev at all. The mismatch is exclusively in the PRD caller config.

Impact

  • Severity: medium-low. No customer endpoint returns wrong data. The PRD consumer (probably DFX-API → DFX core DB) is silently failing to mirror two tables; downstream reports on asset and monitoring_balance.rootstockBalance go stale.
  • Noise: ~6 PG-side errors per minute, ~8 600 / day. Drowns out real errors in monitoring (60 % of all detected_level=error lines in lds-api-postgres on PRD).
  • Not a deploy-time regression of the Stage 2.2 cutover — the same queries failed on Azure MSSQL too (likely with a different error). The cutover just made the noise visible in Loki because the new dfxprd Postgres logs at ERROR severity.

Suggested fixes (any combination, not exclusive)

  1. Owner: caller side (DFX-API PRD). Update the polling config: drop asset (use asset_account if mirroring is still wanted), drop the monitoring_balance.rootstockBalance from the select list. This is the actual root-cause fix; no lds-api change strictly required.
  2. Owner: this repo — server-side hardening of /v1/support/db. Reject requests that reference a non-existent table or column with 400 before hitting Postgres, so the failure surfaces in the HTTP-response side and stops polluting PG logs. The current .catch(...) re-throws as BadRequestException but only after the SQL has already been parsed and rejected by PG — the PG log line still gets written. Pre-validation against information_schema.tables / information_schema.columns would prevent both the PG error AND give the caller a usable diagnostic.
  3. Owner: this repo — quote identifiers. Even when the user-supplied select list contains mixed-case names, wrap them in "…" so Postgres does not case-fold. Same class of fix as fix: remove non-existent version column from chainSwaps query #184 (chainSwaps version).

(1) alone fixes the immediate noise. (2) + (3) make the endpoint robust against future schema renames so a stale caller fails fast and audibly.

Verification once fixed

ssh dfx01-remote 'curl -s "http://localhost:3101/loki/api/v1/query" \
  --data-urlencode "query=sum by (container_name) (count_over_time({container_name=\"lds-api-postgres-1\", server=\"dfxprd\"} | detected_level=\"error\" [30m]))"'

Expect: 0 (or only sporadic real errors), down from current ~60/30 min.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions