Skip to content

Add query labels via sqlcommenter comment parsing#67

Open
iskakaushik wants to merge 11 commits intomainfrom
query-labels-real
Open

Add query labels via sqlcommenter comment parsing#67
iskakaushik wants to merge 11 commits intomainfrom
query-labels-real

Conversation

@iskakaushik
Copy link
Copy Markdown
Collaborator

Summary

  • Extract user-defined key='value' labels from SQL comments in the sqlcommenter format (/* controller='users',action='show' */) and export them to ClickHouse as a labels JSON column and to OTel as db.query.label.* attributes
  • Parsing runs entirely in the background worker (cold path) — zero changes to the hot path (hooks, events, ring buffer, DSA)
  • New GUC: pg_stat_ch.track_labels (bool, default true, PGC_SIGHUP)

Details

The bgworker scans the query text backward for the last /* */ comment block, parses key='value' pairs with URL decoding, and passes the structured ParseResult to each exporter directly.

New files:

  • src/export/sqlcommenter_parse.cc/.h — parser: ExtractLastComment + ParseSqlcommenter + SerializeLabelsJson
  • test/unit/sqlcommenter_parse_test.cc — 45 unit tests covering parsing, URL decoding, truncation, and edge cases
  • t/030_query_labels.pl — TAP integration test for end-to-end ClickHouse label export

Schema change:

  • New ClickHouse column: labels JSON(max_dynamic_paths=64) on events_raw

Test plan

  • Unit tests pass (sqlcommenter_parse_test — 45 cases)
  • TAP integration test (030_query_labels.pl) confirms labels round-trip to ClickHouse
  • Regression tests pass (GUC visibility)
  • Verify no hot-path performance impact (no hook changes)

🤖 Generated with Claude Code

Extract user-defined key='value' labels from SQL comments in the
sqlcommenter format (/* controller='users',action='show' */) and
export them to ClickHouse as a JSON column and to OTel as
db.query.label.* attributes.

Parsing happens entirely in the background worker (cold path) with
zero changes to the hot path — hooks, events, ring buffer, and DSA
are untouched. The bgworker scans the query text backward for the
last /* */ comment block, parses key='value' pairs with URL decoding,
and passes the structured ParseResult to each exporter directly.

- New GUC: pg_stat_ch.track_labels (bool, default true, PGC_SIGHUP)
- New ClickHouse column: labels JSON(max_dynamic_paths=64)
- Parser: ExtractLastComment + ParseSqlcommenter + SerializeLabelsJson
- 45 unit tests covering parsing, URL decoding, truncation, and edge cases
- TAP integration test for end-to-end ClickHouse label export
Copilot AI review requested due to automatic review settings April 13, 2026 14:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for extracting user-defined sqlcommenter key='value' labels from the last /* ... */ SQL comment and exporting them as structured labels to both ClickHouse (new labels JSON column) and OpenTelemetry (db.query.label.* attributes), gated by a new pg_stat_ch.track_labels GUC.

Changes:

  • Implement sqlcommenter comment extraction/parsing + JSON serialization, and invoke it from the bgworker export loop.
  • Extend exporters with a StatsExporter::AppendLabels() hook to emit labels to ClickHouse and OTel.
  • Add ClickHouse schema support + unit and TAP integration tests, plus regression coverage for the new GUC.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/unit/sqlcommenter_parse_test.cc Adds GTest unit coverage for comment extraction, parsing, decoding, truncation, and JSON serialization.
test/regression/sql/guc.sql Exercises visibility of new pg_stat_ch.track_labels GUC.
test/regression/expected/guc.out Updates expected regression output to include pg_stat_ch.track_labels.
t/030_query_labels.pl Adds end-to-end TAP test asserting labels round-trip to ClickHouse and respect track_labels.
src/export/stats_exporter.cc Parses last comment per event and forwards parsed labels to exporters in the bgworker export loop.
src/export/sqlcommenter_parse.h Declares parser/serializer APIs and ParseResult representation.
src/export/sqlcommenter_parse.cc Implements backward scan for last block comment, sqlcommenter parsing, URL-decoding, and JSON escaping/serialization.
src/export/otel_exporter.cc Implements AppendLabels() to publish db.query.label.* attributes (and tag map entries).
src/export/exporter_interface.h Extends exporter interface with AppendLabels(const ParseResult&).
src/export/clickhouse_exporter.cc Adds a labels column to exported ClickHouse blocks and appends serialized label JSON.
src/config/guc.cc Introduces pg_stat_ch.track_labels (PGC_SIGHUP, default on).
include/config/guc.h Exposes psch_track_labels for use in exporter code.
docker/init/00-schema.sql Adds labels JSON(max_dynamic_paths=64) column to the canonical ClickHouse schema.
CMakeLists.txt Adds a new sqlcommenter_parse_test unit test target.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread t/030_query_labels.pl Outdated
Comment on lines +62 to +67
$node->safe_psql('postgres', 'SELECT 42');
$node->safe_psql('postgres', 'SELECT pg_stat_ch_flush()');

my $labels = psch_wait_for_clickhouse_query(
"SELECT labels FROM pg_stat_ch.events_raw "
. "WHERE query LIKE '%42%' LIMIT 1",
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this new subtest the ClickHouse lookup filters on WHERE query LIKE '%42%', but queries are normalized in pg_stat_ch (numeric literals are replaced with $N placeholders; see existing normalization tests). This can make the row unfindable and the test flaky/failing. Prefer filtering on a stable identifier that survives normalization (e.g., an alias/table name) or by pid + ordering, instead of the literal value.

Suggested change
$node->safe_psql('postgres', 'SELECT 42');
$node->safe_psql('postgres', 'SELECT pg_stat_ch_flush()');
my $labels = psch_wait_for_clickhouse_query(
"SELECT labels FROM pg_stat_ch.events_raw "
. "WHERE query LIKE '%42%' LIMIT 1",
$node->safe_psql('postgres',
'SELECT 42 AS no_comment_query_labels');
$node->safe_psql('postgres', 'SELECT pg_stat_ch_flush()');
my $labels = psch_wait_for_clickhouse_query(
"SELECT labels FROM pg_stat_ch.events_raw "
. "WHERE query LIKE '%no_comment_query_labels%' LIMIT 1",

Copilot uses AI. Check for mistakes.
// Query text. CH: RecordString "query"; OTel semconv: "db.query.text".
virtual shared_ptr<Column<string_view>> DbQueryTextColumn() = 0;
// Query labels from sqlcommenter comments. Called inside the event loop.
// CH: serializes to JSON, appends to a String "labels" column;
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says ClickHouse appends labels to a String labels column, but the schema change in this PR defines labels as a ClickHouse JSON column. Updating this description to match the actual column type will avoid confusion for future maintainers/users.

Suggested change
// CH: serializes to JSON, appends to a String "labels" column;
// CH: serializes to JSON, appends to a JSON "labels" column;

Copilot uses AI. Check for mistakes.
Comment thread docker/init/00-schema.sql Outdated

query String COMMENT 'Full SQL query text (may be truncated). Used for debugging and query analysis.',

labels JSON(max_dynamic_paths=64) COMMENT 'Query labels from sqlcommenter comments (key=value pairs in /* */ blocks). Access subpaths directly: labels.controller, labels.action. Empty {} when no labels present. See: https://google.github.io/sqlcommenter/',
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column comment describes sqlcommenter labels as "key=value" pairs, but the format handled/advertised elsewhere in this PR is key='value' (single-quoted values). Consider updating the comment text to reflect the actual expected syntax to avoid user confusion.

Suggested change
labels JSON(max_dynamic_paths=64) COMMENT 'Query labels from sqlcommenter comments (key=value pairs in /* */ blocks). Access subpaths directly: labels.controller, labels.action. Empty {} when no labels present. See: https://google.github.io/sqlcommenter/',
labels JSON(max_dynamic_paths=64) COMMENT 'Query labels from sqlcommenter comments (key=''value'' pairs in /* */ blocks). Access subpaths directly: labels.controller, labels.action. Empty {} when no labels present. See: https://google.github.io/sqlcommenter/',

Copilot uses AI. Check for mistakes.
- Parse \' as escaped single quote in values (spec meta char escaping)
- Combined MetaUnescapeAndUrlDecode replaces UrlDecode for spec compliance
- Add ClickHouse migration for existing installs (001_add_labels_column.sql)
- Document labels column in events-schema reference and migration in
  ClickHouse guide
- Fix CI to build all unit test targets (sqlcommenter_parse_test was missing)
- Add 9 spec compliance tests (escaped quotes, spec exhibit round-trip)
- Real-world Django ORM sample (6 labels with URL-encoded framework
  version, route, traceparent, tracestate)
- Multi-line comment extraction and parsing
- Multiple comment blocks (last block wins)
- Non-sqlcommenter last comment returns no labels
- -- style comments don't interfere with /* */ extraction
- Full end-to-end Django pipeline with JSON verification

61 tests total (up from 54).
Copilot AI review requested due to automatic review settings April 13, 2026 14:53
Extract Scanner struct with SkipWhitespace/Consume/ScanKey/ScanQuotedValue
methods. Add DecodeField helper to deduplicate key/value decode logic.
Replace inline character checks with IsWhitespace/IsKeyTerminator.
Use std::from_chars for hex decoding.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +135 to +139
attr_name.append(labels.labels[i].key.data(), labels.labels[i].key.size());
string val(labels.labels[i].value);
current_log_record->SetAttribute(attr_name, val);
current_row_tags[attr_name] = std::move(val);
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppendLabels() currently inserts parsed sqlcommenter labels into current_row_tags (line 138), which means they become dimensions for all emitted metrics (HistogramColumn::Crunch records metrics with current_row_tags). Since these labels are user-controlled and high-cardinality by nature, this can explode metric cardinality and dramatically increase OTel backend cost/ingestion load. Consider keeping labels as log-record attributes only (do not add them to current_row_tags), or gate them behind a separate config specifically for metrics tagging.

Also, the label key is appended directly into the attribute name; after URL-decoding it may contain spaces/control characters (tests include "my key"), which can violate common OTel attribute-key expectations and may break downstream collectors/backends. Please sanitize/normalize keys (e.g., allowlist [A-Za-z0-9_.-] and replace others / skip invalid keys) before constructing the attribute name.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshDreamland you've been dealing with otel perf here, does this copilot perf advice seem applicable here?

Comment on lines +79 to +86
labels_col_->Append(SerializeLabelsJson(labels));
}

void BeginBatch() final {
block = std::make_unique<clickhouse::Block>();
columns.clear();
exported_count = 0;
labels_col_ = Wrap<clickhouse::ColumnString, string_view>("labels");
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClickHouseExporter::BeginBatch() unconditionally adds a "labels" column to every inserted block. On upgrades where the ClickHouse schema hasn’t yet been migrated to include labels, inserts will fail continuously — even when pg_stat_ch.track_labels = off — because the block still contains an unknown column. Consider only creating/appending the labels column when label tracking is enabled and the destination schema supports it (or otherwise detect the missing column and emit a clear actionable error that points to the migration script, while allowing operation with labels disabled).

Suggested change
labels_col_->Append(SerializeLabelsJson(labels));
}
void BeginBatch() final {
block = std::make_unique<clickhouse::Block>();
columns.clear();
exported_count = 0;
labels_col_ = Wrap<clickhouse::ColumnString, string_view>("labels");
if (labels_col_) {
labels_col_->Append(SerializeLabelsJson(labels));
}
}
void BeginBatch() final {
block = std::make_unique<clickhouse::Block>();
columns.clear();
exported_count = 0;
labels_col_.reset();
if (pg_stat_ch_track_labels) {
labels_col_ = Wrap<clickhouse::ColumnString, string_view>("labels");
}

Copilot uses AI. Check for mistakes.
Replaces hand-rolled AppendJsonEscaped + string-concat JSON builder
with nlohmann::ordered_json. Escaping is now handled by a well-tested
library instead of custom code.
- Change labels column from JSON(max_dynamic_paths=64) to String
  DEFAULT '{}' — clickhouse-cpp doesn't support the JSON column type
- Fix OTel export test: duration metric renamed to
  pg_stat_ch_db_client_operation_duration_seconds, label db → db_name
- Fix OTel precedence warning: !system(...) == 0 → system(...) != 0
- Fix query labels test: use JSONExtractString instead of subpath
  syntax, use DDL instead of normalized literals for no-comment test
- Rename 030_query_labels.pl → 032_query_labels.pl (avoid collisions
  with 030_nested_query_normalization and 031_standby_export)
Copilot AI review requested due to automatic review settings April 13, 2026 15:23
New CI jobs that run the integration TAP tests against PG 18:
- clickhouse-tap: starts ClickHouse via docker-compose, runs
  010_clickhouse_export and 011_clickhouse_reconnect
- otel-tap: starts OTel collector via docker-compose, runs
  024_otel_export and 025_otel_reconnect
- Revert redundant clickhouse-tap/otel-tap jobs from ci.yml (ci-tap.yml
  already runs all TAP tests)
- Switch ci-tap.yml from raw docker run to docker-compose so port
  mappings match what the tests expect (18123/19000)
- Start OTel collector alongside ClickHouse so OTel TAP tests run
  instead of being skipped
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


| Column | Type | Description |
|---|---|---|
| `labels` | `String DEFAULT '{}'` | Key-value labels extracted from [sqlcommenter](https://google.github.io/sqlcommenter/) comments appended to the query. For example, `/* controller='users',action='show' */` produces `{"controller":"users","action":"show"}`. Access subpaths directly in ClickHouse: `labels.controller`, `labels.action`. Empty `{}` when no labels are present. |
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs claim you can access JSON subpaths as labels.controller / labels.action, but the schema defines labels as String (see docker/init/00-schema.sql and migration). With a String column, ClickHouse won't support dot-subpath access; you need functions like JSONExtractString(labels, 'controller') (as used in TAP tests), or change the column type to ClickHouse JSON to match the documentation.

Suggested change
| `labels` | `String DEFAULT '{}'` | Key-value labels extracted from [sqlcommenter](https://google.github.io/sqlcommenter/) comments appended to the query. For example, `/* controller='users',action='show' */` produces `{"controller":"users","action":"show"}`. Access subpaths directly in ClickHouse: `labels.controller`, `labels.action`. Empty `{}` when no labels are present. |
| `labels` | `String DEFAULT '{}'` | Key-value labels extracted from [sqlcommenter](https://google.github.io/sqlcommenter/) comments appended to the query. For example, `/* controller='users',action='show' */` produces `{"controller":"users","action":"show"}`. Since this column stores JSON as a string, extract values in ClickHouse with functions such as `JSONExtractString(labels, 'controller')` and `JSONExtractString(labels, 'action')`. Empty `{}` when no labels are present. |

Copilot uses AI. Check for mistakes.

ALTER TABLE pg_stat_ch.events_raw
ADD COLUMN IF NOT EXISTS labels String DEFAULT '{}'
COMMENT 'Query labels from sqlcommenter comments (key=value pairs in /* */ blocks). Access subpaths directly: labels.controller, labels.action. Empty {} when no labels present. See: https://google.github.io/sqlcommenter/'
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration adds labels as a String, but the column COMMENT says you can access subpaths directly via labels.controller/labels.action. That dot-subpath syntax requires the ClickHouse JSON type; for a String column the documented access pattern should use JSONExtract* functions, or the column type should be switched to JSON to match the comment and docs.

Suggested change
COMMENT 'Query labels from sqlcommenter comments (key=value pairs in /* */ blocks). Access subpaths directly: labels.controller, labels.action. Empty {} when no labels present. See: https://google.github.io/sqlcommenter/'
COMMENT 'Query labels from sqlcommenter comments (key=value pairs in /* */ blocks). Extract fields with JSONExtractString(labels, ''controller'') and JSONExtractString(labels, ''action''). Empty {} when no labels present. See: https://google.github.io/sqlcommenter/'

Copilot uses AI. Check for mistakes.
Comment thread docs/guides/clickhouse.md

| Migration | Version | Description |
|---|---|---|
| `001_add_labels_column.sql` | 0.2+ | Adds `labels JSON` column for [sqlcommenter](https://google.github.io/sqlcommenter/) query label support |
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration table says 001_add_labels_column.sql adds a labels JSON column, but the actual schema/migration defines labels as String DEFAULT '{}'. Please align the docs with the actual ClickHouse type (or update the schema if JSON is the real intent).

Suggested change
| `001_add_labels_column.sql` | 0.2+ | Adds `labels JSON` column for [sqlcommenter](https://google.github.io/sqlcommenter/) query label support |
| `001_add_labels_column.sql` | 0.2+ | Adds `labels String DEFAULT '{}'` column for [sqlcommenter](https://google.github.io/sqlcommenter/) query label support |

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +52
// Meta-unescape (\' -> ') and URL-decode (%XX) in a single pass.
// Per sqlcommenter spec, meta unescaping happens before URL decoding,
// but the two transforms don't overlap so a combined pass is equivalent.
size_t MetaUnescapeAndUrlDecode(std::string_view src, char* dst, size_t max_len) {
size_t written = 0;
size_t i = 0;
while (i < src.size() && written < max_len) {
if (src[i] == '\\' && i + 1 < src.size() && src[i + 1] == '\'') {
dst[written++] = '\'';
i += 2;
} else if (src[i] == '%' && i + 2 < src.size()) {
int byte = DecodeHexByte(src[i + 1], src[i + 2]);
if (byte >= 0) {
dst[written++] = static_cast<char>(byte);
i += 3;
continue;
}
dst[written++] = src[i++];
} else {
dst[written++] = src[i++];
}
}
return written;
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetaUnescapeAndUrlDecode will percent-decode arbitrary bytes (including control characters and NUL via %00) into label keys/values. Those bytes can produce non-printable / non-UTF-8 strings in downstream outputs (OTel attribute names/values, ClickHouse JSON text), which can cause ingestion failures or hard-to-debug display issues. Consider validating/sanitizing decoded output (e.g., reject or replace control bytes and ensure UTF-8) before returning it from the parser.

Copilot uses AI. Check for mistakes.
The otel-collector-contrib image is distroless — no shell, wget, or
curl available for Docker healthchecks. Remove the in-container
healthcheck and poll from the host via curl in CI instead.
The OTel periodic metric reader defaults to 5s export interval.
Tests that scrape the Prometheus endpoint need to wait for at least
one export cycle. Set otel_metric_interval_ms=1000 in the test
helper and add sleep(3) before Prometheus scrapes.
Copilot AI review requested due to automatic review settings April 13, 2026 15:53
The test was using a SELECT with a comment (normalized away) and a
racy wait predicate. Use DDL (CREATE/DROP TABLE) whose text survives
normalization, wait for the row to arrive in ClickHouse, then assert.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +35 to +41
| Column | Type | Description |
|---|---|---|
| `labels` | `String DEFAULT '{}'` | Key-value labels extracted from [sqlcommenter](https://google.github.io/sqlcommenter/) comments appended to the query. For example, `/* controller='users',action='show' */` produces `{"controller":"users","action":"show"}`. Access subpaths directly in ClickHouse: `labels.controller`, `labels.action`. Empty `{}` when no labels are present. |

<Note>
Labels are parsed from the **last** `/* */` comment block in the query text. The parser supports URL-encoded values and escaped single quotes per the sqlcommenter specification. Controlled by the [`pg_stat_ch.track_labels`](/reference/configuration) GUC (default: `true`).
</Note>
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The labels column is documented as String, but the description claims you can access subpaths via labels.controller / labels.action. That dot-notation requires a ClickHouse JSON-typed column; with String you need functions like JSONExtractString(labels, 'controller'). Please either change the column type to JSON(...) (and keep docs as-is) or update the docs/examples to match a String column.

Copilot uses AI. Check for mistakes.
| `labels` | `String DEFAULT '{}'` | Key-value labels extracted from [sqlcommenter](https://google.github.io/sqlcommenter/) comments appended to the query. For example, `/* controller='users',action='show' */` produces `{"controller":"users","action":"show"}`. Access subpaths directly in ClickHouse: `labels.controller`, `labels.action`. Empty `{}` when no labels are present. |

<Note>
Labels are parsed from the **last** `/* */` comment block in the query text. The parser supports URL-encoded values and escaped single quotes per the sqlcommenter specification. Controlled by the [`pg_stat_ch.track_labels`](/reference/configuration) GUC (default: `true`).
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section introduces pg_stat_ch.track_labels and links to /reference/configuration, but that page currently doesn’t mention track_labels (so readers can’t discover details like context/reload semantics). Please add pg_stat_ch.track_labels to the configuration reference and consider adding a short warning that label values are user-supplied and may contain sensitive data if applications put PII/tokens in sqlcommenter comments.

Suggested change
Labels are parsed from the **last** `/* */` comment block in the query text. The parser supports URL-encoded values and escaped single quotes per the sqlcommenter specification. Controlled by the [`pg_stat_ch.track_labels`](/reference/configuration) GUC (default: `true`).
Labels are parsed from the **last** `/* */` comment block in the query text. The parser supports URL-encoded values and escaped single quotes per the sqlcommenter specification. Collection is controlled by the [`pg_stat_ch.track_labels`](/reference/configuration) GUC, which defaults to `true` and can be changed with a configuration reload (no server restart required).
Label values are user-supplied application metadata. If applications include PII, tokens, session identifiers, or other secrets in sqlcommenter comments, those values may be exported in `labels`. Only enable and use labels with trusted, non-sensitive metadata.

Copilot uses AI. Check for mistakes.
Comment on lines +78 to 88
docker compose -f docker/docker-compose.test.yml up -d --wait
docker compose -f docker/docker-compose.otel.yml up -d
# Poll OTel health endpoint (distroless image has no shell for healthcheck)
for i in $(seq 1 30); do
if curl -sf http://localhost:13133/ >/dev/null 2>&1; then
echo "OTel collector ready"
break
fi
echo "Waiting for ClickHouse... ($i/30)"
echo "Waiting for OTel collector... ($i/30)"
sleep 1
done
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readiness polling loop for the OTel collector never fails the step if the collector doesn’t become ready within 30s (it just falls through). Please add a post-loop check (or track a flag) and exit 1 if the health endpoint never responds, so CI fails fast with a clear error when the collector can’t start.

Copilot uses AI. Check for mistakes.
}

bool IsWhitespace(char c) {
return c == ' ' || c == '\t' || c == '\n' || c == '\r';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use isspace which is used in postgresql

// Scan a single-quoted value, handling \' escapes per sqlcommenter spec.
// Assumes the opening quote was already consumed.
// Returns the raw content (before decoding). Sets *ok = false on unterminated quote.
std::string_view ScanQuotedValue(bool* ok) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::string_view ScanQuotedValue(bool* ok) {
std::string_view ScanQuotedValue() {

caller can go by result.data() == nullptr

Comment on lines +214 to +216
} else {
exporter->AppendLabels(ParseResult{});
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else {
exporter->AppendLabels(ParseResult{});
}
}

if GUC disabled why do we still push empty object?

serprex
serprex previously approved these changes Apr 13, 2026
@serprex
Copy link
Copy Markdown
Member

serprex commented Apr 13, 2026

There's a lot of logic that tries to be antifragile by ignoring bad inputs (skips misformed labels, tries to keep parsing after, ignores missing commas) which feels like it'll result in unnecessary parsing over invalid input (read: last comment of any normal query), potentially grabbing random labels (probably from commented SQL)

the rfind to find last comment would be better off only ignoring trailing whitespace, if comment is inside query we shouldn't be parsing it

@serprex serprex self-requested a review April 13, 2026 20:53
@serprex serprex dismissed their stale review April 13, 2026 20:53

rfind concern

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants