Skip to content

[pinot-segment-spi] Preserve user new-format FieldConfig.indexes JsonNode through migration#18495

Open
anshul98ks123 wants to merge 5 commits into
apache:masterfrom
anshul98ks123:migration-preserve-slim-new-format
Open

[pinot-segment-spi] Preserve user new-format FieldConfig.indexes JsonNode through migration#18495
anshul98ks123 wants to merge 5 commits into
apache:masterfrom
anshul98ks123:migration-preserve-slim-new-format

Conversation

@anshul98ks123
Copy link
Copy Markdown
Contributor

@anshul98ks123 anshul98ks123 commented May 14, 2026

Issue(s)

AbstractIndexType#convertToNewFormat (called via TableConfigUtils.createTableConfigFromOldFormat) unconditionally overwrites every FieldConfig.indexes[prettyName] with the verbose typed-POJO unwrap — even on columns the user already supplied in new-format slim shape.

This is load-bearing for downstream surfaces (e.g. StarTree Preview / InferIndex APIs) that round-trip user-supplied FieldConfig.indexes JsonNodes through migration and back to the client / ZK — the slim shape never survives.


Problem

getConfig(...) deserializes the user's slim JSON → typed POJO (Jackson fills missing primitive @JsonCreator params with 0 / false). configValue.toJsonNode() then re-serializes via the bean serializer, which emits every key. currentIndexes.set(prettyName, …) clobbers the user's slim shape with the verbose blob:

  FieldConfig.indexes.forward (user posted: {"compressionCodec":"SNAPPY"}):
- {"disabled":false,"encodingType":"DICTIONARY","compressionCodec":"SNAPPY",
-  "deriveNumDocsPerChunk":false,"rawIndexWriterVersion":4,
-  "targetMaxChunkSize":"1MB","targetDocsPerChunk":1000}     // 7 keys, today

Root cause: the overwrite at AbstractIndexType.java was unconditional — it fired even when the column already carried a new-format JsonNode for that index type.


Solution

Make the write gap-filling: when the column already carries a JsonNode at FieldConfig.indexes[prettyName], preserve it verbatim. Pure-legacy inputs still translate to new format unchanged.

-      ObjectNode currentIndexes = fieldConfig.getIndexes().isNull()
-          ? new ObjectMapper().createObjectNode()
-          : new ObjectMapper().valueToTree(fieldConfig.getIndexes());
-      currentIndexes.set(getPrettyName(), configValue.toJsonNode());
+      ObjectNode currentIndexes = fieldConfig.getIndexes().isNull()
+          ? MAPPER.createObjectNode()
+          : MAPPER.valueToTree(fieldConfig.getIndexes());
+      JsonNode existing = currentIndexes.get(getPrettyName());
+      if (existing != null && !existing.isNull()) {
+        // Column already carries a new-format JsonNode for this index type — preserve the
+        // user's shape verbatim. Legacy-only inputs fall through to the set() branch below.
+        continue;
+      }
+      currentIndexes.set(getPrettyName(), configValue.toJsonNode());

ObjectMapper is also hoisted to private static final (was allocated per-column).

After:

  FieldConfig.indexes.forward (user posted: {"compressionCodec":"SNAPPY"}):
+ {"compressionCodec":"SNAPPY"}                              // 1 key, preserved

Backward compatibility

Case Before After
Pure legacy (e.g. bloomFilterColumns=[c1]) Translated to new format (verbose). Same — falls through to the set() branch.
Pure new-format slim (e.g. {"forward":{"compressionCodec":"SNAPPY"}}) Overwritten with verbose typed-POJO unwrap. Preserved verbatim.
Both formats, different index types (e.g. slim bloom + legacy noDictionaryColumns) Both translate; legacy fattens its own entry. Both translate independently; user's slim shape stays slim.
Both formats, same column + same index type ConfigDeclaredTwiceException (merged deserializer). Same — gap-fill runs after the deserializer, so the throw still fires.
Cross-type inference (e.g. inverted → dictionary auto-create) Verbose written. Verbose written (key was absent). Same.

No SPI signature changes. No config-key renames. Strictly more conservative than today.

Rolling upgrade. Mixed-version controllers may produce different (but semantically equivalent) FieldConfig.indexes JSON shapes for the same TableConfig during a rolling upgrade — an older controller will still re-fatten a slim entry, a newer controller preserves it. Both shapes are accepted by all readers, so no downtime or data migration is required.


Testing

New test class ConvertToNewFormatPreservesSlimTest (21 tests) pins the full contract:

  • Pure new-format preservation: forward / bloom slim shapes preserved; explicit value equal to today's default preserved; explicit false preserved; empty {} object preserved.
  • Pure-legacy translation (back-compat): bloomFilterColumns, noDictionaryColumns, invertedIndexColumns all still translate.
  • Both-formats coexistence: different index types coexist with each shape independent; same column + same type still raises ConfigDeclaredTwiceException.
  • Idempotency: migration twice == migration once, for both new-format and legacy inputs.
  • Multi-column independence + multiple index types on one column + legacy cleanup + default-config skip.
  • Cross-type coverage: range, text, json slim shapes preserved (each subclass shares the base codepath).
  • End-to-end Jackson round-trip: serialize → deserialize the migrated TableConfig and confirm the slim shape survives.

Adjacent suites still green: TableConfigUtilsTest (56), DictionaryIndexConfigTest (10), IndexServiceTest (1), plus the 120-test index-type sweep across VectorIndexTest, H3IndexTest, JsonIndexTest, RangeIndexTest, TextIndexTest, FstIndexTest, ForwardIndexTypeTest, DictionaryIndexTypeTest, BloomFilterTypeTest, InvertedIndexTypeTest, NullValueIndexTest.

./mvnw spotless:apply checkstyle:check license:format license:check -pl pinot-segment-spi,pinot-segment-local clean.

…rmat

The migration step unconditionally overwrote every FieldConfig.indexes
JsonNode for which the typed POJO was non-default — even on columns the
user had already supplied in new-format slim shape. After the typed-POJO
round-trip (Jackson fills missing primitive @JsonCreator params with
0/false; the bean serializer emits every key), the user's slim
{"forward":{"compressionCodec":"SNAPPY"}} got fattened into a verbose
7-key blob on every migration pass.

This change makes the write gap-filling: when a column already carries a
JsonNode at FieldConfig.indexes[prettyName], the migration preserves it
verbatim. Pure-legacy inputs still translate to new format unchanged.
Same-column / same-index-type declared in both legacy and new format
continues to raise ConfigDeclaredTwiceException via the existing merged
deserializer (no semantic change there).

The new test class pins the full contract: pure-new-format preservation
(including explicit-default and explicit-false), legacy-only translation,
both-formats coexistence across different index types, idempotency on
repeated migration, multi-column independence, multiple index types on
one column, legacy-field cleanup, default-config skip path, cross-type
coverage (range/text/json), and an end-to-end Jackson round-trip.
@anshul98ks123 anshul98ks123 changed the title Preserve user new-format FieldConfig.indexes JsonNode through table-config migration [pinot-segment-spi] Preserve user new-format FieldConfig.indexes JsonNode through migration May 14, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 14, 2026

Codecov Report

❌ Patch coverage is 0% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.69%. Comparing base (21c88f7) to head (0ee367a).

Files with missing lines Patch % Lines
...che/pinot/segment/spi/index/AbstractIndexType.java 0.00% 9 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18495      +/-   ##
============================================
- Coverage     63.74%   63.69%   -0.06%     
  Complexity     1684     1684              
============================================
  Files          3266     3266              
  Lines        199836   199840       +4     
  Branches      31023    31024       +1     
============================================
- Hits         127388   127288     -100     
- Misses        62282    62411     +129     
+ Partials      10166    10141      -25     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 63.69% <0.00%> (-0.06%) ⬇️
temurin 63.69% <0.00%> (-0.06%) ⬇️
unittests 63.69% <0.00%> (-0.06%) ⬇️
unittests1 55.84% <0.00%> (+<0.01%) ⬆️
unittests2 34.94% <0.00%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Pins the gap-fill rule against silent re-enable: user sets
{"forward":{"disabled":true}}, migration must preserve that exact key,
not overwrite with the typed-POJO default (disabled=false).
…ion test + MAPPER comment

Clarify convertToNewFormat javadoc to accurately describe the merged
deserializer's role: same-column-same-type conflicts raise
ConfigDeclaredTwiceException BEFORE the gap-fill loop runs, so the loop
is only entered for non-conflicting inputs. Drop the misleading
"new format wins" prose.

Add newFormatExplicitNullValueRejectedByIndexTypeValidator test pinning
that user-supplied {"forward": null} is rejected by per-index-type
validators (ForwardIndexType) before reaching the gap-fill guard. Users
must use {} (empty object) for "enabled with defaults", not null.

Add inline comment on MAPPER documenting thread-safe sharing.
Pinot convention prefers imports over fully-qualified class names.
Add imports for NullNode and static Assert.fail; use unqualified
references in newFormatExplicitNullValueRejectedByIndexTypeValidator.
…rsarial-input tests

- Production javadoc: drop ForwardIndexType-specific claim; describe both
  layers (per-type Preconditions check AND gap-fill !isNull() fallback).
- Test javadoc: drop misleading "new format wins" prose; replace with
  ConfigDeclaredTwiceException reference matching production.
- NullNode test assertion: match on "column: c1" instead of the index-type
  prose to decouple from the exact validator message.
- Add newFormatPrimitiveAtPrettyNameRejected and newFormatArrayAtPrettyNameRejected
  for adversarial non-object JsonNode shapes; pin the loud-failure path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants