[pinot-segment-spi] Preserve user new-format FieldConfig.indexes JsonNode through migration#18495
Open
anshul98ks123 wants to merge 5 commits into
Open
Conversation
…rmat
The migration step unconditionally overwrote every FieldConfig.indexes
JsonNode for which the typed POJO was non-default — even on columns the
user had already supplied in new-format slim shape. After the typed-POJO
round-trip (Jackson fills missing primitive @JsonCreator params with
0/false; the bean serializer emits every key), the user's slim
{"forward":{"compressionCodec":"SNAPPY"}} got fattened into a verbose
7-key blob on every migration pass.
This change makes the write gap-filling: when a column already carries a
JsonNode at FieldConfig.indexes[prettyName], the migration preserves it
verbatim. Pure-legacy inputs still translate to new format unchanged.
Same-column / same-index-type declared in both legacy and new format
continues to raise ConfigDeclaredTwiceException via the existing merged
deserializer (no semantic change there).
The new test class pins the full contract: pure-new-format preservation
(including explicit-default and explicit-false), legacy-only translation,
both-formats coexistence across different index types, idempotency on
repeated migration, multi-column independence, multiple index types on
one column, legacy-field cleanup, default-config skip path, cross-type
coverage (range/text/json), and an end-to-end Jackson round-trip.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18495 +/- ##
============================================
- Coverage 63.74% 63.69% -0.06%
Complexity 1684 1684
============================================
Files 3266 3266
Lines 199836 199840 +4
Branches 31023 31024 +1
============================================
- Hits 127388 127288 -100
- Misses 62282 62411 +129
+ Partials 10166 10141 -25
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Pins the gap-fill rule against silent re-enable: user sets
{"forward":{"disabled":true}}, migration must preserve that exact key,
not overwrite with the typed-POJO default (disabled=false).
…ion test + MAPPER comment
Clarify convertToNewFormat javadoc to accurately describe the merged
deserializer's role: same-column-same-type conflicts raise
ConfigDeclaredTwiceException BEFORE the gap-fill loop runs, so the loop
is only entered for non-conflicting inputs. Drop the misleading
"new format wins" prose.
Add newFormatExplicitNullValueRejectedByIndexTypeValidator test pinning
that user-supplied {"forward": null} is rejected by per-index-type
validators (ForwardIndexType) before reaching the gap-fill guard. Users
must use {} (empty object) for "enabled with defaults", not null.
Add inline comment on MAPPER documenting thread-safe sharing.
Pinot convention prefers imports over fully-qualified class names. Add imports for NullNode and static Assert.fail; use unqualified references in newFormatExplicitNullValueRejectedByIndexTypeValidator.
…rsarial-input tests - Production javadoc: drop ForwardIndexType-specific claim; describe both layers (per-type Preconditions check AND gap-fill !isNull() fallback). - Test javadoc: drop misleading "new format wins" prose; replace with ConfigDeclaredTwiceException reference matching production. - NullNode test assertion: match on "column: c1" instead of the index-type prose to decouple from the exact validator message. - Add newFormatPrimitiveAtPrettyNameRejected and newFormatArrayAtPrettyNameRejected for adversarial non-object JsonNode shapes; pin the loud-failure path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue(s)
AbstractIndexType#convertToNewFormat(called viaTableConfigUtils.createTableConfigFromOldFormat) unconditionally overwrites everyFieldConfig.indexes[prettyName]with the verbose typed-POJO unwrap — even on columns the user already supplied in new-format slim shape.This is load-bearing for downstream surfaces (e.g. StarTree Preview / InferIndex APIs) that round-trip user-supplied
FieldConfig.indexesJsonNodes through migration and back to the client / ZK — the slim shape never survives.Problem
getConfig(...)deserializes the user's slim JSON → typed POJO (Jackson fills missing primitive@JsonCreatorparams with0/false).configValue.toJsonNode()then re-serializes via the bean serializer, which emits every key.currentIndexes.set(prettyName, …)clobbers the user's slim shape with the verbose blob:FieldConfig.indexes.forward (user posted: {"compressionCodec":"SNAPPY"}): - {"disabled":false,"encodingType":"DICTIONARY","compressionCodec":"SNAPPY", - "deriveNumDocsPerChunk":false,"rawIndexWriterVersion":4, - "targetMaxChunkSize":"1MB","targetDocsPerChunk":1000} // 7 keys, todayRoot cause: the overwrite at
AbstractIndexType.javawas unconditional — it fired even when the column already carried a new-format JsonNode for that index type.Solution
Make the write gap-filling: when the column already carries a JsonNode at
FieldConfig.indexes[prettyName], preserve it verbatim. Pure-legacy inputs still translate to new format unchanged.ObjectMapperis also hoisted toprivate static final(was allocated per-column).After:
FieldConfig.indexes.forward (user posted: {"compressionCodec":"SNAPPY"}): + {"compressionCodec":"SNAPPY"} // 1 key, preservedBackward compatibility
bloomFilterColumns=[c1])set()branch.{"forward":{"compressionCodec":"SNAPPY"}})bloom+ legacynoDictionaryColumns)ConfigDeclaredTwiceException(merged deserializer).No SPI signature changes. No config-key renames. Strictly more conservative than today.
Rolling upgrade. Mixed-version controllers may produce different (but semantically equivalent)
FieldConfig.indexesJSON shapes for the sameTableConfigduring a rolling upgrade — an older controller will still re-fatten a slim entry, a newer controller preserves it. Both shapes are accepted by all readers, so no downtime or data migration is required.Testing
New test class
ConvertToNewFormatPreservesSlimTest(21 tests) pins the full contract:forward/bloomslim shapes preserved; explicit value equal to today's default preserved; explicitfalsepreserved; empty{}object preserved.bloomFilterColumns,noDictionaryColumns,invertedIndexColumnsall still translate.ConfigDeclaredTwiceException.range,text,jsonslim shapes preserved (each subclass shares the base codepath).TableConfigand confirm the slim shape survives.Adjacent suites still green:
TableConfigUtilsTest(56),DictionaryIndexConfigTest(10),IndexServiceTest(1), plus the 120-test index-type sweep acrossVectorIndexTest,H3IndexTest,JsonIndexTest,RangeIndexTest,TextIndexTest,FstIndexTest,ForwardIndexTypeTest,DictionaryIndexTypeTest,BloomFilterTypeTest,InvertedIndexTypeTest,NullValueIndexTest../mvnw spotless:apply checkstyle:check license:format license:check -pl pinot-segment-spi,pinot-segment-localclean.