Skip to content

Pluggable ingestion serializers with json_ingestion#1079

Open
jwils wants to merge 1 commit intomainfrom
joshauw/pluggable-ingestion-serializers
Open

Pluggable ingestion serializers with json_ingestion#1079
jwils wants to merge 1 commit intomainfrom
joshauw/pluggable-ingestion-serializers

Conversation

@jwils
Copy link
Copy Markdown
Collaborator

@jwils jwils commented Mar 21, 2026

Summary

  • Introduces pluggable ingestion serializer extension modules for schema definition.
  • Extracts JSON ingestion behavior into the new elasticgraph-json_ingestion gem while keeping it loaded by default for backward compatibility.
  • Moves JSON-specific schema definition implementation into extension modules, including JSON schema DSL hooks, field/type JSON schema generation, schema artifact generation, version bump checks, pruning, and metadata merge support.
  • Keeps existing schema-definition specs in place and disables coverage enforcement for the newly extracted module while tests are moved later.

Motivation

Per discussion #1059, protobuf ingestion should be able to plug into the same schema-definition pipeline without keeping JSON ingestion hardcoded in core. This PR narrows that first step to extraction and extension wiring for the existing JSON ingestion behavior.

What moved where

From To
API#json_schema_version, API#json_schema_strictness, built-in JSON schema setup ElasticGraph::JSONIngestion::SchemaDefinition::APIExtension
JSON field/type schema DSL behavior ElasticGraph::JSONIngestion::SchemaDefinition::SchemaElements::*Extension
Indexing::Field, FieldReference, FieldType::*, and Index JSON schema behavior ElasticGraph::JSONIngestion::SchemaDefinition::Indexing::*Extension
JSON schema artifact generation, version bump checking, versioned schema building, and merge error reporting ElasticGraph::JSONIngestion::SchemaDefinition::SchemaArtifactManagerExtension
EventEnvelope, JSONSchemaWithMetadata, JSONSchemaFieldMetadata, JSONSchemaPruner elasticgraph-json_ingestion

What stays in core

  • Serializer-extension plumbing and default loading for backward compatibility.
  • Public rake/factory plumbing for enforce_json_schema_version, which the JSON ingestion extension still uses.
  • Runtime JSON schema consumers such as schema artifacts, indexer record preparation, and support JSON schema validation.
  • Existing schema-definition tests for the moved behavior; the new module is coverage-filtered until those tests are moved in a follow-up.

Test plan

  • script/lint
  • script/type_check
  • script/spellcheck
  • env -u COVERAGE BUNDLE_GEMFILE=elasticgraph-schema_definition/Gemfile bundle exec rspec elasticgraph-schema_definition/spec/unit --fail-fast --format progress
  • env -u COVERAGE BUNDLE_GEMFILE=elasticgraph-apollo/Gemfile bundle exec rspec elasticgraph-apollo/spec/unit --fail-fast --format progress
  • script/run_gem_specs elasticgraph-json_ingestion (skips because the extracted gem intentionally has no spec directory yet)

@jwils jwils force-pushed the joshauw/pluggable-ingestion-serializers branch 3 times, most recently from d29bd5d to a3cfa5c Compare March 22, 2026 16:47
@jwils jwils force-pushed the joshauw/pluggable-ingestion-serializers branch 2 times, most recently from 9e6f39a to c408a50 Compare March 23, 2026 04:10
Copy link
Copy Markdown
Collaborator

@myronmarston myronmarston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start but I think there's a lot more to do here. If the JSON schema ingestion support is no longer part of elasticgraph-schema_definition (now that it lives in elasticgraph-json_schema) then elasticgraph-json_schema needs to be solely responsible for all aspects of the JSON schema artifact.

Here's a non-exhaustive list of things that are still in elasticgraph-schema_definition that should be moved:

I'm sure there's a lot more than that but that'll get you started. the way I'm thinking about this: when someone uses ElasticGraph with only protobuf ingestion in the future, and doesn't even install the elasticgraph-json_schema gem, their collection of ElasticGraph gems should have no JSON-schema related logic, in a similar fashion to how we're not going to add protobuf directly to elasticgraph-schema_definition--it'll be provided by an extesnion.

A good way to approach this is to grep for json_schema in elasticgraph-schema_definition. Any mention of json_schema in elasticgraph-schema_definition should probably be moved. (Maybe some mentions of json without -schema, too, although there are some legit mentions of JSON that will remain like JSONSafeLong).

Comment thread elasticgraph-json_schema/lib/elastic_graph/json_schema.rb Outdated
@jwils jwils force-pushed the joshauw/pluggable-ingestion-serializers branch from c408a50 to 5717334 Compare March 26, 2026 14:32
@jwils jwils force-pushed the joshauw/pluggable-ingestion-serializers branch 11 times, most recently from 23c3ead to 0607029 Compare April 4, 2026 21:34
@jwils jwils changed the title Prototype pluggable ingestion serializers Pluggable ingestion serializers with json_ingestion Apr 4, 2026
@jwils jwils force-pushed the joshauw/pluggable-ingestion-serializers branch from 0607029 to 40815e1 Compare April 4, 2026 21:53
Copy link
Copy Markdown
Collaborator

@myronmarston myronmarston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed review--this is massive! (But also exciting to see...). On top of my inline feedback below, I've got some meta feedback:

  • The fact that this reworks both the tests and the implementation makes it hard to have confidence that it preserves JSON schema behavior (and also makes it a very large PR!). I'd like to see this split:
    • First, extract the implementation into elasticgraph-json_ingestion while leaving elasticgraph-schema_definition/spec changed as little as possible. Leaving it unchanged gives us confidence that the extraction produces equivalent behavior. You'll probably have to update elasticgraph-schema_definition/spec/spec_helper.rb to have it (for now) automatically use elasticgraph-json_ingestion in all its tests so that the existing tests can continue to pass.
    • Secondly, move the tests from elasticgraph-schema_definition into elasticgraph-json_ingestion.
    • Even with that split into 2 steps, the PRs will still be massive. If you see some further opportunity to split into smaller PRs, please take the opportunity to do that--I'm happy to review a whole stack of PRs.
  • There's still a ton of json_schema references in elasticgraph_schema_definition. Can you audit these? It would be good to confirm that every reference to json_schema needs to remain. I'll plan to do an audit in a future review as well (once you've let me know that you've finished your review).
    • ...but obviously the audit can be done at the end on top of the final PR in your (eventual) stack.

Comment thread elasticgraph-json_ingestion/spec/spec_helper.rb Outdated
Comment thread elasticgraph-json_ingestion/elasticgraph-json_ingestion.gemspec Outdated
Comment thread elasticgraph-local/lib/elastic_graph/local/rake_tasks.rb Outdated
Comment thread elasticgraph-local/lib/elastic_graph/local/rake_tasks.rb Outdated
@jwils jwils force-pushed the joshauw/pluggable-ingestion-serializers branch 2 times, most recently from 0cd8153 to c2bcd2b Compare May 6, 2026 12:47
@jwils jwils force-pushed the joshauw/pluggable-ingestion-serializers branch 8 times, most recently from 2d3764a to 63fe627 Compare May 6, 2026 20:01
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jwils jwils force-pushed the joshauw/pluggable-ingestion-serializers branch from 63fe627 to 709e591 Compare May 6, 2026 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants