Spark 4.1: Add validation for required fields in add_files procedure by ebyhr · Pull Request #16977 · apache/iceberg

ebyhr · 2026-06-26T23:33:15Z

add_files bypasses NOT NULL column constraints - importing a Parquet/ORC/Avro
file with null values in a required column silently succeeds, violating the
table's schema integrity.

Validation is skipped when null-value metrics are not collected (e.g., metrics
mode set to none), since there is no data to validate against.

Relates to add_files procedure allows importing NULL on NOT NULL columns #10742

nssalian · 2026-06-30T04:55:35Z

+  }
+
+  private static Set<Integer> requiredFieldIds(Schema schema) {
+    return TypeUtil.indexById(schema.asStruct()).entrySet().stream()


TypeUtil.indexById is recursive, so this set includes nested required ids (struct fields, list elements which are required by default, map keys). nullValueCounts is keyed by leaf primitive ids, and for a required leaf under an optional parent struct Parquet counts the leaf as null whenever the parent is absent, which could reject a valid file. Is the intent to validate only top-level required columns? If nested handling is intended, could we add a nested-schema test to check the behavior?

nssalian · 2026-06-30T04:56:53Z

+  }
+
+  @TestTemplate
+  public void violateNotNullConstraintFromFileTable() {


Both new tests cover the unpartitioned paths (catalog -> driver, file-table -> task). Could we add a partitioned case (a null in a required non-partition column) so the buildManifest validation is pinned for partitioned imports, and a positive test confirming a required column with valid data still imports successfully?

github-actions Bot added the spark label Jun 26, 2026

anuragmantri reviewed Jun 29, 2026

View reviewed changes

Comment thread spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java Outdated

Comment thread spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java Outdated

Spark 4.1: Add validation for required fields in add_files procedure

17c3f97

ebyhr force-pushed the ebi/spark-add-files-not-null branch from b7ddc3a to 17c3f97 Compare June 29, 2026 05:37

nssalian reviewed Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark 4.1: Add validation for required fields in add_files procedure#16977

Spark 4.1: Add validation for required fields in add_files procedure#16977
ebyhr wants to merge 1 commit into
apache:mainfrom
ebyhr:ebi/spark-add-files-not-null

ebyhr commented Jun 26, 2026

Uh oh!

Uh oh!

Uh oh!

nssalian Jun 30, 2026

Uh oh!

nssalian Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ebyhr commented Jun 26, 2026

Uh oh!

Uh oh!

Uh oh!

nssalian Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

nssalian Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants