Skip to content

GCP: Add gcs.write.object-context.* catalog property support(parity with S3 write.tags)#16996

Open
revanthgss wants to merge 2 commits into
apache:mainfrom
revanthgss:gcsfileio-write-context-feature
Open

GCP: Add gcs.write.object-context.* catalog property support(parity with S3 write.tags)#16996
revanthgss wants to merge 2 commits into
apache:mainfrom
revanthgss:gcsfileio-write-context-feature

Conversation

@revanthgss

Copy link
Copy Markdown

GCS: add gcs.write.object-context.* catalog property support (parity with S3 write.tags)

Closes #16995

Overview

Adds write-time GCS object context annotation support to GCSFileIO, providing parity with
s3.write.tags.* in the AWS module. Object contexts are GCS's equivalent of S3 object tags —
key-value metadata serialized into the upload request and committed atomically with the object,
with no extra API call. Object Contexts are GA as of April 2026.

Changes

GCPProperties

  • New constant: GCS_WRITE_OBJECT_CONTEXT_PREFIX = "gcs.write.object-context."
  • Constructor parses all properties matching the prefix into an immutable
    Map<String, String> writeObjectContexts, stripping the prefix to yield plain context keys
  • New accessor: writeObjectContexts()

GCSOutputStream

  • New private method buildBlobInfo(BlobId, GCPProperties) — when writeObjectContexts() is
    non-empty, builds ObjectCustomContextPayload entries and sets them via
    BlobInfo.Builder.setContexts(ObjectContexts) before passing BlobInfo to storage.writer()
  • When no contexts are configured, setContexts is never called — zero overhead on the existing path
  • All stream mechanics (write, close, getPos) are unchanged

Example usage

spark.sql.catalog.my_catalog.gcs.write.object-context.env=prod
spark.sql.catalog.my_catalog.gcs.write.object-context.team=data-platform
spark.sql.catalog.my_catalog.gcs.write.object-context.iceberg-table=orders

Related

Checklist

  • Unit tests added and passing (./gradlew :iceberg-gcp:test)
  • Spotless check passing (./gradlew :iceberg-gcp:spotlessCheck)
  • google-cloud-storage resolved version confirmed ≥ 2.57
    (./gradlew :iceberg-gcp:dependencyInsight --dependency google-cloud-storage)
  • Test File IO changes against a real GCS bucket

AI Disclosure

LLM is used to research the existing S3 implementation and draft the GCS File IO changes and drafting the PR description.

@github-actions github-actions Bot added the GCP label Jun 28, 2026
@revanthgss revanthgss force-pushed the gcsfileio-write-context-feature branch from 3a1ca12 to ace52f4 Compare June 28, 2026 16:12
@revanthgss revanthgss force-pushed the gcsfileio-write-context-feature branch from ace52f4 to fc581fa Compare June 28, 2026 16:19
@revanthgss

Copy link
Copy Markdown
Author

@danielcweeks Could you please review this PR?

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GCP: Support write-time object context annotations in GCS FileIO (equivalent of S3 write.tags)

1 participant