Skip to content

Add pluggable metadata store as opt-in source of truth#1503

Open
ricardo-devis-agullo wants to merge 10 commits into
masterfrom
feat/metadata-scaling-option-b
Open

Add pluggable metadata store as opt-in source of truth#1503
ricardo-devis-agullo wants to merge 10 commits into
masterfrom
feat/metadata-scaling-option-b

Conversation

@ricardo-devis-agullo

Copy link
Copy Markdown
Collaborator

What

Makes a database the authoritative metadata index for which components/versions exist, while statics stay in object storage and the in-memory hot read path is unchanged. Storage-only mode remains the default and is fully non-breaking: no metadata block = today's behavior, byte-for-byte.

Design and status live in metadata-scaling-option-b.md and metadata-scaling-option-b-status.md at the repo root.

Why

Today the metadata index is a derived blob (components.json) rebuilt by a full O(registry) directory scan on startup and after every publish, with last-writer-wins concurrency across nodes. Under unbounded, AI-accelerated publishing this is CPU/GC pressure and a correctness ceiling. A queryable store turns publish into an O(1) atomic append and cross-node correctness into a UNIQUE constraint, and stays flat under growth.

Scope

Core (packages/oc)

  • ComponentRow/MetadataStore/MetadataConfig types; optional metadata on Config (presence-based enablement).
  • Shared metadata-index: one getAllComponents() hydrates both ComponentsList and ComponentsDetails; MetadataIndex.add() updates the snapshot right after publish.
  • components-cache and components-details route through the metadata index when present; storage path untouched when absent; no second DB polling loop in details.
  • Repository: initialise the store before caches; optional startup reconcileFromStorage and exportLegacyFiles; publish commits the metadata row after statics upload (insert = commit point); VERSION_ALREADY_EXISTS → existing already_exists publish error.
  • Optional MetadataStore.close() wired into registry.close() (server first, then pool) and the oc registry migrate-metadata CLI facade (finally block).
  • Metadata adapter config validation in registry-configuration.

Adapters

  • oc-metadata-adapters-utils: shared ComponentRow/MetadataStore contract + VERSION_ALREADY_EXISTS code. Core gains zero runtime DB deps.
  • oc-azure-sql-metadata-adapter: first official adapter (mssql). manageSchema DDL, getAllComponents, addVersion, close() pool lifecycle, SQL Server unique-violation (2627/2601) mapping, schemaName/tableName customisation.

Migration

  • oc registry migrate-metadata <configPath> backfills from components-details.json with a storage directory scan fallback; idempotent (existing rows skipped).

Tests

  • Metadata-mode cache/details hydration, shared snapshot reuse, repository init/publish/duplicate/concurrency/failure injection, close() wiring (repository + registry + migrate facade), migration backfill, config validation, Azure SQL adapter mocked unit tests (19 passing).
  • Azure SQL integration tests are env-var-gated on OC_METADATA_SQL_CONNECTION_STRING and skip otherwise — they still need a live SQL Server run (Docker CI or external) to execute.

Non-breaking

Full OC suite green with metadata absent (storage mode unchanged): 894 passing. Root build: 4 successful, 4 total.

Out of scope (deferred by decision)

Scheduled/background reconcileFromStorage and exportLegacyFiles; degraded DB-down cold start from legacy files; explicit readiness signaling; richer migration reporting; docs-site publishing. The S3/GS listSubDirectories pagination prerequisite is tracked separately in the opencomponents/storage-adapters repo (already merged there).

Introduce a pluggable metadata adapter (sibling to storage.adapter) that makes
a database the authoritative index for which components/versions exist, while
statics stay in object storage and the in-memory hot read path is unchanged.
Storage-only mode remains the default and is fully non-breaking (no metadata
block = today's behavior, byte-for-byte).

Core
- Add ComponentRow/MetadataStore/MetadataConfig types and optional metadata
  config on Config; presence-based enablement.
- Add shared metadata-index: one getAllComponents() hydrates both
  ComponentsList and ComponentsDetails; MetadataIndex.add() updates the
  snapshot immediately after publish.
- Route components-cache and components-details through the metadata index
  when present; storage path untouched when absent. Prevent a second DB
  polling loop in details during metadata mode.
- Repository: initialise the store before caches; optional startup
  reconcileFromStorage and exportLegacyFiles; publish commits the metadata
  row after statics upload (insert is the commit point); map
  VERSION_ALREADY_EXISTS to the existing already_exists publish error.
- Add optional MetadataStore.close() and wire it into registry.close() and
  the oc registry migrate-metadata CLI facade (finally block).
- Validate metadata adapter config in registry-configuration.

Adapters
- oc-metadata-adapters-utils: shared ComponentRow/MetadataStore contract and
  VERSION_ALREADY_EXISTS code.
- oc-azure-sql-metadata-adapter: first official adapter (mssql). manageSchema
  DDL, getAllComponents, addVersion, close() pool lifecycle, SQL Server
  unique-violation (2627/2601) mapping. Env-var-gated integration tests
  (OC_METADATA_SQL_CONNECTION_STRING), skipped otherwise.

Migration
- oc registry migrate-metadata <configPath> backfills from
  components-details.json with a storage directory scan fallback; idempotent
  (existing rows skipped).

Tests
- Metadata-mode cache/details hydration, shared snapshot reuse, repository
  init/publish/duplicate/concurrency/failure injection, close() wiring
  (repository + registry + migrate facade), migration backfill, config
  validation, Azure SQL adapter mocked unit tests.
@ricardo-devis-agullo ricardo-devis-agullo changed the title Add pluggable metadata store as opt-in source of truth (Option B) Add pluggable metadata store as opt-in source of truth Jun 22, 2026
Azure Table Storage metadata adapter that uses PartitionKey=component_name,
RowKey=version as the unique constraint — the exact concurrency model Option B
needs. Schemaless (no migrations framework), HTTP-based (no connection pool),
and works with the same Azure storage account already used for blob statics.

Implements the shared MetadataStore interface:
- isValid: connection string or endpoint + credentials + table name rules
- initialise: createTable (idempotent) or verify table accessibility
- getAllComponents: listEntities via SDK paged async iterator (auto-paginates)
- addVersion: createEntity, 409 Conflict -> VERSION_ALREADY_EXISTS
- close: clears client reference (no pool to close)

Supports connectionString, endpoint+accountName/accountKey, endpoint+sasToken,
allowInsecureConnection (Azurite), and custom tableName.

18 mocked unit tests passing; 7 integration tests env-var-gated on
OC_METADATA_TABLE_CONNECTION_STRING (skipped without it).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant