Skip to content

fix(grpc): allow credential rotation when legacy provider.type exceeds current limit#1350

Open
latenighthackathon wants to merge 1 commit into
NVIDIA:mainfrom
latenighthackathon:fix-provider-update-rejects-legacy-oversized-type
Open

fix(grpc): allow credential rotation when legacy provider.type exceeds current limit#1350
latenighthackathon wants to merge 1 commit into
NVIDIA:mainfrom
latenighthackathon:fix-provider-update-rejects-legacy-oversized-type

Conversation

@latenighthackathon
Copy link
Copy Markdown
Contributor

@latenighthackathon latenighthackathon commented May 13, 2026

Summary

openshell provider update <name> --credential ... re-validates the immutable type field on every call, stranding any legacy record whose stored type exceeds the current MAX_PROVIDER_TYPE_LEN (64). Split provider validation into create-time and update-time variants so the update path only validates fields the caller is mutating (credentials, config); immutable name/type are carried forward from the existing record without re-checking length.

Related Issue

Closes #1347.

Reported by @KodeDaemon in the NVIDIA Discord community after upgrading NemoClaw 0.0.38 to 0.0.39:

Error:   × status: InvalidArgument, message: "provider.type exceeds maximum length (79 > 64)"

openshell provider list confirmed an existing inference provider had a 79-character type value. NemoClaw's onboard re-runs openshell provider update inference --credential ... on every upgrade, and that path failed because update_provider_record rebuilt the merged Provider from existing.r#type and ran the full validate_provider_fields over it, even though the caller only touches credentials. The CLI's provider update sends r#type: "", so the existing value was preserved without modification, but the validator still measured it.

Two ways legacy data ends up oversized: (1) records that predate validate_provider_fields (added by #145, later split out in #777); (2) write paths that bypass normalize_provider_type, e.g. the TUI form at crates/openshell-tui/src/lib.rs:1563 which forwards r#type to the request directly. Either way, the only escape was provider delete + recreate, which loses any provider.config entries the caller never sees.

Changes

  • crates/openshell-server/src/grpc/validation.rs: validate_provider_fields keeps the full create-time check (name + type + credentials + config), now delegating the map checks to a new helper. validate_provider_mutable_fields validates only credentials and config; used by the update path.
  • crates/openshell-server/src/grpc/provider.rs: update_provider_record calls validate_provider_mutable_fields instead of validate_provider_fields. Immutable name/type carried forward from existing are trusted because they passed validation when stored.

Testing

  • cargo test -p openshell-server --lib grpc::provider (32 passed including new regression)
  • cargo test -p openshell-server --lib grpc::validation (76 passed)
  • cargo clippy -p openshell-server --all-targets -- -D warnings (clean)
  • cargo fmt --all -- --check (clean)
  • Unit tests added/updated
  • E2E tests added/updated (if applicable) — pure server-side validation change, covered by unit tests against the same store the gRPC handler uses

New regression update_provider_allows_credential_rotation_on_legacy_oversized_type inserts a Provider with a 79-character type directly via store.put_message (bypassing gRPC validation, simulating a legacy record), then calls update_provider_record to rotate a credential and asserts both that the update succeeds and the original 79-character type is preserved on the stored record. Existing update_provider_validates_merged_result continues to assert that an oversized incoming credential key is still rejected, so incoming-mutation validation is unchanged.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable) — no architectural change; behavior change is a one-line swap in the update handler with the rationale captured in the new helper's doc comment

Signed-off-by: latenighthackathon latenighthackathon@users.noreply.github.com

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@latenighthackathon latenighthackathon force-pushed the fix-provider-update-rejects-legacy-oversized-type branch from 063f82b to d2fe817 Compare May 13, 2026 01:41
@latenighthackathon latenighthackathon force-pushed the fix-provider-update-rejects-legacy-oversized-type branch from d2fe817 to 13d9fcd Compare May 18, 2026 02:52
@johntmyers johntmyers added the good first issue Good for newcomers label May 22, 2026
@latenighthackathon latenighthackathon force-pushed the fix-provider-update-rejects-legacy-oversized-type branch from 13d9fcd to 775a3e7 Compare May 26, 2026 02:31
@johntmyers johntmyers added test:e2e Requires end-to-end coverage gator:blocked Gator is blocked by process or repository gates labels Jun 3, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

Label test:e2e applied, but pull-request/1350 is at {"messa while the PR head is 775a3e7. A maintainer needs to comment /ok to test 775a3e7a3778957f12d3c95000c414a2d6887de8 to refresh the mirror. Once the mirror catches up, re-run Branch E2E Checks from the Actions tab.

@johntmyers
Copy link
Copy Markdown
Collaborator

johntmyers commented Jun 3, 2026

gator-agent

Gate Status: Watching Pipeline

I validated the linked bug report and PR scope: #1347 has a concrete reproduction path, and this PR is a focused server-side fix for provider credential rotation on legacy records with oversized stored provider.type values. Issue #1347 is marked gator:validated; this PR is now marked gator:watch-pipeline.

Code review of the REST-fetched diff found no blocking implementation defects. After the 2026-06-03 rebase to head 311b12deb4400628d3fb1d8b62c9aef826ffe794, I re-reviewed the updated diff; it remains the same two-file provider validation change, adjusted onto current main.

Docs check: no Fern docs update is required for this PR because it restores the existing provider update behavior for legacy records and does not add a new command, flag, output shape, setup workflow, or documented API surface.

Test label decision: test:e2e is applied because the change affects provider credential update flow.

Current status as of 2026-06-04 00:00 UTC:

  • Merge conflict blocker is resolved. GitHub reports mergeable: true, with mergeable_state: blocked while required checks are pending.
  • mise Lockfile now passes on the rebased head.
  • OpenShell / Branch Checks and OpenShell / E2E are queued/running. OpenShell / Helm Lint has passed. OpenShell / GPU E2E is not required because test:e2e-gpu is not applied.

I will keep watching this PR every 15 minutes until it closes, merges, or the sandbox session is stopped.

@johntmyers
Copy link
Copy Markdown
Collaborator

gator-agent

/ok to test 775a3e7

@johntmyers
Copy link
Copy Markdown
Collaborator

/ok to test 775a3e7

@johntmyers
Copy link
Copy Markdown
Collaborator

@latenighthackathon this def needs to be re-based to get tests passing based on some recent changes

…s current limit

Closes NVIDIA#1347. Reported by @KodeDaemon in the NVIDIA Developer Discord after
upgrading NemoClaw 0.0.38 -> 0.0.39 left the inference provider stranded
with `provider.type exceeds maximum length (79 > 64)` on every credential
update.

`update_provider_record` rebuilt the merged Provider from `existing.r#type`
and ran `validate_provider_fields` over it. The CLI's `provider update`
sends `r#type: ""`, so the existing value is preserved without modification,
but the validator still measured it. Any record whose stored type predates
current MAX_PROVIDER_TYPE_LEN (or was written by a path that bypassed
validation, e.g. the TUI form which forwards r#type to the request without
going through normalize_provider_type) became unupdateable: the only escape
was `provider delete` + recreate, which loses any provider.config entries
the caller never sees.

Split the validator into create-time (validate_provider_fields, full check
including immutable name/type) and update-time (validate_provider_mutable_fields,
checks only credentials/config maps). The update path validates only what
the caller is mutating; immutable fields carried forward from existing are
trusted because they passed validation when stored.

Tests: new regression in `update_provider_allows_credential_rotation_on_legacy_oversized_type`
inserts a Provider with a 79-char type directly via `store.put_message`
(bypassing gRPC validation), updates a credential, asserts the update
succeeds and the original type is preserved. Existing
`update_provider_validates_merged_result` still covers the case where
incoming credential entries push the merged map past limits.

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>
@latenighthackathon latenighthackathon force-pushed the fix-provider-update-rejects-legacy-oversized-type branch from 775a3e7 to 311b12d Compare June 3, 2026 23:57
@johntmyers johntmyers added gator:watch-pipeline Gator is monitoring PR CI/CD status and removed gator:blocked Gator is blocked by process or repository gates labels Jun 4, 2026
@latenighthackathon
Copy link
Copy Markdown
Contributor Author

@johntmyers thanks for taking a look at this! Re-based + fixed - cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gator:watch-pipeline Gator is monitoring PR CI/CD status good first issue Good for newcomers test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

provider update fails for legacy records with type > 64 chars

2 participants