Skip to content

Make secure token storage the default storage mode#5272

Open
simonfaltum wants to merge 6 commits into
mainfrom
simonfaltum/cli-ga-ms4-secure-default
Open

Make secure token storage the default storage mode#5272
simonfaltum wants to merge 6 commits into
mainfrom
simonfaltum/cli-ga-ms4-secure-default

Conversation

@simonfaltum
Copy link
Copy Markdown
Member

@simonfaltum simonfaltum commented May 19, 2026

Why

Part of CLI GA. Storing long-lived U2M refresh tokens in a plain JSON file in the user's home directory is a security weakness: any process with home-directory access can read them. Now that the CLI is being positioned as a building block for local agent workflows, we want tokens in the OS-native secure store by default.

Changes

Before: databricks-cli auth tokens were written to ~/.databricks/token-cache.json. Setting DATABRICKS_AUTH_STORAGE=secure opted in to the OS keyring.

Now: tokens are written to the OS keyring (macOS Keychain, Windows Credential Manager, Linux Secret Service) by default. Setting DATABRICKS_AUTH_STORAGE=plaintext (or [__settings__] auth_storage = plaintext in ~/.databrickscfg) opts back to the file cache; env wins over config. Users re-run databricks auth login once after upgrade. Old tokens in token-cache.json are not migrated.

The login-time fallback that silently downgrades to plaintext when the user did not explicitly ask for secure and the keyring is unreachable (already on main as dormant infrastructure) is now active.

After a successful keyring write, the resolved mode is now pinned (auth_storage = secure written to [__settings__]). From then on, the resolver sees source=Config (explicit), so a transient keyring probe failure on a later login surfaces as an error instead of silently demoting a working user to plaintext. Pin is best-effort: persistence errors are logged at debug and never block login.

Implementation:

  • libs/auth/storage/mode.go: resolver default flips from StorageModePlaintext to StorageModeSecure. Comments on the constants and the resolver doc updated.
  • libs/auth/storage/cache.go: removed "dormant today" comments that no longer apply; added PinSecureMode(ctx, mode).
  • cmd/auth/login.go + cmd/auth/token.go: call storage.PinSecureMode(ctx, mode) after each persistentAuth.Challenge() succeeds (main login, discoveryLogin, runInlineLogin).
  • Unit tests in libs/auth/storage/ and cmd/auth/describe_test.go updated for the new default. New TestPinSecureMode table-driven cases plus idempotence and persist-failure swallowing.
  • acceptance/script.prepare: forces DATABRICKS_AUTH_STORAGE=plaintext at the root so existing auth acceptance tests keep exercising the file-backed path. Tests that want the resolver default override it.
  • acceptance/cmd/auth/describe/u2m-plaintext-default renamed to u2m-secure-default; output updated to assert secure mode is reported as the default. A [[Repls]] regex in its test.toml normalizes the platform-dependent keyring lookup error.
  • acceptance/cmd/auth/describe/u2m-json-output: regenerated JSON output reflects the new default. The jq filter on .token_storage keeps output deterministic.
  • NEXT_CHANGELOG.md: breaking-change entry under Notable Changes documenting the flip, the re-login requirement, and both opt-out paths.

Test plan

  • ./task checks clean
  • ./task lint-q clean
  • go test ./libs/auth/... ./cmd/auth/... ./libs/databrickscfg/... passes
  • go test ./acceptance -run 'TestAccept/cmd/auth' passes on macOS
  • go test ./acceptance -run 'TestAccept/cmd/configure' passes (covers a databricks-cli auth path outside cmd/auth)
  • Verify on Linux CI: the u2m-secure-default acceptance test relies on a [[Repls]] regex to canonicalize the keyring lookup error. If Linux output diverges in an unexpected way (e.g. error appears on a different line), the regex needs tightening.
  • Manual: with DATABRICKS_AUTH_STORAGE unset, databricks auth login --profile X writes to the keyring and persists auth_storage = secure to [__settings__].
  • Manual: DATABRICKS_AUTH_STORAGE=plaintext databricks auth login --profile X continues to write to ~/.databricks/token-cache.json with the host-key dual-write entry; [__settings__] is not modified.

This pull request and its description were written by Isaac.

U2M tokens for the databricks-cli auth type now write to the OS-native
keyring by default. Users who need the previous file-backed cache can
opt back via DATABRICKS_AUTH_STORAGE=plaintext or auth_storage =
plaintext under [__settings__] in .databrickscfg; the env var takes
precedence. The login-time keyring probe and fallback (already on main)
activate with this change.

Co-authored-by: Isaac
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

Approval status: pending

/cmd/auth/ - needs approval

4 files changed
Suggested: @tanmay-db
Also eligible: @mihaimitrea-db, @tejaskochar-db, @hectorcast-db, @renaudhartert-db, @parthban-db, @Divyansh-db, @chrisst, @rauchy

/libs/auth/ - needs approval

4 files changed
Suggested: @tanmay-db
Also eligible: @mihaimitrea-db, @tejaskochar-db, @hectorcast-db, @renaudhartert-db, @parthban-db, @Divyansh-db, @chrisst, @rauchy

General files (require maintainer)

8 files changed
Based on git history:

  • @pietern -- recent work in cmd/auth/, ./, acceptance/

Any maintainer (@andrewnester, @anton-107, @denik, @pietern, @shreyas-goenka, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

When the resolver returns secure from the default (no env, no config),
login now writes auth_storage = secure to [__settings__] after the
keyring Store succeeds. Subsequent invocations see source=Config, so the
explicit-secure branch of applyLoginFallback surfaces a transient keyring
probe failure as an error instead of silently demoting the user to
plaintext. Without this pin, a working secure-storage user could get
stranded on the file cache after a single flaky probe.

No-op when mode is plaintext (silent fallback already happened) or when
the user already chose a mode explicitly. Persistence failures are
logged at debug and never block login.

Co-authored-by: Isaac
- login.go: move PinSecureMode call out of the existing comment block so
  the "At this point... / The rest of the command focuses on" narration
  stays together
- cache.go: trim PinSecureMode doc comment and acknowledge that
  concurrent logins racing the write is benign because both write the
  same value
- cache_test.go: drop the unused wantSkipMsg struct field; strengthen
  TestPinSecureMode_PersistFailureIsSwallowed to assert no file was
  written (and that the underlying os.OpenFile failure is the real
  trigger)
- u2m-secure-default test.toml: rephrase the fixture comment to keep
  internal Go API names out of test config

Co-authored-by: Isaac
After the default flipped to secure, any test that runs the login
command on linux (no D-Bus) hits applyLoginFallback, which silently
persists auth_storage = plaintext to whatever DATABRICKS_CONFIG_FILE
points at. TestProfileHostCompatibleViaCobra points it at the checked-
in cmd/auth/testdata/.databrickscfg fixture, so the test run leaves a
dirty working tree and CI's `git diff --exit-code` step fails.

Two changes:

1. Move ResolveCacheForLogin in login.go to run after input validation
   (cluster/serverless mutex + positional-arg check) rather than before.
   Trivially-invalid commands now fail without probing the keyring, so
   TestLoginRejectsPositionalArgWithHostFlag / WithProfileFlag no
   longer hit applyLoginFallback. The "resolve before browser step"
   property the original comment cared about is preserved: cache
   resolution still happens before NewPersistentAuth and Challenge.

2. Force plaintext via DATABRICKS_AUTH_STORAGE in
   TestProfileHostCompatibleViaCobra, which legitimately passes all
   input validation and reaches the resolver. The test is about flag
   compatibility, not storage mode; pinning it to plaintext keeps it
   hermetic.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant