Skip to content

lakebox: bugfixes — start help text, create default-clobber, keyHash trailing-newline, plus UX#5460

Merged
akshaysingla-db merged 1 commit into
databricks:demo-lakeboxfrom
akshaysingla-db:akshay/lakebox-bugfix-bundle
Jun 8, 2026
Merged

lakebox: bugfixes — start help text, create default-clobber, keyHash trailing-newline, plus UX#5460
akshaysingla-db merged 1 commit into
databricks:demo-lakeboxfrom
akshaysingla-db:akshay/lakebox-bugfix-bundle

Conversation

@akshaysingla-db
Copy link
Copy Markdown
Collaborator

Summary

Three correctness fixes and several small UX cleanups, surfaced by a final ship-readiness audit. Bundled because each is small and they all sit on the same register → ssh path that gets exercised together.

BLOCKERs (real bugs)

1. start.go documented timeout was half the actual timeout

Long: said "blocks until the sandbox reaches Running (or up to 5 minutes)" while startWaitTimeout = 10 * time.Minute. Fix the text.

2. create.go clobbered the saved default on any transient API error

Previous code:

if _, err := api.get(currentDefault); err != nil {
    shouldSetDefault = true   // any error
}

A 500, network blip, or rate-limit silently overwrote the user's chosen default. Every sibling (delete, default, ssh) already uses errors.Is(err, apierr.ErrNotFound)create now matches.

3. keyHash included a trailing newline in its sha256 input

.pub files end with \n. Locally-generated keys are fine because ssh-keygen -C "lakebox" always adds a comment that gets stripped before the \n is reached. But a user-supplied key without -C would produce a hash that doesn't match the server's, making verifyKeyRegistered falsely report "key not registered". Fix: strings.TrimSpace the input first. Two new test cases pin the behavior for \n and surrounding whitespace.

UX fixes

  • status.go exposed fqdnapi.go documents FQDN as the manager's internal routing host, not user-actionable. Dropped from non-JSON output (still in -o json for completeness).
  • status.go 404 message — wrapped as "failed to get lakebox X: ..." while delete/default/ssh returned the friendlier "no lakebox named X — databricks lakebox list shows available IDs". Now consistent.
  • ui.go warn() switched to yellow — was cyan, indistinguishable from the success ✓ marker.
  • delete.go clean cancel — returning errors.New("aborted") surfaced as Error: aborted + non-zero exit. Now prints Cancelled. and returns nil.
  • Consistent sentence-case across all warn() messages.

Test plan

  • go test ./cmd/lakebox/... passes (including 2 new keyHash cases for trailing-newline and leading/trailing whitespace)
  • go build ./... clean
  • gofmt -l cmd/lakebox/ clean

Out of scope

Acceptance test scaffold for cmd/lakebox/ (the bigger gap surfaced by the same audit) is its own PR — these are zero today and adding them is a separate larger change.

This pull request and its description were written by Isaac.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 7, 2026

Waiting for approval

Based on git history, these people are best suited to review:

  • @pietern -- recent work in cmd/lakebox/

Eligible reviewers: @andrewnester, @anton-107, @denik, @renaudhartert-db, @shreyas-goenka, @simonfaltum

Suggestions based on git history. See OWNERS for ownership rules.

…sh trailing-newline, plus UX

Three correctness fixes (BLOCKERs from a final ship-readiness audit) and
several smaller UX cleanups, bundled because each is small and they all
sit on the same `register → ssh` path that gets exercised together.

BLOCKERs:

1. `start.go` `Long:` said "blocks until the sandbox reaches Running
   (or up to 5 minutes)" while the actual `startWaitTimeout` constant
   is 10 minutes. Documented bound mismatched the enforced bound — fix
   the text.

2. `create.go` was using `if _, err := api.get(currentDefault); err != nil`
   to decide whether to overwrite the saved default. Any non-404 error
   (transient 5xx, network blip, rate limit) silently clobbered the
   user's default. Every sibling (`delete`, `default`, `ssh`) already
   uses `errors.Is(err, apierr.ErrNotFound)` for this — `create` now
   matches.

3. `keyHash` included a trailing newline in its sha256 input for `.pub`
   files without an OpenSSH comment. Locally-generated keys are fine
   (`ssh-keygen -C "lakebox"` always adds a comment), but a user-supplied
   key without `-C` would produce a hash that doesn't match the server's,
   making `verifyKeyRegistered` falsely report "key not registered". Fix:
   `strings.TrimSpace` the input first. New test cases pin the behavior
   for `\n` and surrounding whitespace.

UX fixes:

- `status.go` displayed an `fqdn` field even though `api.go` documents
  FQDN as the manager's internal routing host — drop from non-JSON
  output (still present in `-o json` for completeness).
- `status.go` wrapped 404 errors as "failed to get lakebox X: ..." while
  `delete`, `default`, and `ssh` all return the friendlier "no lakebox
  named X — `databricks lakebox list` shows available IDs". Add the
  same branch.
- `ui.go` `warn()` used cyan, the same color as `ok()` and the spinner
  ✓ marker — so a warning was visually indistinguishable from success.
  Switch to yellow (already in the palette via `cmdio.Yellow`).
- `delete.go` returned `errors.New("aborted")` on user-cancel, which
  surfaced as `Error: aborted` with a non-zero exit. Print "Cancelled."
  and return nil instead.
- Inconsistent capitalization across `warn()` messages (some sentence
  case, some lowercase) — standardize on sentence case to match the
  rest of the repo.

Co-authored-by: Isaac
@akshaysingla-db akshaysingla-db force-pushed the akshay/lakebox-bugfix-bundle branch from 6dfcba4 to 654c0d2 Compare June 8, 2026 15:09
@akshaysingla-db
Copy link
Copy Markdown
Collaborator Author

/merge

@akshaysingla-db akshaysingla-db merged commit 93691a9 into databricks:demo-lakebox Jun 8, 2026
12 of 20 checks passed
akshaysingla-db added a commit that referenced this pull request Jun 8, 2026
## Summary

Bootstraps `acceptance/cmd/lakebox/` — lakebox had zero acceptance tests
today. Six leaf scenarios cover the rendered output of every subcommand
surface that's exercised by user-visible flows.

| Test | What it locks down |
|---|---|
| `list/empty` | Empty result → `No lakeboxes found.` hint |
| `list/with-entries` | Multi-row table: NAME column always present,
mixed `running`/`stopped`/`creating…` statuses, autostop label including
the `noAutostop=true` "never" case |
| `status/not-found` | 404 → friendly `no lakebox named "X" —
`databricks lakebox list` shows available IDs` (the message added in
#5460) |
| `ssh-key/list/empty` | Empty result → "Run `databricks lakebox
register` to add one" hint |
| `config/no-flags` | No flags → `nothing to update` error |
| `config/idle-timeout-bounds` | Out-of-range values (`30s`, `48h`) →
client-side bounds error before any API call, with the
duration-formatted bounds from #5372 |

Each test stubs the relevant API endpoint via the testserver framework's
`[[Server]]` entries. The two `config` tests need no stub because
validation errors fire before any API call. Parent
`acceptance/cmd/lakebox/test.toml` adds `.databricks` to `Ignore` so the
CLI's local state file (`lakebox.json`) doesn't bleed into the test
directory's diff.

All six pass under both `DATABRICKS_BUNDLE_ENGINE=terraform` and
`=direct` with identical output, which is correct — lakebox is
engine-independent.

These would have caught the FQDN-displayed-in-status inconsistency and
the `start.go` 5-vs-10-minute help-text mismatch that the bugfix-bundle
PR (#5460) caught manually.

## Test plan

- [x] `go test ./acceptance -run TestAccept/cmd/lakebox` passes (12
subtests, both engine variants)
- [x] `-update` produces stable golden files (re-run without `-update`
is clean)

## Out of scope (potential follow-ups)

- Acceptance coverage for write-path commands (`create`, `delete`,
`start`, `stop`, `default`, `register`) — these are interactive or have
stateful side effects worth a separate, more involved test design.
- Coverage for the `ssh` command — that path execs `ssh` directly so it
isn't testable through this framework without an SSH server stub.

This pull request and its description were written by Isaac.
akshaysingla-db added a commit that referenced this pull request Jun 8, 2026
)

## Summary

Second batch of lakebox acceptance tests, following #5474. Covers 14
leaf scenarios on the write/connection-path commands and JSON output
shapes that weren't in the first batch.

| Test | What it locks down |
|---|---|
| `list/json` | `-o json` shape (no FQDN field, gatewayHost present,
idleTimeout serialized as Duration string) |
| `status/running` | Happy-path text output with all rendered fields |
| `status/json` | JSON shape — verifies the FQDN-removal in #5460 didn't
get reverted |
| `ssh-key/list/with-entries` | Multi-row table including unnamed key
falling back to `(unset)` |
| `ssh-key/delete/success` | DELETE wire route + confirmation message |
| `delete/success` | Happy path with `--auto-approve` to bypass the
prompt |
| `delete/not-found` | 404 → friendly `no lakebox named X` (the
consistency fix from #5460) |
| `delete/no-tty-no-auto-approve` | Non-interactive context fast-fails
pointing at the `--auto-approve` flag |
| `default/set` | Set default → confirmation message |
| `default/not-found` | 404 → friendly error (same message as delete) |
| `start/already-running` | API returns Running → CLI short-circuits
with "Already running X" (skips waitForRunning) |
| `stop/success` | Happy path with refreshed status |
| `config/update-name` | PATCH wire route + post-update output rendering
|
| `create/with-name` | POST wire route + stdout ID + default-set side
effect |

## Two test-infrastructure additions

1. **`acceptance/cmd/lakebox/script.prepare`** sets `HOME=$TEST_TMP_DIR`
so each test's `~/.databricks/lakebox.json` is isolated. Without this,
parallel lakebox tests race each other writing to the shared HOME-rooted
state file and one read sees the other's half-written content. Auth is
unaffected — the framework passes `DATABRICKS_HOST` / `DATABRICKS_TOKEN`
explicitly via env, so the CLI doesn't fall back to `~/.databrickscfg`.

2. **`EnvMatrix.DATABRICKS_BUNDLE_ENGINE = []`** in the parent
`test.toml` overrides the root matrix to skip the per-engine variants.
Lakebox is engine-independent and running both variants in parallel was
the original symptom of the HOME race.

Two of the ssh-key tests also pin the test key hash via a high-priority
`[[Repls]]` entry so the framework's default 3+-digit-run → `[NUMID]`
regex doesn't shred it mid-hash.

## Test plan

- [x] `go test ./acceptance -run TestAccept/cmd/lakebox` — all 20 tests
pass (6 from #5474 + 14 new)
- [x] `./task lint` clean
- [x] Re-running with `-update` and then without is idempotent (no
drift)

## Still out of scope

- `register` — generates and reads real SSH keys via `ssh-keygen`; needs
a heavier fixture / process stub
- `ssh` — `execv`s into ssh; not testable through this framework

This pull request and its description were written by Isaac.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant