Skip to content

test-stress(disk-guest): box rootfs is a bounded, isolated disk that survives a fill#623

Open
G4614 wants to merge 7 commits into
boxlite-ai:mainfrom
G4614:test/box-rootfs-disk-bounded
Open

test-stress(disk-guest): box rootfs is a bounded, isolated disk that survives a fill#623
G4614 wants to merge 7 commits into
boxlite-ai:mainfrom
G4614:test/box-rootfs-disk-bounded

Conversation

@G4614
Copy link
Copy Markdown
Contributor

@G4614 G4614 commented May 29, 2026

add tests to confirm the stability of box volume(guest)

  1. each box has an independent volume with boundary
  2. when volume fill, the micro-VM is still running, rm file can be operated (to recover)

Test plan

make test:integration:cli FILTER=stress_disk — 11 integration tests, each starting an alpine box and stressing its rootfs:

test property
box_rootfs_is_bounded_isolated_and_survives_fill rootfs is its own image-sized ext4 (50 MiB..2 GiB, not the host), dd hits ENOSPC, VM stays Running + exec works, old files readable, tmpfs /tmp writable
concurrent_writers_all_hit_enospc_and_vm_survives 6 same-box dd writers race; every one hits ENOSPC cleanly, agent serves exec after
two_boxes_rootfs_disks_are_isolated fill victim → idle bystander's free space, writability, liveness all unchanged
bystander_writes_keep_progressing_while_peer_fills_its_disk an active bystander's background appender keeps progressing through the peer's fill
rootfs_inode_exhaustion_keeps_vm_alive_and_old_files_readable mass-touch exhausts ext4's inode table (separate ENOSPC path from block fill); VM survives, old reads OK
rootfs_inode_exhaustion_does_not_block_appending_to_existing_files inode-full but blocks free → appending to a pre-existing file (no new inode) still works
box_restarts_cleanly_with_full_rootfs stop/start with full rootfs succeeds, /fill persists, new writes still ENOSPC
rm_after_fill_recovers_in_box_free_space_and_writes_resume rm on a 100 %-full ext4 succeeds (metadata-only op), in-box df recovers, writes resume
three_fill_delete_cycles_leave_the_box_serving_normally 3 fill→rm cycles → agent/reads/writes/tmpfs all healthy afterwards (no per-cycle leak)
rm_after_fill_releases_host_disk_space boxlite rm -f releases ≥ 90 % of host qcow2 footprint (host du + 30 s poll for async cleanup)
rootfs_fill_delete_fill_does_not_double_qcow2_footprint discard / unmap plumbed end-to-end (ext4 → virtio-blk → qcow2) — fill→delete→fill doesn't roughly double the qcow2

@G4614 G4614 changed the title test(disk): box rootfs is a bounded, isolated disk that survives a fill test-stress(disk): box rootfs is a bounded, isolated disk that survives a fill May 29, 2026
@G4614 G4614 force-pushed the test/box-rootfs-disk-bounded branch from 681a52b to de89600 Compare June 1, 2026 03:18
Ubuntu and others added 4 commits June 1, 2026 04:20
A box must not see or exhaust the host filesystem. This integration test starts
an alpine box and checks that its `/` is its own small ext4 (a few hundred MB,
sized from the image — not the host's tens of millions of 1K-blocks), then fills
it with dd and asserts the write hits ENOSPC rather than wandering onto the host
disk, and that the VM stays Running and serving exec afterward. Covers the
box-internal disk quadrant (the per-box blast radius), complementing the
host-disk admission guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lf-bounded

The existing test proved one box's rootfs is bounded and survives a fill, but
not that boxes don't share a disk pool. Add a two-box test: fill the victim's
rootfs to ENOSPC and assert the bystander keeps its free space, still accepts
writes, and both VMs stay alive — the per-box disk boundary the title claims.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spawn 6 dd writers racing to fill the same rootfs and assert every one sees
"No space left on device" (no hangs, no silent partial success, RC!=0 for
all), and the guest agent still accepts exec afterward. Closes the gap that
the single-writer fill test leaves open: a regression where one ENOSPC could
wedge the rootfs for the others (stuck journal commit, EXT4 lock pile-up,
agent dying on the I/O storm) would have passed the single-writer test
silently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ive bystander

Five gaps the existing single-writer/idle-bystander tests left open:

- inode exhaustion is a separate resource axis from block fill (different
  ext4 code path) — mass-touch must terminate via ENOSPC, VM survives, old
  files stay readable.
- `boxlite rm` after fill must release the box's qcow2 overlay growth on
  the host (≥90% of grown bytes); use `du` over the home dir (polled) so
  the assertion isn't fooled by `df` lag on async file removal.
- A full rootfs must not be a startup-blocking condition: stop → start
  succeeds, /fill persists, new writes still hit ENOSPC, exec still works.
- An *active* bystander box must keep its background appender progressing
  while a peer box fills its disk (stronger isolation than the idle-bystander
  check).
- Augment the original survival test with degraded-mode probes: a
  pre-existing file (/etc/alpine-release) stays readable, and tmpfs (/tmp,
  a separate resource pool) still accepts writes after the rootfs fills.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 force-pushed the test/box-rootfs-disk-bounded branch from 8fb081c to 5bffb21 Compare June 1, 2026 04:20
Ubuntu and others added 3 commits June 1, 2026 04:47
…e + append-after-inode-exhaustion

Four supplements addressing the gaps the prior batch left:

- assert_alive now polls for up to 10 s instead of one-shot, so a transient
  post-fill agent stall (I/O queue still draining) doesn't flake the test.
- bound the rootfs size to 50 MiB..2 GiB (was 1 KiB..4 GiB): still rejects
  the host fs cleanly (a 124 GiB host is ~130 M blocks) but flags a
  regression that lets the image-derived sizing run away.
- fill → delete → fill in one box must not roughly double the host-side
  qcow2 footprint. qcow2 grows monotonically unless discard / unmap is
  plumbed end-to-end (ext4 → virtio-blk → qcow2); the assertion catches a
  regression that would silently leak host disk on cache churn.
- inode exhaustion and block exhaustion are independent ENOSPC paths.
  After mass-touch fills the inode table, appending to a pre-existing file
  (which needs blocks, not a new inode) must still succeed — a regression
  that conflated the two would silently break log-append workloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`fill_delete_fill_does_not_double_qcow2_footprint` proves the host-side
qcow2 doesn't bloat across churn; this test pins the complementary in-box
user-visible flow: rootfs hits ENOSPC → df Available = 0 → `rm /fill`
succeeds on a 100%-full ext4 (pure metadata op, no new block needed) →
in-box df Available recovers to ≈ pre-fill → new writes succeed again.
A regression that wedged metadata ops at ENOSPC, or broke ext4 block
reclamation, would now fail loudly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The single-cycle recovery test pins the first time round-trip; this one
catches regressions that only surface after repeated fill/delete churn —
agent fd / handle leaks, ext4 journal exhaustion, qcow2 metadata growth,
or any state the cgroup / mount layer accumulates per fill. Three back-to-
back cycles, then a four-point health check: agent still serving exec,
pre-existing files readable, new rootfs writes succeed, tmpfs unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 marked this pull request as ready for review June 1, 2026 06:14
@G4614 G4614 changed the title test-stress(disk): box rootfs is a bounded, isolated disk that survives a fill test-stress(disk-guest): box rootfs is a bounded, isolated disk that survives a fill Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant