test-stress(disk-guest): box rootfs is a bounded, isolated disk that survives a fill#623
Open
G4614 wants to merge 7 commits into
Open
test-stress(disk-guest): box rootfs is a bounded, isolated disk that survives a fill#623G4614 wants to merge 7 commits into
G4614 wants to merge 7 commits into
Conversation
681a52b to
de89600
Compare
A box must not see or exhaust the host filesystem. This integration test starts an alpine box and checks that its `/` is its own small ext4 (a few hundred MB, sized from the image — not the host's tens of millions of 1K-blocks), then fills it with dd and asserts the write hits ENOSPC rather than wandering onto the host disk, and that the VM stays Running and serving exec afterward. Covers the box-internal disk quadrant (the per-box blast radius), complementing the host-disk admission guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lf-bounded The existing test proved one box's rootfs is bounded and survives a fill, but not that boxes don't share a disk pool. Add a two-box test: fill the victim's rootfs to ENOSPC and assert the bystander keeps its free space, still accepts writes, and both VMs stay alive — the per-box disk boundary the title claims. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spawn 6 dd writers racing to fill the same rootfs and assert every one sees "No space left on device" (no hangs, no silent partial success, RC!=0 for all), and the guest agent still accepts exec afterward. Closes the gap that the single-writer fill test leaves open: a regression where one ENOSPC could wedge the rootfs for the others (stuck journal commit, EXT4 lock pile-up, agent dying on the I/O storm) would have passed the single-writer test silently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ive bystander Five gaps the existing single-writer/idle-bystander tests left open: - inode exhaustion is a separate resource axis from block fill (different ext4 code path) — mass-touch must terminate via ENOSPC, VM survives, old files stay readable. - `boxlite rm` after fill must release the box's qcow2 overlay growth on the host (≥90% of grown bytes); use `du` over the home dir (polled) so the assertion isn't fooled by `df` lag on async file removal. - A full rootfs must not be a startup-blocking condition: stop → start succeeds, /fill persists, new writes still hit ENOSPC, exec still works. - An *active* bystander box must keep its background appender progressing while a peer box fills its disk (stronger isolation than the idle-bystander check). - Augment the original survival test with degraded-mode probes: a pre-existing file (/etc/alpine-release) stays readable, and tmpfs (/tmp, a separate resource pool) still accepts writes after the rootfs fills. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8fb081c to
5bffb21
Compare
…e + append-after-inode-exhaustion Four supplements addressing the gaps the prior batch left: - assert_alive now polls for up to 10 s instead of one-shot, so a transient post-fill agent stall (I/O queue still draining) doesn't flake the test. - bound the rootfs size to 50 MiB..2 GiB (was 1 KiB..4 GiB): still rejects the host fs cleanly (a 124 GiB host is ~130 M blocks) but flags a regression that lets the image-derived sizing run away. - fill → delete → fill in one box must not roughly double the host-side qcow2 footprint. qcow2 grows monotonically unless discard / unmap is plumbed end-to-end (ext4 → virtio-blk → qcow2); the assertion catches a regression that would silently leak host disk on cache churn. - inode exhaustion and block exhaustion are independent ENOSPC paths. After mass-touch fills the inode table, appending to a pre-existing file (which needs blocks, not a new inode) must still succeed — a regression that conflated the two would silently break log-append workloads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`fill_delete_fill_does_not_double_qcow2_footprint` proves the host-side qcow2 doesn't bloat across churn; this test pins the complementary in-box user-visible flow: rootfs hits ENOSPC → df Available = 0 → `rm /fill` succeeds on a 100%-full ext4 (pure metadata op, no new block needed) → in-box df Available recovers to ≈ pre-fill → new writes succeed again. A regression that wedged metadata ops at ENOSPC, or broke ext4 block reclamation, would now fail loudly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The single-cycle recovery test pins the first time round-trip; this one catches regressions that only surface after repeated fill/delete churn — agent fd / handle leaks, ext4 journal exhaustion, qcow2 metadata growth, or any state the cgroup / mount layer accumulates per fill. Three back-to- back cycles, then a four-point health check: agent still serving exec, pre-existing files readable, new rootfs writes succeed, tmpfs unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
add tests to confirm the stability of box volume(guest)
Test plan
make test:integration:cli FILTER=stress_disk— 11 integration tests, each starting an alpine box and stressing its rootfs:box_rootfs_is_bounded_isolated_and_survives_fillddhits ENOSPC, VM staysRunning+ exec works, old files readable, tmpfs/tmpwritableconcurrent_writers_all_hit_enospc_and_vm_survivesddwriters race; every one hits ENOSPC cleanly, agent serves exec aftertwo_boxes_rootfs_disks_are_isolatedbystander_writes_keep_progressing_while_peer_fills_its_diskrootfs_inode_exhaustion_keeps_vm_alive_and_old_files_readablerootfs_inode_exhaustion_does_not_block_appending_to_existing_filesbox_restarts_cleanly_with_full_rootfsstop/startwith full rootfs succeeds,/fillpersists, new writes still ENOSPCrm_after_fill_recovers_in_box_free_space_and_writes_resumermon a 100 %-full ext4 succeeds (metadata-only op), in-boxdfrecovers, writes resumethree_fill_delete_cycles_leave_the_box_serving_normallyrm_after_fill_releases_host_disk_spaceboxlite rm -freleases ≥ 90 % of host qcow2 footprint (hostdu+ 30 s poll for async cleanup)rootfs_fill_delete_fill_does_not_double_qcow2_footprint