Skip to content

nasbackup.sh: add timeout, cleanup trap, space check, quiesce support#12843

Open
jmsperu wants to merge 1 commit intoapache:4.20from
jmsperu:fix/nasbackup-timeout-spacecheck-trap
Open

nasbackup.sh: add timeout, cleanup trap, space check, quiesce support#12843
jmsperu wants to merge 1 commit intoapache:4.20from
jmsperu:fix/nasbackup-timeout-spacecheck-trap

Conversation

@jmsperu
Copy link

@jmsperu jmsperu commented Mar 17, 2026

Summary

  • Add BACKUP_TIMEOUT (default 6 hours) to prevent indefinitely stuck backup jobs; aborts via domjobabort when exceeded
  • Add EXIT trap with cleanup() that resumes paused VMs, removes temp dirs, and unmounts NFS on any exit (error, signal, or normal) — prevents orphan mounts from accumulating
  • Add check_free_space() pre-flight check (default 1 GB minimum) before writing backup data
  • Add -q/--quiesce flag for optional fsfreeze/thaw via qemu-guest-agent for application-consistent backups
  • Use set -eo pipefail for stricter error handling
  • Fix mount_operation(): proper if mount instead of broken $? check after pipe
  • Quote all variable expansions to prevent word splitting
  • Remove manual umount/rmdir from functions (now handled by trap)

Motivation

The current nasbackup.sh has several reliability issues observed in production:

  1. No timeout — backup polling loop (until domjobinfo --completed) runs forever if backup stalls, blocking the agent
  2. No cleanup on failure — failed backups leave orphan NFS mounts (/tmp/csbackup.XXXXX) and paused VMs that never resume (related: KVM NAS backup: VM remains paused indefinitely when backup job fails #12821)
  3. No space check — backups silently fail or produce corrupt output when the NAS is full
  4. No quiesce — running VM backups are crash-consistent only, even when qemu-guest-agent is available
  5. Broken error handling$? after a pipe always returns the pipe's exit status, not the command's

Test plan

  • Backup a running VM with sufficient space — verify backup completes normally
  • Backup a running VM with BACKUP_TIMEOUT=30 — verify timeout and domjobabort
  • Backup with NAS at <1GB free — verify pre-flight rejection
  • Kill backup mid-run — verify cleanup resumes VM and unmounts NFS
  • Backup with -q flag and qemu-guest-agent installed — verify fsfreeze/thaw in logs
  • Backup with -q flag without qemu-guest-agent — verify graceful fallback
  • Delete backup — verify cleanup trap handles unmount
  • Backup a stopped VM — verify space check and error handling on disk convert

…ror handling

- Add BACKUP_TIMEOUT (default 6h) to prevent indefinitely stuck backup jobs;
  aborts via domjobabort when exceeded
- Add EXIT trap with cleanup() that resumes paused VMs, removes temp dirs,
  and unmounts NFS on any exit (error, signal, or normal)
- Add check_free_space() pre-flight check (default 1 GB minimum)
- Add -q/--quiesce flag for optional fsfreeze/thaw via qemu-guest-agent
- Use set -eo pipefail for stricter error handling
- Fix mount_operation: proper if/then instead of broken $? after pipe
- Quote all variable expansions to prevent word splitting
- Remove manual umount/rmdir from functions (handled by trap)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jmsperu jmsperu force-pushed the fix/nasbackup-timeout-spacecheck-trap branch from 1728663 to 937e646 Compare March 17, 2026 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant