Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
2d11185
fix: upgrade linux dockerfile to debian 12
todthomson May 7, 2026
c0540dd
fix: upgrade apt deps libicu72 and libssl3
todthomson May 7, 2026
1f08e49
fix: add apt-utils to deps to avoid warning => debconf: delaying pack…
todthomson May 7, 2026
d39a5b1
tidy: remove explicit install of things already installed by default
todthomson May 7, 2026
0436965
fix: suppress stderr warning of dpkg interactive config
todthomson May 9, 2026
9313b9f
revert: doesn't help
todthomson May 11, 2026
3b4c696
test: add E2E smoke test for Linux Tentacle Docker image
todthomson May 12, 2026
f732c6d
+semver: major
todthomson May 12, 2026
61cf3f6
include `-y --no-install-recommends` even though it likely doesn't ma…
todthomson May 12, 2026
4754e74
fix: drop undeclared python3 dep from smoke test
todthomson May 12, 2026
0ac4f49
fix: use mktemp for transient compose override
todthomson May 12, 2026
1573bfe
fix: use mktemp for .env backup path
todthomson May 12, 2026
a60a983
fix: put X's at end of mktemp templates for BSD portability
todthomson May 12, 2026
f01217f
fix: use grep -qF for literal "Configuration successful." match
todthomson May 20, 2026
15a08f5
docs: clarify that API_KEY is the sibling repo's dev-only sentinel
todthomson May 20, 2026
2895556
fix: hard-fail when Debian 12 is missing from AdHocScript output
todthomson May 20, 2026
a9ac166
fix: use mktemp for upsert_env_var temp file
todthomson May 20, 2026
6000ef2
feat: allow OCTOPUS_LICENSE_BASE64 to bypass 1Password lookup
todthomson May 20, 2026
44922e9
fix: look up worker by per-run TargetName instead of "highest Workers-N"
todthomson May 20, 2026
3e392f0
fix: deregister worker in teardown to keep test idempotent
todthomson May 20, 2026
dca6d6a
cleanup: combine two apt-get steps into one
todthomson May 20, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 7 additions & 20 deletions docker/linux/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,31 +1,18 @@
FROM debian:11-slim
FROM debian:12-slim
ENV ASPNETCORE_URLS=http://+:80 DOTNET_RUNNING_IN_CONTAINER=true
RUN apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates \
libc6 \
libgcc1 \
libgssapi-krb5-2 \
libicu67 \
libssl1.1 \
libstdc++6 \
zlib1g && \
libicu72 \
libssl3 \
xxd && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

ARG BUILD_NUMBER
ARG BUILD_DATE

RUN apt-get update && \
apt-get install -y \
curl \
dos2unix \
jq \
sudo \
xxd \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

EXPOSE 10933

WORKDIR /tmp
Expand All @@ -43,11 +30,11 @@ RUN /install-scripts/install-docker.sh
# Install Tentacle
COPY _artifacts/deb/tentacle_${BUILD_NUMBER}_amd64.deb /tmp/
RUN apt-get update && \
apt install ./tentacle_${BUILD_NUMBER}_amd64.deb && \
apt-get install -y --no-install-recommends ./tentacle_${BUILD_NUMBER}_amd64.deb && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
ln -s /opt/octopus/tentacle/Tentacle /usr/bin/tentacle

WORKDIR /

# We know this won't reduce the image size at all. It's just to make the filesystem a little tidier.
Expand Down
287 changes: 287 additions & 0 deletions scripts/smoke-test-linux-tentacle.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,287 @@
#!/usr/bin/env bash
#
# End-to-end smoke test for the Linux Tentacle Docker image (EFT-3311).
#
# Builds the image from the .deb in _artifacts/deb, brings up a local Octopus
# Server in the sibling OctopusDeploy repo, registers the Tentacle as a worker,
# runs a hello-world AdHocScript on it, and asserts success.
#
# Required tools: docker, curl, jq.
# Required state: a built .deb in ../_artifacts/deb/tentacle_*_amd64.deb and the
# OctopusDeploy repo checked out alongside OctopusTentacle.
#
# License source: set $OCTOPUS_LICENSE_BASE64 to a base64-encoded Octopus license
# to skip the 1Password lookup (this is the path CI runners should use). When
# the env var is unset, the script falls back to `op read` against 1Password
# for local-dev use, in which case `op` must be installed and signed in.
#
# Note on $API_KEY below: "API-APIKEY01" is the well-known dev sentinel API key
# provisioned by the sibling OctopusDeploy repo's docker-compose stack for its
# local-only Server instance. It is not a real secret and is safe to commit.

set -euo pipefail

TENTACLE_REPO="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
SERVER_REPO="${SERVER_REPO:-$(cd "$TENTACLE_REPO/../OctopusDeploy" && pwd)}"
ENV_FILE="$SERVER_REPO/.env"
# .env backup path is assigned via mktemp in Step 2 (after we know $ENV_FILE
# exists). Using a unique per-run path avoids clobbering a stale backup from
# a previously-crashed run.
ENV_BACKUP=""
# Transient compose override: disables Docker-in-Docker on the Tentacle. The
# default tentacle entrypoint launches a dockerd daemon, which requires the
# container to run with `--privileged`; without that the daemon fails and its
# wrapper script kills the Tentacle agent. Setting DISABLE_DIND=Y skips it.
# Created via mktemp in Step 4 so we never clobber an unrelated file the user
# may already have in the sibling repo.
OVERRIDE_COMPOSE=""

Comment thread
todthomson marked this conversation as resolved.
API="http://localhost:8065/api"
API_KEY="API-APIKEY01"
H="X-Octopus-ApiKey: $API_KEY"
IMAGE_TAG="smoke-debian12"
ONEPASSWORD_LICENSE_REF="op://software licencing/octopus deploy ultimate license key base64/value"

# Per-run worker name. Tagging the worker with a unique name (rather than
# relying on the container hostname / a "highest Workers-N" heuristic) keeps
# the test idempotent across reused Server DB volumes and lets teardown find
# the exact worker this run registered.
WORKER_TARGET_NAME="smoke-tentacle-$(date +%Y%m%d-%H%M%S)-$$"
# Populated in Step 5 once the Server confirms registration; used by teardown
# to deregister the worker via DELETE so the workers list doesn't grow
# monotonically across runs that share a Server DB volume.
WORKER_ID=""

log() { printf '\033[1;34m[smoke]\033[0m %s\n' "$*"; }
warn() { printf '\033[1;33m[smoke]\033[0m %s\n' "$*" >&2; }
die() { printf '\033[1;31m[smoke]\033[0m %s\n' "$*" >&2; exit 1; }

require() { command -v "$1" >/dev/null || die "Missing required tool: $1"; }
require docker
require curl
require jq
Comment thread
todthomson marked this conversation as resolved.
# `op` is only required when OCTOPUS_LICENSE_BASE64 is not pre-set (local-dev path).
[[ -n "${OCTOPUS_LICENSE_BASE64:-}" ]] || require op

teardown() {
local exit_code=$?
log "--- teardown ---"
# Deregister the worker first, while the Server is still up. Best-effort:
# if the Server is already dead or the worker never registered, we just
# move on — the goal is to keep the workers list clean across runs.
if [[ -n "$WORKER_ID" ]]; then
log "Deregistering worker $WORKER_ID"
curl -fsS -X DELETE -H "$H" "$API/workers/$WORKER_ID" >/dev/null 2>&1 || true
fi
if [[ -n "$OVERRIDE_COMPOSE" && -f "$OVERRIDE_COMPOSE" ]]; then
(cd "$SERVER_REPO" && docker compose -f docker-compose.yml -f "$OVERRIDE_COMPOSE" --profile tentacle down 2>/dev/null) || true
rm -f "$OVERRIDE_COMPOSE"
fi
(cd "$SERVER_REPO" && docker compose down 2>/dev/null) || true
if [[ -n "$ENV_BACKUP" && -f "$ENV_BACKUP" ]]; then
mv "$ENV_BACKUP" "$ENV_FILE"
log "Restored $ENV_FILE"
fi
exit "$exit_code"
}
trap teardown EXIT

###############################################################################
# Step 1: Build the Linux Tentacle image from the local .deb
###############################################################################
log "--- Step 1: build Tentacle image ---"
cd "$TENTACLE_REPO"

shopt -s nullglob
DEBS=(_artifacts/deb/tentacle_*_amd64.deb)
shopt -u nullglob
[[ ${#DEBS[@]} -ge 1 ]] || die "No .deb found in _artifacts/deb/. Build it first."
[[ ${#DEBS[@]} -eq 1 ]] || die "Multiple .debs in _artifacts/deb/; expected one: ${DEBS[*]}"
DEB_FILE="${DEBS[0]}"
DEB_BASENAME="$(basename "$DEB_FILE")"
BUILD_NUMBER="${DEB_BASENAME#tentacle_}"
BUILD_NUMBER="${BUILD_NUMBER%_amd64.deb}"
export BUILD_NUMBER
export BUILD_DATE="$(date -u +%Y-%m-%dT%H:%M:%SZ)"

log "BUILD_NUMBER=$BUILD_NUMBER"
# Use `docker build` directly rather than `docker compose -f docker-compose.build.yml`
# because that compose file also defines kubernetes/windows tentacle services which
# require extra env vars (BUILD_ARCH, BUILD_VARIANT) we don't care about here.
DST_IMAGE="octopusdeploy/tentacle:${IMAGE_TAG}"
docker build \
--platform linux/amd64 \
--build-arg BUILD_NUMBER="$BUILD_NUMBER" \
--build-arg BUILD_DATE="$BUILD_DATE" \
-f docker/linux/Dockerfile \
-t "$DST_IMAGE" \
.
log "Built $DST_IMAGE"

###############################################################################
# Step 2: Resolve license & patch .env
###############################################################################
log "--- Step 2: resolve license and patch .env ---"
[[ -f "$ENV_FILE" ]] || die "Expected $ENV_FILE to exist."

if [[ -n "${OCTOPUS_LICENSE_BASE64:-}" ]]; then
LICENSE_BASE64="$OCTOPUS_LICENSE_BASE64"
log "Using license from \$OCTOPUS_LICENSE_BASE64 (${#LICENSE_BASE64} bytes)"
else
if ! op account list >/dev/null 2>&1; then
die "1Password CLI is not signed in. Run: eval \$(op signin) — or pre-set \$OCTOPUS_LICENSE_BASE64."
fi
LICENSE_BASE64="$(op read "$ONEPASSWORD_LICENSE_REF" 2>/dev/null || true)"
[[ -n "$LICENSE_BASE64" ]] || die "Could not read license from 1Password at: $ONEPASSWORD_LICENSE_REF"
log "Fetched license from 1Password (${#LICENSE_BASE64} bytes)"
fi

ENV_BACKUP="$(mktemp "${TMPDIR:-/tmp}/octopus-server-env-smoke-tentacle-XXXXXX")"
cp "$ENV_FILE" "$ENV_BACKUP"
log "Backed up .env to $ENV_BACKUP (will be restored on exit)"
Comment on lines +139 to +141
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The .env backup pattern at scripts/smoke-test-linux-tentacle.sh:108-109 has a small race window: mktemp creates an empty file and assigns ENV_BACKUP before cp populates it. If the script is interrupted (Ctrl+C, SIGTERM) between those two statements, or if cp itself fails under set -e, the EXIT trap fires with ENV_BACKUP non-empty and the empty file present — its guard [[ -n "$ENV_BACKUP" && -f "$ENV_BACKUP" ]] passes and mv "$ENV_BACKUP" "$ENV_FILE" clobbers the real .env with an empty file. This is the same data-loss scenario commit 439be89 was meant to prevent. Trivial fix: only assign ENV_BACKUP after cp succeeds (copy to a local temp path first, then assign).

Extended reasoning...

The race window

Lines 108-109 of scripts/smoke-test-linux-tentacle.sh:

ENV_BACKUP="$(mktemp "${TMPDIR:-/tmp}/octopus-server-env-smoke-tentacle-XXXXXX")"
cp "$ENV_FILE" "$ENV_BACKUP"

mktemp does two things in one shot: it creates an empty file on disk and prints its path. After bash finishes the command substitution, ENV_BACKUP is set and the file exists — but it is still empty. The cp that actually populates it is a separate statement.

The EXIT trap installed at line 60 runs teardown on any exit path, including signals. Its guard at lines 54-55 is:

if [[ -n "$ENV_BACKUP" && -f "$ENV_BACKUP" ]]; then
  mv "$ENV_BACKUP" "$ENV_FILE"

That guard cannot distinguish "backup we already wrote" from "empty file mktemp just made".

Step-by-step proof

  1. Bash evaluates line 108. The subshell runs mktemp, which creates /tmp/octopus-server-env-smoke-tentacle-abc123 (empty, 0 bytes) and prints the path.
  2. Bash assigns that path to ENV_BACKUP. Variable is now non-empty, file exists, file is empty.
  3. Trigger A — signal: User hits Ctrl-C before bash starts line 109. set -e plus signal propagation invokes the EXIT trap.
  4. Trigger B — cp failure under set -e: Disk full, source unreadable mid-read, or any other cp error causes set -e to exit. EXIT trap fires.
  5. teardown runs. Guard passes (path non-empty, file exists). mv "$ENV_BACKUP" "$ENV_FILE" moves the empty file over the user's real .env. Data destroyed.

Why existing code doesn't prevent this

ENV_BACKUP is initialized empty at line 21 specifically so the trap can tell "backup not yet made" from "backup made". That intent is correct but the protection is defeated by mktemp because the assignment and the population happen in two statements. The guard checks the wrong thing — file existence rather than file validity.

This is exactly the failure mode commit 439be89 (and the resolved Copilot comment 3223737974) set out to eliminate. The mktemp pattern protected against stale backups from previous runs, but it introduced a smaller race against the current run's own interrupt or cp failure.

Impact

The target file .env contains the Ultimate license fetched from 1Password. Loss is recoverable (op read again), so this is not catastrophic, and the timing window for a pure signal race is microseconds. The more realistic trigger is cp failure under set -e — narrower-looking on paper but achievable (e.g. ENOSPC on $TMPDIR).

Suggested fix

Assign ENV_BACKUP only after cp succeeds, so the trap's "is the path set?" check is meaningful:

_tmp="$(mktemp "${TMPDIR:-/tmp}/octopus-server-env-smoke-tentacle-XXXXXX")"
cp "$ENV_FILE" "$_tmp"
ENV_BACKUP="$_tmp"
log "Backed up .env to $ENV_BACKUP (will be restored on exit)"

Alternatively, gate the restore in teardown on a BACKUP_READY=1 flag set after cp, or compare sizes ([[ -s "$ENV_BACKUP" ]]) before restoring.

Severity

Nit — developer-only smoke test, window is small, license is recoverable from 1Password. But the bug is real and undermines the safety guarantee the change it lives in was meant to add, so it's worth a line.

🔬 also observed by copilot-pull-request-reviewer


upsert_env_var() {
# Pure-bash: avoids sed/awk escape headaches with a base64 value (which
# contains '/' and '=' but not '\' or '&'). Matches the line by literal
# "KEY=" prefix, not regex, so unusual keys won't bite us.
local key="$1" value="$2"
local tmp line found=
tmp="$(mktemp "${TMPDIR:-/tmp}/octopus-server-env-smoke-upsert-XXXXXX")"
while IFS= read -r line || [[ -n "$line" ]]; do
if [[ "$line" == "${key}="* ]]; then
printf '%s=%s\n' "$key" "$value" >> "$tmp"
found=1
else
printf '%s\n' "$line" >> "$tmp"
fi
done < "$ENV_FILE"
[[ -z "$found" ]] && printf '%s=%s\n' "$key" "$value" >> "$tmp"
mv "$tmp" "$ENV_FILE"
}

upsert_env_var TENTACLE_TAG "$IMAGE_TAG"
upsert_env_var OCTOPUS_SERVER_BASE64_LICENSE "$LICENSE_BASE64"

###############################################################################
# Step 3: Bring up Octopus Server and wait for /api to respond
###############################################################################
log "--- Step 3: start octopus-server ---"
cd "$SERVER_REPO"
docker compose up -d octopus-server

log "Waiting for $API/octopusservernodes/ping ..."
for i in {1..120}; do
if curl -fsS -H "$H" "$API/octopusservernodes/ping" >/dev/null 2>&1; then
log "Server is up after ${i}s"
break
fi
[[ $i -eq 120 ]] && die "Server did not become ready in 120s"
sleep 1
done

###############################################################################
# Step 4: Bring up the Tentacle (Worker, polling mode, DIND disabled)
###############################################################################
log "--- Step 4: start tentacle ---"
OVERRIDE_COMPOSE="$(mktemp "${TMPDIR:-/tmp}/docker-compose-smoke-tentacle-XXXXXX")"
cat > "$OVERRIDE_COMPOSE" <<YAML
services:
tentacle:
environment:
DISABLE_DIND: "Y"
TargetName: "$WORKER_TARGET_NAME"
YAML

COMPOSE=(docker compose -f docker-compose.yml -f "$OVERRIDE_COMPOSE" --profile tentacle)

# --no-deps because octopus-server may lack a healthcheck; we already polled
# its API ping above and know it's ready.
"${COMPOSE[@]}" up -d --no-deps tentacle

log "Waiting for Tentacle 'Configuration successful.' in logs ..."
for i in {1..60}; do
if "${COMPOSE[@]}" logs --no-color tentacle 2>/dev/null | grep -qF "Configuration successful."; then
log "Tentacle registered after ${i}s"
break
fi
[[ $i -eq 60 ]] && die "Tentacle did not register in 60s. Logs:
$("${COMPOSE[@]}" logs --no-color --tail=80 tentacle)"
sleep 1
done

# Make sure the agent is still running (the wrapper script can exit shortly
# after registration if a sidecar like dockerd dies).
if ! "${COMPOSE[@]}" ps --status running --services 2>/dev/null | grep -qx tentacle; then
die "Tentacle container exited shortly after registration. Logs:
$("${COMPOSE[@]}" logs --no-color --tail=80 tentacle)"
fi

###############################################################################
# Step 5: Verify worker is registered & run hello-world AdHocScript
###############################################################################
log "--- Step 5: verify registration and run hello-world ---"

# Find the worker we just registered by its per-run TargetName. This is
# robust against reused Server DB volumes (where workers list grows across
# runs) and avoids the previous "highest Workers-N" heuristic.
for i in {1..60}; do
WORKERS_JSON="$(curl -fsS -H "$H" --data-urlencode "name=$WORKER_TARGET_NAME" -G "$API/workers" 2>/dev/null || echo '{"Items":[]}')"
WORKER_ID="$(echo "$WORKERS_JSON" \
| jq -r --arg name "$WORKER_TARGET_NAME" '.Items[] | select(.Name == $name) | .Id' \
| head -n1)"
[[ -n "$WORKER_ID" ]] && break
sleep 1
done
if [[ -z "$WORKER_ID" ]]; then
warn "No worker named '$WORKER_TARGET_NAME' appeared. Diagnostic dump of $API/workers:"
curl -fsS -H "$H" "$API/workers" || true
warn "Tentacle container logs (tail 80):"
docker compose --profile tentacle logs --no-color --tail=80 tentacle || true
die "Worker '$WORKER_TARGET_NAME' did not appear after 60s"
fi
log "Registered worker: $WORKER_ID (name='$WORKER_TARGET_NAME')"

ADHOC_BODY="$(jq -nc \
--arg id "$WORKER_ID" \
'{
Name: "AdHocScript",
Description: "EFT-3311 Debian 12 smoke test",
Arguments: {
ScriptBody: "echo Hello from $(hostname); cat /etc/os-release | head -2",
Syntax: "Bash",
WorkerIds: [$id]
}
}')"

TASK_RESP="$(curl -fsS -X POST -H "$H" -H "Content-Type: application/json" \
"$API/tasks" -d "$ADHOC_BODY")"
TASK_ID="$(echo "$TASK_RESP" | jq -r '.Id')"
[[ -n "$TASK_ID" && "$TASK_ID" != "null" ]] || die "Could not submit AdHocScript task. Response: $TASK_RESP"
log "Submitted task: $TASK_ID"

STATE=""
for i in {1..120}; do
STATE="$(curl -fsS -H "$H" "$API/tasks/$TASK_ID" | jq -r '.State')"
echo " task=$TASK_ID state=$STATE"
case "$STATE" in
Success|Failed|Canceled|TimedOut) break ;;
esac
sleep 2
done
Comment on lines +263 to +270
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The task polling loop at line 234 (STATE="$(curl -fsS … | jq -r '.State')") runs under set -euo pipefail, so a single transient curl failure inside the 120-iteration loop will exit the script mid-poll without printing the structured task=X state=Y diagnostic for the failing iteration. Consider mirroring the defensive pattern already used in the worker-discovery loop at line 198 (e.g. … || echo Unknown) so polling is resilient to brief hiccups and produces consistent per-iteration output. Nit — smoke-test tooling against a local server, and curl -fsS will still emit its own stderr message, so this isn't a correctness issue.

Extended reasoning...

What

In scripts/smoke-test-linux-tentacle.sh, the task-state polling loop at lines 233–240:

for i in {1..120}; do
  STATE="$(curl -fsS -H "$H" "$API/tasks/$TASK_ID" | jq -r '.State')"
  echo "  task=$TASK_ID state=$STATE"
  case "$STATE" in
    Success|Failed|Canceled|TimedOut) break ;;
  esac
  sleep 2
done

runs under set -euo pipefail (line 13). The command substitution wraps a curl-piped-to-jq pipeline. If curl exits non-zero (HTTP 4xx/5xx with -f, transient connection error, container restart, brief 5xx during task execution), pipefail propagates the non-zero status through the pipeline, the command substitution carries it to the assignment, and set -e exits the script immediately — before the echo " task=… state=…" line for that iteration runs.

Step-by-step proof

  1. Iteration 47 begins; curl is invoked against $API/tasks/$TASK_ID.
  2. Octopus Server briefly returns HTTP 502 (or the TCP connection is reset, or the container is mid-restart).
  3. curl -fsS exits with non-zero (e.g. 22 for HTTP errors, 7 for connection refused).
  4. jq runs on empty stdin and exits 0 (or non-zero), but pipefail ensures the pipeline's exit status is curl's non-zero code.
  5. The command substitution propagates that status to the STATE=… assignment.
  6. set -e fires; the script exits.
  7. The EXIT trap runs teardown() — which prints "--- teardown ---" and tears down compose.
  8. The user sees the curl stderr line (e.g. curl: (22) The requested URL returned error: 502) and the teardown trace, but no task=$TASK_ID state=… line for iteration 47, and no indication that the script was specifically in the task-polling loop when it died. They have to scroll up and reason about where in the script the abort happened.

Why existing code doesn't prevent it

The worker-discovery polling loop a few lines above (line 198) handles exactly this case defensively:

WORKERS_JSON="$(curl -fsS -H "$H" "$API/workers?take=1000" 2>/dev/null || echo '{"Items":[]}')"

The || echo '{"Items":[]}' absorbs transient curl failures and feeds valid JSON to jq, so the loop keeps polling. The task-polling loop has no such guard.

The same pattern (curl inside a pipeline whose failure can abort) also appears at line 251 (if ! curl … | grep -q …). Strictly that one won't abort the script — set -e is suppressed inside an if condition — but a curl failure there gets mis-attributed as "log doesn't mention Debian 12" via the warn, which is a separate UX paper-cut.

Addressing the counter-argument

A reasonable counter-argument is that curl -fsS uses the capital -S flag (show errors), so the user does see a curl stderr line like curl: (7) Failed to connect to … — i.e. the failure isn't truly silent. That's correct, and it's the main reason this is a nit, not a normal bug. The remaining gap is diagnostic quality: an operator running the smoke test sees a curl error plus an abrupt teardown trace, but loses the structured task=$TASK_ID state=… line that the loop is designed to emit each iteration, and there's no die message tying the failure back to "task polling" specifically. For a 2-minute polling loop against a server that may legitimately be doing other work, one transient 5xx terminating the whole smoke run is also a slightly steep response.

It's also reasonable to argue the two polling loops have different semantics (worker may legitimately not yet exist; task definitely does), so fail-fast on line 234 is intentional. That's a defensible design choice — but if a hiccup is fatal, it'd be nicer to die "Task polling failed at iteration $i: curl returned $?" than to let set -e exit silently from a command substitution.

Suggested fix

One-line change matching line 198's pattern:

STATE="$(curl -fsS -H "$H" "$API/tasks/$TASK_ID" 2>/dev/null | jq -r '.State' 2>/dev/null || echo Unknown)"

This lets the loop continue polling on transient failures, the echo " task=… state=Unknown" line still prints (so the operator can see which iteration hiccuped), and a persistent failure still falls out of the 120-iteration window with a clean "state never reached Success" die.

Severity rationale

Nit. Smoke-test tooling, not production code; the server is local so transient hiccups are uncommon; curl -fsS still emits stderr; and exit code is preserved so test integrity is intact. Purely a defensive-resilience and diagnostic-quality improvement, worth doing because the fix is one line and mirrors a pattern already in the same file.


log "--- Task log ---"
curl -fsS -H "$H" "$API/tasks/$TASK_ID/raw" || true
log "--- end task log ---"

if [[ "$STATE" != "Success" ]]; then
die "Task finished in state '$STATE' (expected Success)"
fi

# Load-bearing assertion: the whole point of this smoke test is to prove the
# Debian 12 base image is what's actually running on the Tentacle, so a missing
# os-release line is a hard failure, not a warning.
if ! curl -fsS -H "$H" "$API/tasks/$TASK_ID/raw" | grep -qF 'Debian GNU/Linux 12'; then
die "Task succeeded but the log does NOT mention 'Debian GNU/Linux 12'. Inspect output above."
fi

log "PASS — Tentacle (Debian 12) registered and executed hello-world."