Skip to content

feat(curtailment): Stop, staggered restore, max-duration enforcement#232

Merged
rongxin-liu merged 33 commits into
mainfrom
feat/issue-231-curtailment-stop-restore
May 21, 2026
Merged

feat(curtailment): Stop, staggered restore, max-duration enforcement#232
rongxin-liu merged 33 commits into
mainfrom
feat/issue-231-curtailment-stop-restore

Conversation

@rongxin-liu
Copy link
Copy Markdown
Contributor

@rongxin-liu rongxin-liu commented May 14, 2026

Background

Curtailment ships in stages. The contract foundation, persistence + preview, and the operator-facing Start + dispatch + reconciler all landed via #192. What's still missing is the path back: an operator-facing way to end an event, and the system's ability to bring miners back online safely. This PR closes that gap and removes the production gate on StartCurtailment.

Summary

StopCurtailment

The operator-facing end-of-event RPC. Validates min_curtailed_duration_sec has elapsed (admin caller can supply force=true to bypass the hysteresis gate), transitions the event to restoring, and in the same transaction flips every non-terminal target to desired_state='active', resets state to pending, and clears the per-phase metadata (retry_count, last_dispatched_at, last_batch_uuid, confirmed_at, last_error). Terminal targets are untouched. The reset gives the restorer a DB-backed, unambiguous work queue that survives a fleetd restart.

min_curtailed_duration_sec only gates active events — pending events haven't curtailed anything yet, so the elapsed-since-started_at check doesn't apply. Idempotent re-Stop: a second call against an already-restoring event echoes the persisted row without touching targets. Terminal events return FailedPrecondition; cross-org event UUIDs return NotFound. These semantics are documented on the StopCurtailment proto RPC declaration so SDK consumers see the idempotent-restoring vs. non-retryable-terminal split without reading the handler.

StopCurtailmentRequest carries event_uuid (tag 1) and an admin-gated force boolean (tag 3). Tag 2 is reserved — both the number and the name restore_batch_size_override — so an old client's varint payload on the deleted slot can never silently alias into force=true. The handler enforces the admin role on force=true before reaching the service, so non-admin callers see PermissionDenied and never trip the bypass.

Restorer

A DB-backed step inside the reconciler tick rather than a separate goroutine, so a restart resumes restore from persisted state without rehydrating in-memory workers. Each tick scans restoring events and:

  1. Confirms any dispatched restore targets via telemetry (baseline_power_w × 0.5 threshold, falling back to a positive-hash signal when baseline is null). A target whose candidate row has vanished (device unpaired or deleted) burns retry budget via recordDispatchFailure instead of pinning the event indefinitely.
  2. Transitions the event terminal — COMPLETED or COMPLETED_WITH_FAILURES based on RestoreFailed count — once every target is terminal.
  3. Claims the next batch when both gates pass:
    • No prior batch in flight: zero restore-phase targets in Dispatched or Drifted.
    • Inter-batch interval elapsed: newest restore-phase last_dispatched_at is null or older than restore_batch_interval_sec.

The claim selects up to effective_batch_size pending targets ordered by device_identifier, dispatches a single batched Uncurtail call covering all of them (sharing one batch_uuid), and per-device commits state transitions based on the kept/skipped split returned by the command service. Bulk dispatch errors and per-device skips route through the same recordDispatchFailure retry path the Curtail dispatch uses, and the per-device audit string comes from a single skippedDeviceReason helper shared with single-device dispatch (Reason > FilterName > generic preflight fallback). The reconciler self-bypass on CurtailmentActiveFilter covers Uncurtail traffic too, so restore batches aren't blocked by the very event they're restoring.

Each tick body is bounded by a 2 × TickInterval deadline. Per-event contexts derive from the same tick budget, so the loop-entry gate and per-event deadline together cap total tick wall-time. A stuck DB during a large restore (ResetCurtailmentTargetsForRestore over thousands of targets, slow ListCandidates on a large fleet) returns an error to the tick instead of hanging the goroutine; the heartbeat upsert uses its own background-derived 5s deadline so liveness reporting stays honest even when the tick body errors out. Telemetry-timeout aging in the restore phase logs a dedicated slog line before recordDispatchFailure so triage can distinguish timed-out-and-wrote from timed-out-DB-error.

Adaptive batch sizing — computed at Start

effective_batch_size = max(restore_batch_size, ceil(0.01 × selected_target_count)) clamped to [10, 100], stamped on the event row at Start time from the selected-target count and read back by Stop, the reconciler's max-duration arm, and the restorer's batch claim. The column is non-null from event creation. StartCurtailmentResponse.event.effective_batch_size echoes the computed value, so SDK consumers see the resolved batch size in the same response as the new event UUID rather than needing a follow-up GetActiveCurtailment poll.

A 5,000-miner event therefore restores in batches of 50; a 10K+ event caps at 100. Operators tune the formula's floor via the restore_batch_size field on StartCurtailmentRequest.

One non-terminal event per org

Migration 000051 adds a partial unique index uq_curtailment_event_one_non_terminal_per_org ON curtailment_event (org_id) WHERE state IN ('pending','active','restoring'). InsertEventWithTargets maps the unique-violation to a typed interfaces.ErrCurtailmentNonTerminalEventExists sentinel, and Service.Start catches it: a concurrent Start that races the selector's per-device exclusion check now fails at insert time with AlreadyExists, surfaced with the existing event's UUID + state. The AlreadyExists DebugMessage embeds that identity as the stable substring (event_uuid=<uuid>, state="<state>") — documented on the StartCurtailment RPC declaration so SDK consumers can recover the existing event without parsing an undocumented format.

This closes the cross-event scope-overlap gap at the DB level. The rule is org-wide rather than device-set-aware (two non-overlapping device sets in the same org are still mutually exclusive), which is sufficient for v1's manual-trigger use case.

Migration 000051 also bounds the operational controls at the DB layer with CHECK constraints on max_duration_seconds (NULL OR 0 < x <= 604800) and restore_batch_interval_sec (0 <= x <= 3600). The migration is safe without CONCURRENTLY / NOT VALID because curtailment_event has zero rows at deploy (the production StartCurtailment gate is removed by the same PR); the file documents this assumption inline for any future re-application against a populated table.

max_duration_seconds enforcement

Reconciler-side enforcement transitions an active event to restoring when the cap elapses since started_at. This closes the prior gap where a target sitting in Dispatched with no positive telemetry stays there indefinitely under drift detection alone — the cap now drags the event into restore regardless. allow_unbounded events and events with nil started_at short-circuit out.

When max_duration_seconds is unset and !allow_unbounded, Service.Start fills the value from orgConfig.MaxDurationDefaultSec and re-checks the absolute upper bound (maxFiniteDurationSeconds = 604800) on the normalized value. A misconfigured org default surfaces as InvalidArgument (org's max_duration_default_sec must be <= 604800) instead of tunnelling past validateStartRequest and tripping the DB CHECK constraint as Internal.

StartCurtailment gate flipped

With Stop, the restorer, and reconciler-side max-duration enforcement all in place, an active event now has both an operator-facing exit (Stop) and a wall-clock safety net. The Handler.startEnabled field, in-handler gate check, and the coupling test that pinned the gate are removed. main.go constructs the handler with the lean NewHandler(curtailmentSvc).

Restore semantics worth flagging

  • Restore does not re-check device_status. The operator who set force_include_maintenance=true owns restore timing. Operators expecting a maintenance window to outlast a curtailment event must size max_duration_seconds, not rely on schedule-coupled timing.
  • Per-device FIFO awareness. Stop does not cancel any already-enqueued Curtail; the Uncurtail queues behind earlier pending or processing work. The restorer accounts for this via the in-flight gate rather than assuming Uncurtail supersedes.
  • BeginCurtailmentRestoration concurrency model. The load-bearing guarantee is the WHERE state IN ('pending','active') row-lock guard combined with the per-org partial unique index added in migration 000051. The store's ErrNoRows re-read path distinguishes "already restoring" (idempotent) from "already terminal" (FailedPrecondition).

Wiring

cmd/fleetd/main.go constructs the handler with the single-argument NewHandler and the wiring comment documents the current surface: Preview, Start, and Stop are wired; the reconciler drives non-terminal events terminal (including max-duration-based forced restore); UpdateCurtailmentEvent, GetActiveCurtailment, and ListCurtailmentEvents return Unimplemented pending the read APIs ticket.

What is intentionally not in this PR

  • UpdateCurtailmentEvent for operator-safe mutation of reason and restore controls — lands with the read/audit work.
  • GetActiveCurtailment and ListCurtailmentEvents — same.
  • Metrics pipeline wiring and the heartbeat staleness alert — blocked on the unresolved metrics-pipeline shape decision.
  • Webhook ingestion and signal adapters.

Test plan

go build ./... and go vet ./... clean. golangci-lint run ./internal/domain/curtailment/... ./internal/handlers/curtailment/... ./cmd/fleetd/... reports 0 issues. go test on the curtailment unit surface is green (DB-dependent integration tests fail on local connection-refused as usual).

Coverage by area:

  • Service.Stop — happy path, idempotent already-restoring, terminal rejection, min_curtailed_duration_sec gate with force=true bypass, EMERGENCY priority does not bypass the gate at Stop time (regression), pending-event short-circuit, store-error propagation.
  • Service.StartTestService_Start_StampsEffectiveBatchSize pins the adaptive formula at Start time (floor 10, restore_batch_size floor, 5K → 50, 10K ceiling at 100). TestService_Start_NonTerminalOverlapReturnsAlreadyExists pins the per-org partial unique index path; TestService_Start_NonTerminalOverlapWithGetActiveErrorReturnsBareAlreadyExists pins the identity-lookup-fallback sub-branch. TestService_Start_RejectsOrgDefaultAboveAbsoluteCeiling pins the post-normalization upper-bound guard on the org-default path. TestService_Start_RejectsAdminRestoreBatchIntervalAboveAbsoluteCeiling pins the absolute ceiling for admin callers (above the non-admin cap but still bounded).
  • ComputeEffectiveBatchSize — small-fleet floor, 5K-fleet → 50, 10K+ ceiling at 100, restore_batch_size floors the formula, negative restore_batch_size handling.
  • StopCurtailment handler — happy path through the full handler → service → store flow, missing session Unauthenticated, malformed UUID InvalidArgument, force admin role gate (PermissionDenied for non-Admin), admin force=true flowing through and bypassing the min-duration gate, ListTargetsByEvent error propagation after a committed transition.
  • StartCurtailment handler — happy path asserts EffectiveBatchSize on the wire response (proves toStartResponse populates the computed value rather than always-zero).
  • Reconciler enforceMaxDuration — elapsed transitions to restoring (no drift dispatch), not-elapsed no-ops, allow_unbounded skips cap, nil started_at no-ops, BeginRestoreTransition error path skips drift this tick.
  • Reconciler observeRestoring — claim dispatches one Uncurtail per batch with shared batch_uuid, in-flight gate blocks claim, interval gate blocks claim, telemetry-confirmed dispatched promotes to Resolved (and triggers terminal transition same tick), all-resolved → COMPLETED, mixed-with-failures → COMPLETED_WITH_FAILURES. TestReconciler_Restoring_MissingCandidateDuringConfirmConsumesRetryBudget pins the nil-candidate guard in the restore phase, mirroring the curtail-phase analog.
  • Reconciler tick budgetTestReconciler_RunTickStopsWhenTickBudgetExpires pins the per-event context bound; later events cancel when the shared tick budget expires.
  • isRestored — power above/at/below baseline × 0.5 thresholds, baseline-nil positive-hash fallback, baseline-nil zero-hash not-restored, no-telemetry not-restored, baseline-zero hash fallback.
  • skippedDeviceReason — priority-order contract (Reason wins, then FilterName, then generic preflight fallback) pinned across all three branches.

Closes #231
Refs #192

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 14, 2026

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

  • Reviewed pull request diff only (032712636d79888dce558d32ebe7ecc2391b0617...9fded5863cf56eb8513c5443b9a285a25c5d3613, exact PR three-dot diff)
  • Model: gpt-5.5

💡 Click "edited" above to see previous reviews for this PR.


Review Summary

Overall Risk: HIGH

Findings

[HIGH] Pending curtailment dispatches can stall forever and bypass max_duration_seconds

  • Category: Reliability
  • Location: server/internal/domain/curtailment/reconciler/reconciler.go:519
  • Description: After a Curtail command is successfully enqueued, confirmOneDispatched simply returns when telemetry does not prove the miner is curtailed. There is no dispatch-confirmation timeout or retry burn for this path. Since max_duration_seconds is only enforced for active events, an event with a dispatched target that never confirms remains pending indefinitely.
  • Impact: A single stale or broken telemetry stream can leave an org with a permanent non-terminal curtailment event. The new per-org non-terminal unique index then blocks future curtailment events, and already-curtailed miners may remain curtailed beyond the configured max duration.
  • Recommendation: Add a pending-phase dispatch confirmation timeout, similar to restore telemetry timeout, that either retries or marks the target failed. Also consider enforcing max_duration_seconds from event creation/first dispatch so pending events cannot outlive the cap.

[MEDIUM] Non-admin Stop can bypass minimum curtailed duration during pending rollout

  • Category: Authorization | Reliability
  • Location: server/internal/domain/curtailment/service.go:840
  • Description: checkMinCurtailedDurationGate skips the gate unless the event state is active. But pending events can already have dispatched or confirmed targets while waiting for the remaining targets to confirm. A normal non-admin caller can call Stop during that pending window and restore already-curtailed devices without waiting for min_curtailed_duration_sec.
  • Impact: The admin-only force bypass is weakened because operators can avoid the min-duration hysteresis by stopping before the event flips to active. This can cause rapid curtail/restore flapping and defeats the operational safety control.
  • Recommendation: Do not exempt all pending events. Either reject non-force Stop once any target has been dispatched/confirmed, or enforce the gate transactionally using target timestamps/first confirmation. Re-check the state inside BeginRestoreTransition so pending-to-active races cannot bypass the gate.

Notes

No SQL injection, command injection, pool/wallet hijacking, or protobuf wire-compatibility issues stood out in the scoped diff. The main risk is the new curtailment lifecycle becoming stuck or bypassing its own duration controls.


Generated by Codex Security Review |
Triggered by: @rongxin-liu |
Review workflow run

Base automatically changed from feat/issue-191-curtailment-start-dispatch-reconciler to main May 19, 2026 15:34
Adds the operator-facing StopCurtailment write path:

- New sqlc queries BeginCurtailmentRestoration (event-side flip to
  'restoring' + effective_batch_size stamp, state-guarded so already-restoring
  or terminal events fall through to a no-op row count) and
  ResetCurtailmentTargetsForRestore (non-terminal targets flip to
  desired_state='active', state='pending', cleared phase-local cursors).
  Composed inside the SQL store as a single tx via WithTransaction.
- Store.BeginRestoreTransition method on the curtailment interface +
  SQL implementation. Idempotent against re-Stop (returns the current row
  unchanged); FailedPrecondition on a terminal event; NotFound for
  cross-org event UUIDs. A concurrent state-change between the pre-read
  and the UPDATE is handled by a re-read so the caller observes the
  latest persisted state.
- Service.Stop with adaptive batch sizing — max(restore_batch_size,
  ceil(0.01 × non_terminal_target_count)) clamped to [10, 100] — plus
  the min_curtailed_duration_sec gate that applies only to active events
  (pending events haven't curtailed anything yet) with EMERGENCY priority
  bypass. The per-request restore_batch_size_override (admin-gated
  upstream) skips the formula entirely and is bounded at the service
  layer to [1, 200], the BE-4 fleet-class ceiling that the proto bound
  will be tightened to match.
- Handler.StopCurtailment wired: replaces the Unimplemented stub with the
  service call, preserving the existing requireAdminFromContext gate on
  the override field. Session-derived OrgID, no actor change.
- New translate helpers (toStopRequest, toStopResponse, toEventProto, plus
  enum mappers for event_state/mode/strategy/level/priority) that build a
  CurtailmentEvent wire message from the persisted row. Scope and
  mode_params are intentionally left empty for now — reconstructing those
  from JSONB lands with the read APIs.
- Shared models.DesiredStateCurtailed / models.DesiredStateActive
  constants so the SQL literal values have a single Go-side source of
  truth, and fake-store stubs (preview/start service tests + handler
  start test + reconciler test) implementing the new interface method.
Replaces the BE-3 'restoring is a no-op' stub with an end-to-end
restorer arm and a time-based escape hatch on active events:

- observeRestoring runs three phases per tick: (1) confirm any
  dispatched restore targets via telemetry (baseline-relative
  power_w > baseline × 0.5, with a positive-hash fallback when
  baseline is null); (2) transition the event terminal once every
  target is Resolved/Released/RestoreFailed (chooses COMPLETED vs
  COMPLETED_WITH_FAILURES based on failure count, stamps ended_at);
  (3) claim the next batch when both gates pass.

- Restore-batch gating mirrors the BE-4 plan: no prior batch in flight
  (no restore-phase targets in Dispatched or Drifted) AND inter-batch
  interval elapsed (newest restore-phase last_dispatched_at older than
  restore_batch_interval_sec). Both gates fail closed; the in-flight
  gate is the primary inrush protection when interval is zero.

- Batched Uncurtail dispatch: one command call covers up to
  effective_batch_size devices, sharing a single batch_uuid. Per-device
  outcomes (kept vs filtered-skip) drive individual state updates;
  bulk dispatch errors route every claim target through retry budget
  the same way BE-3 Curtail does. Missing candidate row burns budget
  rather than stalling so vanished devices reach RestoreFailed instead
  of hanging the event.

- enforceMaxDuration runs before drift detection on active events:
  when (now - started_at) >= max_duration_seconds, the reconciler
  calls BeginRestoreTransition with the adaptive-formula batch size
  (no override available without an operator request). This closes
  the BE-3 stuck-Dispatched gap at the event boundary — a target with
  no positive telemetry is dragged into restore by the cap, not left
  sitting in Dispatched forever. AllowUnbounded and pending events
  short-circuit.

- isRestored predicate is the symmetric counterpart of isCurtailed
  for the restore-confirm path; it requires positive evidence so a
  flaky sensor cannot trigger a premature Resolved transition.
@rongxin-liu rongxin-liu force-pushed the feat/issue-231-curtailment-stop-restore branch from 3ed88b5 to 77e78da Compare May 19, 2026 15:48
@github-actions github-actions Bot added javascript Pull requests that update javascript code client labels May 19, 2026
…he override ceiling

With Stop, the restorer, and reconciler-side max_duration_seconds
enforcement all in place, an Active event now has both an operator-facing
exit (StopCurtailment) and a wall-clock safety net (max-duration forced
restore). The 'gated until BE-4 ships' rationale no longer applies, so
the Handler.startEnabled field, the in-handler gate check, and the
TestHandler_StartCurtailment_DisabledFlagReturnsUnimplemented coupling
test are all removed. main.go constructs the handler with the lean
single-argument NewHandler(curtailmentSvc) and the wiring comment now
documents what is and isn't reachable in the current surface.

Separately, the proto bound on restore_batch_size_override drops from
the ungrounded [1, 10000] sanity ceiling to [1, 200] — 2x the adaptive
formula's upper band (100). 200 short-circuits inrush staggering for a
single-rack restore but stays narrow enough that an admin typo cannot
silently dump 10K miners at once. Service.Stop already enforces the
same 200 ceiling at the service layer, so the two checks agree.
…ounds

Stop service tests (service_stop_test.go) pin the happy path, the
idempotent already-restoring branch, terminal-event rejection, the
min_curtailed_duration gate and its EMERGENCY bypass, override
precedence over the adaptive formula, and override bound rejection at
both ends of the [1, 200] window. ComputeEffectiveBatchSize is exercised
across the realistic fleet sizes (small → floor, 5000 → 50, 10K+ →
ceiling) so a regression in the inrush envelope shows up immediately.

Stop handler tests (handler_stop_test.go) cover the happy path through
the full handler→service→store flow, missing-session
Unauthenticated, malformed-UUID InvalidArgument, the admin role gate
on restore_batch_size_override, and the admin override flowing through
to the persistence boundary so an admin-driven number doesn't get
silently dropped on the floor.

Reconciler restorer arm tests (restore_test.go) cover the four
enforceMaxDuration branches (elapsed/not-elapsed/allow_unbounded/nil
started_at), the observeRestoring three-phase loop (claim+dispatch,
in-flight gate, interval gate, telemetry confirmation), the
completion paths (all-resolved→COMPLETED, mixed→COMPLETED_WITH_FAILURES),
and the isRestored predicate including the baseline-null hash fallback
that protects the operator from a spurious "restore complete" call when
the power-side telemetry is missing.

A small fix lands alongside: the gosec int→int32 saturation around the
non-terminal-count → ComputeEffectiveBatchSize call site, exhaustive
switch statements in the target-state count loops, and a cap variable
renamed to maxDur to stop shadowing the builtin.
A target Dispatched in the restore phase that never returns positive
telemetry evidence would stay there indefinitely. confirmOneRestore
only consumed retry budget when the candidate row was nil (vanished
device); a present device with nil or missing power and hash samples
never advanced. The in-flight gate on the next batch claim then
blocked all subsequent batches, and enforceMaxDuration does not
close this gap (it only transitions active to restoring).

Add RestoreDispatchTimeoutSec (default 300s = 10x the tick interval).
confirmOneRestore now consumes retry budget when a Dispatched target
has been there longer than the timeout, draining to RestoreFailed at
exhaustion so the event can still progress.

Also picked up during the review pass:

- Move countNonTerminalTargets to models.CountNonTerminalTargets so
  service.Stop and reconciler.enforceMaxDuration share one helper.
- Document BeginCurtailmentRestoration's atomicity model inline
  (UPDATE WHERE state-guard + partial-unique-index; ErrNoRows re-read
  terminal-state risk).
- Document that service.Stop's pre-read terminal/restoring check is a
  fast-path; BeginRestoreTransition is the authoritative atomic guard.
- Document the strict > vs <= asymmetry between isRestored and
  isCurtailed, and the no-progress band at exactly baseline*factor.
- Forward-compat comment on maybeClaimRestoreBatch's unreachable
  Drifted arm; comment on the interval gate loop documenting that
  terminal targets retain LastDispatchedAt as the spacing reference.
- Tighten translate.toStopRequest overflow-check comment.
- Inline single-use desiredStateCurtailed alias.
- Strip internal ticket prefixes from source comments.
- Trim multi-paragraph rationale on adaptive batch-sizing constants
  and ComputeEffectiveBatchSize godoc.
- Fix stale cross-reference comment on
  StartCurtailmentRequest.restore_batch_size; regenerate bindings.
Fills coverage gaps surfaced during the review pass and pins the new
restore-dispatch age-out behavior.

- Restore age-out exhaustion drains the target to RestoreFailed and
  takes the event to COMPLETED_WITH_FAILURES; happy path within the
  timeout stays in Dispatched.
- enforceMaxDuration with a BeginRestoreTransition error leaves the
  event unchanged and still runs drift dispatch in the same tick.
- dispatchRestoreBatch under an Uncurtail error and under an empty
  BatchIdentifier keeps the batch Pending with LastError set and
  RetryCount incremented.
- dispatchRestoreBatch per-device filter-skip in the restore phase
  mirrors the curtail-phase coverage: kept target -> Dispatched,
  skipped target -> Pending with retry consumed.
- Service.Stop returns NotFound for an unknown event UUID, propagates
  ListTargetsByEvent store errors, and the MinDurationGate test now
  pins GRPCCode = FailedPrecondition.
- Restorer-shared batch_uuid is asserted by value (both batched
  targets reference the same LastBatchUUID), not just by comment.
- isRestored table picks up the power-present, baseline-nil, hash-nil
  case so the line-1035 hash-nil guard has direct coverage.
@rongxin-liu rongxin-liu force-pushed the feat/issue-231-curtailment-stop-restore branch from 3db4d3a to ab215a0 Compare May 19, 2026 20:15
Post-review simplification pass surfaced four small wins:

- service_test fakeStore: remove three fields that were declared but
  never read or assigned in any test (beginRestoreResult,
  getEventByUUIDLastOrg, listTargetsLastOrg). The dead capture-state
  was a misleading escape hatch.
- service.Stop: drop the nine-line int->int32 MaxInt32 saturation in
  favour of the same //nolint:gosec one-liner the reconciler's
  enforceMaxDuration already uses. The count is bounded by the
  per-event target rows; the saturation block was dead defensive code.
- maybeCompleteRestoring: add a default arm to the per-target switch
  so an unknown state from a future migration falls through as
  non-terminal rather than silently completing the event prematurely.
- observeActive: drop the block comment narrating the
  enforceMaxDuration call site; the function name and its godoc carry
  the rationale.
…Stop failure

Two related Codex-flagged issues on the BE-4 surface plus one defensive
test, all small and self-contained.

P1: enforceMaxDuration's BeginRestoreTransition error path returned
false, falling through to drift detection. Past the max_duration cap,
that meant new Curtail commands could be issued to miners the event
was supposed to be releasing — extending curtailment past the
contracted ceiling on every transient DB failure. Now returns true on
error so observeActive skips drift; next tick retries the transition.
The corresponding test (was pinning the wrong behavior) is updated to
assert curtailCalls=0 on the failure tick.

P2: BeginRestoreTransition's ErrNoRows re-read returned the latest
event unconditionally, including terminal states. A concurrent
AdminTerminateEvent between the pre-read and the UPDATE caused Stop
to echo a 200 with a terminal event masquerading as restoring. The
re-read now routes by latest.State: terminal → FailedPrecondition,
restoring → idempotent success, anything else → InternalError.

P3: New table-style test for maybeCompleteRestoring's default arm.
The arm is dead code today (all current TargetState values are
explicitly covered), but the test pins the defensive "unknown state
keeps the event non-terminal" contract against future schema
additions.
…ment

The prior query comment claimed concurrency safety relied on "the
partial-unique-index on (org_id, state) for active events." No such
unique index exists; the actual index `idx_curtailment_event_active`
is a plain partial btree. Concurrency safety rests on the
`WHERE state IN ('pending','active')` row-lock guard alone.

Replace the comment with the actual model so a future maintainer
does not trust a fictitious uniqueness guarantee and remove the
state guard.
Adds the server-resolved restore batch size as a field on the wire
CurtailmentEvent and populates it from the persisted event row in
toEventProto. Restore-time clients (Stop response, future read APIs)
can now display the value the server actually drained at — important
when the adaptive [10, 100] envelope or an operator
`restore_batch_size_override` resolves to something other than the
caller's `restore_batch_size` preference.

Also tightens the StopCurtailment RPC docstring to make the
terminal-race contract explicit: idempotent on an already-restoring
event; FailedPrecondition when the event is terminal or a concurrent
transition raced past restoring; treat as non-retryable.
Derive a `2 * TickInterval` per-tick deadline in runTick and pass it
to ListNonTerminalEvents and processEvent. A stuck DB during a large
restore tick (e.g. ResetCurtailmentTargetsForRestore over thousands
of targets) now returns an error to the tick body instead of hanging
the goroutine indefinitely. The heartbeat keeps using its own
background-derived 5s timeout so liveness reporting is preserved
even when the tick errors.

Extend dispatchRestoreBatch's doc to call out the audit-lineage
consequence of the existing restart-safety design: a per-target
UpdateTargetState failure after Uncurtail returns success leaves
the target Pending; next tick redispatches a second Uncurtail.
Uncurtail is device-idempotent via the per-device FIFO, but audit
records may show duplicate batches.
@rongxin-liu rongxin-liu marked this pull request as ready for review May 20, 2026 14:36
Copilot AI review requested due to automatic review settings May 20, 2026 14:36
@rongxin-liu rongxin-liu requested a review from a team as a code owner May 20, 2026 14:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the missing “exit path” for curtailment events by implementing operator Stop, a DB-backed staggered restore loop in the reconciler, adaptive restore batch sizing, and reconciler-side max-duration enforcement; this also removes the production gate that previously prevented StartCurtailment from being enabled.

Changes:

  • Implement StopCurtailment end-to-end (proto docs, handler/service/store, SQL queries) and reset non-terminal targets into a restore work queue.
  • Extend the reconciler to (a) force restore on max_duration_seconds and (b) drive restoring events to terminal via staggered batched Uncurtail.
  • Add adaptive restore batch sizing and surface the server-resolved effective_batch_size on CurtailmentEvent; update generated artifacts and wiring.

Reviewed changes

Copilot reviewed 17 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
server/sqlc/queries/curtailment.sql Adds restoration transition + target reset queries.
server/internal/handlers/curtailment/translate.go Adds Stop request/response translation and event → proto mapping (incl. timestamps and effective batch size).
server/internal/handlers/curtailment/handler.go Removes Start gate; implements StopCurtailment handler with auth + admin override gate.
server/internal/handlers/curtailment/handler_test.go Updates handler constructor usage after gate removal.
server/internal/handlers/curtailment/handler_stop_test.go New Stop handler tests (auth, UUID validation, admin override gating).
server/internal/handlers/curtailment/handler_start_test.go Removes Start gate test; updates handler constructor and stubs.
server/internal/domain/stores/sqlstores/curtailment.go Implements BeginRestoreTransition in SQL store with transactional semantics.
server/internal/domain/stores/interfaces/curtailment.go Extends CurtailmentStore interface with BeginRestoreTransition.
server/internal/domain/curtailment/service.go Implements Service.Stop, min-duration gating, and ComputeEffectiveBatchSize.
server/internal/domain/curtailment/service_test.go Extends fake store to support Stop-path testing.
server/internal/domain/curtailment/service_stop_test.go New unit tests for Stop + batch sizing helper.
server/internal/domain/curtailment/reconciler/restore_test.go New reconciler tests for restoring workflow + max-duration enforcement.
server/internal/domain/curtailment/reconciler/reconciler.go Adds per-tick deadline, max-duration enforcement, restoring observer, and batched Uncurtail dispatch.
server/internal/domain/curtailment/reconciler/reconciler_test.go Updates fakes to support restore/max-duration and Uncurtail result overrides.
server/internal/domain/curtailment/models/models.go Adds desired_state constants and CountNonTerminalTargets.
server/generated/sqlc/db.go Regenerates sqlc prepared statements for new queries.
server/generated/sqlc/curtailment.sql.go Regenerates sqlc bindings for new restoration queries.
server/generated/grpc/curtailment/v1/curtailmentv1connect/curtailment.connect.go Updates generated connect client/server docs for Stop semantics.
server/cmd/fleetd/main.go Removes Start gate wiring; wires handler with new constructor and updates comment.
proto/curtailment/v1/curtailment.proto Documents Stop semantics; adds effective_batch_size; tightens Stop override validation to lte:200.
client/src/protoFleet/api/generated/curtailment/v1/curtailment_pb.ts Regenerates TS proto bindings to reflect proto updates.

Comment thread server/internal/domain/curtailment/models/models.go Outdated
Comment thread server/internal/domain/curtailment/reconciler/reconciler.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 46df1b316a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/internal/domain/curtailment/reconciler/reconciler.go Outdated
Comment thread server/internal/domain/curtailment/reconciler/reconciler.go
Make restore reconciliation recoverable under slow ticks, clear stale target errors correctly, and expose the active event recovery path needed after Start responses are lost.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c9379e3662

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/sqlc/queries/curtailment.sql Outdated
Ensure GetActiveCurtailment returns active or restoring events before newer pending events so clients recover the event currently affecting miners.
Comment thread server/internal/domain/curtailment/service.go Outdated
Comment thread server/sqlc/queries/curtailment.sql
Comment thread server/internal/domain/curtailment/service.go Outdated
Comment thread server/internal/handlers/curtailment/translate.go
Copy link
Copy Markdown
Contributor

@ankitgoswami ankitgoswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments but not 100% sure about the requirements so don't know if they are blocking.

… max_duration check

Two small simplifications from review:

- CountNonTerminalTargets: the explicit non-terminal case arm did the same
  thing as default. Collapse to terminal-skip + default-count. Comment notes
  unknown future states count as non-terminal — the conservative direction.

- Service.Start: the upper-bound check on max_duration_seconds duplicates
  the check already in validateStartRequest. Drop the duplicate. The
  admin-gate against the org default stays — it is intrinsically
  post-normalization (req.MaxDurationSeconds may have been resolved from
  orgConfig.MaxDurationDefaultSec just above).
Adds a partial unique index on curtailment_event(org_id) WHERE state IN
('pending','active','restoring') so a concurrent Start that races the
selector's per-device exclusion check fails at insert time instead of
silently producing two overlapping events.

The unique-violation surfaces as a typed sentinel
ErrCurtailmentNonTerminalEventExists from the sqlstore boundary. Service.Start
catches the sentinel, reads the existing event via GetActiveEvent, and
returns AlreadyExists with the existing event_uuid + state so the caller
can act on the conflict (e.g. join the existing event, surface it in the
UI).

Migration 000051 appends the index alongside the CHECK constraints
already in the file. Index name and constraint name pinned in code:
nonTerminalEventPerOrgUniqueIndex matches the migration's identifier so
only this exact violation is mapped to the sentinel.

Adds NewAlreadyExistsError(f) and IsAlreadyExistsError helpers to
fleeterror to match the existing per-code patterns.
… the stop override

The Stop-time restore_batch_size_override forced a per-Stop admin gate plus a
race window (concurrent Stops with conflicting overrides silently dropped all
but the first). It was also doubly redundant: the per-event restore_batch_size
the operator sets at Start already lets admins tune the adaptive formula.

This commit removes the override entirely and moves effective_batch_size
computation to Start. The column is stamped from the selected target count
when the event row is inserted, so it is non-null from creation. Stop just
flips state; the reconciler's max-duration arm and the restorer read the
already-stamped value.

Proto change: optional uint32 restore_batch_size_override removed from
StopCurtailmentRequest (pre-release per BE-1 policy; deleted, not reserved).
Tag 2 frees up for the admin-gated `force` field in the next commit.

Surface changes:
- proto/curtailment.proto: field removed; bindings regenerated.
- service: StopRequest loses RestoreBatchSizeOverride; Service.Stop drops
  ListTargetsByEvent + ComputeEffectiveBatchSize; validateStopRequest loses
  the override bound check. ComputeEffectiveBatchSize loses its override
  param. Service.Start (via buildInsertParams) stamps effective_batch_size
  on the InsertEventParams from the selected target count.
- sqlc InsertCurtailmentEvent: effective_batch_size joins the INSERT
  column list. BeginCurtailmentRestoration drops the SET clause for it.
- sqlstores: InsertEventWithTargets forwards EffectiveBatchSize; the
  BeginRestoreTransition signature loses the int32 parameter.
- interfaces: BeginRestoreTransition signature updated.
- reconciler.enforceMaxDuration: drops the ComputeEffectiveBatchSize
  recomputation; the event row already carries the value.
- handler.StopCurtailment: drops the override admin-gate block.
- translate.toStopRequest: drops the override translation +
  uint32-overflow safety check.

Tests:
- service_start_test.go: TestService_Start_StampsEffectiveBatchSize pins
  the formula at Start (floor 10, restore_batch_size floor, 5K → 50, 10K
  ceiling at 100). Uses tolerated-undershoot so the FIXED_KW selector
  takes every candidate.
- service_stop_test.go: deleted override-precedence + override-bounds +
  list-targets-error tests (Stop no longer reads targets or accepts an
  override). TestComputeEffectiveBatchSize loses its override case +
  column.
- handler_stop_test.go: deleted OverrideRequiresAdmin +
  PassesAdminOverride.
- handler_test.go: removed stopWithBatchOverride from the role-gate
  matrix (4 cases).
- service_test.go / reconciler/reconciler_test.go: BeginRestoreTransition
  signatures updated; beginRestoreLastBatch capture removed.
…ity bypass

Decouples Start-time priority from Stop-time hysteresis bypass. Priority
is set at event creation and reflects the curtailment urgency at that
moment; reusing it to short-circuit the min_curtailed_duration_sec gate
on Stop conflated two different operator intents. The bypass now lives
where the decision actually happens.

StopCurtailmentRequest gains `bool force = 2` (admin-gated handler-side
via requireAdminFromContext, same pattern as the existing override
fields on Start/Preview). checkMinCurtailedDurationGate takes a force
parameter and bypasses on force=true; the Priority==Emergency shortcut
is gone. Error message updated to direct admins to force=true rather
than EMERGENCY-priority retry.

EMERGENCY priority still skips post_event_cooldown_sec in the selector
at Start time — that bypass is at the correct layer and stays. Only the
Stop-time bypass moves.

Surface changes:
- proto/curtailment.proto: bool force = 2 on StopCurtailmentRequest.
- handler.StopCurtailment: gates force=true via requireAdminFromContext.
- translate.toStopRequest: carries Force from proto to service.
- service.StopRequest: adds Force bool.
- service.checkMinCurtailedDurationGate: signature change.

Tests:
- service_stop_test.go: TestService_Stop_ForceBypassesMinDuration
  replaces the old emergency-priority bypass test. New
  TestService_Stop_EmergencyPriorityNoLongerBypasses pins the regression:
  an EMERGENCY-priority event with unelapsed min_curtailed_duration_sec
  is now blocked unless force=true.
- handler_stop_test.go: TestHandler_StopCurtailment_ForceRequiresAdmin
  (non-admin + Force=true → PermissionDenied) and
  TestHandler_StopCurtailment_AdminForcePassesThrough.
- handler_test.go: Stop force role-gate cases added to the matrix
  (viewer/admin × session/API-key).
Two tail-end fixups missed from 1e5cf13 (the commit that dropped the
int32 parameter from BeginRestoreTransition):

- reconciler/restore_test.go: removed the stale beginRestoreLastBatch
  assertion in TestReconciler_EnforceMaxDuration_ElapsedTransitionsToRestoring;
  effective_batch_size is now stamped at Start and not touched by the
  transition.
- handlers/curtailment/handler_start_test.go: updated the startStubStore
  BeginRestoreTransition panic-stub signature to match the new interface.
- Proto + connect/TS regen: rewrite the effective_batch_size docstring to
  match the actual stamp-at-Start behavior (the column is non-null from
  event creation, no operator override exists). Same revision touches the
  models.InsertEventParams godoc.
- StopCurtailmentRequest: add `reserved "restore_batch_size_override"` so
  the deleted name can never be re-introduced by mistake. The field number
  itself is now `bool force`; reserving the old name documents the
  history without resurrecting the slot.
- StartCurtailment RPC: document the AlreadyExists DebugMessage format as
  a stable substring so SDK consumers can recover the existing event's
  identity without parsing an undocumented contract.
- Migration 000051: note that CREATE INDEX / ADD CONSTRAINT run without
  CONCURRENTLY / NOT VALID is safe only because the table is empty at
  deploy; future re-applications against populated tables should use the
  online forms.
toStartResponse built CurtailmentEvent from request + plan and never set
EffectiveBatchSize, so clients reading the value from Start saw 0 while
the persisted row carried the computed batch size. The mismatch forced
callers into a follow-up GetActiveCurtailment poll.

Stamp the adaptive batch size on the plan in Service.Start (single source
of truth for buildInsertParams and the response builder), thread it
through toStartResponse, and update the stale toEventProto inline comment
that still described the field as nil-until-restoring.
runTick derived each per-event timeout from tickCtx, which already had
its own 2×TickInterval deadline. A slow ListNonTerminalEvents or a slow
first event would silently shrink the budget available to later events.
Derive the per-event context from the parent ctx so each restoring event
gets the full 2×TickInterval window; tickCtx.Err() still gates the loop,
so total tick duration stays bounded.

Also tag the restore telemetry-timeout aging path with a dedicated slog
line so triage can distinguish a timed-out-and-wrote target from one
where the follow-up UpdateTargetState DB write itself failed.
The ORDER BY CASE preferred active/restoring over pending when both
existed, but the partial unique index added in 000051 makes "both
exist" impossible: at most one row per org matches the WHERE state IN
clause. The CASE expression also defeated idx_curtailment_event_active,
forcing a sort over the filtered rows.

Drop the ORDER BY entirely; LIMIT 1 now lets the planner satisfy the
lookup directly via the partial unique index.
…g, and Stop error paths

- Restorer: pin the restore-phase analog of the curtail-phase nil-candidate
  guard. A Dispatched+Active target whose candidate row has vanished
  burns retry budget via recordDispatchFailure instead of pinning the
  event indefinitely.
- Service.Start: pin the absolute restore_batch_interval_sec ceiling
  (3600s) for admin callers. The non-admin cap test already exists; this
  pins what happens when an admin caller exceeds the absolute upper
  bound, preventing a future refactor that drops the validateStartRequest
  check from silently passing through to the DB CHECK constraint.
- Service.Start: pin the AlreadyExists fallback when the identity lookup
  itself fails (insertEvent returns ErrCurtailmentNonTerminalEventExists
  and the subsequent GetActiveEvent errors out). Caller still gets
  AlreadyExists; the identity substring is omitted as documented.
- StopCurtailment handler: pin the ListTargetsByEvent error propagation
  path. The transition has already committed when the targets read
  fails, so the handler returns the error without leaking a partial
  response.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d721bfef7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/internal/domain/curtailment/service.go Outdated
79750ab derived each per-event context from the long-lived parent
ctx (workCtx) so that a slow first event would not eat into later
events' budgets. That objective is real, but it also reintroduced
exactly the failure mode the Codex P1 thread (and the explicit ack at
7c6359f) called out: one tick can run for N × 2×TickInterval when
multiple events are present, violating the documented bounded-tick
contract. TestReconciler_RunTickStopsWhenTickBudgetExpires fails on
HEAD with the parent-derived eventCtx because the second event no
longer cancels when tickCtx expires.

Restore the tickCtx-derived form. The aging-log addition from 79750ab
stays in place.
…re and tighten plan comment

The Start handler's translate.go wire path that populates
EffectiveBatchSize had no end-to-end assertion; a regression that
zeroed plan.EffectiveBatchSize before the translate call would not be
caught. Add the assertion to the happy-path handler test (two
candidates, no caller preference clamp to the floor of 10).

Plan.EffectiveBatchSize's godoc claimed Stop and the reconciler read
the value from Plan. Stop never touches Plan, and the reconciler reads
from models.Event after persistence. Rewrite the comment to say
"echoed in the Start response; Stop and the reconciler read the
persisted event row" so future readers don't chase a nonexistent
data-flow.

Add a clarifying comment to the missing-candidate restore test noting
that the disp.uncurtailCalls==0 assertion is enforced by the interval
gate, not by the missing-candidate path it primarily pins.
… persist

When the caller leaves max_duration_seconds unset and !allow_unbounded,
Service.Start fills it from orgConfig.MaxDurationDefaultSec. The upper
bound (maxFiniteDurationSeconds = 604800) is checked by
validateStartRequest, which ran before the fill and saw nil. A
misconfigured org default above the cap would tunnel into the DB and
trip ck_curtailment_event_max_duration_bounds as an Internal error
instead of a clean InvalidArgument.

Mirror the existing <= 0 org-default guard with a > maxFiniteDurationSeconds
guard at the same site. Pin with TestService_Start_RejectsOrgDefaultAboveAbsoluteCeiling.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 35cc67c417

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread proto/curtailment/v1/curtailment.proto Outdated
- Extract candidatesByDeviceID: confirmDispatched, observeActive, and
  confirmDispatchedRestores each built the same map inline. One helper
  replaces three identical 4-line loops.
- Consolidate skip-reason rendering. dispatchOneCurtail used
  skipReasonForDevice (linear scan with priority logic), while
  dispatchRestoreBatch inlined the same priority logic into a map
  builder. Extract skippedDeviceReason as the single source of the
  reason-priority rule (Reason > FilterName > generic). skipReasonForDevice
  becomes a thin wrapper for the single-device path; dispatchRestoreBatch
  calls it from the map-builder loop.
- Trim multi-paragraph rationale comments on Stop, dispatchOneCurtail,
  observeRestoring, and dispatchRestoreBatch. The WHY (restart-safety
  gap, gate semantics, audit-double-batch caveat) survives; the WHAT
  that the code already shows is gone.
CountNonTerminalTargets was exported from the models package with no
production caller (only its own test referenced it). Drafted earlier
when batch sizing was target-count-based; the current design stamps
effective_batch_size from len(plan.Selected) at Start time, so the
count helper is unused. Delete the function and its test; the models
package still ships the canonical state-set predicates next to it.

toStopResponse was a one-line wrapper with a single caller. Inline
into the StopCurtailment handler so the response shape is visible at
the call site, matching how toStartResponse / toEventProtoWithTargets
are already used elsewhere.
Trim multi-paragraph rationale on Service.Start's org-default
upper-bound guard, Service.runSelector's pipeline overview,
Service.Stop's invariant, and toStartRequest's max_duration_seconds=0
sentinel. The WHY (mismatch surfaces as InvalidArgument vs DB CHECK
trip, mutual-exclusion check intent) survives; restated parameter
descriptions and pipeline phase names are gone.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4549263ffa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread proto/curtailment/v1/curtailment.proto
Reserving only the old name left the tag number free, so an older
client that still encodes restore_batch_size_override on tag 2 would
have its varint payload decoded as force on the new shape — silently
flipping force=true on any non-zero value and bypassing the
min_curtailed_duration_sec gate.

Reserve both number and name on StopCurtailmentRequest, and put force
on tag 3. An old client now decodes its tag-2 payload into a reserved
slot (ignored), so the Stop hits the min-duration gate and fails
loudly with FailedPrecondition rather than aliasing into a silent
bypass.
skippedDeviceReason is the shared rendering helper that both the
single-device and batch restore dispatch paths route audit strings
through. Only the Reason!="" branch was exercised before; the
FilterName-only and both-empty fallback branches had no test coverage.
Add a table-driven unit test pinning all three cases.
@rongxin-liu rongxin-liu merged commit 2bf18b5 into main May 21, 2026
69 checks passed
@rongxin-liu rongxin-liu deleted the feat/issue-231-curtailment-stop-restore branch May 21, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client javascript Pull requests that update javascript code server shared

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Curtailment: Stop, staggered restore, and max-duration enforcement

3 participants