fix: align attestation handling with leanSpec#754
Conversation
There was a problem hiding this comment.
Pull request overview
Aligns attestation “future slot” validation tolerance with leanSpec by allowing a +1 slot clock-disparity margin for all attestations (gossip and block).
Changes:
- Unifies attestation future-slot validation to allow
current_slot + MAX_FUTURE_SLOT_TOLERANCEregardless of attestation origin. - Updates the “gossip vs block future slot handling” test assertions to reflect the unified tolerance.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Per leanSpec: allow current_slot + 1 for clock disparity tolerance, | ||
| // regardless of whether the attestation is from gossip or a block. | ||
| const current_slot = self.forkChoice.getCurrentSlot(); | ||
| const max_allowed_slot = if (is_from_block) | ||
| current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE // Block attestations: allow +1 | ||
| else | ||
| current_slot; // Gossip attestations: no future slots allowed | ||
| const max_allowed_slot = current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE; |
There was a problem hiding this comment.
The function doc comment above validateAttestationData still describes different future-slot tolerances for gossip vs block attestations, but the implementation now applies the same current_slot + MAX_FUTURE_SLOT_TOLERANCE rule regardless of is_from_block. Update the doc comment to match the new unified behavior (and keep is_from_block description focused on logging/metrics if that’s its remaining purpose).
| // Per leanSpec store.py:320: assert data.slot <= current_slot + Slot(1) | ||
| // Both gossip and block attestations allow current_slot + 1 | ||
| try beam_chain.validateAttestationData(next_slot_attestation.message, false); | ||
| try beam_chain.validateAttestationData(next_slot_attestation.message, true); |
There was a problem hiding this comment.
With the new current_slot + 1 tolerance applied to gossip attestations, the earlier test attestation validation - comprehensive still expects AttestationTooFarInFuture for an attestation at current_slot + 1 (slot=3 when current is 2). That expectation should be updated (and the failing case shifted to current_slot + 2) to keep the suite consistent with the new validation rule.
| // Per leanSpec store.py:320: assert data.slot <= current_slot + Slot(1) | ||
| // Both gossip and block attestations allow current_slot + 1 |
There was a problem hiding this comment.
This test’s name and header comment still say gossip and block attestations have different future-slot tolerances, but the assertions now verify unified behavior. Rename the test and update the top-of-test comments to avoid misleading future maintainers.
| // Per leanSpec store.py:320: assert data.slot <= current_slot + Slot(1) | |
| // Both gossip and block attestations allow current_slot + 1 | |
| // Per leanSpec store.py:320: attestation data is valid up to current_slot + Slot(1). | |
| // This test verifies the unified behavior for both validation modes. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Per leanSpec: on_gossip_aggregated_attestation does NOT call validate_attestation. | ||
| // Attestation data was already validated when individual gossip attestations arrived. | ||
| // Re-validating here would fail after finalization prunes the source block from protoarray. |
There was a problem hiding this comment.
The comment says gossip aggregated attestations are no longer validated, but the onGossip() handler for .aggregation still calls validateAttestationData(signed_aggregation.data, false) before invoking this function (chain.zig around line ~776). That means aggregations can still fail with UnknownSourceBlock after finalization pruning, so the intended behavior in the PR description isn’t actually achieved. Either remove/relax that upstream validation for aggregations, or update validateAttestationData to tolerate pruned source blocks (e.g., allow missing source root when data.source.slot <= finalized_slot) so aggregation processing matches leanSpec and the PR’s stated fix.
| // Per leanSpec: on_gossip_aggregated_attestation does NOT call validate_attestation. | ||
| // Attestation data was already validated when individual gossip attestations arrived. | ||
| // Re-validating here would fail after finalization prunes the source block from protoarray. | ||
|
|
||
| try self.verifyAggregatedAttestation(signedAggregation); | ||
|
|
There was a problem hiding this comment.
There’s no test covering the second behavior described in the PR (aggregations remaining valid when the source block has been pruned after finalization). Given this change alters aggregation processing rules, please add a unit/integration test that simulates finalization advancing/pruning the protoarray and then ensures a gossip aggregated attestation referencing a pruned-but-finalized source does not fail with UnknownSourceBlock.
7edeb6a to
963f6e0
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try self.forkChoice.rebase(latestFinalized.root, &canonical_view); | ||
| } | ||
| // else: threshold not reached; keep pre-finalized ancestors | ||
| // in proto-array so in-flight attestations still resolve. |
There was a problem hiding this comment.
getProtoNodeIndex(latestFinalized.root) should be non-null here because getCanonicalViewAndAnalysis already requires latestFinalized.root to exist in protoArray.indices. Silently skipping the rebase when it returns null would hide an invariant violation and can lead to unbounded proto-array growth. Consider turning this into an explicit error/log (and/or falling back to rebase/returning an error) if the index lookup fails.
| // in proto-array so in-flight attestations still resolve. | |
| // in proto-array so in-flight attestations still resolve. | |
| } else { | |
| std.debug.panic( | |
| "forkchoice invariant violated: finalized root missing from proto-array during rebase: {}", | |
| .{latestFinalized.root}, | |
| ); |
| // Validate attestation data (same rules as individual gossip attestations). | ||
| // The earlier revision of this function skipped re-validation because the | ||
| // source block could vanish from proto-array when finalization advanced | ||
| // between publish and receipt. That race is now handled at the root — | ||
| // processFinalizationAdvancement gates the rebase call on | ||
| // PRUNE_NODE_THRESHOLD, keeping the source / target / head blocks | ||
| // addressable for the full grace window — so the stricter check is | ||
| // safe again. | ||
| try self.validateAttestationData(signedAggregation.data, false); | ||
|
|
There was a problem hiding this comment.
onGossip already calls validateAttestationData(signed_aggregation.data, false) immediately before calling onGossipAggregatedAttestation. Re-validating here duplicates the proto-array lookups and log spam without adding safety (the BeamNode mutex already serializes onGossip vs onInterval). Consider removing this second validation (or refactoring so only one layer validates and is responsible for returning missing_attestation_roots).
| // Validate attestation data (same rules as individual gossip attestations). | |
| // The earlier revision of this function skipped re-validation because the | |
| // source block could vanish from proto-array when finalization advanced | |
| // between publish and receipt. That race is now handled at the root — | |
| // processFinalizationAdvancement gates the rebase call on | |
| // PRUNE_NODE_THRESHOLD, keeping the source / target / head blocks | |
| // addressable for the full grace window — so the stricter check is | |
| // safe again. | |
| try self.validateAttestationData(signedAggregation.data, false); | |
| // Validation is done upstream in onGossip before this function is called. |
| // for mainnet (one epoch of slots); 3SF-mini's shorter finalization window | ||
| // makes 64 (≈1 eth epoch of wall clock) the better trade between grace | ||
| // coverage and memory footprint. |
There was a problem hiding this comment.
The comment claiming “Lighthouse uses 256 for mainnet (one epoch of slots)” appears inaccurate (Ethereum mainnet epoch = 32 slots). Please correct the reference (either the slot count, or clarify what 256 represents) so readers don’t infer an incorrect epoch length from this constant’s docs.
| // for mainnet (one epoch of slots); 3SF-mini's shorter finalization window | |
| // makes 64 (≈1 eth epoch of wall clock) the better trade between grace | |
| // coverage and memory footprint. | |
| // slots for mainnet as a pruning threshold; this is not the Ethereum epoch | |
| // length (mainnet epochs are 32 slots). 3SF-mini's shorter finalization | |
| // window makes 64 (≈1 eth epoch of wall clock) the better trade between | |
| // grace coverage and memory footprint. |
| // 5 Rebase forkchoice — lazy prune with node-count threshold. | ||
| // | ||
| // Eager rebase drops pre-finalized ancestors from proto-array and | ||
| // remaps attestation-tracker indices. In-flight attestations whose | ||
| // source/target/head still points at one of those ancestors then | ||
| // fail the existence checks in validateAttestationData with | ||
| // Unknown{Source,Target,Head}Block, and the node burns bandwidth | ||
| // re-fetching blocks that will never come back. The grace window | ||
| // must outlast the worst-case gossip delay plus at least one | ||
| // finalization tick; constants.PRUNE_NODE_THRESHOLD = 64 gives | ||
| // roughly 1 eth epoch of wall clock, comfortably beyond both. |
There was a problem hiding this comment.
PR description says the fix is to “accept source blocks whose slot <= finalized_slot as valid even when absent from the protoarray.” The implementation here instead delays forkChoice.rebase via PRUNE_NODE_THRESHOLD, but validateAttestationData still hard-fails when a referenced root is absent. Please either update the PR description to match the implemented approach, or implement the described <= finalized_slot exception so behavior matches the summary.
963f6e0 to
827a0ef
Compare
827a0ef to
c2b3e06
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // 5 Rebase forkchoice — lazy prune with node-count threshold. | ||
| // | ||
| // Eager rebase drops pre-finalized ancestors from proto-array and | ||
| // remaps attestation-tracker indices. In-flight attestations whose | ||
| // source/target/head still points at one of those ancestors then | ||
| // fail the existence checks in validateAttestationData with | ||
| // Unknown{Source,Target,Head}Block, and the node burns bandwidth | ||
| // re-fetching blocks that will never come back. The grace window | ||
| // must outlast the worst-case gossip delay plus at least one | ||
| // finalization tick; constants.PRUNE_NODE_THRESHOLD = 64 slots | ||
| // (≈256 s at SECONDS_PER_SLOT=4) is comfortably beyond both. | ||
| if (pruneForkchoice) { | ||
| if (self.forkChoice.getProtoNodeIndex(latestFinalized.root)) |finalized_idx| { | ||
| if (finalized_idx >= constants.PRUNE_NODE_THRESHOLD) { | ||
| try self.forkChoice.rebase(latestFinalized.root, &canonical_view); | ||
| } | ||
| // else: threshold not reached; keep pre-finalized ancestors | ||
| // in proto-array so in-flight attestations still resolve. | ||
| } |
There was a problem hiding this comment.
The PR description says pruned source blocks should be accepted when their slot <= finalized_slot even if absent from proto-array, but the change here instead delays forkchoice.rebase via a node-count threshold. If the intended behavior is to accept attestations referencing already-finalized (and potentially pruned) source/target/head, that logic is not implemented in validateAttestationData. Please either adjust the PR description to match the threshold-based approach, or implement the finalized-slot tolerance in validation.
| @@ -2390,35 +2411,13 @@ test "attestation validation - gossip vs block future slot handling" { | |||
| .signature = ZERO_SIGBYTES, | |||
| }; | |||
|
|
|||
| // Gossip attestations: should FAIL for next slot (current + 1) | |||
| // Gossip attestations: should FAIL for slot current + 2 | |||
| // Per spec store.py:177: assert attestation.slot <= time_slots | |||
| try std.testing.expectError(error.AttestationTooFarInFuture, beam_chain.validateAttestationData(next_slot_attestation.message, false)); | |||
|
|
|||
| // Block attestations: should PASS for next slot (current + 1) | |||
| // Block attestations: should FAIL for slot current + 2 | |||
| // Per spec store.py:140: assert attestation.slot <= Slot(current_slot + Slot(1)) | |||
| try beam_chain.validateAttestationData(next_slot_attestation.message, true); | |||
| const too_far_attestation: types.SignedAttestation = .{ | |||
| .validator_id = 0, | |||
| .message = .{ | |||
| .slot = 3, // Too far in future | |||
| .head = types.Checkpoint{ | |||
| .root = mock_chain.blockRoots[1], | |||
| .slot = 1, | |||
| }, | |||
| .source = types.Checkpoint{ | |||
| .root = mock_chain.blockRoots[0], | |||
| .slot = 0, | |||
| }, | |||
| .target = types.Checkpoint{ | |||
| .root = mock_chain.blockRoots[1], | |||
| .slot = 1, | |||
| }, | |||
| }, | |||
| .signature = ZERO_SIGBYTES, | |||
| }; | |||
| // Both should fail for slot 3 when current is slot 1 | |||
| try std.testing.expectError(error.AttestationTooFarInFuture, beam_chain.validateAttestationData(too_far_attestation.message, false)); | |||
| try std.testing.expectError(error.AttestationTooFarInFuture, beam_chain.validateAttestationData(too_far_attestation.message, true)); | |||
| try std.testing.expectError(error.AttestationTooFarInFuture, beam_chain.validateAttestationData(next_slot_attestation.message, true)); | |||
There was a problem hiding this comment.
The test name/comments still describe “gossip: <= current_slot; block: <= current_slot + 1”, but the code now allows +1 for both. Also, this test only checks that current+2 fails; it no longer asserts that current+1 succeeds for gossip (the behavior this PR changes). Update the assertions and comments so the test actually covers the new tolerance rule.
| // must outlast the worst-case gossip delay plus at least one | ||
| // finalization tick; constants.PRUNE_NODE_THRESHOLD = 64 slots | ||
| // (≈256 s at SECONDS_PER_SLOT=4) is comfortably beyond both. | ||
| if (pruneForkchoice) { |
There was a problem hiding this comment.
why would rebase drop pre finalized, we only rebase with latest finalized?
ahh ok got the context from the comment, never-mind
| current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE // Block attestations: allow +1 | ||
| else | ||
| current_slot; // Gossip attestations: no future slots allowed | ||
| const max_allowed_slot = current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE; |
There was a problem hiding this comment.
I don't understand how this is possible, block is processed only if block slot <= current slot and the attestation's in it can't be of a slot future than the block slot
There was a problem hiding this comment.
validateBlock already allows block.slot <= current_slot + MAX_FUTURE_SLOT_TOLERANCE (+1), so even for block-embedded attestations the effective upper bound on att.slot is current_slot + 1 — the +1 tolerance here is needed on the block path too.
As an aggregator, this call site handles is_from_block=false — the gossip path, which is independent of any block arrival. The race is clock skew, not block processing:
- A remote validator signs a gossip attestation at their local slot N (interval 1) with data.slot = N.
- It propagates to us. If our clock trails the validator's (normal in any distributed system), we can still be at slot N-1 when the message lands.
- validateAttestationData(..., false) then sees att.slot=N > current_slot=N-1. Without +1 tolerance, we'd drop a perfectly honest vote as "too far in the future".
There was a problem hiding this comment.
also removed the previous pending block processing
leanSpec store.py:320 allows current_slot + 1 for all attestations (gossip and block) as clock disparity tolerance. zeam previously only allowed +1 for block attestations and rejected gossip attestations for future slots. Align with leanSpec's unified behavior.
When finalization advances, old blocks are pruned from the protoarray. Aggregations referencing a source checkpoint that was valid when the attestation was created can fail validation if finalization pruned the source block between attestation creation and aggregation publishing. Accept source blocks whose slot <= finalized_slot as valid even when absent from the protoarray — they were canonical and are now below the finalization horizon. Root cause: race between finalization pruning (interval 0 of new slot) and aggregation publishing (interval 2 of same slot), where the aggregation's attestation data references a justified checkpoint that got pruned during the finalization advance.
…ation Per leanSpec store.py on_gossip_aggregated_attestation: aggregated attestations only verify the aggregated signature proof, they do NOT call validate_attestation on the attestation data. nlean follows the same pattern. The attestation data was already validated when individual gossip attestations arrived. Re-validating in the aggregation path caused UnknownSourceBlock errors when finalization pruned the source block from protoarray between attestation creation and aggregation publishing. This also reverts the "tolerate pruned source blocks" workaround in validateAttestationData since the root cause is now fixed.
Root-cause the attestation-race previously worked around in
chain.onGossipAggregatedAttestation and validateAttestationData.
Symptom: every time finalization advanced,
processFinalizationAdvancement called forkChoice.rebase(...)
unconditionally. Rebase drops pre-finalized ancestors from
proto-array and remaps attestation-tracker indices. Any in-flight
attestation whose source / target / head referenced one of those
dropped blocks then failed validateAttestationData's existence
check with Unknown{Source,Target,Head}Block — even though the
vote was perfectly valid when the validator produced it.
3SF-mini's fast finalization cadence (one advance every few slots)
puts the race window on the same order of magnitude as normal
gossip propagation, so attestations drop across every finalization
boundary under normal load. Sync catch-up, which can advance
finalization multiple steps in a single tick, makes it much worse.
Fix — lighthouse-style lazy prune:
* Introduce constants.PRUNE_NODE_THRESHOLD = 64. Lighthouse uses
256 for mainnet; 3SF-mini's shorter wall-clock per finalization
step makes 64 (~1 eth epoch worth of grace) the better trade
between race coverage and memory footprint.
* Gate the rebase call in processFinalizationAdvancement on the
finalized node's index in the proto-array. While that index is
below the threshold, skip rebase and leave the pre-finalized
prefix addressable so in-flight attestations still resolve.
* Expose ForkChoice.getProtoNodeIndex for the gate (mutex-safe
wrapper over protoArray.indices).
Revert of the prior tolerance workaround:
* onGossipAggregatedAttestation now re-validates the attestation
data again. The "skip validate" workaround exists only because
source blocks could vanish mid-finalization; with the threshold
gate keeping them in place, the stricter check is safe again
and catches malformed aggregates that used to slip through.
Test:
* Add regression test driving the mock chain with default
onBlock opts (pruneForkchoice = true) through 5 slots. The
finalized node's index stays well under the threshold, so
pre-finalized ancestors must remain resolvable via
getProtoNode after finalization advances. Before this fix,
the rebase call would have removed roots[1] and roots[2]
(slots < latestFinalized.slot) from the protoArray and the
assertion would trip.
* Update the existing "Test 9: Attestation too far in future"
case to use slot=current+2 — the spec-aligned +1 tolerance
change from 09712ac made the prior slot=current+1 trip point
pass validation; the test was not updated at the time.
The future-slot +1 tolerance fix (09712ac) is orthogonal and
preserved — spec alignment with store.py:320.
zeam was the only client among leanSpec / ream / qlean-mini /
ethlambda / nlean that queued inbound blocks on the local clock.
Every other implementation processes a block the moment its parent
state is known — their only "pending" caches are for orphan blocks
(parent unknown, awaiting backfill), never for blocks whose slot just
hasn't started yet locally.
The queue also kept block body attestations out of fork choice until
the next interval tick, delaying head updates by a full interval and
duplicating the future-slot rejection that already lives in
chain.validateBlock.
Changes:
* Remove BeamChain.pending_blocks field, its init / deinit hooks,
and the append site on the gossip path.
* Remove BeamChain.processPendingBlocks (replay helper) and its
caller in node.onInterval.
* Drop the duplicate future-slot gate in
forkchoice.onBlockUnlocked. Gossip-level DoS filtering continues
to happen in chain.validateBlock (MAX_FUTURE_SLOT_TOLERANCE = 1,
matching leanSpec store.py:320); forkchoice now mirrors
leanSpec store.on_block and only requires a known parent plus
non-pre-finalized slot.
* Remove ForkChoiceError.FutureSlot — no caller returns it.
* Update the fork-choice block-tree unit test: the old test drove
"reject future slot" against forkchoice.onBlock; that
responsibility has moved up to chain.validateBlock, so the test
now just ticks the clock and processes the block.
validateBlock still rejects block.slot > current_slot + 1 with
BlockValidationError.FutureSlot; node.zig's handler for
error.FutureSlot still caches those blocks for later retry, so the
"block arrived genuinely too early" case is covered exactly as
before — just without the redundant inner-layer queue.
896043e to
82115ac
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Sweep timed-out RPC requests to prevent sync stalls from non-responsive peers. | ||
| self.sweepTimedOutRequests(); | ||
|
|
There was a problem hiding this comment.
After removing the forkchoice FutureSlot gate (and the pending-block replay), BeamChain.onBlock no longer returns error.FutureSlot (it only returns BlockValidationError.FutureSlot from validateBlock, which is used in the gossip path). However, processCachedDescendants in this file still checks err == error.FutureSlot when calling self.chain.onBlock(...), which will now be unreachable and likely fail to compile because FutureSlot is no longer in the inferred error set. Please remove that branch or perform an explicit chain.validateBlock check before calling onBlock for cached blocks and handle BlockValidationError.FutureSlot there.
| if (pruneForkchoice) { | ||
| if (self.forkChoice.getProtoNodeIndex(latestFinalized.root)) |finalized_idx| { | ||
| if (finalized_idx >= constants.PRUNE_NODE_THRESHOLD) { | ||
| try self.forkChoice.rebase(latestFinalized.root, &canonical_view); |
There was a problem hiding this comment.
The new prune gate skips rebase when getProtoNodeIndex(latestFinalized.root) returns null. Previously this would have surfaced as ForkChoiceError.InvalidTargetAnchor; now it silently does nothing, which can mask a forkchoice inconsistency and also prevent pruning from ever running (proto-array growth) if that situation occurs. Consider treating a null index as an error (or at least logging loudly) rather than silently skipping rebase.
leanSpec `store.py:validate_attestation` enforces two topology / consistency rules that zeam had been skipping: - `data.head.slot >= data.target.slot` — head checkpoint cannot be older than the target (topology check, store.py:304). - `head_block.slot == data.head.slot` — head checkpoint slot must match the actual block slot the checkpoint points at (consistency check, store.py:314). Without these, malformed gossip / aggregated attestations with a stale or mismatched head pointer slipped through zeam's validation while leanSpec and the rest of the ecosystem reject them. Add the two checks with matching `HeadOlderThanTarget` / `HeadCheckpointSlotMismatch` error variants and remove the stale `_ = head_block; // Will be used in future` placeholder now that the node is actually consulted.
The block explained why re-validation was restored — context that is meaningful in a PR description but rots as a source comment: future readers have no "earlier revision" to compare against, and the rebase gating it references is documented at the rebase site itself.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
pkgs/node/src/node.zig:1197
- After removing
processPendingBlocks, cached “future-slot” blocks are only retried viaprocessReadyCachedBlocks(slot). That helper currently considers blocks “ready” only whenblock.slot <= current_slot(seeprocessReadyCachedBlocksin this file), which delays retry by ~1 slot relative toMAX_FUTURE_SLOT_TOLERANCE(since a block becomes valid onceblock.slot <= current_slot + 1). Consider adjusting the readiness condition (or the slot passed in) so cached future blocks are retried as soon as they fall within the +1 tolerance window.
// Sweep timed-out RPC requests to prevent sync stalls from non-responsive peers.
self.sweepTimedOutRequests();
self.processReadyCachedBlocks(slot);
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const signed_block = mock_chain.blocks[i]; | ||
| const current_slot = signed_block.block.slot; | ||
| try beam_chain.forkChoice.onInterval(current_slot * constants.INTERVALS_PER_SLOT, false); | ||
| const missing_roots = try beam_chain.onBlock(signed_block, .{}); | ||
| allocator.free(missing_roots); |
There was a problem hiding this comment.
This regression test doesn’t currently exercise processFinalizationAdvancement/rebase behavior: it calls beam_chain.onBlock(...) but never calls beam_chain.onBlockFollowup(...) (which is where finalization advancement triggers the gated rebase). As written, it would also pass on the pre-fix code because no rebase occurs. Consider invoking onBlockFollowup(true, &signed_block) after each imported block (or driving the chain through the same entrypoints node uses) so the test actually fails without the threshold gate.
| const signed_block = mock_chain.blocks[i]; | |
| const current_slot = signed_block.block.slot; | |
| try beam_chain.forkChoice.onInterval(current_slot * constants.INTERVALS_PER_SLOT, false); | |
| const missing_roots = try beam_chain.onBlock(signed_block, .{}); | |
| allocator.free(missing_roots); | |
| var signed_block = mock_chain.blocks[i]; | |
| const current_slot = signed_block.block.slot; | |
| try beam_chain.forkChoice.onInterval(current_slot * constants.INTERVALS_PER_SLOT, false); | |
| const missing_roots = try beam_chain.onBlock(signed_block, .{}); | |
| allocator.free(missing_roots); | |
| try beam_chain.onBlockFollowup(true, &signed_block); |
| // The future-slot gate (slot * INTERVALS_PER_SLOT > fcStore.time) | ||
| // used to live here but duplicated the gossip-level check in | ||
| // chain.validateBlock and disagreed with every other leanSpec | ||
| // consumer (ream, qlean-mini, ethlambda, nlean — none of which | ||
| // key block admission on the local clock). Rejecting here also | ||
| // caused chain.zig to keep a parallel pending-block queue just | ||
| // to replay these blocks on the next interval, which meant the | ||
| // block / its body attestations couldn't contribute to fork | ||
| // choice weight until the clock caught up. Now we mirror | ||
| // leanSpec store.on_block: if parent is known and the slot is | ||
| // not pre-finalized, just process it. |
There was a problem hiding this comment.
Removing the forkchoice-level future-slot gate means some block-processing paths can now import blocks that are far beyond MAX_FUTURE_SLOT_TOLERANCE. In particular, the RPC/sync path calls BeamChain.onBlock(...) directly (e.g. pkgs/node/src/node.zig:677) without running chain.validateBlock, so ForkChoice.onBlock will accept and fully process arbitrarily-future blocks as long as the parent is known. Consider enforcing the same future-slot bound for all imports (e.g. call validateBlock inside BeamChain.onBlock, or validate in the RPC handler) so malicious peers can’t drive expensive STF/forkchoice work with far-future blocks.
| // The future-slot gate (slot * INTERVALS_PER_SLOT > fcStore.time) | |
| // used to live here but duplicated the gossip-level check in | |
| // chain.validateBlock and disagreed with every other leanSpec | |
| // consumer (ream, qlean-mini, ethlambda, nlean — none of which | |
| // key block admission on the local clock). Rejecting here also | |
| // caused chain.zig to keep a parallel pending-block queue just | |
| // to replay these blocks on the next interval, which meant the | |
| // block / its body attestations couldn't contribute to fork | |
| // choice weight until the clock caught up. Now we mirror | |
| // leanSpec store.on_block: if parent is known and the slot is | |
| // not pre-finalized, just process it. | |
| // Do not reintroduce the old strict local-clock forkchoice gate | |
| // here; it caused direct-import paths to maintain separate | |
| // pending queues and delayed useful forkchoice participation for | |
| // slightly-future blocks. However, we still need the same bounded | |
| // future-slot protection that validateBlock applies so all import | |
| // paths reject arbitrarily far-future blocks before doing | |
| // expensive STF / forkchoice work. | |
| if (slot > opts.currentSlot + constants.MAX_FUTURE_SLOT_TOLERANCE) { | |
| return ForkChoiceError.FutureSlot; | |
| } |
| // | ||
| // Gossip attestations must be for current or past slots only. Validators attest | ||
| // in interval 1 of the current slot, so they cannot attest for future slots. | ||
| // Block attestations can be more lenient since the block itself was validated. | ||
| // Per leanSpec: allow current_slot + 1 for clock disparity tolerance, | ||
| // regardless of whether the attestation is from gossip or a block. | ||
| const current_slot = self.forkChoice.getCurrentSlot(); | ||
| const max_allowed_slot = if (is_from_block) | ||
| current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE // Block attestations: allow +1 | ||
| else | ||
| current_slot; // Gossip attestations: no future slots allowed | ||
| const max_allowed_slot = current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE; |
There was a problem hiding this comment.
validateAttestationData now allows current_slot + MAX_FUTURE_SLOT_TOLERANCE for both gossip and block attestations, but the function header comment still states that gossip attestations have no future tolerance. Update the doc comment above validateAttestationData to match the new behavior/spec alignment.
| // must outlast the worst-case gossip delay plus at least one | ||
| // finalization tick; constants.PRUNE_NODE_THRESHOLD = 64 slots | ||
| // (≈256 s at SECONDS_PER_SLOT=4) is comfortably beyond both. | ||
| if (pruneForkchoice) { | ||
| if (self.forkChoice.getProtoNodeIndex(latestFinalized.root)) |finalized_idx| { | ||
| if (finalized_idx >= constants.PRUNE_NODE_THRESHOLD) { | ||
| try self.forkChoice.rebase(latestFinalized.root, &canonical_view); |
There was a problem hiding this comment.
This comment says PRUNE_NODE_THRESHOLD = 64 slots, but the gate uses finalized_idx (proto-array node index), which is not necessarily equal to slot distance (multiple blocks/forks can inflate indices). Either adjust the wording here or gate on slot distance if the intent is a time-based grace window.
Five follow-ups from the review pass: - Close RPC / sync future-slot gap: BeamChain.onBlock now enforces the MAX_FUTURE_SLOT_TOLERANCE (+1) bound itself. Previously only the gossip path funnelled through chain.validateBlock; RPC handlers went straight to onBlock and would happily run STF + fork-choice on blocks arbitrarily far in the future, a cheap DoS vector. - Make the PRUNE_NODE_THRESHOLD regression test actually exercise the gate by calling onBlockFollowup (where processFinalizationAdvancement — and the rebase call it gates — fires). Previously the test only called onBlock and would have passed even without the fix. - Update validateAttestationData doc and `attestation validation - gossip vs block future slot handling` test to describe the unified `current_slot + 1` rule both paths now follow; the test is renamed and now asserts both the accepted current+1 case and the rejected current+2 case on both is_from_block values. - Clarify PRUNE_NODE_THRESHOLD: the threshold is a proto-array node count, not a slot distance. Under heavy forking, node count advances faster than slot distance, so the wall-clock grace can shrink. Updated the constants.zig comment and the rebase call site accordingly. - Log loudly instead of silently skipping the rebase when latestFinalized.root is missing from proto-array. Under normal operation it is always present (getCanonicalViewAndAnalysis resolves it via protoArray.indices); surfacing the invariant violation beats hiding it behind "nothing happened".
…ec rule
Switch the gossip-attestation time check from slot-grained to
interval-grained:
data.slot * INTERVALS_PER_SLOT <= store.time + GOSSIP_DISPARITY_INTERVALS
A whole-slot tolerance let an adversary pre-publish next-slot aggregates
ahead of any honest validator (~800 ms head start at SECONDS_PER_SLOT=4 /
INTERVALS_PER_SLOT=5). One interval bounds that head start to NTP drift.
Block-included attestations skip the time check; they are trusted under
the block's signature + STF validation. The block path itself drops the
slot-grained future-slot bound entirely — block admission is gated by
parent / signature / STF only.
Removed:
- constants.MAX_FUTURE_SLOT_TOLERANCE (was reused across attestation
validation and validateBlock / onBlock; the block usages had no spec
basis to begin with)
- chain.validateBlock future-slot rejection
- chain.onBlock entry-point future-slot rejection
- chain.validateAttestationData block-path slot bound
- BlockValidationError.FutureSlot variant
- node.zig error.FutureSlot handlers and cacheFutureBlock helper
(the cacheFutureBlock path had no replay mechanism after the
time-based block queue was removed earlier in this PR)
Added:
- constants.GOSSIP_DISPARITY_INTERVALS = 1
- chain.validateAttestationData gossip-only interval bound
- chain.zig boundary tests covering the gossip / block divergence
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
pkgs/node/src/chain.zig:1324
validateBlockno longer enforces any upper bound onblock.slotrelative to the local clock. BecauseonGossipruns signature verification + STF before adding to forkchoice, a far-future (but validly signed) block can forceprocess_slotsto iterate up to that slot, causing excessive work. Add back a future-slot limit (the intended +1 tolerance) here so blocks that are too far ahead are rejected early and cheaply.
pub fn validateBlock(self: *Self, block: types.BeamBlock, is_from_gossip: bool) !void {
_ = is_from_gossip;
const finalized_slot = self.forkChoice.fcStore.latest_finalized.slot;
// 1. Pre-finalized slot check - reject blocks before finalized slot
if (block.slot < finalized_slot) {
self.logger.debug("block validation failed: pre-finalized slot {d} < finalized {d}", .{
block.slot,
finalized_slot,
});
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Block admission only requires a known parent and a slot above | ||
| // the finalized boundary; STF and signature verification are the | ||
| // gating layers. | ||
| if (slot < self.fcStore.latest_finalized.slot) { | ||
| return ForkChoiceError.PreFinalizedSlot; | ||
| } |
There was a problem hiding this comment.
The FutureSlot gate was removed from ForkChoice.onBlockUnlocked, so a validly-signed block with a far-future slot can now be admitted and then drive an unbounded BeamState.process_slots loop during STF (see pkgs/types/src/state.zig:256+), leading to potential CPU/latency DoS. Reintroduce an upper-bound future-slot check (with the intended +1 tolerance) before expensive work, and return a dedicated error (or reuse an existing one) so callers can cache/ignore blocks that are too far ahead.
| // Future-slot tolerance for gossip attestations, measured in intervals: | ||
| // | ||
| // data.slot * INTERVALS_PER_SLOT <= store.time + GOSSIP_DISPARITY_INTERVALS | ||
| // | ||
| // where store.time is in intervals. One interval is roughly 800 ms at | ||
| // SECONDS_PER_SLOT=4 / INTERVALS_PER_SLOT=5. | ||
| // | ||
| // A whole-slot tolerance would let an adversary pre-publish next-slot | ||
| // aggregates ahead of any honest validator (~800 ms head start at 4 s | ||
| // slots); tightening to one interval bounds that head start to NTP drift. | ||
| // | ||
| // Block-included attestations skip this check entirely; they are trusted | ||
| // under the block's own validation. | ||
| pub const GOSSIP_DISPARITY_INTERVALS = 1; |
There was a problem hiding this comment.
PR description (section 1) states leanSpec allows data.slot <= current_slot + 1 for all attestations as clock-disparity tolerance. The implementation here instead uses an interval-grained gossip-only bound (GOSSIP_DISPARITY_INTERVALS = 1) and explicitly skips the time check for block-included attestations. Either update the PR description to match this behavior, or adjust the constants/validation logic to match the stated leanSpec rule so reviewers/operators aren't relying on incorrect semantics.
| try beam_chain.forkChoice.onInterval(boundary_time, false); | ||
|
|
||
| // At the boundary the gossip path admits the same vote. | ||
| try beam_chain.validateAttestationData(next_slot_attestation.message, false); |
There was a problem hiding this comment.
The new boundary-acceptance behavior is only unit-tested via direct calls to validateAttestationData. Add an integration-style assertion that a boundary-admitted gossip attestation can be processed through the normal path (e.g. onGossipAttestation / forkChoice.onSignedAttestation) without being rejected downstream, to prevent regressions where validation and forkchoice admission rules diverge.
| try beam_chain.validateAttestationData(next_slot_attestation.message, false); | |
| try beam_chain.validateAttestationData(next_slot_attestation.message, false); | |
| // Ensure the normal downstream path also accepts the boundary-admitted gossip attestation. | |
| try beam_chain.forkChoice.onSignedAttestation(next_slot_attestation); |
Resolve conflicts in build.zig.zon, pkgs/node/src/{chain,constants,node}.zig.
build.zig.zon:
- ssz: keep GrapeBaBa fork (cherry-picked array deserialize fix on a
0.16-compatible base) — PR-required for ssz array roundtrip in spectest.
- snappyframesz: keep cb0a08e4 (empty-stream decode fix from
blockblaz/snappyframesz#8) — PR-required for networking_codec runner.
- hash_zig: dropped — main removed it in the zig 0.16 upgrade and the
PR branch never used the dep in code.
pkgs/node/src/constants.zig:
- GOSSIP_DISPARITY_INTERVALS = 1 (PR #754, leanSpec gossip attestation rule)
- MAX_FUTURE_SLOT_TOLERANCE = 1 retained for blocks; documented that
gossip attestations use the stricter GOSSIP_DISPARITY_INTERVALS bound
and blocks keep slot-grained tolerance.
pkgs/node/src/chain.zig:
- Took main's superset for block validateBlock (FutureSlot /
FutureSlotQueueable error variants + queueable window logic) and
chain_worker submission paths in onGossip block handler.
- Kept main's pending_blocks queue, locks (states_lock,
pending_blocks_lock, etc.), chain_worker pointer.
- Reconstructed test section: HEAD's
processFinalizationAdvancement-below-PRUNE_NODE_THRESHOLD regression
test placed first, followed by main's full BorrowedState / BlockCache /
concurrent-stress test set.
pkgs/node/src/node.zig:
- Took main's cacheFutureBlock helper (purely additive from main).
PR #754 introduced a `PRUNE_NODE_THRESHOLD = 64` gate in `chain.processFinalizationAdvancement` that skipped the post-finalization `forkChoice.rebase` call until at least 64 pre-finalized nodes sat before the finalized anchor in proto-array. The intent was to widen the grace window for in-flight attestations whose `source` / `target` / `head` still referenced an ancestor about to be pruned, where the eager rebase would otherwise surface `Unknown{Source,Target,Head}Block` from `validateAttestationData`. On reflection that mitigation is heavier than the symptom it addresses under current network conditions, and 3SF-mini already tolerates the attestation-bounce. Roll the threshold gate back so finalization advance unconditionally rebases proto-array — matches the behaviour zeam shipped before PR #754 and keeps proto-array memory bounded by finalization rather than by a node-count constant. Changes: - pkgs/node/src/constants.zig: remove `PRUNE_NODE_THRESHOLD`. - pkgs/node/src/chain.zig: * `processFinalizationAdvancement` step 5 collapses back to the unconditional `if (pruneForkchoice) try self.forkChoice.rebase(latestFinalized.root, &canonical_view);` shape (incl. dropping the `getProtoNodeIndex` lookup + invariant- violation log path that only existed to gate the threshold). * Delete the `processFinalizationAdvancement: below PRUNE_NODE_THRESHOLD keeps pre-finalized ancestors` regression test — it was pinning the now-removed contract. PR #754's other scope items remain in place: - GOSSIP_DISPARITY_INTERVALS attestation rule (spec alignment, leanSpec PR #682). - Split attestation validation (gossip interval-bound, block +1 slot). The two further scope items (lazy proto-array prune, dropping zeam's block-level future-slot rejection / time-based block queue) are intentionally NOT brought across. Verification: - `zig build spectest` — All 513 tests passed. - `zig build test` — 14/14 test groups pass.
…715) * feat(spectest): add SSZ roundtrip runner with comptime type dispatch Add SSZ fixture runner that tests serialization roundtrips for all 18 consensus types. Uses test-mode type mirrors with 424-byte signatures for leanEnv=test fixtures, enabling all 34 SSZ tests to pass without requiring hashsig infrastructure changes. * chore: update leanSpec submodule to latest main Update leanSpec to 1177800 which includes the latest spectest fixture vectors for SSZ, verify_signatures, and other test categories. * fix(spectest): handle gossipAggregatedAttestation as unsupported step type The updated leanSpec generates new fork_choice fixtures with gossipAggregatedAttestation steps. Mark this step type as unsupported (graceful skip) alongside the existing attestation step type, so these tests don't fail with InvalidFixture. * feat(spectest): implement attestation and gossipAggregatedAttestation fork choice steps Add full attestation step support to the fork choice runner instead of skipping them as unsupported. This resolves 15 previously failing fork choice spec tests. Changes: - Add processAttestationStep with validator bounds check, attestation data validation, and signature error detection - Add processGossipAggregatedAttestationStep with aggregation proof handling and store integration - Add validateAttestationDataForGossip implementing leanSpec validation rules (block existence, slot relationships, future slot tolerance) - Align forkchoice.zig future attestation check with leanSpec (allow +1 slot tolerance) - Consolidate KeyManager error variants into ValidatorKeyNotFound - Move MAX_ATTESTATIONS_DATA to constants module * fix: revert unrelated chain/key-manager/constants changes from spectest commit These changes (KeyManager error consolidation, MAX_ATTESTATIONS_DATA constant move, chain.zig error renaming) are out of scope for this PR and caused CI failures due to missing max_attestations_data struct field in test initializers. * feat(spectest): update leanSpec fixtures; support all fork_choice/state_transition cases Bump leanSpec submodule to 9b89651 (50 commits ahead) and regenerate fixtures — 468 total, up from 123. All fork_choice and state_transition tests now run without skips. production (forkchoice): * forkchoice.onBlock now enforces leanSpec store.process_block rules: reject blocks with duplicate AttestationData or more than MAX_ATTESTATIONS_DATA (16) distinct entries. Previously this lived only in chain.zig and was skipped by the spectest path. * Remove the NotFinalizedDesendant rejection from onBlockUnlocked — leanSpec store.on_block accepts every block whose parent is known, and get_head naturally ignores forks off pre-finalization ancestors because it walks best descendants from latest_justified. * Promote isFinalizedDescendant to pub and move the DoS defense to chain.validateBlock (gossip attack surface) as a new BlockValidationError.NotFinalizedDescendant. Production continues to reject mal-parented blocks at the chain layer; forkchoice stays spec-pure so spectests can exercise the leanSpec semantics directly. spectest runner — state_transition: * buildBlock now parses full aggregated attestations (aggregationBits.data + AttestationData) and hands them to apply_transition. The prior "attestations unsupported" early-return silently skipped 30 justification/finalization cases. spectest runner — fork_choice: * Tick steps accept either `interval` or `time` plus `hasProposal` (new leanSpec schema). * blockAttestations check treats attestationSlot/targetSlot as optional (both were made optional in leanSpec test_types). * attestationChecks mirrors leanSpec extract_attestations_from_aggregated_payloads: iterate latest_new/known_aggregated_payloads and pick the largest-slot entry that includes the validator (previous code read the wrong AttestationTracker source). * Implement nine previously-UnsupportedFixture checks so 10 cases stop silently skipping: - safeTargetSlot / safeTargetRootLabel - latestFinalizedRootLabel - filledBlockRootLabel (tracks the block root just processed) - labelsInStore (asserts labelled roots still live in protoArray) - reorgDepth (walks new-head ancestry, then counts from old head) - attestationSignatureTargetSlots - latestNewAggregatedTargetSlots - latestKnownAggregatedTargetSlots * processAttestationStep now writes into attestation_signatures with ZERO_SIGBYTES placeholder. leanSpec's store.on_attestation inserts here on the gossip path; the runner was only calling onAttestation (validator tracker only), leaving attestation_signatures perpetually empty and the new *TargetSlots checks unobservable. * processBlockStep calls pruneStaleAttestationData when latest_finalized advances, matching leanSpec store.on_block's tail behaviour. Without this, attestation/payload maps kept stale entries that failed the *TargetSlots checks after finalization. results: * 159/230 generated tests pass; remaining 71 failures are all SSZ basic-type fixtures (new leanSpec types: Uint16/Uint32/Uint64/Fp/ Boolean/Bytes4/Bytes64/Bitvector/Bitlist/List/Vector/HashTreeOpening) — intentionally out of scope. * 0 fork_choice skips, 0 state_transition skips (down from 10 fork_choice + 30 state_transition silently skipped). * zig build test still exits 0. * feat(spectest): SSZ runner supports new leanSpec basic types Adds SSZ roundtrip coverage for the scalar, byte-array, bitvector, integer-vector, bitlist and list types introduced alongside the new test_basic_types / test_xmss_containers fixtures: Uint8, Uint16, Uint32, Uint64, Fp, Boolean Bytes4, Bytes32, Bytes52, Bytes64 SampleBitvector8/64, AttestationSubnets, SyncCommitteeSubnets SampleUint16Vector3, SampleUint64Vector4 SampleBitlist16, SampleBytes32List8, SampleUint32List16 ByteListMiB, HashTreeOpening (empty), SignedAggregatedAttestation Wires `@zeam/xmss` into the zeam_spectests module so ByteListMiB and HashDigest types are reachable from the runner. Known-skipped via explicit skip list (previously silently unsupported): - SampleUnionNone / SampleUnionTypes — zeam's SSZ library has no `union(enum)` path yet; leanSpec introduced SSZUnion in this release. - SampleUint16Vector3 / SampleUint64Vector4 / HashTreeLayer / HashTreeOpening (typical) — blockblaz/ssz.zig's deserialize for `.array` with element size > 1 conflates byte offset with element index, so only out[0], out[2], ... get written and the roundtrip diverges. Upstream bug; filed for separate fix. Result: 230/230 spectests pass, 0 skips in the runner path (40 previously silent SkippedFixture cases now run). * fix: align attestation future slot tolerance with leanSpec leanSpec store.py:320 allows current_slot + 1 for all attestations (gossip and block) as clock disparity tolerance. zeam previously only allowed +1 for block attestations and rejected gossip attestations for future slots. Align with leanSpec's unified behavior. * fix: tolerate pruned source blocks in attestation validation When finalization advances, old blocks are pruned from the protoarray. Aggregations referencing a source checkpoint that was valid when the attestation was created can fail validation if finalization pruned the source block between attestation creation and aggregation publishing. Accept source blocks whose slot <= finalized_slot as valid even when absent from the protoarray — they were canonical and are now below the finalization horizon. Root cause: race between finalization pruning (interval 0 of new slot) and aggregation publishing (interval 2 of same slot), where the aggregation's attestation data references a justified checkpoint that got pruned during the finalization advance. * fix: remove attestation data validation from onGossipAggregatedAttestation Per leanSpec store.py on_gossip_aggregated_attestation: aggregated attestations only verify the aggregated signature proof, they do NOT call validate_attestation on the attestation data. nlean follows the same pattern. The attestation data was already validated when individual gossip attestations arrived. Re-validating in the aggregation path caused UnknownSourceBlock errors when finalization pruned the source block from protoarray between attestation creation and aggregation publishing. This also reverts the "tolerate pruned source blocks" workaround in validateAttestationData since the root cause is now fixed. * fix(forkchoice): gate proto-array rebase on PRUNE_NODE_THRESHOLD Root-cause the attestation-race previously worked around in chain.onGossipAggregatedAttestation and validateAttestationData. Symptom: every time finalization advanced, processFinalizationAdvancement called forkChoice.rebase(...) unconditionally. Rebase drops pre-finalized ancestors from proto-array and remaps attestation-tracker indices. Any in-flight attestation whose source / target / head referenced one of those dropped blocks then failed validateAttestationData's existence check with Unknown{Source,Target,Head}Block — even though the vote was perfectly valid when the validator produced it. 3SF-mini's fast finalization cadence (one advance every few slots) puts the race window on the same order of magnitude as normal gossip propagation, so attestations drop across every finalization boundary under normal load. Sync catch-up, which can advance finalization multiple steps in a single tick, makes it much worse. Fix — lighthouse-style lazy prune: * Introduce constants.PRUNE_NODE_THRESHOLD = 64. Lighthouse uses 256 for mainnet; 3SF-mini's shorter wall-clock per finalization step makes 64 (~1 eth epoch worth of grace) the better trade between race coverage and memory footprint. * Gate the rebase call in processFinalizationAdvancement on the finalized node's index in the proto-array. While that index is below the threshold, skip rebase and leave the pre-finalized prefix addressable so in-flight attestations still resolve. * Expose ForkChoice.getProtoNodeIndex for the gate (mutex-safe wrapper over protoArray.indices). Revert of the prior tolerance workaround: * onGossipAggregatedAttestation now re-validates the attestation data again. The "skip validate" workaround exists only because source blocks could vanish mid-finalization; with the threshold gate keeping them in place, the stricter check is safe again and catches malformed aggregates that used to slip through. Test: * Add regression test driving the mock chain with default onBlock opts (pruneForkchoice = true) through 5 slots. The finalized node's index stays well under the threshold, so pre-finalized ancestors must remain resolvable via getProtoNode after finalization advances. Before this fix, the rebase call would have removed roots[1] and roots[2] (slots < latestFinalized.slot) from the protoArray and the assertion would trip. * Update the existing "Test 9: Attestation too far in future" case to use slot=current+2 — the spec-aligned +1 tolerance change from 09712ac made the prior slot=current+1 trip point pass validation; the test was not updated at the time. The future-slot +1 tolerance fix (09712ac) is orthogonal and preserved — spec alignment with store.py:320. * fix(forkchoice): drop time-based block queue; align with leanSpec zeam was the only client among leanSpec / ream / qlean-mini / ethlambda / nlean that queued inbound blocks on the local clock. Every other implementation processes a block the moment its parent state is known — their only "pending" caches are for orphan blocks (parent unknown, awaiting backfill), never for blocks whose slot just hasn't started yet locally. The queue also kept block body attestations out of fork choice until the next interval tick, delaying head updates by a full interval and duplicating the future-slot rejection that already lives in chain.validateBlock. Changes: * Remove BeamChain.pending_blocks field, its init / deinit hooks, and the append site on the gossip path. * Remove BeamChain.processPendingBlocks (replay helper) and its caller in node.onInterval. * Drop the duplicate future-slot gate in forkchoice.onBlockUnlocked. Gossip-level DoS filtering continues to happen in chain.validateBlock (MAX_FUTURE_SLOT_TOLERANCE = 1, matching leanSpec store.py:320); forkchoice now mirrors leanSpec store.on_block and only requires a known parent plus non-pre-finalized slot. * Remove ForkChoiceError.FutureSlot — no caller returns it. * Update the fork-choice block-tree unit test: the old test drove "reject future slot" against forkchoice.onBlock; that responsibility has moved up to chain.validateBlock, so the test now just ticks the clock and processes the block. validateBlock still rejects block.slot > current_slot + 1 with BlockValidationError.FutureSlot; node.zig's handler for error.FutureSlot still caches those blocks for later retry, so the "block arrived genuinely too early" case is covered exactly as before — just without the redundant inner-layer queue. * fix(forkchoice): add missing leanSpec attestation checks leanSpec `store.py:validate_attestation` enforces two topology / consistency rules that zeam had been skipping: - `data.head.slot >= data.target.slot` — head checkpoint cannot be older than the target (topology check, store.py:304). - `head_block.slot == data.head.slot` — head checkpoint slot must match the actual block slot the checkpoint points at (consistency check, store.py:314). Without these, malformed gossip / aggregated attestations with a stale or mismatched head pointer slipped through zeam's validation while leanSpec and the rest of the ecosystem reject them. Add the two checks with matching `HeadOlderThanTarget` / `HeadCheckpointSlotMismatch` error variants and remove the stale `_ = head_block; // Will be used in future` placeholder now that the node is actually consulted. * chore(chain): drop stale 'earlier revision' comment on aggregated path The block explained why re-validation was restored — context that is meaningful in a PR description but rots as a source comment: future readers have no "earlier revision" to compare against, and the rebase gating it references is documented at the rebase site itself. * fix(chain): address Copilot review on PR #754 Five follow-ups from the review pass: - Close RPC / sync future-slot gap: BeamChain.onBlock now enforces the MAX_FUTURE_SLOT_TOLERANCE (+1) bound itself. Previously only the gossip path funnelled through chain.validateBlock; RPC handlers went straight to onBlock and would happily run STF + fork-choice on blocks arbitrarily far in the future, a cheap DoS vector. - Make the PRUNE_NODE_THRESHOLD regression test actually exercise the gate by calling onBlockFollowup (where processFinalizationAdvancement — and the rebase call it gates — fires). Previously the test only called onBlock and would have passed even without the fix. - Update validateAttestationData doc and `attestation validation - gossip vs block future slot handling` test to describe the unified `current_slot + 1` rule both paths now follow; the test is renamed and now asserts both the accepted current+1 case and the rejected current+2 case on both is_from_block values. - Clarify PRUNE_NODE_THRESHOLD: the threshold is a proto-array node count, not a slot distance. Under heavy forking, node count advances faster than slot distance, so the wall-clock grace can shrink. Updated the constants.zig comment and the rebase call site accordingly. - Log loudly instead of silently skipping the rebase when latestFinalized.root is missing from proto-array. Under normal operation it is always present (getCanonicalViewAndAnalysis resolves it via protoArray.indices); surfacing the invariant violation beats hiding it behind "nothing happened". * forkchoice: align attestation time check with interval-grained leanSpec rule Switch the gossip-attestation time check from slot-grained to interval-grained: data.slot * INTERVALS_PER_SLOT <= store.time + GOSSIP_DISPARITY_INTERVALS A whole-slot tolerance let an adversary pre-publish next-slot aggregates ahead of any honest validator (~800 ms head start at SECONDS_PER_SLOT=4 / INTERVALS_PER_SLOT=5). One interval bounds that head start to NTP drift. Block-included attestations skip the time check; they are trusted under the block's signature + STF validation. The block path itself drops the slot-grained future-slot bound entirely — block admission is gated by parent / signature / STF only. Removed: - constants.MAX_FUTURE_SLOT_TOLERANCE (was reused across attestation validation and validateBlock / onBlock; the block usages had no spec basis to begin with) - chain.validateBlock future-slot rejection - chain.onBlock entry-point future-slot rejection - chain.validateAttestationData block-path slot bound - BlockValidationError.FutureSlot variant - node.zig error.FutureSlot handlers and cacheFutureBlock helper (the cacheFutureBlock path had no replay mechanism after the time-based block queue was removed earlier in this PR) Added: - constants.GOSSIP_DISPARITY_INTERVALS = 1 - chain.validateAttestationData gossip-only interval bound - chain.zig boundary tests covering the gossip / block divergence * spectest: bump ssz.zig pin to pick up array deserialize fix and remove skip workarounds Upstream blockblaz/ssz.zig fixed the fixed-size array deserialize bug in PR #61 (`iterate fixed-size arrays by element, not byte step`), but the fix landed on master after the 0.16 upgrade. Use the GrapeBaBa fork's `backport-array-fix-015x` branch, which cherry-picks the fix on top of 0.15.2. Drop the corresponding skip entries (SampleUint16Vector3, SampleUint64Vector4, HashTreeLayer, HashTreeOpening) from the SSZ runner. Register a TestHashTreeLayer mirror so the dispatch table can roundtrip it. Eight previously-skipped fixture cases (4 fixture files × 2 cases) now run real serialize/deserialize roundtrips instead of silent pass. SSZ Union (SampleUnionNone, SampleUnionTypes — 5 cases) remains skipped pending union(enum) registration, tracked separately. * spectest: implement justifiability and verify_signatures runners Two leanSpec fixture kinds were skipped at parse time because zeam's spectest framework only knew about state_transition / fork_choice / ssz. Add the two missing kinds. Justifiability (33 fixtures, all real): each fixture pins a (slot, finalizedSlot) pair and an expected (delta, isJustifiable) output. The runner calls types.IsJustifiableSlot(finalized, slot) and asserts both the delta arithmetic and the predicate result match. verify_signatures (7 fixtures): the runner is wired through and deserialization is sketched, but every current fixture is generated with leanEnv=test (signature_len ≈ 700 bytes), while zeam runs against the production scheme (SIGSIZE=2536). Loading these into a production SignedBlock is a hard size mismatch, so each case validates the fixture's structural shape and then surfaces a clear skip message naming the scheme gap. When zeam grows a test-mode XMSS path (or leanSpec emits prod-scheme fixtures), the production branch in runCase will pick them up without further wiring. Adds: - FixtureKind variants: justifiability, verify_signatures - runner/justifiability_runner.zig - runner/verify_signatures_runner.zig - lib.zig exports Test counts: 270/270 pass (33 new justifiability + 7 new verify_signatures skipped with explicit reason; previously 230 generated and the 40 here were silently dropped at parse time). * spectest: real verification path for verify_signatures fixtures leanSpec generates verify_signatures fixtures with leanEnv=test (LOG_LIFETIME=8, DIMENSION=4, ~424-byte signatures), which the production scheme (LOG_LIFETIME=32, DIMENSION=46, 2536-byte signatures) cannot parse. Earlier the runner short-circuited every case with a scheme-mismatch skip. Add a parallel test-scheme XMSS verify FFI that always compiles alongside the production path: - rust/hashsig-glue: new test_scheme module exporting hashsig_test_verify_ssz, instantiated against SchemeAbortingTargetSumLifetime8Dim46Base8 (note: leansig type alias name is misleading — the constants inside use DIMENSION=4, matching leanSpec TEST_CONFIG exactly). - pkgs/xmss: extern declaration + xmss.verifySszTest wrapper. Runner now parses the anchor state and signed block with the existing state_transition_runner helpers (made pub for reuse), allocates the proposer signature as a variable-length []u8 (sidestepping the prod SIGSIZE=2536 fixed array), hashes the block body, and dispatches to verifySsz / verifySszTest based on leanEnv. expectException maps to an expected verification failure. 6 of 7 fixtures now run real cryptography: - test_invalid_proposer_signature → verifySszTest rejects - test_proposer_signature, test_single_validator_attestation, test_proposer_and_attester_signatures, test_all_four_validators_attesting, test_multiple_attestation_groups_same_data → verifySszTest accepts The remaining fixture (test_invalid_aggregated_attestation_signature) carries a body attestation; it still skips with a clear note pending a test-scheme leanMultisig wrapper. Reuses the SignedBlock/BeamBlock types directly. Only the proposer signature differs in byte count between the two schemes; the validator public keys (52 bytes), block container, and AggregatedSignatureProof (variable byte list) are scheme-agnostic. * spectest: cover body-attestation verify path in verify_signatures runner The previous commit ran real cryptographic verification on the proposer signature only, and skipped the lone fixture (test_invalid_aggregated_attestation_signature) that carries body attestations. Wire that case through. After verifying the proposer signature, walk each block-included aggregated attestation: parse participants and proofData out of the fixture, load attestation pubkeys from the anchor state, hash the attestation data, and call AggregatedSignatureProof.verify (which dispatches to leanMultisig). leanMultisig's Rust glue is hardcoded to the production scheme; with test-scheme bytes from the fixture the prod path will reject at deserialization. That happens to be the expected outcome here — the fixture asserts the implementation rejects the corrupted aggregated signature — so the path closes cleanly without forking leanMultisig. The runner now treats expectException as "at least one signature rejected" (proposer or any body attestation), and absent-exception as "all signatures verified". The bookkeeping is in a single any_failure flag rather than the previous proposer-only branch. A future leanSpec valid-case fixture with body attestations would need a parallel test-scheme leanMultisig FFI; the limitation is documented in verifyBodyAttestations. 7/7 verify_signatures fixtures now run real verification, 0 skips. * spectest: skip body-attestation verify under leanEnv=test instead of relying on prod-rejects-test coincidence The previous commit covered the body-attestation path by routing the fixture through zeam's production leanMultisig FFI. For the lone fixture that exercises that path (test_invalid_aggregated_attestation_signature), the prod path rejects test-scheme bytes at AggregatedXMSS::deserialize — which happens to match expectException, so the test "passes". That is structural coincidence, not real cryptographic rejection: a future valid-case body-attestation fixture under leanEnv=test would also be rejected by the prod path and the test would falsely report a failure. Rather than carry that latent false-positive, branch explicitly. When leanEnv=test and the block carries any body attestations, surface a clear skip naming the leanMultisig test-scheme FFI gap. The 6 fixtures without body attestations still run real test-scheme XMSS verification through the proposer signature path. Production-scheme fixtures (none today) would continue to exercise the body-attestation path through the prod leanMultisig glue. Genuine coverage of the seventh fixture requires forking leanMultisig to add a parallel test-scheme rec_aggregation (parallel constants, types, cached bytecode, and verify entry point). Tracked separately — not blocking PR #715. * spectest: implement slot_clock and api_endpoint runners slot_clock (25 fixtures, all real): pure-arithmetic runner covering the five operations leanSpec exposes — current_slot, current_interval, total_intervals, from_slot, from_unix_time. Each one is a single arithmetic identity against the embedded clock config, with a clamp to zero for inputs that fall before genesis. Operations are derived locally from SECONDS_PER_SLOT (params) and INTERVALS_PER_SLOT (node constants) so the spec config is sanity-checked against zeam's compile- time constants on every fixture. api_endpoint (12 fixtures, 7 real + 5 skipped): - /lean/v0/health (1) — fixed body parity check. - /lean/v0/admin/aggregator GET (2) — mirrors handleAggregatorAdmin's response shape against initialIsAggregator. - /lean/v0/admin/aggregator POST (4) — mirrors handleAggregatorPost's {is_aggregator, previous} response, including the idempotent enable / disable cases. - /lean/v0/states/finalized, /lean/v0/fork_choice, /lean/v0/checkpoints/justified (5) — surface a clear skip naming the endpoint; covering them needs a constructed BeamChain (anchor state, validators, finalized checkpoint, proto-array nodes) that the spec runner doesn't currently build. The runner is a behavioural-parity test, not a full handler integration: it reproduces each handler's response generation locally and compares against the fixture's expectedStatusCode / expectedContentType / expectedBody. Content-type is matched as a prefix because zeam emits "application/json; charset=utf-8" while leanSpec pins "application/json". 307/307 tests pass; 6 explicit skips total (5 api_endpoint chain-state endpoints + 1 leftover verify_signatures body-attestation case). * spectest: implement networking_codec runner (varint, snappy, gossip topic/msg id, discovery routing) leanSpec emits 135 networking_codec fixtures across 9 codec families, and zeam previously dropped them all at parse time. Add a unified runner that dispatches on codecName and reuses zeam's production codecs where they exist: - varint (14): pure LEB128 encode + length check - gossip_topic (8): network.LeanNetworkTopic.encode - gossip_message_id (8): SHA-256(domain || topic_len_le_u64 || topic || data) truncated to 20 bytes — mirrors libp2p-glue's message_id_fn - log2_distance (8) + xor_distance (6): bitwise on 32-byte node IDs - snappy_block (8) + snappy_frame (7): round-trip via snappyz / snappyframesz. Snappy compression is non-canonical — emitter choices about literal/copy tags or chunk layout differ between implementations — so wire-format interop is verified by decode(leanSpec output) == input AND encode(input)→decode→input, not by byte-equal encoder agreement. Codecs requiring infrastructure zeam doesn't expose yet (RLP for ENR / discv5, protobuf for gossipsub RPC, secp256k1 for peer_id, varint+ssz framing for reqresp) are skipped at runtime with a clear codec-name reason. 75 such skips covering reqresp_codec (21), gossipsub_rpc (26), enr_and_peer_id (14), discv5_messages (15) — these are the natural follow-on PRs. Module wiring: zeam_spectests now imports @zeam/network, snappyz, and snappyframesz so the runner can call zeam's codecs directly. Empty-input snappy_frame is skipped with a separate note: zeam's snappyframesz rejects a stream that contains only the 10-byte magic header (no data chunks); leanSpec's encoder emits exactly that. Tracked separately as a snappyframesz library gap. 442/442 tests pass; 83 total runtime skips. networking_codec contributes 59 real and 76 skipped. * spectest: bump snappyframesz to pick up empty-stream decode fix blockblaz/snappyframesz#8 (closes #7) makes decode accept a identifier-only frame as a valid empty payload. Bump the pin to the post-merge tip and drop the local skip in the snappy_frame runner. The snappy_frame_empty fixture now exercises the same round-trip path as every other snappy fixture: leanSpec's 10-byte "\xff\x06\x00\x00sNaPpY" stream decodes to an empty slice through zeam's snappyframesz, and zeam's encoder→decoder pair round-trips the empty input cleanly. Total runtime skips drop from 83 to 82. * fix(lint): reorder imports in hashsig-glue test_scheme to satisfy rustfmt * chore: bump leanSpec to a5a05f9 (lstar fork rename + 56 commits) leanSpec changes since 9b89651: - Stages 1-6 of #686: large refactor of forks/lstar (decoupled subspecs, ForkProtocol surface, generic Store). - PR #710: renamed devnet fork to lstar everywhere. - 30+ new test-vector PRs across networking, SSZ, fork choice, state transition, sync, poseidon_permutation, gossipsub_handler, discovery_crypto fixture formats. zeam side: - pkgs/spectest/src/fork.zig: rename `devnet` Fork to `lstar` (name=Lstar, path=lstar, symbol=forks.lstar) to match leanSpec PR #710 directory layout. - Generated index (pkgs/spectest/src/generated/) is gitignored; will be regenerated next `zig build spectest:generate`. * spectest: port to zig 0.16 (post-main-merge) The PR branch's full spectest infrastructure (runners + lib glue) was written against zig 0.15. Main's #784 upgraded zeam to zig 0.16 and silently broke the spectest because main only had a minimal runner set (fork_choice + state_transition + ssz) — the regression wasn't visible until this branch's full suite was merged. Changes: build.zig.zon: - ssz: switch back to blockblaz/ssz.zig (404762835d), the 0.16-compatible upstream. The GrapeBaBa cherry-pick fork pinned in earlier PR commits has `array_list.Aligned(u8,null).writer` API that 0.16 removed. The array-deserialize fix the fork existed for is now on blockblaz/ssz.zig master. - snappyframesz: switch back to df262c69. The newer cb0a08e4 (empty- stream decode fix) uses `meta.intToEnum` which 0.16 removed; the upstream fix branch has not been ported to 0.16. The networking_codec test_snappy_frame_empty fixture is the one that needs it — it stays failing until upstream snappyframesz is ported. pkgs/spectest/src/lib.zig: - Replace `std.testing.refAllDecls` with a local `refAllDeclsRecursive` (the std variant was removed in 0.16). With the shallow version, only the wrapper test was discovered ("1/1 generated fixtures...OK" with zero inner tests actually run); the recursive walk now expands the full ~500-test tree. pkgs/spectest/src/generator.zig: - Drop the `else => return err` prong from the emitted `resolveFixturesRoot` switch — `ResolveError` is a single-variant error set, and 0.16 rejects unreachable else prongs. pkgs/spectest/src/runner/*.zig: - `std.fs.Dir` → `std.Io.Dir` (the directory type moved in 0.16). - `dir.readFileAlloc(allocator, path, max)` → `dir.readFileAlloc(std.testing.io, path, allocator, .limited(max))` — the 0.16 signature takes an explicit `io` first, swaps `(path, allocator)` order, and uses `Io.Limit` instead of usize. - `std.json.ObjectMap.init(allocator)` → `var x: ObjectMap = .empty;` + per-call `put(allocator, ...)` — ObjectMap is the unmanaged map in 0.16 and `init` now takes `(gpa, key_list, value_list)`. Updated `putString` helper to thread the allocator through. - `std.ArrayList(T){}` → `: std.ArrayList(T) = .empty;` — the 0.16 unmanaged ArrayList rejects the empty-struct literal (it has multiple fields now). pkgs/spectest/src/skip.zig: - Replace `std.once` (removed in 0.16) with a hand-rolled CAS-based one-shot guard using `std.atomic.Value(bool)`. - Replace `std.process.getEnvVarOwned` (removed in 0.16) with `std.testing.environ.getPosix(...)` — the test runner initialises `std.testing.environ` and that's the only env-read surface we need here (skip.zig is consulted exclusively from spectest test code). Test results: `zig build spectest` now compiles and runs the full suite. 490 / 513 pass. The 23 failures break down as: - 11 new ssz fixtures (test_decode_failure_smoke / test_decode_rejections / test_merkleization_boundaries from leanSpec PRs #646/#649) — runner doesn't yet recognise bitlist / bitvector / merkleization-boundary fixture variants; ports flagged as UnsupportedFixture. - 6 new fork-choice attestation validation fixtures (test_safe_target, test_gossip_attestation_validation / test_gossip_aggregated_*, test_equivocation from leanSpec PRs #682/#680/#644) — fixtures expect specific reject behaviour at the disparity boundary that the runner needs to align with. - 1 new state_transition test_slot_monotonicity case (#643). - 1 new networking_codec test_snappy_frame_empty case (needs the not-yet-0.16-ported empty-stream snappy fix). - 4 misc fixtures whose runner mapping changed shape post-leanSpec refactor (Stage 1-6 of #686). Net: the merge + leanSpec bump is mechanically clean and the spectest runner is back in business under 0.16. The 23 failures are content-level deltas from the leanSpec update, not regressions introduced by this merge. * spectest: align with leanSpec a5a05f9; full 513/513 pass Drive the remaining 23 failures from the leanSpec bump down to 0. The failures broke into five buckets — each fixed below. 1. SSZ runner: support new boundary / decode-rejection fixture types (pkgs/spectest/src/runner/ssz_runner.zig) - Register BoundaryBitvector{1,7,9,255,256,257}, BoundaryBitlist256, BoundaryUint64List32 (leanSpec PR #646 merkleization boundaries). - Register DecodeBitvector16, DecodeBitlist8, SmokeBitlist8 (PRs #649 decode_rejections / decode_failure_smoke). - Read `rawBytes` in addition to `serialized` and honour the `expectException` field — when present, the runner now asserts the malformed input is rejected (instead of treating a successful roundtrip as a pass). 2. Forkchoice gossip-attestation disparity bound (PR #682) (pkgs/spectest/src/runner/fork_choice_runner.zig) - `validateAttestationDataForGossip` used the old "1 whole slot" tolerance against `getCurrentSlot()`. The spec now compares `data.slot * INTERVALS_PER_SLOT` against `store.time + GOSSIP_DISPARITY_INTERVALS`, exercised by the four `*_just_beyond_disparity_boundary_rejected` / `*_one_full_slot_in_future_rejected` fixtures. 3. Forkchoice update_safe_target ignores block-pool attestations (PR #680) (pkgs/node/src/forkchoice.zig) - `onAttestation(is_from_block=true)` used to mirror the new `latestKnown` into `latestNew`, smuggling block-pool weight back into `update_safe_target` (which reads from the "new" tracker per PR #680). Drop the mirror so the pools stay strictly separate; the `test_safe_target_ignores_known_pool_at_interval_3` fixture pins the contract. 4. Attestation-tracker as ground truth for `attestationChecks` (PR #690) (pkgs/spectest/src/runner/fork_choice_runner.zig) - The check used to iterate `latest_*_aggregated_payloads` (`std.AutoHashMap`) and pick the entry with max slot. With same-slot equivocation the result depends on hashmap iteration order — the spec relies on Python dict insertion order. Switch to reading `tracker.latestKnown` / `tracker.latestNew`: zeam's tracker is populated by `onAttestation` with a strict-`>` comparison, so it already encodes "first attestation at a given slot wins" — exactly the spec semantics under equivocation. 5. State-transition slots-only fixture support (PR #643) (pkgs/spectest/src/runner/state_transition_runner.zig) - `test_process_slots_target_equal_to_state_slot_rejected` ships `pre` + `expectException` with an empty `blocks` array. Previously the runner rejected this shape outright. Now: when no blocks are supplied and an `expectException` is set, call `pre_state.process_slots(state.slot)` and verify it returns an error (zeam surfaces this as `InvalidPreState`; the spec name is `AssertionError` "Target slot must be in the future"). 6. Networking codec: empty snappy frame (leanSpec `test_snappy_frame_empty`, blocked on blockblaz/snappyframesz#8) (build.zig.zon, pkgs/third_party/snappyframesz/...) - Upstream `v0.16.0` branch lacks the empty-stream decode fix; the branch that has the fix (`main` at cb0a08e) is still on a 0.15.2 commit base. Vendor the v0.16.0 source under `pkgs/third_party/snappyframesz` and apply the 2-line `if (!saw_data_chunk)` → `if (!saw_stream_identifier and !saw_data_chunk)` guard in both `decodeFromReader` and `decodeFramed`. Switch the `snappyframesz` dependency to `.path = "pkgs/third_party/..."`. Build / runner plumbing fixes that surfaced along the way: - pkgs/spectest/src/generator.zig: `resolveFixturesRoot` now returns `[:0]u8` instead of `[]u8`. `realPathFileAlloc` in 0.16 returns a null-terminated slice; demoting to `[]u8` shortens the slice length by one and crashes the DebugAllocator with "Allocation size N bytes does not match free size N-1" at the `defer test_allocator.free(...)` site. - build.zig: add `spectests.step.dependOn(&run_spectest_generate.step)`. Without it, the test-binary compile and the generator race when zig fans out parallel build steps, intermittently producing "file contents changed during update" or `FileNotFound` on generated/index.zig. The new edge makes `zig build spectest` idempotent. Pre-existing merge fallout cleaned up while here (chain.zig): - BeamChain.init was missing the `pending_blocks` field initialisation that main's #788 pending-queue refactor added — fill it with `.empty`. - HEAD's `processFinalizationAdvancement: below PRUNE_NODE_THRESHOLD keeps pre-finalized ancestors` test (preserved from the merge) used the zig 0.15 `Dir.realpathAlloc` API and the pre-refactor `std.StringHashMap(PeerInfo)` for ConnectedPeers. Port both call sites to the current API (`Dir.realPathFileAlloc(io, ".", gpa)` and `network.ConnectedPeers`). Verification: - `zig build spectest` — All 513 tests passed (idempotent across re-invocations). - `zig build test` — All 14 test groups pass (251 unit tests). * forkchoice: drop lazy proto-array prune (PR #754 extra scope rollback) PR #754 introduced a `PRUNE_NODE_THRESHOLD = 64` gate in `chain.processFinalizationAdvancement` that skipped the post-finalization `forkChoice.rebase` call until at least 64 pre-finalized nodes sat before the finalized anchor in proto-array. The intent was to widen the grace window for in-flight attestations whose `source` / `target` / `head` still referenced an ancestor about to be pruned, where the eager rebase would otherwise surface `Unknown{Source,Target,Head}Block` from `validateAttestationData`. On reflection that mitigation is heavier than the symptom it addresses under current network conditions, and 3SF-mini already tolerates the attestation-bounce. Roll the threshold gate back so finalization advance unconditionally rebases proto-array — matches the behaviour zeam shipped before PR #754 and keeps proto-array memory bounded by finalization rather than by a node-count constant. Changes: - pkgs/node/src/constants.zig: remove `PRUNE_NODE_THRESHOLD`. - pkgs/node/src/chain.zig: * `processFinalizationAdvancement` step 5 collapses back to the unconditional `if (pruneForkchoice) try self.forkChoice.rebase(latestFinalized.root, &canonical_view);` shape (incl. dropping the `getProtoNodeIndex` lookup + invariant- violation log path that only existed to gate the threshold). * Delete the `processFinalizationAdvancement: below PRUNE_NODE_THRESHOLD keeps pre-finalized ancestors` regression test — it was pinning the now-removed contract. PR #754's other scope items remain in place: - GOSSIP_DISPARITY_INTERVALS attestation rule (spec alignment, leanSpec PR #682). - Split attestation validation (gossip interval-bound, block +1 slot). The two further scope items (lazy proto-array prune, dropping zeam's block-level future-slot rejection / time-based block queue) are intentionally NOT brought across. Verification: - `zig build spectest` — All 513 tests passed. - `zig build test` — 14/14 test groups pass. * spectest: drop duplicate `zig_snappy` entry from vendored snappyframesz The vendored copy of `pkgs/third_party/snappyframesz/build.zig.zon` carried two identical dependency entries — `.snappyz` and `.zig_snappy` — with the same URL and hash. The library's `build.zig` only references the first one (`b.dependency("snappyz", ...)`); the second is dead weight inherited from the upstream `v0.16.0` tip. Drop it here so the vendored copy doesn't propagate the noise. The same change is also stacked on the upstream PR that promotes `v0.16.0` to `main` (blockblaz/snappyframesz#10) so the next sync won't reintroduce it. * deps: pin snappyframesz to upstream main; drop vendored copy Upstream blockblaz/snappyframesz#10 merged the zig 0.16 upgrade + the empty-stream decode fix into main. Pin e95759d9 and delete pkgs/third_party/snappyframesz/. * deps: actually pin snappyframesz URL (build.zig.zon) Fix-up for 6a1a813 — that commit staged the `git rm` of the vendored copy but the `build.zig.zon` edit was never staged (the chained `git add build.zig.zon pkgs/third_party` aborted on the now-missing `pkgs/third_party` path before reaching the first argument). CI on 6a1a813 failed with `unable to open '.../pkgs/third_party/snappyframesz': FileNotFound` because the manifest still pointed `.snappyframesz.path` there. Now the manifest carries the `git+https://github.com/blockblaz/snappyframesz#e95759d9...` pin that 6a1a813's commit message claimed. * ci: rename `--fork=devnet` to `--fork=Lstar` in fixture-gen step leanSpec PR #710 renamed the `devnet` fork to `lstar`; the CI step `Generate LeanSpec fixtures` still passed `--fork=devnet` to `uv run fill`, so on the bumped submodule the command exits with `Unsupported fork for consensus layer: devnet` (exit 4) and the test job is cancelled before zig tests run. Switch to `Lstar` (the case that `fill` expects, matching the rest of this branch's leanSpec interactions). --------- Co-authored-by: Anshal Shukla <53994948+anshalshukla@users.noreply.github.com> Co-authored-by: Parthasarathy Ramanujam <1627026+ch4r10t33r@users.noreply.github.com>
Summary
Four fixes to align attestation and block handling with leanSpec.
1. Switch gossip-attestation future-slot bound to interval-grained
zeam's gossip-attestation time check was slot-grained —
data.slot <= current_slot + 1. A whole-slot tolerance lets an adversary pre-publish next-slot aggregates ahead of any honest validator (~800 ms head start atSECONDS_PER_SLOT=4/INTERVALS_PER_SLOT=5).This PR adopts the interval-grained rule, expressed against the forkchoice store's interval clock:
with a new
GOSSIP_DISPARITY_INTERVALS = 1constant. One interval bounds the head start to NTP drift. Block-included attestations skip the time check entirely; they are trusted under the block's signature + STF validation.2. Lazy proto-array prune on finalization (root-cause fix)
chain.processFinalizationAdvancementcalledforkChoice.rebase(...)unconditionally on every finalization advance. Rebase removes pre-finalized ancestors from proto-array and remaps attestation-tracker indices, so any in-flight attestation whose source / target / head still referenced one of those ancestors failed validation withUnknown{Source,Target,Head}Block— even though the vote was valid at sign time.3SF-mini's fast finalization cadence puts the race window on the same order of magnitude as normal gossip propagation, so attestations drop across every finalization boundary under normal load. Sync catch-up (which can advance finalization multiple steps per tick) makes it dramatically worse, multiplying
BlocksByRootre-fetch storms and delaying justification.Fix: gate the
rebasecall on the finalized node's position in proto-array. While that index is belowconstants.PRUNE_NODE_THRESHOLD = 64(≈256 s grace atSECONDS_PER_SLOT=4), skip the rebuild and leave the pre-finalized prefix in place so in-flight attestations still resolve. Once the threshold is crossed the full rebase fires and amortises the vec rebuild + tracker remap across many finalizations.Because the root cause is fixed at the prune site, the stricter re-validation in
onGossipAggregatedAttestationis also restored — earlier revisions of this PR had skipped it as a tolerance workaround; with the grace window in place it's safe again and catches malformed aggregates that would otherwise slip through.3. Drop zeam-specific block-level future-slot rejection and time-based block queue
Two pieces were entangled and are removed together:
Block-level future-slot rejection —
validateBlock(and the entry-point checkonBlockadded in an earlier revision of this PR) rejectedblock.slot > current_slot + MAX_FUTURE_SLOT_TOLERANCEasBlockValidationError.FutureSlot. TheMAX_FUTURE_SLOT_TOLERANCEconstant was originally introduced for attestation validation and later reused at the block layer; that reuse never had a corresponding spec rule. Block admission is now gated by parent / signature / STF only.Time-based block queue — zeam previously queued future-slot blocks on a local wall clock and replayed them on
onInterval. The replay path was already removed in an earlier revision of this PR; what remained (cacheFutureBlock, the twoerror.FutureSlothandlers innode.zig) was a graveyard for blocks that would never come back. Both are gone now. Blocks within the parent / signature / STF gate process inline; anything beyond is cached innode.zigfor later retry, identical to the rest of the ecosystem.The
MAX_FUTURE_SLOT_TOLERANCEconstant andBlockValidationError.FutureSlotvariant are removed.4. Fill in missing leanSpec attestation topology / consistency checks
validateAttestationDatawas missing two rules from leanSpecvalidate_attestation:data.head.slot >= data.target.slot. The head checkpoint must not be older than the target; without it, malformed attestations with a stale head pointer pass through zeam.head_block.slot == data.head.slot. The checkpoint slot must match the slot of the block it points at; zeam used to look uphead_blockand immediately drop it with_ = head_block; // Will be used in future validations, so a mismatched head slot silently passed.Added as
AttestationValidationError.HeadOlderThanTargetandHeadCheckpointSlotMismatch. Caught by the leanSpec fork-choice fixturestest_attestation_head_older_than_target_rejectedandtest_aggregated_attestation_head_slot_mismatch_rejected.