Skip to content

fix: align attestation handling with leanSpec#754

Open
GrapeBaBa wants to merge 13 commits into
mainfrom
fix/align-attestation-slot-tolerance-with-leanspec
Open

fix: align attestation handling with leanSpec#754
GrapeBaBa wants to merge 13 commits into
mainfrom
fix/align-attestation-slot-tolerance-with-leanspec

Conversation

@GrapeBaBa
Copy link
Copy Markdown
Member

@GrapeBaBa GrapeBaBa commented Apr 17, 2026

Summary

Four fixes to align attestation and block handling with leanSpec.

1. Switch gossip-attestation future-slot bound to interval-grained

zeam's gossip-attestation time check was slot-grained — data.slot <= current_slot + 1. A whole-slot tolerance lets an adversary pre-publish next-slot aggregates ahead of any honest validator (~800 ms head start at SECONDS_PER_SLOT=4 / INTERVALS_PER_SLOT=5).

This PR adopts the interval-grained rule, expressed against the forkchoice store's interval clock:

data.slot * INTERVALS_PER_SLOT <= store.time + GOSSIP_DISPARITY_INTERVALS

with a new GOSSIP_DISPARITY_INTERVALS = 1 constant. One interval bounds the head start to NTP drift. Block-included attestations skip the time check entirely; they are trusted under the block's signature + STF validation.

2. Lazy proto-array prune on finalization (root-cause fix)

chain.processFinalizationAdvancement called forkChoice.rebase(...) unconditionally on every finalization advance. Rebase removes pre-finalized ancestors from proto-array and remaps attestation-tracker indices, so any in-flight attestation whose source / target / head still referenced one of those ancestors failed validation with Unknown{Source,Target,Head}Block — even though the vote was valid at sign time.

3SF-mini's fast finalization cadence puts the race window on the same order of magnitude as normal gossip propagation, so attestations drop across every finalization boundary under normal load. Sync catch-up (which can advance finalization multiple steps per tick) makes it dramatically worse, multiplying BlocksByRoot re-fetch storms and delaying justification.

Fix: gate the rebase call on the finalized node's position in proto-array. While that index is below constants.PRUNE_NODE_THRESHOLD = 64 (≈256 s grace at SECONDS_PER_SLOT=4), skip the rebuild and leave the pre-finalized prefix in place so in-flight attestations still resolve. Once the threshold is crossed the full rebase fires and amortises the vec rebuild + tracker remap across many finalizations.

Because the root cause is fixed at the prune site, the stricter re-validation in onGossipAggregatedAttestation is also restored — earlier revisions of this PR had skipped it as a tolerance workaround; with the grace window in place it's safe again and catches malformed aggregates that would otherwise slip through.

3. Drop zeam-specific block-level future-slot rejection and time-based block queue

Two pieces were entangled and are removed together:

Block-level future-slot rejectionvalidateBlock (and the entry-point check onBlock added in an earlier revision of this PR) rejected block.slot > current_slot + MAX_FUTURE_SLOT_TOLERANCE as BlockValidationError.FutureSlot. The MAX_FUTURE_SLOT_TOLERANCE constant was originally introduced for attestation validation and later reused at the block layer; that reuse never had a corresponding spec rule. Block admission is now gated by parent / signature / STF only.

Time-based block queue — zeam previously queued future-slot blocks on a local wall clock and replayed them on onInterval. The replay path was already removed in an earlier revision of this PR; what remained (cacheFutureBlock, the two error.FutureSlot handlers in node.zig) was a graveyard for blocks that would never come back. Both are gone now. Blocks within the parent / signature / STF gate process inline; anything beyond is cached in node.zig for later retry, identical to the rest of the ecosystem.

The MAX_FUTURE_SLOT_TOLERANCE constant and BlockValidationError.FutureSlot variant are removed.

4. Fill in missing leanSpec attestation topology / consistency checks

validateAttestationData was missing two rules from leanSpec validate_attestation:

  • Topologydata.head.slot >= data.target.slot. The head checkpoint must not be older than the target; without it, malformed attestations with a stale head pointer pass through zeam.
  • Consistencyhead_block.slot == data.head.slot. The checkpoint slot must match the slot of the block it points at; zeam used to look up head_block and immediately drop it with _ = head_block; // Will be used in future validations, so a mismatched head slot silently passed.

Added as AttestationValidationError.HeadOlderThanTarget and HeadCheckpointSlotMismatch. Caught by the leanSpec fork-choice fixtures test_attestation_head_older_than_target_rejected and test_aggregated_attestation_head_slot_mismatch_rejected.

Copilot AI review requested due to automatic review settings April 17, 2026 10:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns attestation “future slot” validation tolerance with leanSpec by allowing a +1 slot clock-disparity margin for all attestations (gossip and block).

Changes:

  • Unifies attestation future-slot validation to allow current_slot + MAX_FUTURE_SLOT_TOLERANCE regardless of attestation origin.
  • Updates the “gossip vs block future slot handling” test assertions to reflect the unified tolerance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines +1421 to +1424
// Per leanSpec: allow current_slot + 1 for clock disparity tolerance,
// regardless of whether the attestation is from gossip or a block.
const current_slot = self.forkChoice.getCurrentSlot();
const max_allowed_slot = if (is_from_block)
current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE // Block attestations: allow +1
else
current_slot; // Gossip attestations: no future slots allowed
const max_allowed_slot = current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE;
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function doc comment above validateAttestationData still describes different future-slot tolerances for gossip vs block attestations, but the implementation now applies the same current_slot + MAX_FUTURE_SLOT_TOLERANCE rule regardless of is_from_block. Update the doc comment to match the new unified behavior (and keep is_from_block description focused on logging/metrics if that’s its remaining purpose).

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines 2312 to 2315
// Per leanSpec store.py:320: assert data.slot <= current_slot + Slot(1)
// Both gossip and block attestations allow current_slot + 1
try beam_chain.validateAttestationData(next_slot_attestation.message, false);
try beam_chain.validateAttestationData(next_slot_attestation.message, true);
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new current_slot + 1 tolerance applied to gossip attestations, the earlier test attestation validation - comprehensive still expects AttestationTooFarInFuture for an attestation at current_slot + 1 (slot=3 when current is 2). That expectation should be updated (and the failing case shifted to current_slot + 2) to keep the suite consistent with the new validation rule.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines +2312 to +2313
// Per leanSpec store.py:320: assert data.slot <= current_slot + Slot(1)
// Both gossip and block attestations allow current_slot + 1
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test’s name and header comment still say gossip and block attestations have different future-slot tolerances, but the assertions now verify unified behavior. Rename the test and update the top-of-test comments to avoid misleading future maintainers.

Suggested change
// Per leanSpec store.py:320: assert data.slot <= current_slot + Slot(1)
// Both gossip and block attestations allow current_slot + 1
// Per leanSpec store.py:320: attestation data is valid up to current_slot + Slot(1).
// This test verifies the unified behavior for both validation modes.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 17, 2026 14:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines +1460 to +1462
// Per leanSpec: on_gossip_aggregated_attestation does NOT call validate_attestation.
// Attestation data was already validated when individual gossip attestations arrived.
// Re-validating here would fail after finalization prunes the source block from protoarray.
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says gossip aggregated attestations are no longer validated, but the onGossip() handler for .aggregation still calls validateAttestationData(signed_aggregation.data, false) before invoking this function (chain.zig around line ~776). That means aggregations can still fail with UnknownSourceBlock after finalization pruning, so the intended behavior in the PR description isn’t actually achieved. Either remove/relax that upstream validation for aggregations, or update validateAttestationData to tolerate pruned source blocks (e.g., allow missing source root when data.source.slot <= finalized_slot) so aggregation processing matches leanSpec and the PR’s stated fix.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines 1460 to 1465
// Per leanSpec: on_gossip_aggregated_attestation does NOT call validate_attestation.
// Attestation data was already validated when individual gossip attestations arrived.
// Re-validating here would fail after finalization prunes the source block from protoarray.

try self.verifyAggregatedAttestation(signedAggregation);

Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s no test covering the second behavior described in the PR (aggregations remaining valid when the source block has been pruned after finalization). Given this change alters aggregation processing rules, please add a unit/integration test that simulates finalization advancing/pruning the protoarray and then ensures a gossip aggregated attestation referencing a pruned-but-finalized source does not fail with UnknownSourceBlock.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 20, 2026 15:53
@GrapeBaBa GrapeBaBa force-pushed the fix/align-attestation-slot-tolerance-with-leanspec branch from 7edeb6a to 963f6e0 Compare April 20, 2026 15:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkgs/node/src/chain.zig
try self.forkChoice.rebase(latestFinalized.root, &canonical_view);
}
// else: threshold not reached; keep pre-finalized ancestors
// in proto-array so in-flight attestations still resolve.
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getProtoNodeIndex(latestFinalized.root) should be non-null here because getCanonicalViewAndAnalysis already requires latestFinalized.root to exist in protoArray.indices. Silently skipping the rebase when it returns null would hide an invariant violation and can lead to unbounded proto-array growth. Consider turning this into an explicit error/log (and/or falling back to rebase/returning an error) if the index lookup fails.

Suggested change
// in proto-array so in-flight attestations still resolve.
// in proto-array so in-flight attestations still resolve.
} else {
std.debug.panic(
"forkchoice invariant violated: finalized root missing from proto-array during rebase: {}",
.{latestFinalized.root},
);

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines 1539 to 1548
// Validate attestation data (same rules as individual gossip attestations).
// The earlier revision of this function skipped re-validation because the
// source block could vanish from proto-array when finalization advanced
// between publish and receipt. That race is now handled at the root —
// processFinalizationAdvancement gates the rebase call on
// PRUNE_NODE_THRESHOLD, keeping the source / target / head blocks
// addressable for the full grace window — so the stricter check is
// safe again.
try self.validateAttestationData(signedAggregation.data, false);

Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onGossip already calls validateAttestationData(signed_aggregation.data, false) immediately before calling onGossipAggregatedAttestation. Re-validating here duplicates the proto-array lookups and log spam without adding safety (the BeamNode mutex already serializes onGossip vs onInterval). Consider removing this second validation (or refactoring so only one layer validates and is responsible for returning missing_attestation_roots).

Suggested change
// Validate attestation data (same rules as individual gossip attestations).
// The earlier revision of this function skipped re-validation because the
// source block could vanish from proto-array when finalization advanced
// between publish and receipt. That race is now handled at the root —
// processFinalizationAdvancement gates the rebase call on
// PRUNE_NODE_THRESHOLD, keeping the source / target / head blocks
// addressable for the full grace window — so the stricter check is
// safe again.
try self.validateAttestationData(signedAggregation.data, false);
// Validation is done upstream in onGossip before this function is called.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/constants.zig Outdated
Comment on lines +37 to +39
// for mainnet (one epoch of slots); 3SF-mini's shorter finalization window
// makes 64 (≈1 eth epoch of wall clock) the better trade between grace
// coverage and memory footprint.
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment claiming “Lighthouse uses 256 for mainnet (one epoch of slots)” appears inaccurate (Ethereum mainnet epoch = 32 slots). Please correct the reference (either the slot count, or clarify what 256 represents) so readers don’t infer an incorrect epoch length from this constant’s docs.

Suggested change
// for mainnet (one epoch of slots); 3SF-mini's shorter finalization window
// makes 64 (≈1 eth epoch of wall clock) the better trade between grace
// coverage and memory footprint.
// slots for mainnet as a pruning threshold; this is not the Ethereum epoch
// length (mainnet epochs are 32 slots). 3SF-mini's shorter finalization
// window makes 64 (≈1 eth epoch of wall clock) the better trade between
// grace coverage and memory footprint.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines +1300 to +1310
// 5 Rebase forkchoice — lazy prune with node-count threshold.
//
// Eager rebase drops pre-finalized ancestors from proto-array and
// remaps attestation-tracker indices. In-flight attestations whose
// source/target/head still points at one of those ancestors then
// fail the existence checks in validateAttestationData with
// Unknown{Source,Target,Head}Block, and the node burns bandwidth
// re-fetching blocks that will never come back. The grace window
// must outlast the worst-case gossip delay plus at least one
// finalization tick; constants.PRUNE_NODE_THRESHOLD = 64 gives
// roughly 1 eth epoch of wall clock, comfortably beyond both.
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says the fix is to “accept source blocks whose slot <= finalized_slot as valid even when absent from the protoarray.” The implementation here instead delays forkChoice.rebase via PRUNE_NODE_THRESHOLD, but validateAttestationData still hard-fails when a referenced root is absent. Please either update the PR description to match the implemented approach, or implement the described <= finalized_slot exception so behavior matches the summary.

Copilot uses AI. Check for mistakes.
@GrapeBaBa GrapeBaBa force-pushed the fix/align-attestation-slot-tolerance-with-leanspec branch from 963f6e0 to 827a0ef Compare April 21, 2026 03:27
Copilot AI review requested due to automatic review settings April 21, 2026 03:36
@GrapeBaBa GrapeBaBa force-pushed the fix/align-attestation-slot-tolerance-with-leanspec branch from 827a0ef to c2b3e06 Compare April 21, 2026 03:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkgs/node/src/chain.zig
Comment on lines +1300 to +1318
// 5 Rebase forkchoice — lazy prune with node-count threshold.
//
// Eager rebase drops pre-finalized ancestors from proto-array and
// remaps attestation-tracker indices. In-flight attestations whose
// source/target/head still points at one of those ancestors then
// fail the existence checks in validateAttestationData with
// Unknown{Source,Target,Head}Block, and the node burns bandwidth
// re-fetching blocks that will never come back. The grace window
// must outlast the worst-case gossip delay plus at least one
// finalization tick; constants.PRUNE_NODE_THRESHOLD = 64 slots
// (≈256 s at SECONDS_PER_SLOT=4) is comfortably beyond both.
if (pruneForkchoice) {
if (self.forkChoice.getProtoNodeIndex(latestFinalized.root)) |finalized_idx| {
if (finalized_idx >= constants.PRUNE_NODE_THRESHOLD) {
try self.forkChoice.rebase(latestFinalized.root, &canonical_view);
}
// else: threshold not reached; keep pre-finalized ancestors
// in proto-array so in-flight attestations still resolve.
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says pruned source blocks should be accepted when their slot <= finalized_slot even if absent from proto-array, but the change here instead delays forkchoice.rebase via a node-count threshold. If the intended behavior is to accept attestations referencing already-finalized (and potentially pruned) source/target/head, that logic is not implemented in validateAttestationData. Please either adjust the PR description to match the threshold-based approach, or implement the finalized-slot tolerance in validation.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines +2392 to +2420
@@ -2390,35 +2411,13 @@ test "attestation validation - gossip vs block future slot handling" {
.signature = ZERO_SIGBYTES,
};

// Gossip attestations: should FAIL for next slot (current + 1)
// Gossip attestations: should FAIL for slot current + 2
// Per spec store.py:177: assert attestation.slot <= time_slots
try std.testing.expectError(error.AttestationTooFarInFuture, beam_chain.validateAttestationData(next_slot_attestation.message, false));

// Block attestations: should PASS for next slot (current + 1)
// Block attestations: should FAIL for slot current + 2
// Per spec store.py:140: assert attestation.slot <= Slot(current_slot + Slot(1))
try beam_chain.validateAttestationData(next_slot_attestation.message, true);
const too_far_attestation: types.SignedAttestation = .{
.validator_id = 0,
.message = .{
.slot = 3, // Too far in future
.head = types.Checkpoint{
.root = mock_chain.blockRoots[1],
.slot = 1,
},
.source = types.Checkpoint{
.root = mock_chain.blockRoots[0],
.slot = 0,
},
.target = types.Checkpoint{
.root = mock_chain.blockRoots[1],
.slot = 1,
},
},
.signature = ZERO_SIGBYTES,
};
// Both should fail for slot 3 when current is slot 1
try std.testing.expectError(error.AttestationTooFarInFuture, beam_chain.validateAttestationData(too_far_attestation.message, false));
try std.testing.expectError(error.AttestationTooFarInFuture, beam_chain.validateAttestationData(too_far_attestation.message, true));
try std.testing.expectError(error.AttestationTooFarInFuture, beam_chain.validateAttestationData(next_slot_attestation.message, true));
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name/comments still describe “gossip: <= current_slot; block: <= current_slot + 1”, but the code now allows +1 for both. Also, this test only checks that current+2 fails; it no longer asserts that current+1 succeeds for gossip (the behavior this PR changes). Update the assertions and comments so the test actually covers the new tolerance rule.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig
// must outlast the worst-case gossip delay plus at least one
// finalization tick; constants.PRUNE_NODE_THRESHOLD = 64 slots
// (≈256 s at SECONDS_PER_SLOT=4) is comfortably beyond both.
if (pruneForkchoice) {
Copy link
Copy Markdown
Member

@g11tech g11tech Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would rebase drop pre finalized, we only rebase with latest finalized?

ahh ok got the context from the comment, never-mind

Comment thread pkgs/node/src/chain.zig Outdated
current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE // Block attestations: allow +1
else
current_slot; // Gossip attestations: no future slots allowed
const max_allowed_slot = current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE;
Copy link
Copy Markdown
Member

@g11tech g11tech Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this is possible, block is processed only if block slot <= current slot and the attestation's in it can't be of a slot future than the block slot

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateBlock already allows block.slot <= current_slot + MAX_FUTURE_SLOT_TOLERANCE (+1), so even for block-embedded attestations the effective upper bound on att.slot is current_slot + 1 — the +1 tolerance here is needed on the block path too.

As an aggregator, this call site handles is_from_block=false — the gossip path, which is independent of any block arrival. The race is clock skew, not block processing:

  • A remote validator signs a gossip attestation at their local slot N (interval 1) with data.slot = N.
  • It propagates to us. If our clock trails the validator's (normal in any distributed system), we can still be at slot N-1 when the message lands.
  • validateAttestationData(..., false) then sees att.slot=N > current_slot=N-1. Without +1 tolerance, we'd drop a perfectly honest vote as "too far in the future".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also removed the previous pending block processing

leanSpec store.py:320 allows current_slot + 1 for all attestations
(gossip and block) as clock disparity tolerance. zeam previously
only allowed +1 for block attestations and rejected gossip attestations
for future slots. Align with leanSpec's unified behavior.
When finalization advances, old blocks are pruned from the protoarray.
Aggregations referencing a source checkpoint that was valid when the
attestation was created can fail validation if finalization pruned the
source block between attestation creation and aggregation publishing.

Accept source blocks whose slot <= finalized_slot as valid even when
absent from the protoarray — they were canonical and are now below
the finalization horizon.

Root cause: race between finalization pruning (interval 0 of new slot)
and aggregation publishing (interval 2 of same slot), where the
aggregation's attestation data references a justified checkpoint that
got pruned during the finalization advance.
…ation

Per leanSpec store.py on_gossip_aggregated_attestation: aggregated
attestations only verify the aggregated signature proof, they do NOT
call validate_attestation on the attestation data. nlean follows the
same pattern.

The attestation data was already validated when individual gossip
attestations arrived. Re-validating in the aggregation path caused
UnknownSourceBlock errors when finalization pruned the source block
from protoarray between attestation creation and aggregation publishing.

This also reverts the "tolerate pruned source blocks" workaround in
validateAttestationData since the root cause is now fixed.
Root-cause the attestation-race previously worked around in
chain.onGossipAggregatedAttestation and validateAttestationData.

Symptom: every time finalization advanced,
processFinalizationAdvancement called forkChoice.rebase(...)
unconditionally. Rebase drops pre-finalized ancestors from
proto-array and remaps attestation-tracker indices. Any in-flight
attestation whose source / target / head referenced one of those
dropped blocks then failed validateAttestationData's existence
check with Unknown{Source,Target,Head}Block — even though the
vote was perfectly valid when the validator produced it.

3SF-mini's fast finalization cadence (one advance every few slots)
puts the race window on the same order of magnitude as normal
gossip propagation, so attestations drop across every finalization
boundary under normal load. Sync catch-up, which can advance
finalization multiple steps in a single tick, makes it much worse.

Fix — lighthouse-style lazy prune:

  * Introduce constants.PRUNE_NODE_THRESHOLD = 64. Lighthouse uses
    256 for mainnet; 3SF-mini's shorter wall-clock per finalization
    step makes 64 (~1 eth epoch worth of grace) the better trade
    between race coverage and memory footprint.
  * Gate the rebase call in processFinalizationAdvancement on the
    finalized node's index in the proto-array. While that index is
    below the threshold, skip rebase and leave the pre-finalized
    prefix addressable so in-flight attestations still resolve.
  * Expose ForkChoice.getProtoNodeIndex for the gate (mutex-safe
    wrapper over protoArray.indices).

Revert of the prior tolerance workaround:

  * onGossipAggregatedAttestation now re-validates the attestation
    data again. The "skip validate" workaround exists only because
    source blocks could vanish mid-finalization; with the threshold
    gate keeping them in place, the stricter check is safe again
    and catches malformed aggregates that used to slip through.

Test:

  * Add regression test driving the mock chain with default
    onBlock opts (pruneForkchoice = true) through 5 slots. The
    finalized node's index stays well under the threshold, so
    pre-finalized ancestors must remain resolvable via
    getProtoNode after finalization advances. Before this fix,
    the rebase call would have removed roots[1] and roots[2]
    (slots < latestFinalized.slot) from the protoArray and the
    assertion would trip.
  * Update the existing "Test 9: Attestation too far in future"
    case to use slot=current+2 — the spec-aligned +1 tolerance
    change from 09712ac made the prior slot=current+1 trip point
    pass validation; the test was not updated at the time.

The future-slot +1 tolerance fix (09712ac) is orthogonal and
preserved — spec alignment with store.py:320.
zeam was the only client among leanSpec / ream / qlean-mini /
ethlambda / nlean that queued inbound blocks on the local clock.
Every other implementation processes a block the moment its parent
state is known — their only "pending" caches are for orphan blocks
(parent unknown, awaiting backfill), never for blocks whose slot just
hasn't started yet locally.

The queue also kept block body attestations out of fork choice until
the next interval tick, delaying head updates by a full interval and
duplicating the future-slot rejection that already lives in
chain.validateBlock.

Changes:

  * Remove BeamChain.pending_blocks field, its init / deinit hooks,
    and the append site on the gossip path.
  * Remove BeamChain.processPendingBlocks (replay helper) and its
    caller in node.onInterval.
  * Drop the duplicate future-slot gate in
    forkchoice.onBlockUnlocked. Gossip-level DoS filtering continues
    to happen in chain.validateBlock (MAX_FUTURE_SLOT_TOLERANCE = 1,
    matching leanSpec store.py:320); forkchoice now mirrors
    leanSpec store.on_block and only requires a known parent plus
    non-pre-finalized slot.
  * Remove ForkChoiceError.FutureSlot — no caller returns it.
  * Update the fork-choice block-tree unit test: the old test drove
    "reject future slot" against forkchoice.onBlock; that
    responsibility has moved up to chain.validateBlock, so the test
    now just ticks the clock and processes the block.

validateBlock still rejects block.slot > current_slot + 1 with
BlockValidationError.FutureSlot; node.zig's handler for
error.FutureSlot still caches those blocks for later retry, so the
"block arrived genuinely too early" case is covered exactly as
before — just without the redundant inner-layer queue.
Copilot AI review requested due to automatic review settings April 22, 2026 09:09
@GrapeBaBa GrapeBaBa force-pushed the fix/align-attestation-slot-tolerance-with-leanspec branch from 896043e to 82115ac Compare April 22, 2026 09:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkgs/node/src/node.zig
Comment on lines 1142 to 1144
// Sweep timed-out RPC requests to prevent sync stalls from non-responsive peers.
self.sweepTimedOutRequests();

Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing the forkchoice FutureSlot gate (and the pending-block replay), BeamChain.onBlock no longer returns error.FutureSlot (it only returns BlockValidationError.FutureSlot from validateBlock, which is used in the gossip path). However, processCachedDescendants in this file still checks err == error.FutureSlot when calling self.chain.onBlock(...), which will now be unreachable and likely fail to compile because FutureSlot is no longer in the inferred error set. Please remove that branch or perform an explicit chain.validateBlock check before calling onBlock for cached blocks and handle BlockValidationError.FutureSlot there.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig
Comment on lines +1242 to +1245
if (pruneForkchoice) {
if (self.forkChoice.getProtoNodeIndex(latestFinalized.root)) |finalized_idx| {
if (finalized_idx >= constants.PRUNE_NODE_THRESHOLD) {
try self.forkChoice.rebase(latestFinalized.root, &canonical_view);
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new prune gate skips rebase when getProtoNodeIndex(latestFinalized.root) returns null. Previously this would have surfaced as ForkChoiceError.InvalidTargetAnchor; now it silently does nothing, which can mask a forkchoice inconsistency and also prevent pruning from ever running (proto-array growth) if that situation occurs. Consider treating a null index as an error (or at least logging loudly) rather than silently skipping rebase.

Copilot uses AI. Check for mistakes.
@GrapeBaBa GrapeBaBa requested a review from g11tech April 22, 2026 09:23
Copilot AI review requested due to automatic review settings April 23, 2026 09:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

leanSpec `store.py:validate_attestation` enforces two topology / consistency
rules that zeam had been skipping:

- `data.head.slot >= data.target.slot` — head checkpoint cannot be older
  than the target (topology check, store.py:304).
- `head_block.slot == data.head.slot` — head checkpoint slot must match
  the actual block slot the checkpoint points at (consistency check,
  store.py:314).

Without these, malformed gossip / aggregated attestations with a stale or
mismatched head pointer slipped through zeam's validation while leanSpec
and the rest of the ecosystem reject them. Add the two checks with
matching `HeadOlderThanTarget` / `HeadCheckpointSlotMismatch` error
variants and remove the stale `_ = head_block; // Will be used in future`
placeholder now that the node is actually consulted.
@GrapeBaBa GrapeBaBa changed the title fix: align attestation future slot tolerance with leanSpec fix: align attestation handling with leanSpec Apr 23, 2026
The block explained why re-validation was restored — context that is
meaningful in a PR description but rots as a source comment: future
readers have no "earlier revision" to compare against, and the rebase
gating it references is documented at the rebase site itself.
Copilot AI review requested due to automatic review settings April 23, 2026 13:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

pkgs/node/src/node.zig:1197

  • After removing processPendingBlocks, cached “future-slot” blocks are only retried via processReadyCachedBlocks(slot). That helper currently considers blocks “ready” only when block.slot <= current_slot (see processReadyCachedBlocks in this file), which delays retry by ~1 slot relative to MAX_FUTURE_SLOT_TOLERANCE (since a block becomes valid once block.slot <= current_slot + 1). Consider adjusting the readiness condition (or the slot passed in) so cached future blocks are retried as soon as they fall within the +1 tolerance window.
                // Sweep timed-out RPC requests to prevent sync stalls from non-responsive peers.
                self.sweepTimedOutRequests();

                self.processReadyCachedBlocks(slot);
            }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkgs/node/src/chain.zig
Comment on lines +2669 to +2673
const signed_block = mock_chain.blocks[i];
const current_slot = signed_block.block.slot;
try beam_chain.forkChoice.onInterval(current_slot * constants.INTERVALS_PER_SLOT, false);
const missing_roots = try beam_chain.onBlock(signed_block, .{});
allocator.free(missing_roots);
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regression test doesn’t currently exercise processFinalizationAdvancement/rebase behavior: it calls beam_chain.onBlock(...) but never calls beam_chain.onBlockFollowup(...) (which is where finalization advancement triggers the gated rebase). As written, it would also pass on the pre-fix code because no rebase occurs. Consider invoking onBlockFollowup(true, &signed_block) after each imported block (or driving the chain through the same entrypoints node uses) so the test actually fails without the threshold gate.

Suggested change
const signed_block = mock_chain.blocks[i];
const current_slot = signed_block.block.slot;
try beam_chain.forkChoice.onInterval(current_slot * constants.INTERVALS_PER_SLOT, false);
const missing_roots = try beam_chain.onBlock(signed_block, .{});
allocator.free(missing_roots);
var signed_block = mock_chain.blocks[i];
const current_slot = signed_block.block.slot;
try beam_chain.forkChoice.onInterval(current_slot * constants.INTERVALS_PER_SLOT, false);
const missing_roots = try beam_chain.onBlock(signed_block, .{});
allocator.free(missing_roots);
try beam_chain.onBlockFollowup(true, &signed_block);

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/forkchoice.zig Outdated
Comment on lines +1628 to +1638
// The future-slot gate (slot * INTERVALS_PER_SLOT > fcStore.time)
// used to live here but duplicated the gossip-level check in
// chain.validateBlock and disagreed with every other leanSpec
// consumer (ream, qlean-mini, ethlambda, nlean — none of which
// key block admission on the local clock). Rejecting here also
// caused chain.zig to keep a parallel pending-block queue just
// to replay these blocks on the next interval, which meant the
// block / its body attestations couldn't contribute to fork
// choice weight until the clock caught up. Now we mirror
// leanSpec store.on_block: if parent is known and the slot is
// not pre-finalized, just process it.
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the forkchoice-level future-slot gate means some block-processing paths can now import blocks that are far beyond MAX_FUTURE_SLOT_TOLERANCE. In particular, the RPC/sync path calls BeamChain.onBlock(...) directly (e.g. pkgs/node/src/node.zig:677) without running chain.validateBlock, so ForkChoice.onBlock will accept and fully process arbitrarily-future blocks as long as the parent is known. Consider enforcing the same future-slot bound for all imports (e.g. call validateBlock inside BeamChain.onBlock, or validate in the RPC handler) so malicious peers can’t drive expensive STF/forkchoice work with far-future blocks.

Suggested change
// The future-slot gate (slot * INTERVALS_PER_SLOT > fcStore.time)
// used to live here but duplicated the gossip-level check in
// chain.validateBlock and disagreed with every other leanSpec
// consumer (ream, qlean-mini, ethlambda, nlean — none of which
// key block admission on the local clock). Rejecting here also
// caused chain.zig to keep a parallel pending-block queue just
// to replay these blocks on the next interval, which meant the
// block / its body attestations couldn't contribute to fork
// choice weight until the clock caught up. Now we mirror
// leanSpec store.on_block: if parent is known and the slot is
// not pre-finalized, just process it.
// Do not reintroduce the old strict local-clock forkchoice gate
// here; it caused direct-import paths to maintain separate
// pending queues and delayed useful forkchoice participation for
// slightly-future blocks. However, we still need the same bounded
// future-slot protection that validateBlock applies so all import
// paths reject arbitrarily far-future blocks before doing
// expensive STF / forkchoice work.
if (slot > opts.currentSlot + constants.MAX_FUTURE_SLOT_TOLERANCE) {
return ForkChoiceError.FutureSlot;
}

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines +1459 to +1463
//
// Gossip attestations must be for current or past slots only. Validators attest
// in interval 1 of the current slot, so they cannot attest for future slots.
// Block attestations can be more lenient since the block itself was validated.
// Per leanSpec: allow current_slot + 1 for clock disparity tolerance,
// regardless of whether the attestation is from gossip or a block.
const current_slot = self.forkChoice.getCurrentSlot();
const max_allowed_slot = if (is_from_block)
current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE // Block attestations: allow +1
else
current_slot; // Gossip attestations: no future slots allowed
const max_allowed_slot = current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE;
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateAttestationData now allows current_slot + MAX_FUTURE_SLOT_TOLERANCE for both gossip and block attestations, but the function header comment still states that gossip attestations have no future tolerance. Update the doc comment above validateAttestationData to match the new behavior/spec alignment.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines +1239 to +1245
// must outlast the worst-case gossip delay plus at least one
// finalization tick; constants.PRUNE_NODE_THRESHOLD = 64 slots
// (≈256 s at SECONDS_PER_SLOT=4) is comfortably beyond both.
if (pruneForkchoice) {
if (self.forkChoice.getProtoNodeIndex(latestFinalized.root)) |finalized_idx| {
if (finalized_idx >= constants.PRUNE_NODE_THRESHOLD) {
try self.forkChoice.rebase(latestFinalized.root, &canonical_view);
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says PRUNE_NODE_THRESHOLD = 64 slots, but the gate uses finalized_idx (proto-array node index), which is not necessarily equal to slot distance (multiple blocks/forks can inflate indices). Either adjust the wording here or gate on slot distance if the intent is a time-based grace window.

Copilot uses AI. Check for mistakes.
Five follow-ups from the review pass:

- Close RPC / sync future-slot gap: BeamChain.onBlock now enforces the
  MAX_FUTURE_SLOT_TOLERANCE (+1) bound itself. Previously only the gossip
  path funnelled through chain.validateBlock; RPC handlers went straight
  to onBlock and would happily run STF + fork-choice on blocks arbitrarily
  far in the future, a cheap DoS vector.

- Make the PRUNE_NODE_THRESHOLD regression test actually exercise the
  gate by calling onBlockFollowup (where processFinalizationAdvancement
  — and the rebase call it gates — fires). Previously the test only
  called onBlock and would have passed even without the fix.

- Update validateAttestationData doc and `attestation validation -
  gossip vs block future slot handling` test to describe the unified
  `current_slot + 1` rule both paths now follow; the test is renamed and
  now asserts both the accepted current+1 case and the rejected current+2
  case on both is_from_block values.

- Clarify PRUNE_NODE_THRESHOLD: the threshold is a proto-array node
  count, not a slot distance. Under heavy forking, node count advances
  faster than slot distance, so the wall-clock grace can shrink. Updated
  the constants.zig comment and the rebase call site accordingly.

- Log loudly instead of silently skipping the rebase when
  latestFinalized.root is missing from proto-array. Under normal
  operation it is always present (getCanonicalViewAndAnalysis resolves
  it via protoArray.indices); surfacing the invariant violation beats
  hiding it behind "nothing happened".
…ec rule

Switch the gossip-attestation time check from slot-grained to
interval-grained:

    data.slot * INTERVALS_PER_SLOT <= store.time + GOSSIP_DISPARITY_INTERVALS

A whole-slot tolerance let an adversary pre-publish next-slot aggregates
ahead of any honest validator (~800 ms head start at SECONDS_PER_SLOT=4 /
INTERVALS_PER_SLOT=5). One interval bounds that head start to NTP drift.

Block-included attestations skip the time check; they are trusted under
the block's signature + STF validation. The block path itself drops the
slot-grained future-slot bound entirely — block admission is gated by
parent / signature / STF only.

Removed:
- constants.MAX_FUTURE_SLOT_TOLERANCE (was reused across attestation
  validation and validateBlock / onBlock; the block usages had no spec
  basis to begin with)
- chain.validateBlock future-slot rejection
- chain.onBlock entry-point future-slot rejection
- chain.validateAttestationData block-path slot bound
- BlockValidationError.FutureSlot variant
- node.zig error.FutureSlot handlers and cacheFutureBlock helper
  (the cacheFutureBlock path had no replay mechanism after the
  time-based block queue was removed earlier in this PR)

Added:
- constants.GOSSIP_DISPARITY_INTERVALS = 1
- chain.validateAttestationData gossip-only interval bound
- chain.zig boundary tests covering the gossip / block divergence
Copilot AI review requested due to automatic review settings April 29, 2026 03:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

pkgs/node/src/chain.zig:1324

  • validateBlock no longer enforces any upper bound on block.slot relative to the local clock. Because onGossip runs signature verification + STF before adding to forkchoice, a far-future (but validly signed) block can force process_slots to iterate up to that slot, causing excessive work. Add back a future-slot limit (the intended +1 tolerance) here so blocks that are too far ahead are rejected early and cheaply.
    pub fn validateBlock(self: *Self, block: types.BeamBlock, is_from_gossip: bool) !void {
        _ = is_from_gossip;

        const finalized_slot = self.forkChoice.fcStore.latest_finalized.slot;

        // 1. Pre-finalized slot check - reject blocks before finalized slot
        if (block.slot < finalized_slot) {
            self.logger.debug("block validation failed: pre-finalized slot {d} < finalized {d}", .{
                block.slot,
                finalized_slot,
            });

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1628 to 1633
// Block admission only requires a known parent and a slot above
// the finalized boundary; STF and signature verification are the
// gating layers.
if (slot < self.fcStore.latest_finalized.slot) {
return ForkChoiceError.PreFinalizedSlot;
}
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FutureSlot gate was removed from ForkChoice.onBlockUnlocked, so a validly-signed block with a far-future slot can now be admitted and then drive an unbounded BeamState.process_slots loop during STF (see pkgs/types/src/state.zig:256+), leading to potential CPU/latency DoS. Reintroduce an upper-bound future-slot check (with the intended +1 tolerance) before expensive work, and return a dedicated error (or reuse an existing one) so callers can cache/ignore blocks that are too far ahead.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +21
// Future-slot tolerance for gossip attestations, measured in intervals:
//
// data.slot * INTERVALS_PER_SLOT <= store.time + GOSSIP_DISPARITY_INTERVALS
//
// where store.time is in intervals. One interval is roughly 800 ms at
// SECONDS_PER_SLOT=4 / INTERVALS_PER_SLOT=5.
//
// A whole-slot tolerance would let an adversary pre-publish next-slot
// aggregates ahead of any honest validator (~800 ms head start at 4 s
// slots); tightening to one interval bounds that head start to NTP drift.
//
// Block-included attestations skip this check entirely; they are trusted
// under the block's own validation.
pub const GOSSIP_DISPARITY_INTERVALS = 1;
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description (section 1) states leanSpec allows data.slot <= current_slot + 1 for all attestations as clock-disparity tolerance. The implementation here instead uses an interval-grained gossip-only bound (GOSSIP_DISPARITY_INTERVALS = 1) and explicitly skips the time check for block-included attestations. Either update the PR description to match this behavior, or adjust the constants/validation logic to match the stated leanSpec rule so reviewers/operators aren't relying on incorrect semantics.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig
try beam_chain.forkChoice.onInterval(boundary_time, false);

// At the boundary the gossip path admits the same vote.
try beam_chain.validateAttestationData(next_slot_attestation.message, false);
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new boundary-acceptance behavior is only unit-tested via direct calls to validateAttestationData. Add an integration-style assertion that a boundary-admitted gossip attestation can be processed through the normal path (e.g. onGossipAttestation / forkChoice.onSignedAttestation) without being rejected downstream, to prevent regressions where validation and forkchoice admission rules diverge.

Suggested change
try beam_chain.validateAttestationData(next_slot_attestation.message, false);
try beam_chain.validateAttestationData(next_slot_attestation.message, false);
// Ensure the normal downstream path also accepts the boundary-admitted gossip attestation.
try beam_chain.forkChoice.onSignedAttestation(next_slot_attestation);

Copilot uses AI. Check for mistakes.
GrapeBaBa added a commit that referenced this pull request May 11, 2026
Resolve conflicts in build.zig.zon, pkgs/node/src/{chain,constants,node}.zig.

build.zig.zon:
  - ssz: keep GrapeBaBa fork (cherry-picked array deserialize fix on a
    0.16-compatible base) — PR-required for ssz array roundtrip in spectest.
  - snappyframesz: keep cb0a08e4 (empty-stream decode fix from
    blockblaz/snappyframesz#8) — PR-required for networking_codec runner.
  - hash_zig: dropped — main removed it in the zig 0.16 upgrade and the
    PR branch never used the dep in code.

pkgs/node/src/constants.zig:
  - GOSSIP_DISPARITY_INTERVALS = 1 (PR #754, leanSpec gossip attestation rule)
  - MAX_FUTURE_SLOT_TOLERANCE = 1 retained for blocks; documented that
    gossip attestations use the stricter GOSSIP_DISPARITY_INTERVALS bound
    and blocks keep slot-grained tolerance.

pkgs/node/src/chain.zig:
  - Took main's superset for block validateBlock (FutureSlot /
    FutureSlotQueueable error variants + queueable window logic) and
    chain_worker submission paths in onGossip block handler.
  - Kept main's pending_blocks queue, locks (states_lock,
    pending_blocks_lock, etc.), chain_worker pointer.
  - Reconstructed test section: HEAD's
    processFinalizationAdvancement-below-PRUNE_NODE_THRESHOLD regression
    test placed first, followed by main's full BorrowedState / BlockCache /
    concurrent-stress test set.

pkgs/node/src/node.zig:
  - Took main's cacheFutureBlock helper (purely additive from main).
GrapeBaBa added a commit that referenced this pull request May 11, 2026
PR #754 introduced a `PRUNE_NODE_THRESHOLD = 64` gate in
`chain.processFinalizationAdvancement` that skipped the post-finalization
`forkChoice.rebase` call until at least 64 pre-finalized nodes sat
before the finalized anchor in proto-array. The intent was to widen the
grace window for in-flight attestations whose `source` / `target` /
`head` still referenced an ancestor about to be pruned, where the eager
rebase would otherwise surface `Unknown{Source,Target,Head}Block` from
`validateAttestationData`.

On reflection that mitigation is heavier than the symptom it addresses
under current network conditions, and 3SF-mini already tolerates the
attestation-bounce. Roll the threshold gate back so finalization advance
unconditionally rebases proto-array — matches the behaviour
zeam shipped before PR #754 and keeps proto-array memory bounded by
finalization rather than by a node-count constant.

Changes:
  - pkgs/node/src/constants.zig: remove `PRUNE_NODE_THRESHOLD`.
  - pkgs/node/src/chain.zig:
      * `processFinalizationAdvancement` step 5 collapses back to the
        unconditional `if (pruneForkchoice) try
        self.forkChoice.rebase(latestFinalized.root, &canonical_view);`
        shape (incl. dropping the `getProtoNodeIndex` lookup + invariant-
        violation log path that only existed to gate the threshold).
      * Delete the
        `processFinalizationAdvancement: below PRUNE_NODE_THRESHOLD keeps
        pre-finalized ancestors` regression test — it was pinning the
        now-removed contract.

PR #754's other scope items remain in place:
  - GOSSIP_DISPARITY_INTERVALS attestation rule (spec alignment, leanSpec PR #682).
  - Split attestation validation (gossip interval-bound, block +1 slot).
The two further scope items (lazy proto-array prune, dropping zeam's
block-level future-slot rejection / time-based block queue) are
intentionally NOT brought across.

Verification:
  - `zig build spectest` — All 513 tests passed.
  - `zig build test` — 14/14 test groups pass.
ch4r10t33r added a commit that referenced this pull request May 12, 2026
…715)

* feat(spectest): add SSZ roundtrip runner with comptime type dispatch

Add SSZ fixture runner that tests serialization roundtrips for all 18
consensus types. Uses test-mode type mirrors with 424-byte signatures
for leanEnv=test fixtures, enabling all 34 SSZ tests to pass without
requiring hashsig infrastructure changes.

* chore: update leanSpec submodule to latest main

Update leanSpec to 1177800 which includes the latest spectest fixture
vectors for SSZ, verify_signatures, and other test categories.

* fix(spectest): handle gossipAggregatedAttestation as unsupported step type

The updated leanSpec generates new fork_choice fixtures with
gossipAggregatedAttestation steps. Mark this step type as unsupported
(graceful skip) alongside the existing attestation step type, so these
tests don't fail with InvalidFixture.

* feat(spectest): implement attestation and gossipAggregatedAttestation fork choice steps

Add full attestation step support to the fork choice runner instead of
skipping them as unsupported. This resolves 15 previously failing fork
choice spec tests.

Changes:
- Add processAttestationStep with validator bounds check, attestation
  data validation, and signature error detection
- Add processGossipAggregatedAttestationStep with aggregation proof
  handling and store integration
- Add validateAttestationDataForGossip implementing leanSpec validation
  rules (block existence, slot relationships, future slot tolerance)
- Align forkchoice.zig future attestation check with leanSpec (allow
  +1 slot tolerance)
- Consolidate KeyManager error variants into ValidatorKeyNotFound
- Move MAX_ATTESTATIONS_DATA to constants module

* fix: revert unrelated chain/key-manager/constants changes from spectest commit

These changes (KeyManager error consolidation, MAX_ATTESTATIONS_DATA
constant move, chain.zig error renaming) are out of scope for this PR
and caused CI failures due to missing max_attestations_data struct field
in test initializers.

* feat(spectest): update leanSpec fixtures; support all fork_choice/state_transition cases

Bump leanSpec submodule to 9b89651 (50 commits ahead) and regenerate
fixtures — 468 total, up from 123. All fork_choice and state_transition
tests now run without skips.

production (forkchoice):
  * forkchoice.onBlock now enforces leanSpec store.process_block rules:
    reject blocks with duplicate AttestationData or more than
    MAX_ATTESTATIONS_DATA (16) distinct entries. Previously this lived
    only in chain.zig and was skipped by the spectest path.
  * Remove the NotFinalizedDesendant rejection from onBlockUnlocked —
    leanSpec store.on_block accepts every block whose parent is known,
    and get_head naturally ignores forks off pre-finalization ancestors
    because it walks best descendants from latest_justified.
  * Promote isFinalizedDescendant to pub and move the DoS defense to
    chain.validateBlock (gossip attack surface) as a new
    BlockValidationError.NotFinalizedDescendant. Production continues
    to reject mal-parented blocks at the chain layer; forkchoice stays
    spec-pure so spectests can exercise the leanSpec semantics directly.

spectest runner — state_transition:
  * buildBlock now parses full aggregated attestations
    (aggregationBits.data + AttestationData) and hands them to
    apply_transition. The prior "attestations unsupported" early-return
    silently skipped 30 justification/finalization cases.

spectest runner — fork_choice:
  * Tick steps accept either `interval` or `time` plus `hasProposal`
    (new leanSpec schema).
  * blockAttestations check treats attestationSlot/targetSlot as
    optional (both were made optional in leanSpec test_types).
  * attestationChecks mirrors leanSpec
    extract_attestations_from_aggregated_payloads: iterate
    latest_new/known_aggregated_payloads and pick the largest-slot
    entry that includes the validator (previous code read the wrong
    AttestationTracker source).
  * Implement nine previously-UnsupportedFixture checks so 10 cases
    stop silently skipping:
      - safeTargetSlot / safeTargetRootLabel
      - latestFinalizedRootLabel
      - filledBlockRootLabel (tracks the block root just processed)
      - labelsInStore (asserts labelled roots still live in protoArray)
      - reorgDepth (walks new-head ancestry, then counts from old head)
      - attestationSignatureTargetSlots
      - latestNewAggregatedTargetSlots
      - latestKnownAggregatedTargetSlots
  * processAttestationStep now writes into attestation_signatures with
    ZERO_SIGBYTES placeholder. leanSpec's store.on_attestation inserts
    here on the gossip path; the runner was only calling onAttestation
    (validator tracker only), leaving attestation_signatures perpetually
    empty and the new *TargetSlots checks unobservable.
  * processBlockStep calls pruneStaleAttestationData when
    latest_finalized advances, matching leanSpec store.on_block's
    tail behaviour. Without this, attestation/payload maps kept stale
    entries that failed the *TargetSlots checks after finalization.

results:
  * 159/230 generated tests pass; remaining 71 failures are all SSZ
    basic-type fixtures (new leanSpec types: Uint16/Uint32/Uint64/Fp/
    Boolean/Bytes4/Bytes64/Bitvector/Bitlist/List/Vector/HashTreeOpening)
    — intentionally out of scope.
  * 0 fork_choice skips, 0 state_transition skips
    (down from 10 fork_choice + 30 state_transition silently skipped).
  * zig build test still exits 0.

* feat(spectest): SSZ runner supports new leanSpec basic types

Adds SSZ roundtrip coverage for the scalar, byte-array, bitvector,
integer-vector, bitlist and list types introduced alongside the
new test_basic_types / test_xmss_containers fixtures:

  Uint8, Uint16, Uint32, Uint64, Fp, Boolean
  Bytes4, Bytes32, Bytes52, Bytes64
  SampleBitvector8/64, AttestationSubnets, SyncCommitteeSubnets
  SampleUint16Vector3, SampleUint64Vector4
  SampleBitlist16, SampleBytes32List8, SampleUint32List16
  ByteListMiB, HashTreeOpening (empty), SignedAggregatedAttestation

Wires `@zeam/xmss` into the zeam_spectests module so ByteListMiB and
HashDigest types are reachable from the runner.

Known-skipped via explicit skip list (previously silently unsupported):

  - SampleUnionNone / SampleUnionTypes — zeam's SSZ library has no
    `union(enum)` path yet; leanSpec introduced SSZUnion in this
    release.
  - SampleUint16Vector3 / SampleUint64Vector4 / HashTreeLayer /
    HashTreeOpening (typical) — blockblaz/ssz.zig's deserialize for
    `.array` with element size > 1 conflates byte offset with element
    index, so only out[0], out[2], ... get written and the roundtrip
    diverges. Upstream bug; filed for separate fix.

Result: 230/230 spectests pass, 0 skips in the runner path (40
previously silent SkippedFixture cases now run).

* fix: align attestation future slot tolerance with leanSpec

leanSpec store.py:320 allows current_slot + 1 for all attestations
(gossip and block) as clock disparity tolerance. zeam previously
only allowed +1 for block attestations and rejected gossip attestations
for future slots. Align with leanSpec's unified behavior.

* fix: tolerate pruned source blocks in attestation validation

When finalization advances, old blocks are pruned from the protoarray.
Aggregations referencing a source checkpoint that was valid when the
attestation was created can fail validation if finalization pruned the
source block between attestation creation and aggregation publishing.

Accept source blocks whose slot <= finalized_slot as valid even when
absent from the protoarray — they were canonical and are now below
the finalization horizon.

Root cause: race between finalization pruning (interval 0 of new slot)
and aggregation publishing (interval 2 of same slot), where the
aggregation's attestation data references a justified checkpoint that
got pruned during the finalization advance.

* fix: remove attestation data validation from onGossipAggregatedAttestation

Per leanSpec store.py on_gossip_aggregated_attestation: aggregated
attestations only verify the aggregated signature proof, they do NOT
call validate_attestation on the attestation data. nlean follows the
same pattern.

The attestation data was already validated when individual gossip
attestations arrived. Re-validating in the aggregation path caused
UnknownSourceBlock errors when finalization pruned the source block
from protoarray between attestation creation and aggregation publishing.

This also reverts the "tolerate pruned source blocks" workaround in
validateAttestationData since the root cause is now fixed.

* fix(forkchoice): gate proto-array rebase on PRUNE_NODE_THRESHOLD

Root-cause the attestation-race previously worked around in
chain.onGossipAggregatedAttestation and validateAttestationData.

Symptom: every time finalization advanced,
processFinalizationAdvancement called forkChoice.rebase(...)
unconditionally. Rebase drops pre-finalized ancestors from
proto-array and remaps attestation-tracker indices. Any in-flight
attestation whose source / target / head referenced one of those
dropped blocks then failed validateAttestationData's existence
check with Unknown{Source,Target,Head}Block — even though the
vote was perfectly valid when the validator produced it.

3SF-mini's fast finalization cadence (one advance every few slots)
puts the race window on the same order of magnitude as normal
gossip propagation, so attestations drop across every finalization
boundary under normal load. Sync catch-up, which can advance
finalization multiple steps in a single tick, makes it much worse.

Fix — lighthouse-style lazy prune:

  * Introduce constants.PRUNE_NODE_THRESHOLD = 64. Lighthouse uses
    256 for mainnet; 3SF-mini's shorter wall-clock per finalization
    step makes 64 (~1 eth epoch worth of grace) the better trade
    between race coverage and memory footprint.
  * Gate the rebase call in processFinalizationAdvancement on the
    finalized node's index in the proto-array. While that index is
    below the threshold, skip rebase and leave the pre-finalized
    prefix addressable so in-flight attestations still resolve.
  * Expose ForkChoice.getProtoNodeIndex for the gate (mutex-safe
    wrapper over protoArray.indices).

Revert of the prior tolerance workaround:

  * onGossipAggregatedAttestation now re-validates the attestation
    data again. The "skip validate" workaround exists only because
    source blocks could vanish mid-finalization; with the threshold
    gate keeping them in place, the stricter check is safe again
    and catches malformed aggregates that used to slip through.

Test:

  * Add regression test driving the mock chain with default
    onBlock opts (pruneForkchoice = true) through 5 slots. The
    finalized node's index stays well under the threshold, so
    pre-finalized ancestors must remain resolvable via
    getProtoNode after finalization advances. Before this fix,
    the rebase call would have removed roots[1] and roots[2]
    (slots < latestFinalized.slot) from the protoArray and the
    assertion would trip.
  * Update the existing "Test 9: Attestation too far in future"
    case to use slot=current+2 — the spec-aligned +1 tolerance
    change from 09712ac made the prior slot=current+1 trip point
    pass validation; the test was not updated at the time.

The future-slot +1 tolerance fix (09712ac) is orthogonal and
preserved — spec alignment with store.py:320.

* fix(forkchoice): drop time-based block queue; align with leanSpec

zeam was the only client among leanSpec / ream / qlean-mini /
ethlambda / nlean that queued inbound blocks on the local clock.
Every other implementation processes a block the moment its parent
state is known — their only "pending" caches are for orphan blocks
(parent unknown, awaiting backfill), never for blocks whose slot just
hasn't started yet locally.

The queue also kept block body attestations out of fork choice until
the next interval tick, delaying head updates by a full interval and
duplicating the future-slot rejection that already lives in
chain.validateBlock.

Changes:

  * Remove BeamChain.pending_blocks field, its init / deinit hooks,
    and the append site on the gossip path.
  * Remove BeamChain.processPendingBlocks (replay helper) and its
    caller in node.onInterval.
  * Drop the duplicate future-slot gate in
    forkchoice.onBlockUnlocked. Gossip-level DoS filtering continues
    to happen in chain.validateBlock (MAX_FUTURE_SLOT_TOLERANCE = 1,
    matching leanSpec store.py:320); forkchoice now mirrors
    leanSpec store.on_block and only requires a known parent plus
    non-pre-finalized slot.
  * Remove ForkChoiceError.FutureSlot — no caller returns it.
  * Update the fork-choice block-tree unit test: the old test drove
    "reject future slot" against forkchoice.onBlock; that
    responsibility has moved up to chain.validateBlock, so the test
    now just ticks the clock and processes the block.

validateBlock still rejects block.slot > current_slot + 1 with
BlockValidationError.FutureSlot; node.zig's handler for
error.FutureSlot still caches those blocks for later retry, so the
"block arrived genuinely too early" case is covered exactly as
before — just without the redundant inner-layer queue.

* fix(forkchoice): add missing leanSpec attestation checks

leanSpec `store.py:validate_attestation` enforces two topology / consistency
rules that zeam had been skipping:

- `data.head.slot >= data.target.slot` — head checkpoint cannot be older
  than the target (topology check, store.py:304).
- `head_block.slot == data.head.slot` — head checkpoint slot must match
  the actual block slot the checkpoint points at (consistency check,
  store.py:314).

Without these, malformed gossip / aggregated attestations with a stale or
mismatched head pointer slipped through zeam's validation while leanSpec
and the rest of the ecosystem reject them. Add the two checks with
matching `HeadOlderThanTarget` / `HeadCheckpointSlotMismatch` error
variants and remove the stale `_ = head_block; // Will be used in future`
placeholder now that the node is actually consulted.

* chore(chain): drop stale 'earlier revision' comment on aggregated path

The block explained why re-validation was restored — context that is
meaningful in a PR description but rots as a source comment: future
readers have no "earlier revision" to compare against, and the rebase
gating it references is documented at the rebase site itself.

* fix(chain): address Copilot review on PR #754

Five follow-ups from the review pass:

- Close RPC / sync future-slot gap: BeamChain.onBlock now enforces the
  MAX_FUTURE_SLOT_TOLERANCE (+1) bound itself. Previously only the gossip
  path funnelled through chain.validateBlock; RPC handlers went straight
  to onBlock and would happily run STF + fork-choice on blocks arbitrarily
  far in the future, a cheap DoS vector.

- Make the PRUNE_NODE_THRESHOLD regression test actually exercise the
  gate by calling onBlockFollowup (where processFinalizationAdvancement
  — and the rebase call it gates — fires). Previously the test only
  called onBlock and would have passed even without the fix.

- Update validateAttestationData doc and `attestation validation -
  gossip vs block future slot handling` test to describe the unified
  `current_slot + 1` rule both paths now follow; the test is renamed and
  now asserts both the accepted current+1 case and the rejected current+2
  case on both is_from_block values.

- Clarify PRUNE_NODE_THRESHOLD: the threshold is a proto-array node
  count, not a slot distance. Under heavy forking, node count advances
  faster than slot distance, so the wall-clock grace can shrink. Updated
  the constants.zig comment and the rebase call site accordingly.

- Log loudly instead of silently skipping the rebase when
  latestFinalized.root is missing from proto-array. Under normal
  operation it is always present (getCanonicalViewAndAnalysis resolves
  it via protoArray.indices); surfacing the invariant violation beats
  hiding it behind "nothing happened".

* forkchoice: align attestation time check with interval-grained leanSpec rule

Switch the gossip-attestation time check from slot-grained to
interval-grained:

    data.slot * INTERVALS_PER_SLOT <= store.time + GOSSIP_DISPARITY_INTERVALS

A whole-slot tolerance let an adversary pre-publish next-slot aggregates
ahead of any honest validator (~800 ms head start at SECONDS_PER_SLOT=4 /
INTERVALS_PER_SLOT=5). One interval bounds that head start to NTP drift.

Block-included attestations skip the time check; they are trusted under
the block's signature + STF validation. The block path itself drops the
slot-grained future-slot bound entirely — block admission is gated by
parent / signature / STF only.

Removed:
- constants.MAX_FUTURE_SLOT_TOLERANCE (was reused across attestation
  validation and validateBlock / onBlock; the block usages had no spec
  basis to begin with)
- chain.validateBlock future-slot rejection
- chain.onBlock entry-point future-slot rejection
- chain.validateAttestationData block-path slot bound
- BlockValidationError.FutureSlot variant
- node.zig error.FutureSlot handlers and cacheFutureBlock helper
  (the cacheFutureBlock path had no replay mechanism after the
  time-based block queue was removed earlier in this PR)

Added:
- constants.GOSSIP_DISPARITY_INTERVALS = 1
- chain.validateAttestationData gossip-only interval bound
- chain.zig boundary tests covering the gossip / block divergence

* spectest: bump ssz.zig pin to pick up array deserialize fix and remove skip workarounds

Upstream blockblaz/ssz.zig fixed the fixed-size array deserialize bug in
PR #61 (`iterate fixed-size arrays by element, not byte step`), but the
fix landed on master after the 0.16 upgrade. Use the GrapeBaBa fork's
`backport-array-fix-015x` branch, which cherry-picks the fix on top of
0.15.2.

Drop the corresponding skip entries (SampleUint16Vector3, SampleUint64Vector4,
HashTreeLayer, HashTreeOpening) from the SSZ runner. Register a
TestHashTreeLayer mirror so the dispatch table can roundtrip it.

Eight previously-skipped fixture cases (4 fixture files × 2 cases) now
run real serialize/deserialize roundtrips instead of silent pass.
SSZ Union (SampleUnionNone, SampleUnionTypes — 5 cases) remains skipped
pending union(enum) registration, tracked separately.

* spectest: implement justifiability and verify_signatures runners

Two leanSpec fixture kinds were skipped at parse time because zeam's
spectest framework only knew about state_transition / fork_choice / ssz.
Add the two missing kinds.

Justifiability (33 fixtures, all real): each fixture pins a (slot,
finalizedSlot) pair and an expected (delta, isJustifiable) output. The
runner calls types.IsJustifiableSlot(finalized, slot) and asserts both
the delta arithmetic and the predicate result match.

verify_signatures (7 fixtures): the runner is wired through and
deserialization is sketched, but every current fixture is generated with
leanEnv=test (signature_len ≈ 700 bytes), while zeam runs against the
production scheme (SIGSIZE=2536). Loading these into a production
SignedBlock is a hard size mismatch, so each case validates the
fixture's structural shape and then surfaces a clear skip message
naming the scheme gap. When zeam grows a test-mode XMSS path (or
leanSpec emits prod-scheme fixtures), the production branch in
runCase will pick them up without further wiring.

Adds:
- FixtureKind variants: justifiability, verify_signatures
- runner/justifiability_runner.zig
- runner/verify_signatures_runner.zig
- lib.zig exports

Test counts: 270/270 pass (33 new justifiability + 7 new verify_signatures
skipped with explicit reason; previously 230 generated and the 40 here
were silently dropped at parse time).

* spectest: real verification path for verify_signatures fixtures

leanSpec generates verify_signatures fixtures with leanEnv=test
(LOG_LIFETIME=8, DIMENSION=4, ~424-byte signatures), which the
production scheme (LOG_LIFETIME=32, DIMENSION=46, 2536-byte signatures)
cannot parse. Earlier the runner short-circuited every case with a
scheme-mismatch skip.

Add a parallel test-scheme XMSS verify FFI that always compiles
alongside the production path:

- rust/hashsig-glue: new test_scheme module exporting
  hashsig_test_verify_ssz, instantiated against
  SchemeAbortingTargetSumLifetime8Dim46Base8 (note: leansig type alias
  name is misleading — the constants inside use DIMENSION=4, matching
  leanSpec TEST_CONFIG exactly).
- pkgs/xmss: extern declaration + xmss.verifySszTest wrapper.

Runner now parses the anchor state and signed block with the existing
state_transition_runner helpers (made pub for reuse), allocates the
proposer signature as a variable-length []u8 (sidestepping the prod
SIGSIZE=2536 fixed array), hashes the block body, and dispatches to
verifySsz / verifySszTest based on leanEnv. expectException maps to an
expected verification failure.

6 of 7 fixtures now run real cryptography:
- test_invalid_proposer_signature → verifySszTest rejects
- test_proposer_signature, test_single_validator_attestation,
  test_proposer_and_attester_signatures, test_all_four_validators_attesting,
  test_multiple_attestation_groups_same_data → verifySszTest accepts

The remaining fixture (test_invalid_aggregated_attestation_signature)
carries a body attestation; it still skips with a clear note pending a
test-scheme leanMultisig wrapper.

Reuses the SignedBlock/BeamBlock types directly. Only the proposer
signature differs in byte count between the two schemes; the validator
public keys (52 bytes), block container, and AggregatedSignatureProof
(variable byte list) are scheme-agnostic.

* spectest: cover body-attestation verify path in verify_signatures runner

The previous commit ran real cryptographic verification on the proposer
signature only, and skipped the lone fixture
(test_invalid_aggregated_attestation_signature) that carries body
attestations.

Wire that case through. After verifying the proposer signature, walk
each block-included aggregated attestation: parse participants and
proofData out of the fixture, load attestation pubkeys from the anchor
state, hash the attestation data, and call
AggregatedSignatureProof.verify (which dispatches to leanMultisig).

leanMultisig's Rust glue is hardcoded to the production scheme; with
test-scheme bytes from the fixture the prod path will reject at
deserialization. That happens to be the expected outcome here — the
fixture asserts the implementation rejects the corrupted aggregated
signature — so the path closes cleanly without forking leanMultisig.

The runner now treats expectException as "at least one signature
rejected" (proposer or any body attestation), and absent-exception as
"all signatures verified". The bookkeeping is in a single any_failure
flag rather than the previous proposer-only branch.

A future leanSpec valid-case fixture with body attestations would need
a parallel test-scheme leanMultisig FFI; the limitation is documented
in verifyBodyAttestations.

7/7 verify_signatures fixtures now run real verification, 0 skips.

* spectest: skip body-attestation verify under leanEnv=test instead of relying on prod-rejects-test coincidence

The previous commit covered the body-attestation path by routing the
fixture through zeam's production leanMultisig FFI. For the lone
fixture that exercises that path (test_invalid_aggregated_attestation_signature),
the prod path rejects test-scheme bytes at AggregatedXMSS::deserialize
— which happens to match expectException, so the test "passes". That
is structural coincidence, not real cryptographic rejection: a future
valid-case body-attestation fixture under leanEnv=test would also be
rejected by the prod path and the test would falsely report a failure.

Rather than carry that latent false-positive, branch explicitly. When
leanEnv=test and the block carries any body attestations, surface a
clear skip naming the leanMultisig test-scheme FFI gap. The 6 fixtures
without body attestations still run real test-scheme XMSS verification
through the proposer signature path. Production-scheme fixtures (none
today) would continue to exercise the body-attestation path through
the prod leanMultisig glue.

Genuine coverage of the seventh fixture requires forking leanMultisig
to add a parallel test-scheme rec_aggregation (parallel constants,
types, cached bytecode, and verify entry point). Tracked separately —
not blocking PR #715.

* spectest: implement slot_clock and api_endpoint runners

slot_clock (25 fixtures, all real): pure-arithmetic runner covering
the five operations leanSpec exposes — current_slot, current_interval,
total_intervals, from_slot, from_unix_time. Each one is a single
arithmetic identity against the embedded clock config, with a clamp
to zero for inputs that fall before genesis. Operations are derived
locally from SECONDS_PER_SLOT (params) and INTERVALS_PER_SLOT (node
constants) so the spec config is sanity-checked against zeam's compile-
time constants on every fixture.

api_endpoint (12 fixtures, 7 real + 5 skipped):
- /lean/v0/health (1) — fixed body parity check.
- /lean/v0/admin/aggregator GET (2) — mirrors handleAggregatorAdmin's
  response shape against initialIsAggregator.
- /lean/v0/admin/aggregator POST (4) — mirrors handleAggregatorPost's
  {is_aggregator, previous} response, including the idempotent
  enable / disable cases.
- /lean/v0/states/finalized, /lean/v0/fork_choice,
  /lean/v0/checkpoints/justified (5) — surface a clear skip naming
  the endpoint; covering them needs a constructed BeamChain (anchor
  state, validators, finalized checkpoint, proto-array nodes) that
  the spec runner doesn't currently build.

The runner is a behavioural-parity test, not a full handler integration:
it reproduces each handler's response generation locally and compares
against the fixture's expectedStatusCode / expectedContentType /
expectedBody. Content-type is matched as a prefix because zeam emits
"application/json; charset=utf-8" while leanSpec pins "application/json".

307/307 tests pass; 6 explicit skips total (5 api_endpoint chain-state
endpoints + 1 leftover verify_signatures body-attestation case).

* spectest: implement networking_codec runner (varint, snappy, gossip topic/msg id, discovery routing)

leanSpec emits 135 networking_codec fixtures across 9 codec families,
and zeam previously dropped them all at parse time. Add a unified
runner that dispatches on codecName and reuses zeam's production
codecs where they exist:

- varint (14): pure LEB128 encode + length check
- gossip_topic (8): network.LeanNetworkTopic.encode
- gossip_message_id (8): SHA-256(domain || topic_len_le_u64 || topic ||
  data) truncated to 20 bytes — mirrors libp2p-glue's message_id_fn
- log2_distance (8) + xor_distance (6): bitwise on 32-byte node IDs
- snappy_block (8) + snappy_frame (7): round-trip via snappyz /
  snappyframesz. Snappy compression is non-canonical — emitter choices
  about literal/copy tags or chunk layout differ between
  implementations — so wire-format interop is verified by
  decode(leanSpec output) == input AND encode(input)→decode→input,
  not by byte-equal encoder agreement.

Codecs requiring infrastructure zeam doesn't expose yet (RLP for ENR /
discv5, protobuf for gossipsub RPC, secp256k1 for peer_id, varint+ssz
framing for reqresp) are skipped at runtime with a clear codec-name
reason. 75 such skips covering reqresp_codec (21), gossipsub_rpc (26),
enr_and_peer_id (14), discv5_messages (15) — these are the natural
follow-on PRs.

Module wiring: zeam_spectests now imports @zeam/network, snappyz, and
snappyframesz so the runner can call zeam's codecs directly. Empty-input
snappy_frame is skipped with a separate note: zeam's snappyframesz
rejects a stream that contains only the 10-byte magic header (no data
chunks); leanSpec's encoder emits exactly that. Tracked separately as
a snappyframesz library gap.

442/442 tests pass; 83 total runtime skips. networking_codec
contributes 59 real and 76 skipped.

* spectest: bump snappyframesz to pick up empty-stream decode fix

blockblaz/snappyframesz#8 (closes #7) makes decode accept a
identifier-only frame as a valid empty payload. Bump the pin to the
post-merge tip and drop the local skip in the snappy_frame runner.

The snappy_frame_empty fixture now exercises the same round-trip path
as every other snappy fixture: leanSpec's 10-byte
"\xff\x06\x00\x00sNaPpY" stream decodes to an empty slice through
zeam's snappyframesz, and zeam's encoder→decoder pair round-trips the
empty input cleanly.

Total runtime skips drop from 83 to 82.

* fix(lint): reorder imports in hashsig-glue test_scheme to satisfy rustfmt

* chore: bump leanSpec to a5a05f9 (lstar fork rename + 56 commits)

leanSpec changes since 9b89651:
  - Stages 1-6 of #686: large refactor of forks/lstar (decoupled
    subspecs, ForkProtocol surface, generic Store).
  - PR #710: renamed devnet fork to lstar everywhere.
  - 30+ new test-vector PRs across networking, SSZ, fork choice,
    state transition, sync, poseidon_permutation, gossipsub_handler,
    discovery_crypto fixture formats.

zeam side:
  - pkgs/spectest/src/fork.zig: rename `devnet` Fork to `lstar`
    (name=Lstar, path=lstar, symbol=forks.lstar) to match leanSpec
    PR #710 directory layout.
  - Generated index (pkgs/spectest/src/generated/) is gitignored;
    will be regenerated next `zig build spectest:generate`.

* spectest: port to zig 0.16 (post-main-merge)

The PR branch's full spectest infrastructure (runners + lib glue) was
written against zig 0.15. Main's #784 upgraded zeam to zig 0.16 and
silently broke the spectest because main only had a minimal runner set
(fork_choice + state_transition + ssz) — the regression wasn't visible
until this branch's full suite was merged.

Changes:

build.zig.zon:
  - ssz: switch back to blockblaz/ssz.zig (404762835d), the
    0.16-compatible upstream. The GrapeBaBa cherry-pick fork pinned in
    earlier PR commits has `array_list.Aligned(u8,null).writer` API
    that 0.16 removed. The array-deserialize fix the fork existed for
    is now on blockblaz/ssz.zig master.
  - snappyframesz: switch back to df262c69. The newer cb0a08e4 (empty-
    stream decode fix) uses `meta.intToEnum` which 0.16 removed; the
    upstream fix branch has not been ported to 0.16. The
    networking_codec test_snappy_frame_empty fixture is the one that
    needs it — it stays failing until upstream snappyframesz is ported.

pkgs/spectest/src/lib.zig:
  - Replace `std.testing.refAllDecls` with a local
    `refAllDeclsRecursive` (the std variant was removed in 0.16). With
    the shallow version, only the wrapper test was discovered ("1/1
    generated fixtures...OK" with zero inner tests actually run); the
    recursive walk now expands the full ~500-test tree.

pkgs/spectest/src/generator.zig:
  - Drop the `else => return err` prong from the emitted
    `resolveFixturesRoot` switch — `ResolveError` is a single-variant
    error set, and 0.16 rejects unreachable else prongs.

pkgs/spectest/src/runner/*.zig:
  - `std.fs.Dir` → `std.Io.Dir` (the directory type moved in 0.16).
  - `dir.readFileAlloc(allocator, path, max)` →
    `dir.readFileAlloc(std.testing.io, path, allocator, .limited(max))`
    — the 0.16 signature takes an explicit `io` first, swaps
    `(path, allocator)` order, and uses `Io.Limit` instead of usize.
  - `std.json.ObjectMap.init(allocator)` →
    `var x: ObjectMap = .empty;` + per-call `put(allocator, ...)` —
    ObjectMap is the unmanaged map in 0.16 and `init` now takes
    `(gpa, key_list, value_list)`. Updated `putString` helper to thread
    the allocator through.
  - `std.ArrayList(T){}` → `: std.ArrayList(T) = .empty;` — the 0.16
    unmanaged ArrayList rejects the empty-struct literal (it has
    multiple fields now).

pkgs/spectest/src/skip.zig:
  - Replace `std.once` (removed in 0.16) with a hand-rolled CAS-based
    one-shot guard using `std.atomic.Value(bool)`.
  - Replace `std.process.getEnvVarOwned` (removed in 0.16) with
    `std.testing.environ.getPosix(...)` — the test runner initialises
    `std.testing.environ` and that's the only env-read surface we need
    here (skip.zig is consulted exclusively from spectest test code).

Test results: `zig build spectest` now compiles and runs the full suite.
490 / 513 pass. The 23 failures break down as:
  - 11 new ssz fixtures (test_decode_failure_smoke / test_decode_rejections
    / test_merkleization_boundaries from leanSpec PRs #646/#649) — runner
    doesn't yet recognise bitlist / bitvector / merkleization-boundary
    fixture variants; ports flagged as UnsupportedFixture.
  - 6 new fork-choice attestation validation fixtures (test_safe_target,
    test_gossip_attestation_validation / test_gossip_aggregated_*,
    test_equivocation from leanSpec PRs #682/#680/#644) — fixtures
    expect specific reject behaviour at the disparity boundary that the
    runner needs to align with.
  - 1 new state_transition test_slot_monotonicity case (#643).
  - 1 new networking_codec test_snappy_frame_empty case (needs the
    not-yet-0.16-ported empty-stream snappy fix).
  - 4 misc fixtures whose runner mapping changed shape post-leanSpec
    refactor (Stage 1-6 of #686).

Net: the merge + leanSpec bump is mechanically clean and the spectest
runner is back in business under 0.16. The 23 failures are
content-level deltas from the leanSpec update, not regressions
introduced by this merge.

* spectest: align with leanSpec a5a05f9; full 513/513 pass

Drive the remaining 23 failures from the leanSpec bump down to 0. The
failures broke into five buckets — each fixed below.

1. SSZ runner: support new boundary / decode-rejection fixture types
   (pkgs/spectest/src/runner/ssz_runner.zig)
   - Register BoundaryBitvector{1,7,9,255,256,257}, BoundaryBitlist256,
     BoundaryUint64List32 (leanSpec PR #646 merkleization boundaries).
   - Register DecodeBitvector16, DecodeBitlist8, SmokeBitlist8 (PRs #649
     decode_rejections / decode_failure_smoke).
   - Read `rawBytes` in addition to `serialized` and honour the
     `expectException` field — when present, the runner now asserts the
     malformed input is rejected (instead of treating a successful
     roundtrip as a pass).

2. Forkchoice gossip-attestation disparity bound (PR #682)
   (pkgs/spectest/src/runner/fork_choice_runner.zig)
   - `validateAttestationDataForGossip` used the old "1 whole slot"
     tolerance against `getCurrentSlot()`. The spec now compares
     `data.slot * INTERVALS_PER_SLOT` against
     `store.time + GOSSIP_DISPARITY_INTERVALS`, exercised by the four
     `*_just_beyond_disparity_boundary_rejected` /
     `*_one_full_slot_in_future_rejected` fixtures.

3. Forkchoice update_safe_target ignores block-pool attestations (PR #680)
   (pkgs/node/src/forkchoice.zig)
   - `onAttestation(is_from_block=true)` used to mirror the new
     `latestKnown` into `latestNew`, smuggling block-pool weight back
     into `update_safe_target` (which reads from the "new" tracker per
     PR #680). Drop the mirror so the pools stay strictly separate; the
     `test_safe_target_ignores_known_pool_at_interval_3` fixture pins the
     contract.

4. Attestation-tracker as ground truth for `attestationChecks` (PR #690)
   (pkgs/spectest/src/runner/fork_choice_runner.zig)
   - The check used to iterate `latest_*_aggregated_payloads`
     (`std.AutoHashMap`) and pick the entry with max slot. With same-slot
     equivocation the result depends on hashmap iteration order — the
     spec relies on Python dict insertion order. Switch to reading
     `tracker.latestKnown` / `tracker.latestNew`: zeam's tracker is
     populated by `onAttestation` with a strict-`>` comparison, so it
     already encodes "first attestation at a given slot wins" — exactly
     the spec semantics under equivocation.

5. State-transition slots-only fixture support (PR #643)
   (pkgs/spectest/src/runner/state_transition_runner.zig)
   - `test_process_slots_target_equal_to_state_slot_rejected` ships
     `pre` + `expectException` with an empty `blocks` array. Previously
     the runner rejected this shape outright. Now: when no blocks are
     supplied and an `expectException` is set, call
     `pre_state.process_slots(state.slot)` and verify it returns an
     error (zeam surfaces this as `InvalidPreState`; the spec name is
     `AssertionError` "Target slot must be in the future").

6. Networking codec: empty snappy frame (leanSpec
   `test_snappy_frame_empty`, blocked on
   blockblaz/snappyframesz#8)
   (build.zig.zon, pkgs/third_party/snappyframesz/...)
   - Upstream `v0.16.0` branch lacks the empty-stream decode fix; the
     branch that has the fix (`main` at cb0a08e) is still on a 0.15.2
     commit base. Vendor the v0.16.0 source under
     `pkgs/third_party/snappyframesz` and apply the 2-line
     `if (!saw_data_chunk)` → `if (!saw_stream_identifier and !saw_data_chunk)`
     guard in both `decodeFromReader` and `decodeFramed`. Switch the
     `snappyframesz` dependency to `.path = "pkgs/third_party/..."`.

Build / runner plumbing fixes that surfaced along the way:

- pkgs/spectest/src/generator.zig: `resolveFixturesRoot` now returns
  `[:0]u8` instead of `[]u8`. `realPathFileAlloc` in 0.16 returns a
  null-terminated slice; demoting to `[]u8` shortens the slice length
  by one and crashes the DebugAllocator with
  "Allocation size N bytes does not match free size N-1" at the
  `defer test_allocator.free(...)` site.
- build.zig: add `spectests.step.dependOn(&run_spectest_generate.step)`.
  Without it, the test-binary compile and the generator race when zig
  fans out parallel build steps, intermittently producing
  "file contents changed during update" or `FileNotFound` on
  generated/index.zig. The new edge makes `zig build spectest`
  idempotent.

Pre-existing merge fallout cleaned up while here (chain.zig):

- BeamChain.init was missing the `pending_blocks` field
  initialisation that main's #788 pending-queue refactor added — fill
  it with `.empty`.
- HEAD's `processFinalizationAdvancement: below PRUNE_NODE_THRESHOLD
  keeps pre-finalized ancestors` test (preserved from the merge) used
  the zig 0.15 `Dir.realpathAlloc` API and the pre-refactor
  `std.StringHashMap(PeerInfo)` for ConnectedPeers. Port both call sites
  to the current API (`Dir.realPathFileAlloc(io, ".", gpa)` and
  `network.ConnectedPeers`).

Verification:
- `zig build spectest` — All 513 tests passed (idempotent across
  re-invocations).
- `zig build test` — All 14 test groups pass (251 unit tests).

* forkchoice: drop lazy proto-array prune (PR #754 extra scope rollback)

PR #754 introduced a `PRUNE_NODE_THRESHOLD = 64` gate in
`chain.processFinalizationAdvancement` that skipped the post-finalization
`forkChoice.rebase` call until at least 64 pre-finalized nodes sat
before the finalized anchor in proto-array. The intent was to widen the
grace window for in-flight attestations whose `source` / `target` /
`head` still referenced an ancestor about to be pruned, where the eager
rebase would otherwise surface `Unknown{Source,Target,Head}Block` from
`validateAttestationData`.

On reflection that mitigation is heavier than the symptom it addresses
under current network conditions, and 3SF-mini already tolerates the
attestation-bounce. Roll the threshold gate back so finalization advance
unconditionally rebases proto-array — matches the behaviour
zeam shipped before PR #754 and keeps proto-array memory bounded by
finalization rather than by a node-count constant.

Changes:
  - pkgs/node/src/constants.zig: remove `PRUNE_NODE_THRESHOLD`.
  - pkgs/node/src/chain.zig:
      * `processFinalizationAdvancement` step 5 collapses back to the
        unconditional `if (pruneForkchoice) try
        self.forkChoice.rebase(latestFinalized.root, &canonical_view);`
        shape (incl. dropping the `getProtoNodeIndex` lookup + invariant-
        violation log path that only existed to gate the threshold).
      * Delete the
        `processFinalizationAdvancement: below PRUNE_NODE_THRESHOLD keeps
        pre-finalized ancestors` regression test — it was pinning the
        now-removed contract.

PR #754's other scope items remain in place:
  - GOSSIP_DISPARITY_INTERVALS attestation rule (spec alignment, leanSpec PR #682).
  - Split attestation validation (gossip interval-bound, block +1 slot).
The two further scope items (lazy proto-array prune, dropping zeam's
block-level future-slot rejection / time-based block queue) are
intentionally NOT brought across.

Verification:
  - `zig build spectest` — All 513 tests passed.
  - `zig build test` — 14/14 test groups pass.

* spectest: drop duplicate `zig_snappy` entry from vendored snappyframesz

The vendored copy of `pkgs/third_party/snappyframesz/build.zig.zon`
carried two identical dependency entries — `.snappyz` and
`.zig_snappy` — with the same URL and hash. The library's `build.zig`
only references the first one (`b.dependency("snappyz", ...)`); the
second is dead weight inherited from the upstream `v0.16.0` tip.

Drop it here so the vendored copy doesn't propagate the noise. The
same change is also stacked on the upstream PR that promotes
`v0.16.0` to `main` (blockblaz/snappyframesz#10) so the next sync
won't reintroduce it.

* deps: pin snappyframesz to upstream main; drop vendored copy

Upstream blockblaz/snappyframesz#10 merged the zig 0.16 upgrade + the
empty-stream decode fix into main. Pin e95759d9 and delete
pkgs/third_party/snappyframesz/.

* deps: actually pin snappyframesz URL (build.zig.zon)

Fix-up for 6a1a813 — that commit staged the `git rm` of the
vendored copy but the `build.zig.zon` edit was never staged
(the chained `git add build.zig.zon pkgs/third_party` aborted on
the now-missing `pkgs/third_party` path before reaching the first
argument). CI on 6a1a813 failed with
`unable to open '.../pkgs/third_party/snappyframesz': FileNotFound`
because the manifest still pointed `.snappyframesz.path` there.

Now the manifest carries the
`git+https://github.com/blockblaz/snappyframesz#e95759d9...` pin that
6a1a813's commit message claimed.

* ci: rename `--fork=devnet` to `--fork=Lstar` in fixture-gen step

leanSpec PR #710 renamed the `devnet` fork to `lstar`; the CI step
`Generate LeanSpec fixtures` still passed `--fork=devnet` to
`uv run fill`, so on the bumped submodule the command exits with
`Unsupported fork for consensus layer: devnet` (exit 4) and the test
job is cancelled before zig tests run.

Switch to `Lstar` (the case that `fill` expects, matching the rest of
this branch's leanSpec interactions).

---------

Co-authored-by: Anshal Shukla <53994948+anshalshukla@users.noreply.github.com>
Co-authored-by: Parthasarathy Ramanujam <1627026+ch4r10t33r@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants