Skip to content

Harden block validation and snapshot load paths#299

Open
heifner wants to merge 5 commits into
masterfrom
feature/block-validation-hardening
Open

Harden block validation and snapshot load paths#299
heifner wants to merge 5 commits into
masterfrom
feature/block-validation-hardening

Conversation

@heifner
Copy link
Copy Markdown
Contributor

@heifner heifner commented Apr 17, 2026

Summary

  • Close block-2 validation gap: remove early-return in verify_basic_proper_block_invariants, use core.latest_qc_claim() as authoritative QC-claim reference
  • Factor finalizer_policy::validate() / proposer_policy::validate() as shared structural-validation methods reused by intrinsics and snapshot loading
  • Extract validate_s_root_extensions_match() helper from inline 39-line nested-lambda loop in apply_block
  • Add duplicate-detection in check_protocol_features for within-block activation list
  • Reorder qc_t::verify_basic so pending verify_vote_format runs before verify_dual_finalizers_votes
  • Snapshot null-pointer guards run before member-init-list dereferences
  • 40+ new unit tests (finalizer/producer intrinsic negatives, merkle async-determinism, s_root_extension matching, signature-recovery dedup, snapshot invariant checks)

- Remove the early-return gate in verify_basic_proper_block_invariants
  that skipped all QC-claim and block_extensions checks for block 2;
  switch the QC-claim reference from prev.header.qc_claim to the
  authoritative prev.core.latest_qc_claim(), with a new snapshot
  invariant asserting their equality for non-genesis blocks.

- Factor finalizer_policy::validate() and proposer_policy::validate()
  as shared structural-validation methods (uniqueness, bounds, BFT
  threshold) reused by both the set_finalizers / set_proposed_producers
  intrinsics and snapshot loading. Closes drift risk between the two
  paths. Snapshot null-pointer guards now run before member-init-list
  code that dereferences active_finalizer_policy.

- Extract validate_s_root_extensions_match() from the inline
  nested-lambda loop in apply_block into a named helper with 10
  dedicated unit tests.

- Add duplicate-detection in check_protocol_features for the
  within-block activation list.

- Reorder qc_t::verify_basic so pending verify_vote_format runs
  before verify_dual_finalizers_votes.

- Swap += to |= in block_header::calculate_id for clarity.

- Switch std::accumulate init to uint64_t{0} in
  max_weak_sum_before_weak_final.

- 40+ new unit tests across finalizer intrinsic negatives, producer
  schedule negatives, merkle async-determinism, signature-recovery
  dedup regression, s_root_extension matching, and snapshot invariant
  checks.
Base automatically changed from feature/kv-contract-migration to master April 20, 2026 19:48
}

// Now safe to compute state that depends on active_finalizer_policy.
strong_digest = compute_finality_digest();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This computes strong_digest from snapshot state before validating core at block_state.cpp (line 224), and before checking activated_protocol_features. compute_finality_digest() can reach core.latest_qc_claim() and compute_base_digest(), which asserts then dereferences activated_protocol_features at block_header_state.cpp (line 32). A malformed snapshot with a null activated_protocol_features pointer or invalid/empty finality core can still assert/segfault before the intended snapshot_exception. Move core.validate_snapshot() and an explicit activated_protocol_features null check before digest computation, and add regression coverage for both cases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: da3daa7

heifner added 2 commits May 21, 2026 12:32
…atch_rejected

controller::head() returns block_handle by value; binding the get_bsp() reference
directly to the temporary leaves live_bsp pointing at a destroyed shared_ptr.
gcc happened to leave the stack intact; clang/ASan/UBSan/asserton reused it and
the BOOST_REQUIRE on core.latest_qc_claim() == header.qc_claim read garbage.

Bind the head to a named local first (same pattern as finality_proof.hpp:315).
…ing digests

compute_finality_digest() reaches core.latest_qc_claim() and core.get_block_reference()
(both assert non-empty / in-bounds), then calls compute_base_digest() which dereferences
activated_protocol_features. A tampered snapshot planting nulls or a malformed core
would assert/segfault before the ctor could throw snapshot_exception.

Add an explicit activated_protocol_features null-check to the null-guard group, and move
core.validate_snapshot() ahead of the digest computation. Cover both with regression tests.
@heifner heifner requested a review from huangminghuang May 21, 2026 17:59
Comment thread libraries/chain/block_state.cpp Outdated
validate_prop_pol(*latest_pending_proposer_policy, "latest_pending");
}

// Snapshot hardening: validate valid_t
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only validates valid_t when valid is present, but it never rejects valid == std::nullopt for a non-genesis snapshot. The new comment in block_state.cpp (line 365) says null valid is legitimate only for the Savanna genesis core, yet the snapshot constructor accepts a tampered non-genesis snapshot with sbs.valid.reset(). In release builds, get_validation_mroot() then returns an empty digest after an assert(core.is_genesis_core()), so the malformed snapshot is detected too late or only indirectly. Add a constructor check like SYS_ASSERT(valid || core.is_genesis_core(), snapshot_exception, ...), plus a regression test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: b8c9cbe

The snapshot block_state ctor validated valid_t only when it was present, so a tampered or truncated snapshot carrying a null valid passed construction. get_validation_mroot() would then return an empty digest after a debug-only assert(core.is_genesis_core()), so in release builds the corruption surfaced late and indirectly as a finality_mroot mismatch in apply_block.

A snapshot is never taken at genesis, so every snapshot block_state must carry a valid_t finality structure. Promote the check to an unconditional SYS_ASSERT that throws snapshot_exception, and add a regression test.
@heifner heifner requested a review from huangminghuang May 22, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants