Skip to content

PQ-TLS for miner connections#572

Open
illuzen wants to merge 4 commits into
mainfrom
illuzen/miner-protocol-names
Open

PQ-TLS for miner connections#572
illuzen wants to merge 4 commits into
mainfrom
illuzen/miner-protocol-names

Conversation

@illuzen
Copy link
Copy Markdown
Contributor

@illuzen illuzen commented May 28, 2026

Post-Quantum TLS for Miner Connections

This PR upgrades the QUIC/TLS stack for miner connections to use post-quantum cryptography.

Changes

Dependencies (node/Cargo.toml):

  • quinn: 0.10 → 0.11
  • rustls: 0.21 → 0.23 with aws_lc_rs backend (required for ML-DSA support)
  • rcgen: 0.11 → 0.14 with aws_lc_rs_unstable feature
  • Added rustls-post-quantum for X25519MLKEM768 key exchange
  • Added sha2 for certificate fingerprint computation

Miner Server (node/src/miner_server.rs):

  • Certificate now uses ML-DSA-87 signatures (NIST FIPS 204)
  • Key exchange uses X25519MLKEM768 (hybrid classical + post-quantum)
  • Logs certificate fingerprint on startup for miners to use with --node-cert-fingerprint
  • TLS key persisted to miner_tls_cert.der / miner_tls_key.der alongside node key

Note: Due to a limitation in rcgen 0.14 (cannot reload ML-DSA keys from DER), the certificate fingerprint changes on each node restart. This will be fixed upstream.

Security Model

  • Miners connect over QUIC with TLS 1.3
  • Post-quantum key exchange protects against "harvest now, decrypt later" attacks
  • Certificate pinning (--node-cert-fingerprint) prevents MITM attacks for remote miners
  • Local miners (same machine) can skip verification safely

Example

Node startup logs:

Miner connection:

# Remote (with pinning)
quantus-miner serve --node-addr 1.2.3.4:9833 --node-cert-fingerprint sha256:a1b2c3...

# Local / trusted network
quantus-miner serve --node-addr 127.0.0.1:9833 --no-tls-verify

Note

High Risk
Changes TLS/crypto for miner connectivity and bumps quinn/rustls/rcgen; certificate fingerprints are unstable across restarts until rcgen supports ML-DSA key reload, affecting remote pinning workflows.

Overview
This PR upgrades QUIC/TLS for external miner connections to post-quantum algorithms and refreshes the surrounding dependency stack.

The node moves to quinn 0.11, rustls 0.23 (aws-lc-rs), rcgen 0.14, and rustls-post-quantum. The miner server now issues ML-DSA-87 certificates and negotiates X25519MLKEM768 via the PQ rustls provider, logs a sha256: cert fingerprint for pinning, and optionally writes TLS material next to the node key (miner_tls_cert.der / miner_tls_key.der). MinerServer::start takes an optional node_key_path derived from the network config in service.rs.

MiningRequest renames distance_threshold to difficulty (still a decimal U512 string); job broadcast and API docs align with that name. A small QPoW runtime API comment is updated to say “mining difficulty” instead of distance-threshold wording.

Operational caveat: rcgen cannot reload ML-DSA private keys from DER yet, so the logged fingerprint changes on each restart even when cert files exist—documented in logs and module comments.

Reviewed by Cursor Bugbot for commit b0a543c. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e691633. Configure here.

Comment thread node/src/miner_server.rs Outdated
.map_err(|e| format!("Failed to set protocol versions: {}", e))?
.with_no_client_auth()
.with_single_cert(cert_chain, key)
.map_err(|e| format!("Failed to create server config: {}", e))?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALPN protocol setting dropped during TLS refactor

High Severity

The old code set server_config.alpn_protocols = vec![b"quantus-miner".to_vec()] before creating the QUIC config, but the new rustls::ServerConfig is created and immediately consumed by QuicServerConfig::try_from() without setting ALPN. The QUIC specification (RFC 9001) requires ALPN, and the project documentation (MINING.md) explicitly specifies the ALPN identifier as quantus-miner. Miners connecting with ALPN set will fail the TLS handshake since the server advertises no application protocols.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e691633. Configure here.

Comment thread node/src/miner_server.rs
@n13
Copy link
Copy Markdown
Collaborator

n13 commented May 28, 2026

I have a complete picture. Here is my review of PR #572.

Review: PR #572 — PQ-TLS for miner connections

A really cool direction — moving the miner control plane to ML‑DSA‑87 + X25519MLKEM768 is a great forward‑looking move and the rationale (harvest‑now‑decrypt‑later) is sound. That said, there are a few blockers and several rough edges that should be cleaned up before this goes in.

Verdict: Request changes

The PR has one definite crash‑on‑startup‑style bug for any miner that respects the documented protocol, a failing CI check, and a chunk of dead code that needs cleanup. The cryptographic direction is right; the implementation needs another pass.


Blockers

1. ALPN identifier quantus-miner is dropped — miner handshakes will fail

This is the same issue Bugbot flagged, and it’s real. The previous code set ALPN before wrapping into Quinn:

let cert_chain = vec![rustls::Certificate(cert_der)];
let key = rustls::PrivateKey(key_der);

let mut server_config = rustls::ServerConfig::builder()
    .with_safe_defaults()
    .with_no_client_auth()
    .with_single_cert(cert_chain, key)
    .map_err(|e| format!("Failed to create server config: {}", e))?;

// Set ALPN protocol
server_config.alpn_protocols = vec![b"quantus-miner".to_vec()];

let mut quinn_config = quinn::ServerConfig::with_crypto(Arc::new(server_config));

In the new code, the rustls::ServerConfig is let (not let mut) and gets immediately fed into QuicServerConfig::try_from(server_config) without ever touching alpn_protocols. RFC 9001 requires ALPN, and MINING.md line 719 still documents quantus-miner as the protocol identifier. Any compliant miner — including the one the same PR description tells operators to run — will fail the TLS handshake.

Fix:

let mut server_config =
    rustls::ServerConfig::builder_with_provider(Arc::new(rustls_post_quantum::provider()))
        .with_safe_default_protocol_versions()
        .map_err(|e| format!("Failed to set protocol versions: {}", e))?
        .with_no_client_auth()
        .with_single_cert(cert_chain, key)
        .map_err(|e| format!("Failed to create server config: {}", e))?;

server_config.alpn_protocols = vec![b"quantus-miner".to_vec()];

2. CI is red — node/Cargo.toml is not taplo‑formatted

🏁 Fast Checks (Format) is failing. I reproduced it locally and the diff is small:

-rustls = { version = "0.23", default-features = false, features = ["logging", "aws_lc_rs", "std", "tls12"] }
+rustls = { version = "0.23", default-features = false, features = ["aws_lc_rs", "logging", "std", "tls12"] }
 rustls-pki-types = "1"
 rustls-post-quantum = { version = "0.2", features = ["aws-lc-rs-unstable"] }
-sha2 = { workspace = true, default-features = true }
 sc-basic-authorship.default-features = true
 ...
 serde_json = { workspace = true, features = ["alloc"] }
+sha2 = { workspace = true, default-features = true }

Two issues: feature array not alphabetised on rustls, and sha2 is in the wrong position (it sorts after serde_json, not in the middle of the rustls-* block). cargo +nightly fmt --all -- --check may or may not also have something to say once this passes — the build matrix and clippy are currently skipping because format failed.


High‑severity issues

3. load_or_generate_mldsa_key is largely dead code and the docstring lies

Bugbot caught one half of this. Looking at the implementation:

fn load_or_generate_mldsa_key(
    node_key_path: &PathBuf,
) -> Result<(rcgen::KeyPair, Option<Vec<u8>>), String> {
    let cert_path = node_key_path.with_file_name("miner_tls_cert.der");
    let key_path = node_key_path.with_file_name("miner_tls_key.der");

    // Always generate a fresh keypair (rcgen bug: can't reload ML-DSA keys)
    let key_pair = rcgen::KeyPair::generate_for(&rcgen::PKCS_ML_DSA_87)...;

    if cert_path.exists() && key_path.exists() {
        // ... "This WON'T work" ... "Actually, we need to regenerate" ...
    }
    ...
    Ok((key_pair, None))
}

Problems compounded here:

  • The return type is (KeyPair, Option<Vec<u8>>) but the second element is always None, and the caller binds it to _stored_cert and immediately discards it. Delete the Option<Vec<u8>> from the signature.
  • The doc comment claims it returns Some when loaded from storage. It doesn’t.
  • The block under if cert_path.exists() && key_path.exists() is purely stream‑of‑consciousness narration ("This WON'T work", "Actually, we need to regenerate") that belongs in a commit message, not the source tree.
  • The function is named load_or_generate_… but it never loads — it always generates. Rename to generate_mldsa_key (or actually wire up the load path; see New consensus algo #5).
  • Similarly, MinerServer::start’s new docstring claims node_key_path causes "the TLS certificate key [to be] derived from it for deterministic fingerprints" — also untrue, the path is only used as the directory prefix for the (non‑reloaded) cert/key files. Either fix the doc or implement what it says.

4. Silent error on save_cert_der violates the project’s fail‑early rule

// Save certificate for reference
if let Some(ref key_path) = node_key_path {
    let _ = save_cert_der(key_path, cert_der.as_ref());
}

Throwing the result away means a permission problem, full disk, or path collision is invisible to the operator. Either propagate the error (the rest of this function already uses ?) or log::warn! at minimum. This is one of the user rules in the repo ("never have a silent failure"), so this should be a hard rule.

5. Fingerprint pinning is largely defeated by the rotation‑on‑restart behaviour

The PR description acknowledges this but I want to flag how bad the operational story is:

  • Every node restart prints a new sha256:... fingerprint.
  • Every remote miner using --node-cert-fingerprint (the secure path the PR is selling) must be reconfigured every time the node restarts.
  • Otherwise miners silently fall back to --no-tls-verify, which then makes the whole rustls/aws‑lc‑rs/ML‑DSA stack a very expensive way to spell "trust on first use with no continuity".

A few options worth considering before merging:

  • Pin to an upstream rcgen issue/PR in the source comment (and ideally the PR body) so future maintainers know what to track.
  • Until rcgen 0.14 can reload, consider keeping the ed25519 / classical TLS path on disk for cert persistence and only upgrading the kex (rustls-post-quantum::provider() gives you X25519MLKEM768 with any cert algorithm). That preserves stable fingerprints today, gives you HNDL protection for the data, and you upgrade the cert algorithm in a follow‑up when rcgen catches up.
  • Or document the impact prominently in MINING.md and provide a script that re‑reads the fingerprint and SIGHUPs miners.

The PR is silently shipping a regression in usability of --node-cert-fingerprint that is worth being explicit about with operators.


Medium‑severity

6. Wire‑format break: MiningRequest.distance_thresholddifficulty

The rename is semantically correctqpow_math::is_valid_nonce already takes a difficulty: U512 and the runtime API returns get_difficulty(), so the old name was a historical misnomer. But this is still a backward‑incompatible JSON field rename on the QUIC protocol:

  • quantus-miner-api is at version = "0.1.0" (miner-api/Cargo.toml). It needs a bump (0.2.0 minimum given 0.x semver) and a CHANGELOG note.
  • MINING.md (line 574) still shows distance_threshold in the data‑type table. Update it (and line 485’s parenthetical) as part of this PR.
  • The companion repo Quantus-Network/quantus-miner needs a coordinated release, otherwise serde will reject incoming NewJob payloads (field is required, not #[serde(default)]). Confirm the corresponding miner PR is queued.
  • Consider serde‑level back‑compat for one release: #[serde(alias = "distance_threshold")] on difficulty lets old miners keep working until they upgrade.

7. &PathBuf should be &Path (clippy::ptr_arg)

fn load_or_generate_mldsa_key(
    node_key_path: &PathBuf,

Same for save_cert_der. Clippy will flag this once it runs (it’s currently skipping because of #2). Cheap fix.

8. Unused _stored_cert and dead disk I/O

miner_tls_key.der is written every startup but never read. miner_tls_cert.der is also written but never read. Given #3, all of this disk I/O is purely cosmetic and increases the attack surface (private key sitting on disk for no reason). Either:

  • Implement actual reload once rcgen supports it, or
  • Don’t persist the key at all until then — log the fingerprint, that’s enough for operators to copy.

The PR currently has the worst of both worlds: persistent state with no persistence semantics.

9. Native build dependencies (aws‑lc‑sys → cc + cmake)

aws-lc-rs pulls in aws-lc-sys which requires cc and cmake at build time. Worth checking:

  • Docker build images include cmake (the ghcr.io/quantus-network/quantus-node Dockerfile).
  • Cross‑compilation targets (Windows MSVC, ARM Linux, macOS arm64 — all mentioned in MINING.md) all build cleanly.
  • The release pipeline build times don’t regress badly — aws‑lc‑sys is a significant compile.

This is operational risk, not a correctness issue, but worth verifying before merge.


Low‑severity / polish

  • tls12 feature on rustls: QUIC is TLS 1.3 only (RFC 9001). The tls12 feature is just dead weight in this binary. Drop it from node/Cargo.toml, or use .with_protocol_versions(&[&rustls::version::TLS13]) to be explicit.
  • rustls-pki-types = "1": bare‑major version pin is fine here since the workspace doesn’t use this elsewhere, but version = "1" with default-features = false would be a touch tighter.
  • rustls-post-quantum = { version = "0.2", features = ["aws-lc-rs-unstable"] }: note the underscore‑vs‑hyphen inconsistency vs. rcgen’s aws_lc_rs_unstable. Both are correct (they’re feature names on different crates), just an FYI when grepping.
  • No tests added: zero coverage for compute_cert_fingerprint, save_cert_der, or any of the new TLS plumbing. A trivial unit test for compute_cert_fingerprint (golden fingerprint for a known DER input) would catch any future "we changed the hex encoding" type breakage.
  • Pre‑existing .unwrap() on address parsing in create_server_endpoint survives the refactor unchanged. Not your bug, but if you’re in the file anyway…

Suggested commit plan

If I were merging this incrementally I would split into three small commits on top of what’s there:

  1. Fix ALPN + format (unblock CI and miners): re‑add server_config.alpn_protocols, run taplo, drop tls12, change &PathBuf&Path.
  2. Clean up the ML‑DSA load/generate function: drop the Option<Vec<u8>> return, drop the dead if exists branch, rename to generate_mldsa_key, replace the silent let _ = save_cert_der(...) with proper error logging, fix the misleading docstrings on both start and the helper.
  3. Coordinate miner-api bump and update MINING.md to use difficulty everywhere. Optionally add #[serde(alias = "distance_threshold")] for a one‑release deprecation window.

Then the cert‑fingerprint stability story is the only "do it later but track it" item.

Nice work going post‑quantum on the data plane — once the ALPN regression and the cleanup are sorted this is a solid upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants