Skip to content

feat: pluto create cluster#305

Open
varex83 wants to merge 3 commits intomainfrom
bohdan/cmd-create-cluster
Open

feat: pluto create cluster#305
varex83 wants to merge 3 commits intomainfrom
bohdan/cmd-create-cluster

Conversation

@varex83
Copy link
Copy Markdown
Collaborator

@varex83 varex83 commented Mar 31, 2026

Summary

Implements the initial pluto create cluster command and wires it into the CLI.

This adds local cluster artifact generation, including cluster definitions/locks, validator key material, deposit data, and node-specific setup, along with supporting validation and error handling for network, threshold, keymanager, and withdrawal address configuration.

Testing

To keep PR scope smaller, tests will come in a separate PR.

@iamquang95
Copy link
Copy Markdown
Collaborator

@claude

@claude
Copy link
Copy Markdown

claude bot commented Apr 2, 2026

Claude finished @iamquang95's task in 3m 16s —— View job


Code Review: feat: pluto create cluster

  • Read changed files
  • Review create_cluster.rs
  • Review error.rs
  • Review create_dkg.rs and CLI wiring
  • Review cluster helpers changes
  • Post review feedback

Overall this is a solid port of the Go create cluster command. The structure is clean, error types are well-organized, and the happy path looks correct. A number of issues are worth addressing before merge, ranging from a memory leak to logic bugs.


Bugs / Correctness Issues

1. Double signature + hash verification when using --definition-file

load_definition already calls verify_signatures + verify_hashes (lines 1216-1217), but validate_definition repeats both (lines 1045-1047). Beyond wasted work, this means two RPC calls to the execution client for every definition-file run.

Fix this →


2. detect_node_dirs is skipped when using --definition-file

validate_create_config (line 1080) calls detect_node_dirs(args.nodes.unwrap_or(0)). When only --definition-file is supplied, args.nodes is None, so 0 is passed and no directories are checked. The real node count is only set at line 426 from def.operators.len() — after the config validation has already run. This means the "node directory already exists" pre-flight check is silently skipped for definition-file usage.


3. validate_threshold uses args.nodes.unwrap_or(0) as node count

If a user passes --threshold 5 --definition-file ./def.json (but no --nodes), number_of_nodes evaluates to 0, so threshold (5) > number_of_nodes (0) is true and the command fails with ThresholdTooHigh. The threshold check should either be deferred until after the definition is loaded, or be guarded when no node count is available.

Fix this →


4. Misleading error variant for u64 → usize cast failure

validate_addresses (line 1237) uses CreateClusterError::ValueExceedsU8 when num_validators doesn't fit in usize. The value being checked has nothing to do with u8; the error name and message will confuse users on 32-bit targets (the only realistic trigger). Consider a dedicated ValueExceedsUsize variant or using TryFrom with a custom message.


Memory Issues

5. Box::leak in From<TestnetConfig> for network::Network (create_cluster.rs:336–354)

Box::leak causes permanent memory allocation that cannot be freed. If this conversion is ever called more than once (e.g., in tests or future interactive usage), memory accumulates. Consider using Arc<str>, Cow<'static, str>, or changing network::Network to own String fields.

Fix this →


Code Quality / Design

6. Duplicate agg_sign function

create_cluster.rs:1349 defines a private fn agg_sign(share_sets, message) -> Result<Vec<u8>>.
cluster/helpers.rs:173 defines pub fn agg_sign(secrets, message) -> Result<Signature> with the identical algorithm.
The CLI version wraps the result in Vec<u8> but otherwise does the same thing. The cluster helper should be reused directly (calling .to_vec() on the result), eliminating the duplication.

Fix this →


7. Redundant double check in create_deposit_datas / sign_deposit_datas

create_deposit_datas checks secrets.len() != withdrawal_addresses.len() at line 808, then immediately calls sign_deposit_datas which repeats the identical check at line 831. The inner check is unreachable and should be removed.


8. Broken keymanager URL check — message says "https", check requires "http"

create_cluster.rs:1100:

if keymanager_url.scheme() != HTTP_SCHEME {
    return Err(CreateClusterError::InvalidKeymanagerUrlScheme { addr: addr.clone() });
}

The error message reads "Keymanager URL does not use https protocol" (error.rs:173), but the check enforces plain HTTP — HTTPS is rejected. The code comment at line 169 already acknowledges this as a possible Go-parity bug. This should be clarified and the error message fixed regardless. Fix this →


9. No rollback on partial keymanager import failure

write_keys_to_keymanager verifies all connections first (lines 748-756) and then imports per-node. If import succeeds for node 0 but fails for node 1, the cluster ends up in an inconsistent state (some nodes have keys, others don't) with no cleanup or error recovery. At minimum this should be documented as a known limitation, and ideally a warning should be emitted identifying which nodes were successfully imported.


10. Module coupling — validate_withdrawal_addrs belongs in create_dkg.rs

create_cluster.rs:52 imports validate_withdrawal_addrs from create_dkg. This creates an awkward coupling where create_cluster depends on create_dkg. Since both commands use the same validation, it should live in a shared location (e.g., a validation.rs module, or moved into create_cluster with create_dkg importing from there if ordering allows).


Minor / Nits

11. Stale "charon" branding in user-facing strings

Several strings still say "charon" where "pluto" is appropriate:

  • create_cluster.rs:1409: "Created charon cluster:""Created pluto cluster:"
  • create_cluster.rs:1421: "charon-enr-private-key\tCharon networking..." → use "pluto"
  • create_cluster.rs:1464–1468: split-keys warning refers to "charon cluster"
  • cli.rs:142–143: description strings reference "local charon cluster" and "charon p2p keys"

Fix this →


12. write_lock_to_api passes lock.clone() unnecessarily

publish_lock(lock.clone()) at line 615 clones the entire lock struct. If publish_lock accepts a &Lock or Lock by value, a borrow would be sufficient. Minor allocation concern.


Summary table

# Severity Area
1 Medium Double verify in definition-file path
2 Medium Node dir check skipped for definition-file
3 Medium validate_threshold wrong when --nodes absent
4 Low Misleading ValueExceedsU8 error variant
5 Medium Box::leak memory leak in From<TestnetConfig>
6 Low Duplicate agg_sign
7 Low Redundant double check
8 Medium HTTP/HTTPS check vs error message mismatch
9 Low No rollback on partial keymanager import
10 Low Module coupling
11 Low Stale "charon" branding
12 Trivial Unnecessary clone

@emlautarom1 emlautarom1 linked an issue Apr 3, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Collaborator

@emlautarom1 emlautarom1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall OK, but there are a couple of problems:

  • Missing tracing initialization
  • The validation of arguments is done in a different order, potentially causing issues
  • There is a missing address checksum
  • A check in Charon that results in a warning is a hard error here.

The rests are nits or QoL changes that don't fundamentally affect the code.


/// Errors that can occur in the Pluto CLI.
#[derive(Error, Debug)]
#[allow(dead_code)]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary attribute.

process::{ExitCode, Termination},
};

use pluto_eth2util as eth2util;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need for these kind of aliases, makes it a bit harder to follow the code.

Comment on lines +332 to +356
impl From<TestnetConfig> for network::Network {
fn from(config: TestnetConfig) -> Self {
network::Network {
chain_id: config.chain_id.unwrap_or(0),
name: Box::leak(
config
.testnet_name
.as_ref()
.unwrap_or(&String::new())
.clone()
.into_boxed_str(),
),
genesis_fork_version_hex: Box::leak(
config
.fork_version
.as_ref()
.unwrap_or(&String::new())
.clone()
.into_boxed_str(),
),
genesis_timestamp: config.genesis_timestamp.unwrap_or(0),
capella_hard_fork: "",
}
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the Box::leak is fine here since we only need it once, but I'd rather make changes in pluto_eth2util::network::Network to support non static strings.

Comment on lines +905 to +908
let files = load_files_unordered(&split_keys_dir).await?;
Ok(files.sequenced_keys()?)
} else {
let files = load_files_recursively(&split_keys_dir).await?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary refs here

Suggested change
let files = load_files_unordered(&split_keys_dir).await?;
Ok(files.sequenced_keys()?)
} else {
let files = load_files_recursively(&split_keys_dir).await?;
let files = load_files_unordered(split_keys_dir).await?;
Ok(files.sequenced_keys()?)
} else {
let files = load_files_recursively(split_keys_dir).await?;

Comment on lines +1344 to +1349
/// Aggregates BLS signatures over `message` produced by every private share
/// across all validators, mirroring Go's `aggSign`.
///
/// `share_sets` — outer dimension is per-validator, inner is per-node private
/// key share.
fn agg_sign(share_sets: &[Vec<PrivateKey>], message: &[u8]) -> Result<Vec<u8>> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment and parameter names are a bit confusing, they deviate from Charon unnecessarily.

Comment on lines +567 to +568
let node_sig =
pluto_k1util::sign(op_key, &lock.lock_hash).map_err(CreateClusterError::K1UtilError)?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary map_err

Comment on lines +1383 to +1391
tokio::fs::write(&lock_path, &bytes)
.await
.map_err(CreateClusterError::IoError)?;

let perms = std::fs::Permissions::from_mode(0o400);
tokio::fs::set_permissions(&lock_path, perms)
.await
.map_err(CreateClusterError::IoError)?;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary map_err

.await
.map_err(CreateClusterError::IoError)?;

let perms = std::fs::Permissions::from_mode(0o400);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could use set_readonly here but it's fine either way. We should try be consistent throughout the codebase though.

Ok(())
}

fn write_split_keys_warning(w: &mut dyn Write) -> Result<()> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary map_err in multiple places.

@emlautarom1
Copy link
Copy Markdown
Collaborator

I'd suggest also taking a look at Claude's findings, some of them are relevant.

fork_version,
pluto_cluster::definition::Creator::default(),
operators,
args.deposit_amounts.clone(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eth2util::deposit::eths_to_gweis(&args.deposit_amounts)

args.deposit_amounts is ETH, not gwei

pluto-core.workspace = true
pluto-p2p.workspace = true
pluto-eth1wrap.workspace = true
pluto-eth2api.workspace = true
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no usage

reqwest.workspace = true
url.workspace = true
chrono.workspace = true
uuid.workspace = true
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no usage

url.workspace = true
chrono.workspace = true
uuid.workspace = true
flate2.workspace = true
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no usage

chrono.workspace = true
uuid.workspace = true
flate2.workspace = true
tar.workspace = true
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement charon create cluster

3 participants