Skip to content

switch envs, resources are shared on multiple envs and are exclusive#1303

Open
dnsi0 wants to merge 35 commits intomainfrom
feat/adapt-node-envs
Open

switch envs, resources are shared on multiple envs and are exclusive#1303
dnsi0 wants to merge 35 commits intomainfrom
feat/adapt-node-envs

Conversation

@dnsi0
Copy link
Copy Markdown
Contributor

@dnsi0 dnsi0 commented Mar 27, 2026

Fixes #1280 .

Changes proposed in this PR:

  • Multiple environments per engine: Each Docker engine now supports multiple compute environments, each with its own full resource definitions (cpu, ram, disk, GPUs), fees, access controls, and free-tier config.
  • Per-environment exclusive resources with global validation: CPU, RAM, and disk are exclusive per environment, inUse is tracked only within the environment where the job runs. A dual validation ensures both the target environment has capacity and the global aggregate across all environments doesn't exceed physical limits. GPUs are shared-exclusive (if gpu0 is used in envA, it shows in-use on envB too). CPU cores are hard-partitioned per environment via explicit cpuCores arrays with overlap validation at config time.
  • Removed engine-level resource config: All resource definitions moved from C2DDockerConfig to per-environment C2DEnvironmentConfig. No more engine-wide resources, fees, access, storageExpiry, maxJobDuration, minJobDuration, maxJobs, or free fields on the docker config — these are now per-environment.
  • New C2DEnvironmentConfig interface: Added with resources: ComputeResource[], cpuCores: number[], and all
    environment-specific settings.
  • Zod schema updates: Per-environment validation (fees/free required, storageExpiry >= maxJobDuration), CPU core overlap validation across environments within a cluster.
  • Simplified engine initialization: Removed cpuOffset cascading between engines, removed buildEnvResources()/resolveEnvResources() merging logic — environments directly own their resources. CPU and RAM defaults are still auto-detected from Docker sysinfo when not configured.
  • Auto-create benchmark environment: Added enableBenchmark flag that, when enabled, automatically generates a benchmark compute environment at startup using the system's physical resources (CPU, RAM, disk, GPUs)

@dnsi0 dnsi0 self-assigned this Mar 27, 2026
@dnsi0 dnsi0 force-pushed the feat/adapt-node-envs branch from f67b833 to f6422f2 Compare March 30, 2026 08:35
@dnsi0 dnsi0 marked this pull request as ready for review March 30, 2026 10:38
@dnsi0 dnsi0 marked this pull request as draft March 30, 2026 10:39
@dnsi0
Copy link
Copy Markdown
Contributor Author

dnsi0 commented Mar 30, 2026

/run-security-scan

Copy link
Copy Markdown
Member

@alexcos20 alexcos20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI automated code review (Gemini 3).

Overall risk: medium

Summary:
This pull request introduces a significant architectural change to the Compute-to-Data (C2D) component, allowing for multiple, distinct compute environments within a single Docker cluster. The configuration for compute environments is now hierarchical, enabling granular control over resources, fees, access, and job durations for each environment. It also implements global resource availability checks to prevent over-allocation across environments and refactors CPU core allocation to be environment-specific. This is a substantial improvement in flexibility and resource management, but it introduces a breaking change in the configuration structure.

Comments:
• [ERROR][other] This PR introduces a breaking change to the dockerComputeEnvironments configuration structure. Previously, parameters like storageExpiry, maxJobDuration, fees, and resources were directly under the C2DDockerConfig object. Now, they must be nested within an environments array. While the .env.example and config.json files are updated, existing deployments will require manual configuration migration. Please ensure this breaking change is clearly documented in the release notes with migration instructions.
• [INFO][style] Good catch correcting the typo from 1 hours to 1 hour in the paymentClaimInterval comment.
• [INFO][performance] The addition of envResourceMap for efficient resource lookup is a good optimization, especially with the introduction of multiple environments. This avoids repeated array iteration.
• [WARNING][bug] The physicalLimits map correctly initializes for 'cpu' and 'ram' based on sysinfo. However, there's no corresponding initialization for 'disk'. While the checkGlobalResourceAvailability method for disk will sum total values from configured environments, it won't necessarily reflect the actual physical disk capacity of the host unless manually configured to match. Is there a plan to automatically detect physical disk capacity or enforce a configuration check to ensure the sum of disk.total across environments does not exceed the physical limit?
• [INFO][style] The ComputeEnvironmentFreeOptionsSchema now includes minJobDuration. This is consistent with the paid environment options. Good to ensure consistency here.
• [INFO][other] The validation rule environments: z.array(C2DEnvironmentConfigSchema).min(1) ensures that at least one compute environment is configured within a Docker cluster. This is a sensible default and prevents misconfigurations.
• [INFO][other] The environment ID fallback to String(envIdx) if not explicitly provided is functional for ensuring uniqueness. For user-friendliness and traceability in logs/monitoring, it might be beneficial to encourage explicit id definitions in the configuration, or generate a more descriptive ID based on other environment properties if id is missing.

@dnsi0
Copy link
Copy Markdown
Contributor Author

dnsi0 commented Mar 30, 2026

/run-security-scan

Copy link
Copy Markdown
Member

@alexcos20 alexcos20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI automated code review (Gemini 3).

Overall risk: medium

Summary:
This pull request introduces a major architectural refactoring of the compute environment configuration, allowing a single compute node to host multiple distinct compute environments with individual resource definitions, fee structures, and access controls. It moves core environment settings into a nested environments array within the Docker compute configuration. Key improvements include granular per-environment resource management (CPU, RAM, disk), a new global resource availability check to prevent over-allocation, and robust Zod schema validation for the new configuration structure. This is a significant enhancement for compute providers offering varied services.

Comments:
• [INFO][other] The updated example for DOCKER_COMPUTE_ENVIRONMENTS clearly demonstrates the new nested environments array structure. This is a breaking change for existing deployments, so clear migration notes will be essential.
• [INFO][other] The introduction of C2DEnvironmentConfig and the modification of C2DDockerConfig correctly reflect the new multi-environment architecture. This provides much-needed flexibility.
• [INFO][other] The fix from '1 hours' to '1 hour' for paymentClaimInterval comment is a minor but welcome detail correction.
• [INFO][performance] The envResourceMap creation is a good optimization for resource lookups within an environment, improving readability and potentially performance in resource-intensive loops.
• [INFO][other] The new logic distinguishing between shared-exclusive (GPU) and per-env exclusive (CPU, RAM, disk) resource tracking is crucial for accurate multi-environment resource management. This addresses a common challenge in container orchestration.
• [INFO][security] The checkGlobalResourceAvailability method is a critical addition for preventing over-allocation of physical resources across multiple logical compute environments. This enhances the stability and security of the compute node.
• [INFO][other] The processFeesForEnvironment method extracts and centralizes fee processing logic, which improves code organization and maintainability.
• [INFO][other] The start() method's refactoring to iterate over envConfig.environments and dynamically create ComputeEnvironment objects is the core of the multi-environment support. This is a robust implementation of the new design.
• [WARNING][performance] The cpuOffset was removed from the constructor and related logic for CPU allocation is now per-environment. While the new per-environment CPU allocation logic (using envCpuCoresMap) is correct, it's essential to confirm that the cpuOffset functionality for multiple physical Docker clusters (if that was ever a use case) hasn't been implicitly removed without replacement. Currently, C2DEngines iterates c2dClusters but C2DEngineDocker creates only for a single cluster. If there's a use case for multiple DOCKER clusters on one node and they need different CPU affinities, that might need re-evaluation.
• [INFO][performance] Passing envId to allocateCpus and using envCpuCoresMap for environment-specific CPU pinning is a good approach to ensure fair and isolated CPU resource allocation for jobs running in different configured environments.
• [INFO][other] The C2DEnvironmentConfigSchema with its refine clauses provides excellent validation for the new environment configurations. The mandatory disk resource and storageExpiry vs maxJobDuration checks are important for operational stability. The requirement for either fees or free configuration per environment is also sensible.
• [INFO][other] Adding minJobDuration to ComputeEnvironmentFreeOptionsSchema with a default of 60 seconds is a good consistency improvement. This ensures free jobs also have a minimum duration constraint.
• [WARNING][test] In some integration tests, paymentClaimInterval is now set on the DOCKER_COMPUTE_ENVIRONMENTS string at the top level of the Docker config, not within the nested environments array. This is consistent with the C2DDockerConfig definition, but it's important to ensure that this global paymentClaimInterval is respected and applies correctly to all nested environments, or if it should be an environment-specific setting.

@dnsi0 dnsi0 force-pushed the feat/adapt-node-envs branch from d4cb256 to a34a4f3 Compare March 31, 2026 07:16
@dnsi0
Copy link
Copy Markdown
Contributor Author

dnsi0 commented Mar 31, 2026

/run-security-scan

Copy link
Copy Markdown
Member

@alexcos20 alexcos20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI automated code review (Gemini 3).

Overall risk: medium

Summary:
This pull request introduces a significant architectural change by enabling support for multiple, distinct compute environments within a single C2D Docker cluster. Each environment can now define its own resources (CPU, RAM, disk, GPU), fee structures, access controls, and free tier configurations. Key changes include a revised configuration structure, updated TypeScript types, granular resource management logic distinguishing between shared (GPU) and exclusive (CPU, RAM, disk) resources, and the introduction of global physical resource limits. CPU affinity is now managed per environment.

This is a major feature enhancement that improves the flexibility and scalability of compute offerings.

Comments:
• [INFO][other] The C2DDockerConfig interface has been significantly refactored. Previously, it contained properties like storageExpiry, maxJobDuration, fees, and resources directly. These are now moved into the new C2DEnvironmentConfig interface, and C2DDockerConfig now primarily holds connection details and an array of environments.

This is a breaking change to the configuration structure, which needs to be clearly communicated to users and administrators during upgrade.
• [INFO][other] The resource usage tracking logic has been enhanced to differentiate between gpu resources (shared-exclusive, tracked globally) and other resource types (cpu, ram, disk) which are per-environment exclusive. This is a crucial improvement for supporting heterogeneous environments and ensuring accurate resource allocation across multiple defined compute environments.
• [INFO][other] Introduction of physicalLimits and checkGlobalResourceAvailability is a critical addition. This ensures that even with multiple compute environments, the aggregated resource demands do not exceed the actual physical capacity of the host machine. This adds a robust layer of safety and prevents over-provisioning at a global level.
• [WARNING][performance] In the start() method, if statfsSync(this.getC2DConfig().tempFolder) fails to detect the physical disk size, physicalDiskGB defaults to 0. This could lead to diskResources.total being 0 for all environments, effectively making disk resources unavailable or severely constrained if the detection fails. While the CORE_LOGGER.warn helps, consider if a default fallback (e.g., a minimum reasonable disk size) or a more explicit failure during startup is preferable if disk detection is critical and might fail in some environments.
• [INFO][other] The CPU affinity logic now uses envCpuCoresMap to manage core allocations per environment, rather than a single envCpuCores array. This correctly isolates and assigns CPU cores based on the configuration of each specific compute environment, which is vital for multi-environment support.
• [INFO][style] The C2DDockerConfigSchema now requires environments to be an array with a minimum length of 1 (.min(1)). This implies that a Docker compute cluster must define at least one environment. This is a sensible design choice for clarity and preventing misconfigurations where a cluster might be defined but offers no compute environments.

@dnsi0 dnsi0 marked this pull request as ready for review March 31, 2026 07:27
@dnsi0
Copy link
Copy Markdown
Contributor Author

dnsi0 commented Apr 1, 2026

/run-security-scan

Copy link
Copy Markdown
Member

@alexcos20 alexcos20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI automated code review (Gemini 3).

Overall risk: medium

Summary:
This pull request introduces a significant refactoring of the compute-to-data (C2D) environment configuration, allowing for multiple compute environments per Docker cluster. Each environment can now define its own resources (CPU, RAM, disk, GPU), fees, and access controls. It also implements global physical resource limits and environment-specific CPU allocation. A new feature to automatically create a benchmark environment is added. Configuration schemas, environment variable parsing, and core compute logic have been updated to support these changes.

Comments:
• [INFO][other] The DOCKER_COMPUTE_ENVIRONMENTS configuration has undergone a major breaking change, now requiring an environments array. This needs thorough documentation for node operators, including migration steps if applicable, to avoid confusion and misconfigurations during upgrades.
• [WARNING][performance] The start method in C2DEngineDocker has grown quite complex, handling physical limit detection, benchmark environment setup, and iterating through multiple environment configurations. Consider breaking down this method into smaller, more focused private methods to improve readability and maintainability. For example, a _initializePhysicalLimits() and _createEnvironment(envDef, sysinfo, platform, consumerAddress, supportedChains).
• [INFO][bug] The statfsSync call for disk size is wrapped in a try/catch. If this fails for some reason, the disk physical limit will not be set, potentially leading to incorrect disk resource management if a fallback isn't provided. While a warning is logged, for critical resources like disk, a more robust fallback or an explicit error might be considered if disk detection is essential for operation.
• [WARNING][style] The sepoliaChainId and usdcToken for the benchmark environment are hardcoded here. It would be better practice to define these as constants in src/utils/config/constants.ts or make them configurable via environment variables, similar to BENCHMARK_MONITORING_ADDRESS, for better maintainability and flexibility across different network configurations.
• [INFO][bug] The .refine rule for mandatory 'disk' resource: if (!data.resources) return false causes the schema to fail if the resources array is omitted entirely from an environment configuration. This is likely intended behavior to ensure disk is always specified, but it's worth confirming that environments without an explicit resources array (relying on implicit defaults or auto-detection) are not meant to be valid.
• [INFO][style] The C2DDockerConfigSchema has a .min(1) requirement for the environments array. This ensures that at least one compute environment must be defined for each Docker cluster. This is a good constraint for operational stability.

@dnsi0 dnsi0 force-pushed the feat/adapt-node-envs branch from 7cd00ae to 95f1ef8 Compare April 2, 2026 07:17
@dnsi0
Copy link
Copy Markdown
Contributor Author

dnsi0 commented Apr 2, 2026

/run-security-scan

Copy link
Copy Markdown
Member

@alexcos20 alexcos20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI automated code review (Gemini 3).

Overall risk: medium

Summary:
This pull request introduces a major architectural change to the Compute-to-Data (C2D) module, refactoring the configuration to support multiple, distinct compute environments under a single Docker daemon. Previously, a Docker compute cluster was configured with a single set of resources and policies. Now, a Docker socket can host an array of environments, each with its own granular resource allocations (CPU, RAM, Disk, GPU), job duration limits, access controls, and fee structures.

Key changes include:

  • Multi-Environment Support: DOCKER_COMPUTE_ENVIRONMENTS now expects an array of environments for each Docker socket.
  • Granular Resource Management: Resources are defined per environment, including total, max, min, and type properties.
  • Global Resource Limits: A new system tracks physical limits for CPU, RAM, and Disk, performing global availability checks for non-GPU resources, in addition to per-environment checks. GPUs are treated as shared-exclusive resources.
  • Dynamic Benchmark Environment: A new enableBenchmark flag allows for an auto-generated benchmark environment with specific access rules and token/chain configurations.
  • Configuration Schema Updates: All configuration examples, CI/CD, and internal validation schemas are updated to reflect the new nested and array-based structure.
  • CPU Affinity: CPU core allocation is now managed per environment.

This is a significant improvement for flexibility and resource isolation within C2D setups.

Comments:
• [INFO][style] The C2DEnvironmentConfig interface looks good and correctly captures the new structure for individual compute environments. Consider adding JSDoc comments to each property to explain its purpose and expected values, especially for id, description, storageExpiry, minJobDuration, maxJobDuration, and maxJobs. This will greatly improve readability and maintainability for developers interacting with this type.
• [WARNING][bug] The logic if (!isSharedExclusive && !isThisEnv) continue is crucial for differentiating shared vs. exclusive resources. While gpu is hardcoded as shared-exclusive here, it might be beneficial to make isSharedExclusive a property of ComputeResource itself in the future if other resource types might also become shared. For now, this is acceptable, but something to keep in mind for future extensibility and configuration clarity.
• [WARNING][performance] In checkGlobalResourceAvailability, iterating allEnvironments for every resource check could potentially become a performance bottleneck if the number of environments or resource types grows significantly, or if resource checks are extremely frequent. While likely fine for current expected scale, consider if globalUsed and globalTotal for non-GPU resources could be cached and updated reactively when a job starts/ends, rather than re-calculating on every checkIfResourcesAreAvailable call. This might be an over-optimization for now, but worth noting for future scaling discussions.
• [INFO][other] The createBenchmarkEnvironment method is a good addition for development and testing. Ensure that the BENCHMARK_MONITORING_ADDRESS and token details (USDC_TOKEN, SEPOLIA_CHAIN_ID) are clearly documented as test-specific or developer-tooling-specific in relevant documentation (e.g., README.md, developer guides) to avoid confusion or accidental usage in production deployments.
• [WARNING][bug] The environment ID generation uses create256Hash(JSON.stringify(env.fees) + envIdSuffix). If env.fees can be null or undefined (which it can be if not configured), JSON.stringify(null) results in the string 'null'. This might lead to inconsistent or unexpected hash values if the presence or absence of fees is not consistently represented in the JSON stringification. Consider explicitly coercing env.fees to an empty object {} if it's not provided, to ensure hashing consistency.
• [INFO][style] The C2DEnvironmentConfigSchema includes several .refine calls for validation. This is excellent for enforcing complex configuration rules. Please ensure that the error messages provided in these .refine calls are sufficiently clear, user-friendly, and actionable for node operators who might encounter validation failures during configuration.
• [ERROR][bug] In C2DDockerConfigSchema, paymentClaimInterval is defined as z.number().int().optional(). However, the previous implementation in compute_engine_docker.ts (before the refactor in the start method, specifically this.paymentClaimInterval = clusterConfig.paymentClaimInterval || 3600) had a default value of 3600 seconds (1 hour). With optional() and no default in the schema, if paymentClaimInterval is not explicitly provided in the configuration, it will be undefined. This could lead to runtime errors or unexpected behavior in logic that expects a number. It should either have a default value defined in the schema or be made mandatory if it's always required.

@dnsi0
Copy link
Copy Markdown
Contributor Author

dnsi0 commented Apr 2, 2026

/run-security-scan

Copy link
Copy Markdown
Member

@alexcos20 alexcos20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI automated code review (Gemini 3).

Overall risk: medium

Summary:
This pull request introduces a significant refactoring of the Compute-to-Data (C2D) environment configuration and resource management. The core change is the ability to define multiple distinct compute environments within a single Docker cluster configuration, each with its own resources, fees, access rules, and job duration limits.

Key changes include:

  • Restructuring of DOCKER_COMPUTE_ENVIRONMENTS to nest environment-specific settings under an environments array.
  • Implementation of global resource availability checks (CPU, RAM, Disk) to prevent overallocation across environments within a Docker engine, ensuring physical limits are respected.
  • CPU core allocation is now managed per environment, with a mechanism to assign physical CPU cores to specific environments.
  • GPU resources are explicitly handled as 'shared-exclusive' across environments.
  • A new enableBenchmark configuration option allows for automatic generation of a benchmark compute environment based on system resources.
  • Configuration schemas (schemas.ts), .env.example, ci.yml, config.json, and various integration tests have been updated to reflect these structural and functional changes.

Comments:
• [INFO][style] The example DOCKER_COMPUTE_ENVIRONMENTS string is very long and complex. While typical for .env files, it might be challenging for developers to parse and customize. Consider adding a comment suggesting using multiline JSON syntax if the shell supports it, or point to the config.json example for better readability during setup.
• [INFO][other] The example shows "type":"cpu", "type":"ram", "type":"disk", "type":"gpu". While these are now mandatory in the schema, it's worth confirming that the backend logic correctly leverages and validates these type fields for all resource types, especially for any custom resources a user might define.
• [WARNING][other] The dockerComputeEnvironments structure in config.json (and .env equivalent) has been fundamentally changed, with most properties now nested under an environments array. This is a breaking change for existing deployments. It's crucial to document this prominently in release notes and provide clear migration instructions for node operators.
• [INFO][other] The checkGlobalResourceAvailability function is a critical addition for ensuring overallocation of resources (CPU, RAM, Disk) across multiple environments within a single Docker engine is prevented. This significantly improves resource management robustness. Ensure physicalLimits is always reliably populated for essential resources.
• [INFO][other] The globalTotal logic caps the total available resource to physicalLimit if globalTotal (sum of totals from environments) exceeds it. This is a sensible default, but clarifies that environment total definitions collectively can't exceed the detected physical capacity.
• [INFO][style] Adding allEnvironments?: ComputeEnvironment[] as a parameter to checkIfResourcesAreAvailable makes the signature more complex. While necessary for the global check, consider if allEnvironments could be a property of the C2DEngine class that is updated when this.envs changes, reducing the need to pass it explicitly in every call.
• [INFO][other] The createBenchmarkEnvironment function is a valuable feature for node operators, automatically configuring a benchmark environment based on detected system resources and aggregating GPU information. This promotes easier benchmarking and testing.
• [INFO][other] Populating this.physicalLimits with NCPU, MemTotal, and statfsSync for disk is crucial for the global resource checks. This ensures the system respects its true hardware constraints.
• [WARNING][bug] The CPU affinity logic now uses this.envCpuCoresMap which is populated per C2DEngineDocker instance. Previously, cpuOffset in compute_engines.ts ensured distinct CPU core ranges across multiple C2DEngineDocker instances running on the same physical machine. With cpuOffset removed, if multiple C2DEngineDocker instances are started (e.g., from multiple c2dClusters entries in config.json of type DOCKER, on the same host), they will each independently detect sysinfo.NCPU and assign CPU cores to their respective environments, potentially leading to overlapping CPU core assignments and resource contention between these C2DEngineDocker instances. This might be a regression in multi-Docker-engine deployment scenarios on a single physical host.
• [INFO][other] The C2DEnvironmentConfigSchema includes a refine rule requiring storageExpiry >= maxJobDuration. This is a good logical constraint that prevents jobs from running longer than their storage allows.
• [INFO][other] The simplification of C2DDockerConfigSchema to primarily hold Docker connection details and an array of C2DEnvironmentConfigSchema is a good architectural improvement, promoting modularity and cleaner separation of concerns between Docker daemon configuration and compute environment specifics.

@dnsi0
Copy link
Copy Markdown
Contributor Author

dnsi0 commented Apr 2, 2026

/run-security-scan

Copy link
Copy Markdown
Member

@alexcos20 alexcos20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI automated code review (Gemini 3).

Overall risk: medium

Summary:
This pull request introduces a major refactoring of the compute-to-data (C2D) environment configuration, allowing a single Docker compute cluster to host multiple distinct compute environments. Each environment can now define its own resources (CPU, RAM, disk, GPU), fees, access controls, and free-tier options. This change significantly enhances flexibility and resource management for C2D providers.

Key changes include:

  • Multi-Environment Support: DOCKER_COMPUTE_ENVIRONMENTS now accepts an array of environments under each Docker cluster, enabling granular control.
  • Resource Management: CPU, RAM, and disk resources are managed per environment, with global checks ensuring overall physical capacity is not exceeded. GPUs are treated as shared-exclusive resources.
  • Automatic Resource Detection: CPU, RAM, and disk defaults are auto-detected from the system if not explicitly configured.
  • Benchmark Environment: A new enableBenchmark configuration option allows the node to automatically create a benchmark compute environment, simplifying setup and monitoring.
  • Configuration Validation: Updated Zod schemas enforce the new structure and validate constraints (e.g., storageExpiry vs. maxJobDuration, mandatory disk resource).
  • Code Refactoring: The compute_engine_docker.ts and compute_engine_base.ts files have been heavily refactored to support the multi-environment and advanced resource management logic.
  • Documentation & Examples: .env.example, config.json, and docs/env.md have been updated to reflect the new configuration structure.
  • Test Coverage: All relevant integration and unit tests have been updated to match the new configuration schema, indicating thorough propagation of changes.

Comments:
• [INFO][other] This change effectively shifts the core C2D environment configuration from the C2DDockerConfig level to a nested C2DEnvironmentConfig[] array. This is a significant architectural improvement, enabling much more flexible and granular control over compute environments for providers. It's important to clearly communicate this breaking change to users.
• [INFO][performance] The introduction of physicalLimits and checkGlobalResourceAvailability is a crucial safety measure to prevent over-allocation of shared physical resources across multiple compute environments. This enhances the robustness of the C2D engine. Ensure logging is sufficient for debugging potential resource contention.
• [INFO][other] The refactoring of CPU allocation from a single cpuOffset and envCpuCores to a envCpuCoresMap for per-environment CPU affinity is a well-designed change that properly supports the new multi-environment architecture. This ensures that CPU resources are correctly isolated and allocated for each specific environment.
• [WARNING][other] The createBenchmarkEnvironment method hardcodes BENCHMARK_MONITORING_ADDRESS, SEPOLIA_CHAIN_ID, and USDC_TOKEN. While useful for a default setup, this limits flexibility for providers who might want to benchmark on different chains, with different tokens, or target different monitoring addresses. Consider making these configurable via environment variables or the config.json for broader applicability, or explicitly document these limitations.
• [INFO][style] The new C2DEnvironmentConfigSchema is comprehensive and includes important refine rules for validation (e.g., mandatory 'disk' resource, storageExpiry vs. maxJobDuration, presence of fees or free tier). This greatly improves the robustness of configuration parsing and reduces potential runtime errors due to misconfiguration.
• [INFO][other] The start method's refactoring to iterate over envConfig.environments and dynamically create ComputeEnvironment instances, along with physical resource detection and CPU affinity mapping per environment, demonstrates a robust implementation of the new architecture. This will greatly improve the scalability and configurability of compute resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Do not share compute resources between environments

2 participants