Skip to content

Slurm adapter: real HPC cluster integration #37

@jeremymanning

Description

@jeremymanning

Description

adapters/slurm/src/main.rs (135 lines) has a CLI scaffold and config struct but no actual Slurm API integration. The status command returns "not yet implemented".

Requirements

  • Connect to Slurm head node via slurmrestd REST API or SSH+sbatch fallback
  • Advertise aggregate cluster capacity to World Compute broker
  • Dispatch incoming tasks as Slurm batch jobs (sbatch)
  • Monitor job status via sacct/squeue
  • Report results back to World Compute data plane
  • Handle Slurm-specific errors (PENDING, TIMEOUT, NODE_FAIL)
  • Implement install/configure/status CLI commands

Success Criteria

  • Adapter connects to real Slurm cluster
  • Jobs dispatched via sbatch and results collected
  • Cluster capacity advertised accurately
  • Error handling for common Slurm failure modes
  • Integration test on real 2+ node Slurm testbed
  • Adapter appears as aggregate node in worldcompute cluster peers

Testing (Principle V)

  • Deploy on real Slurm cluster (even 2-node test setup)
  • Submit SHA-256 test job → verify correct result
  • Simulate node failure → verify job rescheduled
  • Verify resource reporting matches sinfo output

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions