Skip to content

Latest commit

 

History

History
197 lines (134 loc) · 8.85 KB

File metadata and controls

197 lines (134 loc) · 8.85 KB

SLURM CONFIGURATION

This file includes step-by-step Slurm Configuration:

SLURM CONFIGURATION IN HEAD NODE

To begin configuring Slurm on the head node, the initial step involves defining and exporting an environment variable representing the user ID (UID) for the 'munge' group. This variable serves as a placeholder for the UID that will later be assigned to the 'munge' group. Additionally, a user account named 'munge' is established. MUNGE is employed as the default authentication mechanism. This process is accomplished using the following command:

  export MUNGEUSER=1001 
  sudo groupadd -g $MUNGEUSER munge 
  sudo useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge 

Likewise, an environment variable for SLURM is established, exported, and associated with the SLURM group. Concurrently, a user account named 'slurm' is created. This process is executed using the following command:

  export SLURMUSER=1002 
  sudo groupadd -g $SLURMUSER slurm 
  sudo useradd -m -c "SLURM workload manager" -d /var/lib/slurm -u $SLURMUSER -g slurm -s /bin/bash slurm

Finally MUNGE is installed using the following command:

  sudo apt-get install -y munge 

Upon installing MUNGE, ownership and permissions for MUNGE across various parts of the system are modified to "munge:munge". This task is accomplished using the following command:

  sudo chown -R munge: /etc/munge/ /var/log/munge/ /var/lib/munge/ /run/munge/
  sudo chmod 0700 /etc/munge/ /var/log/munge/ /var/lib/munge/ /run/munge/

Following the configuration of MUNGE, the munge.key file is copied to the /nfs/slurm directory. This action facilitates the utilization of the same munge key for authentication purposes on the client node. The command utilized for this step is:

  sudo scp /etc/munge/munge.key /nfs/slurm/

MUNGE services can be enabled and started utilising following command:

  sudo systemctl enable munge
  sudo systemctl start munge

After setting up MUNGE, the following command can be used to check the status of MUNGE:

  sudo systemctl status munge 

After configuring MUNGE, the next step involves downloading a database server that will store job history and scheduling reports. For this project, Mariadb is chosen due to its simplicity in configuration. The following command is utilized to download the Mariadb server:

  sudo apt-get install mariadb-server

In addition to installing the mariadb-server, the Slurm Database Daemon and Slurm-wlm are also installed using the following command:

  sudo apt-get install slurmdbd
  sudo apt-get install slurm-wlm

The database is configured to grant permissions to the 'localhost'. Furthermore, a Slurm database is created to store all the logs of Slurm activities. Following command can be employed to achieve this process:

  mysql
  grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by 'toor' with grant option; 
  create database slurm_acct_db;
  exit

Once the database is configured, the slurmdbd.conf file is created. The following line can be used to make slurmdbd.conf file.

  sudo nano /etc/slurm/slurmdbd.conf

The database configuration file should contain the following line of code:

  AuthType=auth/munge
  DbdAddr=localhost
  #DbdHost=headnode
  DbdHost=localhost
  DbdPort=6819
  SlurmUser=slurm
  DebugLevel=4
  LogFile=/var/log/slurm/slurmdbd.log
  PidFile=/run/slurm/slurmdbd.pid
  StorageType=accounting_storage/mysql
  StorageHost=localhost
  StorageLoc=slurm_acct_db
  StoragePass=toor
  StorageUser=slurm
  ###Setting database purge parameters
  PurgeEventAfter=12months
  PurgeJobAfter=12months
  PurgeResvAfter=2months
  PurgeStepAfter=2months
  PurgeSuspendAfter=1month
  PurgeTXNAfter=12months
  PurgeUsageAfter=12months

Required Ownership and Permission should be give to the conf file. chown slurm:slurm /etc/slurm/slurmdbd.conf chmod -R 600 slurmdbd.conf

The Slurm configuration file can be generated based on the system requirements using the provided link: Slurm Configuration Generator.

The following command can be used to create the configuration file and place it inside the /etc/slurm/ directory

  sudo nano /etc/slurm/slurm.conf 
  (Add the configuration file which is created using the link)

The firewall is deactivated to enable inbound traffic on ports 6817, 6818, and 6819 using the following command:

  sudo ufw allow 6817
  sudo ufw allow 6818
  sudo ufw allow 6819

Directories are created to store logs and reports generated by SLURM. Subsequently, permissions and ownership are adjusted accordingly.

  mkdir /var/spool/slurmctld
  chown slurm:slurm /var/spool/slurmctld
  chmod 755 /var/spool/slurmctld

  mkdir  /var/log/slurm
  touch /var/log/slurm/slurmctld.log
  touch /var/log/slurm/slurm_jobacct.log /var/log/slurm/slurm_jobcomp.log
  chown -R slurm:slurm /var/log/slurm/
  chmod 755 /var/log/slurm

After completing the configuration of SLURM on the Head Node, following commands is utilised to enable and start all the necessary SLURM services:

  systemctl daemon-reload
  systemctl enable slurmdbd
  systemctl start slurmdbd
  systemctl enable slurmctld
  systemctl start slurmctld

Lastly, following command is employed to check the status of the SLURM services:

  systemctl status slurmdbd
  systemctl status slurmctld

SLURM CONFIGURATION IN CLIENT NODE

-- The process of configuring SLURM on the client node closely resembles that of the head node, with the primary distinction being the absence of the need to set up a database. Similar to the head node, MUNGE must be configured by creating MUNGE and SLURM groups, as well as MUNGE and SLURM users. Following this setup, MUNGE should be installed. The following command is employed to accomplish these tasks:

  export MUNGEUSER=1001 
  $ sudo groupadd -g $MUNGEUSER munge 
  $ sudo useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge 
  $ export SLURMUSER=1002 
  $ sudo groupadd -g $SLURMUSER slurm 
  $ sudo useradd -m -c "SLURM workload manager" -d /var/lib/slurm -u $SLURMUSER -g slurm -s /bin/bash slurm
  sudo apt-get install -y munge

Now, the munge.key file, previously copied to /nfs/slurm, can be further distributed to each client node by copying it to /etc/munge/. This can be achieved using the following command:

  sudo scp /nfs/slurm/munge.key /etc/munge/
  sudo chown munge:munge /etc/munge/munge.key
  sudo chmod 400 /etc/munge/munge.key

Once the key has been copied and permissions have been adjusted, the MUNGE service is ready to be enabled and started. This task can be accomplished using the following command:

  sudo systemctl enable munge
  sudo systemctl start munge

After setting up Munge, SLURM should be installed in the compute node. To install SLURM on the compute node, you can use the following command:

  sudo apt-get install slurm-wlm

For compute node, the same configuration file should be used as headnode. Therefore, all the configuration file from headnode must be copied to /nfs/slurm, which then can be copied to /etc/slurm/ directory in compute node.

Copying slurm.conf and slurmdbd.conf to /nfs/slurm directory in headnode

  sudo scp /etc/slurm/slurm.conf /nfs/slurm/
  sudo scp /etc/slurm/slurmdbd.conf /nfs/slurm/

Copying slurm.conf and slurmdbd.conf from /nfs/slurm to /etc/slurm/ directory in compute node

  sudo scp /nfs/slurm/slurm.conf /etc/slurm
  sudo scp /nfs/slurm/slurmdbd.conf /etc/slurm

Once all the configuration files are copied, the following commands are used to set up the required directories and log files for SLURM, ensuring ownership and permissions are appropriately configured to collect SLURM messages.

  mkdir /var/spool/slurmd 
  chown slurm: /var/spool/slurmd
  chmod 755 /var/spool/slurmd

  mkdir /var/log/slurm/
  touch /var/log/slurm/slurmd.log
  chown -R slurm:slurm /var/log/slurm/slurmd.log
  chmod 755 /var/log/slurm

  mkdir /run/slurm
  touch /run/slurm/slurmd.pid (For compute node)
  chown slurm /run/slurm
  chown slurm:slurm /run/slurm
  chmod -R 770 /run/slurm

After setting up the file, the path of slurmd.pid has to be changed inside slurmd.service file, slurmd.service can be accessed using following command:

  nano /usr/lib/systemd/system/slurmd.service

After completing the configuration of SLURM on the Compute Node, following commands is utilised to enable and start the SLURM services:

  systemctl daemon-reload
  systemctl enable slurmd
  systemctl start slurmd

Lastly, following command is employed to check the status of the SLURM services:

  systemctl status slurmd