From b7749a35d670c766d8bb8d6f5310781625d2ec05 Mon Sep 17 00:00:00 2001 From: Omen Wild Date: Mon, 3 Mar 2025 18:46:29 -0800 Subject: [PATCH 1/2] Initial docs for our internal scripts. --- docs/admin/scripts.md | 76 +++++++++++++++++++++++++++++++++++++++++++ mkdocs.yml | 1 + 2 files changed, 77 insertions(+) create mode 100644 docs/admin/scripts.md diff --git a/docs/admin/scripts.md b/docs/admin/scripts.md new file mode 100644 index 00000000..85bed13d --- /dev/null +++ b/docs/admin/scripts.md @@ -0,0 +1,76 @@ +--- +template: admin.html +title: Internal HPCCF Scripts +--- + +Like every sysadmin group in the world, we have a number of internal-use scripts to accomplish various tasks. Some of +the standout scripts are documented here. + +## /opt/hpccf/bin/ + +- `cleanup.sh` `directory`: cleanup the emacs and jed editor detritus. + +- `cvmfs-transaction` `open | close | status`: a wrapper for starting/stopping a CVMFS transaction. + +- `ipmi.sh` `NodeName` `ipmi command to run`: wrap ipmitool to automatically use our IPMI username and password. + +- `sacctmgr-show-qos.sh`: show the Slurm QoS settings. + +- `showuser.sh` `LoginID`: Show information about a user from LDAP and Slurm. + +- `slurm-job-history.sh` `[today|yesterday|week|(2,3-)months|year] [user|node|job|account] query-subject`: a wrapper + for `sacct`. + +- `ucd` `Name | email`: check our LDAP tree to try to figure out who a user is. + +- `upgrade-reboot-ALL-down-nodes.sh`: without arguments, looks for down nodes, and run the `upgrade-reboot-node.sh` + script on them, one node per screen window. + + - Optional arguments, one of: + - `node1 node2 node3 ...` + - `partition-name` + +- `upgrade-reboot-node.sh` `NodeName`: run the sequence of steps that bring most nodes back into compliance and tries + to re-add the node to Slurm. + +- `victoria-download.sh` `logs | metrics` `version`: download the specified version of the Victoria product and stow + it. + +- `whoami-ssh`: process the authlog for SSH ssh keys to figure out who the actual logged-in user is. + +- `zfs-list.sh`: show zfs file-systems with compression and quota. + - Optional arguments passed directly to `zfs list`. + +## /opt/hpccf/sbin/ + +These commands normally require sudo. + +- `cobbler-add-*.sh`: obsolete scripts to add new nodes. The replacement [pulls from NetBox](/admin/cobbler/). + +- `hpccf-mkswap.sh`: generate the swap file in the `/scratch/` directory. Called by a systemd unit file on boot. + +- `ib-topology-generate.py`: a wrapper around `ibnetdiscover` to pretty-print the InfiniBand links. - Optional + arguments: + + - `-s | --switches-only`: only show switch-to-switch links. + + - `-g | --graphviz`: generate GraphViz output in `ib-topology.dot`. Visualize with `xdot` or + [Graphviz Visual Editor](https://magjac.com/graphviz-visual-editor/). + +- `iptables-disable.sh`: purge the host's iptables firewall. Use for testing only. Puppet will re-create the rules on + next run. + +- `ldap-authorized-keys.sh`: pull the user's keys from LDAP. Used when users log in. + +- `mlx-node-desc.sh`: updated the InfiniBand node description with the hostname so it show up in `ibnetdiscover`. + +- `puppetserver-production-git-pull.sh`: called from cron to pull the newest git Puppet data. + +- `purge-local-users.sh`: post-migration to LDAP cleanup script to purge all local users. + +- `reset-gpu-power.sh`: reset NVidia GPUs to default power limits. + +- `set-gpu-power.sh` `PowerLimitInWatts`: set the NVidia GPU power limits. + +- `slurmd-launch` | `slurmctld-launch` | `slurmdbd-launch`: a wrapper around Slurm services that exit non-zero if the + Slurm storage is not yet available. diff --git a/mkdocs.yml b/mkdocs.yml index de551a00..3ea03801 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -54,6 +54,7 @@ nav: - Provisioning: admin/provisioning.md - PXE boot: admin/PXE.md - Software: admin/software.md + - Scripts: admin/scripts.md - Virtual Machines: admin/vms.md # # Here be dragons! Don't edit these settings unless you Know What You're Doing (tm). From dd3c4051266d16b073a6e0c69626193fef8081d3 Mon Sep 17 00:00:00 2001 From: Omen Wild Date: Tue, 4 Mar 2025 10:39:55 -0800 Subject: [PATCH 2/2] Moving scripts around. --- docs/admin/scripts.md | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/docs/admin/scripts.md b/docs/admin/scripts.md index 85bed13d..d4a13fab 100644 --- a/docs/admin/scripts.md +++ b/docs/admin/scripts.md @@ -10,10 +10,6 @@ the standout scripts are documented here. - `cleanup.sh` `directory`: cleanup the emacs and jed editor detritus. -- `cvmfs-transaction` `open | close | status`: a wrapper for starting/stopping a CVMFS transaction. - -- `ipmi.sh` `NodeName` `ipmi command to run`: wrap ipmitool to automatically use our IPMI username and password. - - `sacctmgr-show-qos.sh`: show the Slurm QoS settings. - `showuser.sh` `LoginID`: Show information about a user from LDAP and Slurm. @@ -23,21 +19,6 @@ the standout scripts are documented here. - `ucd` `Name | email`: check our LDAP tree to try to figure out who a user is. -- `upgrade-reboot-ALL-down-nodes.sh`: without arguments, looks for down nodes, and run the `upgrade-reboot-node.sh` - script on them, one node per screen window. - - - Optional arguments, one of: - - `node1 node2 node3 ...` - - `partition-name` - -- `upgrade-reboot-node.sh` `NodeName`: run the sequence of steps that bring most nodes back into compliance and tries - to re-add the node to Slurm. - -- `victoria-download.sh` `logs | metrics` `version`: download the specified version of the Victoria product and stow - it. - -- `whoami-ssh`: process the authlog for SSH ssh keys to figure out who the actual logged-in user is. - - `zfs-list.sh`: show zfs file-systems with compression and quota. - Optional arguments passed directly to `zfs list`. @@ -47,6 +28,8 @@ These commands normally require sudo. - `cobbler-add-*.sh`: obsolete scripts to add new nodes. The replacement [pulls from NetBox](/admin/cobbler/). +- `cvmfs-transaction` `open | close | status`: a wrapper for starting/stopping a CVMFS transaction. + - `hpccf-mkswap.sh`: generate the swap file in the `/scratch/` directory. Called by a systemd unit file on boot. - `ib-topology-generate.py`: a wrapper around `ibnetdiscover` to pretty-print the InfiniBand links. - Optional @@ -57,6 +40,8 @@ These commands normally require sudo. - `-g | --graphviz`: generate GraphViz output in `ib-topology.dot`. Visualize with `xdot` or [Graphviz Visual Editor](https://magjac.com/graphviz-visual-editor/). +- `ipmi.sh` `NodeName` `ipmi command to run`: wrap ipmitool to automatically use our IPMI username and password. + - `iptables-disable.sh`: purge the host's iptables firewall. Use for testing only. Puppet will re-create the rules on next run. @@ -74,3 +59,18 @@ These commands normally require sudo. - `slurmd-launch` | `slurmctld-launch` | `slurmdbd-launch`: a wrapper around Slurm services that exit non-zero if the Slurm storage is not yet available. + +- `upgrade-reboot-ALL-down-nodes.sh`: without arguments, looks for down nodes, and run the `upgrade-reboot-node.sh` + script on them, one node per screen window. + + - Optional arguments, one of: + - `node1 node2 node3 ...` + - `partition-name` + +- `upgrade-reboot-node.sh` `NodeName`: run the sequence of steps that bring most nodes back into compliance and tries + to re-add the node to Slurm. + +- `victoria-download.sh` `logs | metrics` `version`: download the specified version of the Victoria product and stow + it. + +- `whoami-ssh`: process the authlog for SSH ssh keys to figure out who the actual logged-in user is.