From a4d5de83beaaddbeca0c9ccb2513ff96963fb8ec Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Thu, 4 Jun 2026 11:42:32 +0200 Subject: [PATCH 1/7] Add initial files for each site-building approach --- docs/site_build/overview.md | 45 +++++++++++++++++++++++++++++++++++ docs/site_build/shared_fs.md | 1 + docs/site_build/site_cvmfs.md | 1 + 3 files changed, 47 insertions(+) create mode 100644 docs/site_build/overview.md create mode 100644 docs/site_build/shared_fs.md create mode 100644 docs/site_build/site_cvmfs.md diff --git a/docs/site_build/overview.md b/docs/site_build/overview.md new file mode 100644 index 0000000000..9c2317d327 --- /dev/null +++ b/docs/site_build/overview.md @@ -0,0 +1,45 @@ +# Introduction +This documentation is aimed at HPC sites or other facilities that make EESSI available on their system, but would like to offer additional installations that are performed 'on top' of EESSI (i.e. using dependencies provided by EESSI). + +There are several reasons why, as a site, you may want to offer additional software on top of EESSI. For example: +1. You want to offer software that does is not suitable for upstream deployment in EESSI (e.g. because it is proprietary, or because it is a development build / otherwise very specific build that is not useful for a general audience). +2. You need to make software available on (very) short notice to your users, and cannot wait for it to be deployed in upstream EESSI. +3. You want to retain full autonomy over what gets deployed + +While all of these are valid arguments, note that there is also one major downside to deploying things locally: you loose one of the core benefits of EESSI, namely that it provides _the same software on every system_. The more site-specific installations you have, the more difficult it will be for your users to move their workflows from e.g. their own development machine/cloud environment to your cluster, or scale up to larger clusters. If you're doing site-builds to make software available to your users on short notice, we highly encourage you to _also_ contribute the same software installation in upstream EESSI. This way, once accepted upstream, users that rely on that software retain their 'mobility'. + +# Choosing your approach +There are two approaches to doing site builds, each with their own advantages and disadvantages. + +1. Performing site builds using EESSI-extend on a shared filesystem. +2. Leveraging all of EESSI's tooling for site builds. In this approach, you use the EESSI build bot (`EESSI/eessi-bot-software-layer`), together with the EESSI build scripts (`EESSI/software-layer-scripts`) to build and deploy software into a CernVM-FS repository of your own. Essentially, this means you'll build in a way that is essentially identical to how it is done for upstream EESSI - with the only major difference being the target CernVM-FS repository. + +In both cases, you build 'on top' of EESSI, meaning that dependencies that are already provided by EESSI will not be reinstalled: they will simply be loaded from EESSI. + +Here, we list some advantages and disadvantages to help you choose which approach best suites your requirements. + +## Approach 1: using EESSI-extend on shared FS + +Advantages: +- Easy to get started: no additional setup or knowledge needed +- Automatically optimizes for the host on which you run the installation, and installs in architecture-specific prefix that matches the host architecture. This means you can install optimized software for each of your CPU/GPU architectures in an organized way. + +Disadvantages: +- This is a manual procedure (unless you create your own automation around it). As such, doesn't scale well to installing large amounts of software and/or installing software for many different hardware targets. +- The fact that you get optimized installations means that on a very heterogeneous system, you will have to run the installation many times - once for each architecture on which you want to offer that particular piece of software. +- Shared filesystems (and especially _parallal_ filesystems) are generally ill-suited to serve software. This means start-up time can be quite long (you can find some numbers [here](../training-events/2025/tutorial-best-practices-cvmfs-hpc/performance.md)). + +## Approach 2: leveraging all of EESSI's tooling for site builds + +Advantages: +- Highly automated +- Scalable to many architectures & installations +- Site builds are done based on a list of software in a GitHub repo - making it very transparent what is available / got added on your system +- Share maintenance on the automation with the EESSI community +- End-user look & feel are very similar to EESSI + +Disadvantages +- More setup time +- Requires more extnesive knowledge (CVMFS, EESSI build bot, object store) +- More hardware resources (CVMFS infrastructure, bot infrastructure) +- More components (software/hardware) to maintain diff --git a/docs/site_build/shared_fs.md b/docs/site_build/shared_fs.md new file mode 100644 index 0000000000..1333ed77b7 --- /dev/null +++ b/docs/site_build/shared_fs.md @@ -0,0 +1 @@ +TODO diff --git a/docs/site_build/site_cvmfs.md b/docs/site_build/site_cvmfs.md new file mode 100644 index 0000000000..1333ed77b7 --- /dev/null +++ b/docs/site_build/site_cvmfs.md @@ -0,0 +1 @@ +TODO From 259e935c81d562d031ba6e67fb708a4448f039ff Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Thu, 4 Jun 2026 17:12:51 +0200 Subject: [PATCH 2/7] Made a start with the site builds as a CVMFS repo --- docs/site_build/site_cvmfs.md | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/docs/site_build/site_cvmfs.md b/docs/site_build/site_cvmfs.md index 1333ed77b7..9b63b76d9c 100644 --- a/docs/site_build/site_cvmfs.md +++ b/docs/site_build/site_cvmfs.md @@ -1 +1,32 @@ -TODO +# Leverage EESSI's build procedure for site builds +In this approach, you use the EESSI build bot (`EESSI/eessi-bot-software-layer`), together with the EESSI build scripts (`EESSI/software-layer-scripts`) to build and deploy software into a CernVM-FS repository of your own. Essentially, this means you'll build in a way that is essentially identical to how it is done for upstream EESSI - with the only major difference being the target CernVM-FS repository. + +## Setup steps +What we need: +- Infrastructure for a site-specific CVMFS repository (Stratum 0, Stratum 1, proxies, client configuration) +- An instance of the EESSI build bot +- A bucket in an AWS S3-compatible object store (though you could work around this) +- A GitHub organization on which you can install GitHub Apps +- A GitHub repository within that organization which will be used to list the software you want to build +- Optionally: an automated procedure to ingest tarballs + +This documentation will go through the steps to set each of these up, in order. Since many of these individual steps are documented elsewhere, we will often reference that (and only list a very short summary here). + +### A site-specific CVMFS infrastructure +The recommended CVMFS setup for a site-specific CVMFS repository is: +- A Stratum 0 servers +- Two (or more) Stratum 1 servers +- Two (or more) proxies + +Main reason here is: +- Having two Stratum 1's provides redundancy: if one dies, proxies seamlessly failover to the other one. +- Having two proxies provides both redundancy _and_ load balancing. If one proxy dies, clients failover to the other one. If clients are configured to use the proxies in a [proxy group](https://cvmfs.readthedocs.io/en/2.8/cpt-configure.html#proxy-lists), each client selects a proxy randomly, thus providing load balancing. + +!!! note + + The recommended CVMFS setup requires a fair amount of machines. If this is more than you can afford, there are some tricks you can pull. First, you can combine each proxy with a Stratum 1 on the same machine, only use the proxies for proxy-ing upstream EESSI, and simply have your clients contact your site-specific Stratum 1's directly (without proxy). In this scenario, you can achieve load-balancing by configuring half your clients with `CVMFS_SERVER_URL=";"` and half with `CVMFS_SERVER_URL=";"`, where `instance_1` and `instance_2` are the IPs of your Stratum 1's. Finally, you can even use the Stratum 0 instead of a second Stratum 1. Note that this has security implications, as it means your Stratum 0 needs to be directly accessible to your clients. This is a potential concern: if there are vulnarebilities in the Stratum 0 software, end-users may be able to push (malicious) software in there. + +An extensive [tutorial](https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/) is available that teaches how to setup each of these machines, and how to configure the clients to use the relevant Stratum 1's and proxies. Below, we will summarize some of the key steps, and point out things that are specifically relevant for this setup. + +#### Setting up the Stratum 0 + From 6aebcec4d37628f515e3732c5238c806c480363d Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Mon, 8 Jun 2026 18:06:45 +0200 Subject: [PATCH 3/7] Started to write Stratum 0 install docs. Work in progress --- docs/site_build/site_cvmfs.md | 191 +++++++++++++++++++++++++++++++++- 1 file changed, 190 insertions(+), 1 deletion(-) diff --git a/docs/site_build/site_cvmfs.md b/docs/site_build/site_cvmfs.md index 9b63b76d9c..c478d55edd 100644 --- a/docs/site_build/site_cvmfs.md +++ b/docs/site_build/site_cvmfs.md @@ -28,5 +28,194 @@ Main reason here is: An extensive [tutorial](https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/) is available that teaches how to setup each of these machines, and how to configure the clients to use the relevant Stratum 1's and proxies. Below, we will summarize some of the key steps, and point out things that are specifically relevant for this setup. -#### Setting up the Stratum 0 +#### Setting up the CVMFS Stratum 0 +For extensive instructions, see [the CVMFS tutorial](https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/02_stratum0_client/#21-setting-up-the-stratum-0) or the [upstream documentation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html). + +They key steps are: + +1. Define a repository name, typically something like `software..tld` + +```bash +repo_name=name.sitename.tld +``` +Note that while this looks like a URL, it is not: it is simply a name for the CVMFS repository. If you set up any DNS though, it is conventional to use the same domain structure, to avoid confusion. + +2. Install the `cvmfs` and `cvmfs-server` packages. Typically: +```bash +wget https://cvmrepo.s3.cern.ch/cvmrepo/apt/cvmfs-release-latest_all.deb +sudo dpkg -i cvmfs-release-latest_all.deb +rm -f cvmfs-release-latest_all.deb +sudo apt-get -y update +sudo apt-get -y install cvmfs cvmfs-server +``` + +3. To facilitate ingestion later on, we make sure that the `software.eessi.io` repository is available on our Stratum 0 machine as well. Because the `cvmfs-server` cannot perform certain actions when `autofs` is enabled (which is usually how CVMFS repositories are mounted), we have to mount it manually. We also mount the `cvmfs-config.cern.ch` repository, as that provides the configuration for `software.eessi.io` + +```bash +sudo mkdir -p /cvmfs/{cvmfs-config.cern.ch,software.eessi.io} +sudo bash -c "echo 'cvmfs-config.cern.ch /cvmfs/cvmfs-config.cern.ch cvmfs defaults 0 0' >> /etc/fstab" +sudo bash -c "echo 'software.eessi.io /cvmfs/software.eessi.io cvmfs defaults 0 0' >> /etc/fstab" +sudo systemctl daemon-reload +sudo mount -a +``` + +You should now be able to see the `cvmfs-config.cern.ch` and `software.eessi.io` repositories: +```bash +ls -al /cvmfs/cvmfs-config.cern.ch +ls -al /cvmfs/software.eessi.io +``` + +4. By default, CVMFS will store data for repositories in `/srv/cvmfs`. If you want to store this elsewhere, create a link `/srv/cvmfs` that points to where you want to store the repository data. +```bash +sudo ln -s /my/desired/data/prefix /srv/cvmfs +``` + +5. Create the repository owned by `root`: +```bash +sudo cvmfs_server mkfs -o root $repo_name +``` + +!!! note + + The reason we configure `root` to be the owner of the CVMFS repository is that EasyBuild, when configured through `EESSI-extend`, by default creates read-only installations. This causes issues if CVMFS has to put catalog files (`.cvmfscatalog`) files in these directories, which are metadata files that CVMFS uses to list the files/directories present in the repository. While it is technically possible to use a regular user, this would require making all directories in which CVMFS would create a `.cvmfscatalog` file writeable in a transaction, then create the catalog files, then remove the write permissions again. The same approach would need to be taken to reinstall software that was already installed. We consider this unnecessarily complex, and instead prefer to have the repository owned by root. + + +Then, we create a `.cvmfsdirtab` file, that will tell CVMFS in at which directory levels to create [catalog files](https://cvmfs.readthedocs.io/en/stable/cpt-details.html#nested-catalogs). We advise that you simply use the latest `.cvmfsdirtab` that is used for the upstream EESSI repository as well. You can get it from [the `EESSI/filesystem-layer` repository](https://github.com/EESSI/filesystem-layer/blob/main/roles/create_cvmfs_content_structure/files/.cvmfsdirtab) or simply copy it from `/cvmfs/software.eessi.io/.cvmfsdirtab` on a system where `EESSI` is available. Alternatively, you can configure your CVMFS server to do [automatic catalog creation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#automatic-management-of-nested-catalogs) by setting `CVMFS_AUTOCATALOGS=true` in the server configuration file (`/etc/cvmfs/repositories.d/$repo_name/server.conf`). + +To get the `.cvmfsdirtab` in our repository, we have to open a transaction, move the file into the repository, and publish the transaction. In the same transaction, we can remove the `new_repository` file that is present by default in any newly created repository + +```bash +wget https://raw.githubusercontent.com/EESSI/filesystem-layer/refs/heads/main/roles/create_cvmfs_content_structure/files/.cvmfsdirtab +sudo cvmfs_server transaction $repo_name +mv .cvmfsdirtab /cvmfs/$repo_name/ +rm /cvmfs/$repo_name/new_repository +sudo cvmfs_server publish -m "Add .cvfmsdirtab file and remove new_repository file" +``` + +As we already have a `.cvmfsdirtab` file in place, you should see CVMFS going through the logic of creating catalogs. None will be created at this point, as none of the directory structures listed in the `.cvmfsdirtab` file match existing directories in your repository (since it is still empty). CVMFS will warn you about the patterns that don't have any match (these are harmless). + +For convenience, we list all the commands here together: + +```bash +# Define CVMFS repository name +repo_name=name.sitename.tld +echo "Defined CVMFS repository name as $repo_name" + +# Install cvmfs client and cvmfs-server packages +echo "Installing cvmfs and cvmfs-server packages" +wget https://cvmrepo.s3.cern.ch/cvmrepo/apt/cvmfs-release-latest_all.deb +sudo dpkg -i cvmfs-release-latest_all.deb +rm -f cvmfs-release-latest_all.deb +sudo apt-get -y update +sudo apt-get -y install cvmfs cvmfs-server + +# Manually mount cvmfs-config.cern.ch and software.eessi.io repositories +echo "Manually mounting cvmfs-config.cern.ch and software.eessi.io repositories" +sudo mkdir -p /cvmfs/{cvmfs-config.cern.ch,software.eessi.io} +sudo bash -c "echo 'cvmfs-config.cern.ch /cvmfs/cvmfs-config.cern.ch cvmfs defaults 0 0' >> /etc/fstab" +sudo bash -c "echo 'software.eessi.io /cvmfs/software.eessi.io cvmfs defaults 0 0' >> /etc/fstab" +sudo systemctl daemon-reload +sudo mount -a + +# Check that manually mounted repositories are available +echo "Check that we can access manually mounted cvmfs-config.cern.ch" +ls -al /cvmfs/cvmfs-config.cern.ch/ +echo "Check that we can access manually mounted software.eessi.io" +ls -al /cvmfs/software.eessi.io/ + +# Create the cvmfs repository, owned by root +echo "Creating new CVMFS repository $repo_name" +sudo cvmfs_server mkfs -o root $repo_name + +# Create the .cvmfsdirtab file in the root of the repository +echo "Opening transaction, adding .cvmfsdirtab file to the root of $repo_name, and then publish the transaction" +sudo cvmfs_server transaction $repo_name +cp /cvmfs/software.eessi.io/.cvmfsdirtab /cvmfs/$repo_name/ +rm /cvmfs/$repo_name/new_repository +sudo cvmfs_server publish -m "Add .cvfmsdirtab file and remove new_repository file" +``` + +#### Sanity checking your Stratum 0 setup + +On the machine where you've set up your CVMFS stratum 0, you can perform some checks to see if things where set up correctly: + +1. Check that the repository was created correctly: + +```bash +cvmfs_server list +``` + +lists all the Stratum servers installed on this machine and should report something like `$repo_name (stratum0 / local)`. + +2. Check that two mount points are now present related to your repository: + +```bash +mount +``` + +Should print something like + +``` +$repo_name on /var/spool/cvmfs/$repo_name/rdonly type fuse (...) +overlay_$repo_name on /cvmfs/$repo_name type overlay (...) +``` + +The first is a read-only mount of the current state of your repository. The second is an overlay filesystem that shows the current state of your repositories (as `lowerdir`) with any changes done in a currently open transaction (if any) overlayed on top (as `upperdir`, for which it uses `/var/spool/cvmfs/$repo_name/scratch/current`). I.e. it displays the state of your repository under `/cvmfs/$repo_name` as it will be once you publish any open transactions. + +3. The directory + +```bash +ls /srv/cvmfs/$repo_name +``` + +should now contain some hidden `.cvmfs<...>` files and a `data` directory. The latter is where the data in your repository will actually be stored. + +4. The directory + +```bash +ls -al /cvmfs/$repo_name +``` + +should now show you the `.cvmfsdirtab` file we added in our transaction. + +#### Setting up a CVMFS Stratum 1 + +#### Setting up proxies + +#### Configuring your CVMFS clients + +### Setting up an object store to stage build tarballs + +#### Creating a bucket +- create bucket +- set policies + +#### Create tokens to access bucket +- consider creating seperate IAM identities with separate permissions for your build bot and Stratum 0 + +### Setting up the EESSI build bot + +#### Creating a SMEE channel + +#### Registering a GitHub App for the bot + +#### Installing the GitHub App onto a repository +- create new repo to hold easystacks +- install GH app on new repo + +#### Install EESSI build bot on a machine +- app.cfg +- set up an environment for the bot to run in +- run the necessary scripts + +### Set up automatic ingestion on CVMFS Stratum 0 (optional) +- list steps this needs to do + +### Add your first software +- add easystack +- create PR +- bot:show_config first! +- bot:build +- Add deploy label +- See deployment happening in real-time :) From dcf695f01cdbc4a8fffb76effc902375ccc27514 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Tue, 9 Jun 2026 12:00:11 +0200 Subject: [PATCH 4/7] Slight rephrasing of site-build overview section and add new pages to the mkdocs.yml --- docs/site_build/overview.md | 4 ++-- mkdocs.yml | 4 ++++ 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/site_build/overview.md b/docs/site_build/overview.md index 9c2317d327..78611fe2c8 100644 --- a/docs/site_build/overview.md +++ b/docs/site_build/overview.md @@ -11,8 +11,8 @@ While all of these are valid arguments, note that there is also one major downsi # Choosing your approach There are two approaches to doing site builds, each with their own advantages and disadvantages. -1. Performing site builds using EESSI-extend on a shared filesystem. -2. Leveraging all of EESSI's tooling for site builds. In this approach, you use the EESSI build bot (`EESSI/eessi-bot-software-layer`), together with the EESSI build scripts (`EESSI/software-layer-scripts`) to build and deploy software into a CernVM-FS repository of your own. Essentially, this means you'll build in a way that is essentially identical to how it is done for upstream EESSI - with the only major difference being the target CernVM-FS repository. +1. Perform site builds using EESSI-extend on a shared filesystem. +2. Leverage EESSI's build procedure for site builds. In this approach, you use the EESSI build bot (`EESSI/eessi-bot-software-layer`), together with the EESSI build scripts (`EESSI/software-layer-scripts`) to build and deploy software into a CernVM-FS repository of your own. Essentially, this means you'll build in a way that is essentially identical to how it is done for upstream EESSI - with the only major difference being the target CernVM-FS repository. In both cases, you build 'on top' of EESSI, meaning that dependencies that are already provided by EESSI will not be reinstalled: they will simply be loaded from EESSI. diff --git a/mkdocs.yml b/mkdocs.yml index 81eeb94c7d..d2b14be224 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -50,6 +50,10 @@ nav: - Advanced usage: - Setting up your Stratum: filesystem_layer/stratum1.md - Building software with EESSI: using_eessi/building_on_eessi.md + - Build a site software stack on EESSI: + - Introduction: site_builds/overview.md + - EESSI-extend on shared FS: site_build/shared_fs.md + - EESSI build infra on site CVMFS: site_build/site_cvmfs.md - Test suite: - Overview: test-suite/index.md - Installation & configuration: test-suite/installation-configuration.md From c887a09e6cfea706d70976bd79b69d0f2955671e Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Tue, 9 Jun 2026 12:15:44 +0200 Subject: [PATCH 5/7] Corrected typo --- mkdocs.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mkdocs.yml b/mkdocs.yml index 65817b9d15..c239f7446a 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -51,7 +51,7 @@ nav: - Setting up your Stratum: filesystem_layer/stratum1.md - Building software with EESSI: using_eessi/building_on_eessi.md - Build a site software stack on EESSI: - - Introduction: site_builds/overview.md + - Introduction: site_build/overview.md - EESSI-extend on shared FS: site_build/shared_fs.md - EESSI build infra on site CVMFS: site_build/site_cvmfs.md - Test suite: From 729cf888cccbc9a0035dccb3af9429636fcfa059 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Tue, 9 Jun 2026 15:20:42 +0200 Subject: [PATCH 6/7] More or less completed the docs on how to setup a Stratum 0 --- docs/site_build/site_cvmfs.md | 121 +++++++++++++++++++++------------- 1 file changed, 77 insertions(+), 44 deletions(-) diff --git a/docs/site_build/site_cvmfs.md b/docs/site_build/site_cvmfs.md index c478d55edd..1521e8ba78 100644 --- a/docs/site_build/site_cvmfs.md +++ b/docs/site_build/site_cvmfs.md @@ -3,6 +3,7 @@ In this approach, you use the EESSI build bot (`EESSI/eessi-bot-software-layer`) ## Setup steps What we need: + - Infrastructure for a site-specific CVMFS repository (Stratum 0, Stratum 1, proxies, client configuration) - An instance of the EESSI build bot - A bucket in an AWS S3-compatible object store (though you could work around this) @@ -12,13 +13,15 @@ What we need: This documentation will go through the steps to set each of these up, in order. Since many of these individual steps are documented elsewhere, we will often reference that (and only list a very short summary here). -### A site-specific CVMFS infrastructure +## Site-specific CVMFS infrastructure The recommended CVMFS setup for a site-specific CVMFS repository is: + - A Stratum 0 servers - Two (or more) Stratum 1 servers - Two (or more) proxies Main reason here is: + - Having two Stratum 1's provides redundancy: if one dies, proxies seamlessly failover to the other one. - Having two proxies provides both redundancy _and_ load balancing. If one proxy dies, clients failover to the other one. If clients are configured to use the proxies in a [proxy group](https://cvmfs.readthedocs.io/en/2.8/cpt-configure.html#proxy-lists), each client selects a proxy randomly, thus providing load balancing. @@ -28,21 +31,23 @@ Main reason here is: An extensive [tutorial](https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/) is available that teaches how to setup each of these machines, and how to configure the clients to use the relevant Stratum 1's and proxies. Below, we will summarize some of the key steps, and point out things that are specifically relevant for this setup. -#### Setting up the CVMFS Stratum 0 +### Setting up your Stratum 0 -For extensive instructions, see [the CVMFS tutorial](https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/02_stratum0_client/#21-setting-up-the-stratum-0) or the [upstream documentation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html). +The documentation below provides you with the minimal steps required to set up a working Stratum 0 and is specifically aimed at setting up a Stratum 0 for hosting a site software stack on top of EESSI (which is why e.g. it makes the `software.eessi.io` repository available on this Stratum 0 as well). However, there is a vast amount of things you can configure for a CVMFS Stratum 0, and nothing beats the detail of the extensive [upstream documentation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html). The [CVMFS tutorial](https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/02_stratum0_client/#21-setting-up-the-stratum-0) may also be helpful. -They key steps are: +**1. Choose a repository name** -1. Define a repository name, typically something like `software..tld` +Define a repository name, typically something like `software..tld` -```bash +``` { .bash .copy } repo_name=name.sitename.tld ``` Note that while this looks like a URL, it is not: it is simply a name for the CVMFS repository. If you set up any DNS though, it is conventional to use the same domain structure, to avoid confusion. -2. Install the `cvmfs` and `cvmfs-server` packages. Typically: -```bash +**2. Install the `cvmfs` and `cvmfs-server` packages** + +Typically: +``` { .bash .copy } wget https://cvmrepo.s3.cern.ch/cvmrepo/apt/cvmfs-release-latest_all.deb sudo dpkg -i cvmfs-release-latest_all.deb rm -f cvmfs-release-latest_all.deb @@ -50,10 +55,14 @@ sudo apt-get -y update sudo apt-get -y install cvmfs cvmfs-server ``` -3. To facilitate ingestion later on, we make sure that the `software.eessi.io` repository is available on our Stratum 0 machine as well. Because the `cvmfs-server` cannot perform certain actions when `autofs` is enabled (which is usually how CVMFS repositories are mounted), we have to mount it manually. We also mount the `cvmfs-config.cern.ch` repository, as that provides the configuration for `software.eessi.io` +**3. Make `software.eessi.io` available on your Stratum 0** -```bash +To facilitate ingestion later on, we make sure that the `software.eessi.io` repository is available on our Stratum 0 machine as well. This allows us to leverage e.g. the Lmod installation from there to build the Lmod cache. Because the `cvmfs-server` cannot perform certain actions when `autofs` is enabled (which is usually how CVMFS repositories are mounted), we have to mount it manually. We also mount the `cvmfs-config.cern.ch` repository, as that provides the configuration for `software.eessi.io` + +``` { .bash .copy } sudo mkdir -p /cvmfs/{cvmfs-config.cern.ch,software.eessi.io} +sudo bash -c "echo 'CVMFS_CLIENT_PROFILE="single"' > /etc/cvmfs/default.local" +sudo bash -c "echo 'CVMFS_QUOTA_LIMIT=10000' >> /etc/cvmfs/default.local" sudo bash -c "echo 'cvmfs-config.cern.ch /cvmfs/cvmfs-config.cern.ch cvmfs defaults 0 0' >> /etc/fstab" sudo bash -c "echo 'software.eessi.io /cvmfs/software.eessi.io cvmfs defaults 0 0' >> /etc/fstab" sudo systemctl daemon-reload @@ -61,43 +70,60 @@ sudo mount -a ``` You should now be able to see the `cvmfs-config.cern.ch` and `software.eessi.io` repositories: -```bash +``` { .bash .copy } ls -al /cvmfs/cvmfs-config.cern.ch ls -al /cvmfs/software.eessi.io ``` -4. By default, CVMFS will store data for repositories in `/srv/cvmfs`. If you want to store this elsewhere, create a link `/srv/cvmfs` that points to where you want to store the repository data. -```bash +**4. (Optional) Change location to store Stratum 0 data** + +By default, CVMFS will store data for repositories in `/srv/cvmfs`. If you want to store this elsewhere, create a link `/srv/cvmfs` that points to where you want to store the repository data. + +``` { .bash .copy } sudo ln -s /my/desired/data/prefix /srv/cvmfs ``` -5. Create the repository owned by `root`: -```bash +**5. Create a new CVMFS repository** + +To create a new CVMFS repository on the Stratum 0, run + +``` { .bash .copy } sudo cvmfs_server mkfs -o root $repo_name ``` +The `-o root` tells CVMFS that this repository should be owned by root. + !!! note The reason we configure `root` to be the owner of the CVMFS repository is that EasyBuild, when configured through `EESSI-extend`, by default creates read-only installations. This causes issues if CVMFS has to put catalog files (`.cvmfscatalog`) files in these directories, which are metadata files that CVMFS uses to list the files/directories present in the repository. While it is technically possible to use a regular user, this would require making all directories in which CVMFS would create a `.cvmfscatalog` file writeable in a transaction, then create the catalog files, then remove the write permissions again. The same approach would need to be taken to reinstall software that was already installed. We consider this unnecessarily complex, and instead prefer to have the repository owned by root. +**6. Configure CVMFS catalog creation** + +Here, we have two options. -Then, we create a `.cvmfsdirtab` file, that will tell CVMFS in at which directory levels to create [catalog files](https://cvmfs.readthedocs.io/en/stable/cpt-details.html#nested-catalogs). We advise that you simply use the latest `.cvmfsdirtab` that is used for the upstream EESSI repository as well. You can get it from [the `EESSI/filesystem-layer` repository](https://github.com/EESSI/filesystem-layer/blob/main/roles/create_cvmfs_content_structure/files/.cvmfsdirtab) or simply copy it from `/cvmfs/software.eessi.io/.cvmfsdirtab` on a system where `EESSI` is available. Alternatively, you can configure your CVMFS server to do [automatic catalog creation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#automatic-management-of-nested-catalogs) by setting `CVMFS_AUTOCATALOGS=true` in the server configuration file (`/etc/cvmfs/repositories.d/$repo_name/server.conf`). +**Option 1:** we create a `.cvmfsdirtab` file in the root of the repository. This will tell CVMFS at which directory levels to create [catalog files](https://cvmfs.readthedocs.io/en/stable/cpt-details.html#nested-catalogs). We advise that you simply use the latest `.cvmfsdirtab` that is used for the upstream EESSI repository as well. You can get it from [the `EESSI/filesystem-layer` repository](https://github.com/EESSI/filesystem-layer/blob/main/roles/create_cvmfs_content_structure/files/.cvmfsdirtab) or simply copy it from `/cvmfs/software.eessi.io/.cvmfsdirtab` on a system where `EESSI` is available. The upside of this approach is that it creates catalogue files at the root of each EasyBuild installation prefix. This causes files that are typically accessed together (namely: that belong to the same software installation) to be indexed within the same catalog, which is typically good for performance. The downside is that if installations are extremely big, the catalog may exceed the largest size that CVMFS recommends (upto 200k files/dirs per catalog). -To get the `.cvmfsdirtab` in our repository, we have to open a transaction, move the file into the repository, and publish the transaction. In the same transaction, we can remove the `new_repository` file that is present by default in any newly created repository +**Option 2:** you can configure your CVMFS server to do [automatic catalog creation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#automatic-management-of-nested-catalogs) by setting `CVMFS_AUTOCATALOGS=true` in the server configuration file (`/etc/cvmfs/repositories.d/$repo_name/server.conf`). The upside is that this option will ensure that the number of files per catalog stays within the recommended limits. The downside is that CVMFS does not know which files are commonly accessed together (e.g. because they belong to the same software installation) and might spread them over multiple catalogues - even when that's not strictly needed in terms of catalog size. -```bash -wget https://raw.githubusercontent.com/EESSI/filesystem-layer/refs/heads/main/roles/create_cvmfs_content_structure/files/.cvmfsdirtab +Here, we follow **Option 1**. + +To get the `.cvmfsdirtab` in your repository, you have to open a transaction, move the file into the repository, and publish the transaction. In the same transaction, we can immediately remove the `new_repository` file that is present by default in any newly created repository + +``` { .bash .copy } sudo cvmfs_server transaction $repo_name -mv .cvmfsdirtab /cvmfs/$repo_name/ -rm /cvmfs/$repo_name/new_repository +# Essentially copy the .cvmfsdirtab from EESSI, but strip every pattern related to the compatibility layer +sudo bash -c "cat /cvmfs/software.eessi.io/.cvmfsdirtab | grep -v '^/versions/\*/compat' > /cvmfs/$repo_name/.cvmfsdirtab" +sudo rm /cvmfs/$repo_name/new_repository sudo cvmfs_server publish -m "Add .cvfmsdirtab file and remove new_repository file" ``` -As we already have a `.cvmfsdirtab` file in place, you should see CVMFS going through the logic of creating catalogs. None will be created at this point, as none of the directory structures listed in the `.cvmfsdirtab` file match existing directories in your repository (since it is still empty). CVMFS will warn you about the patterns that don't have any match (these are harmless). +As you now have a `.cvmfsdirtab` file in place, you should see CVMFS going through the logic of creating catalogs as soon as you run the `cvmfs_server publish` command. No catalogs will be created at this point, as none of the directory structures listed in the `.cvmfsdirtab` file match existing directories in your repository (since it is still empty). CVMFS will warn you about the patterns that don't have any match ('WARNING: cannot apply pathspec') - these warnings are harmless and only serve as an indication that not all pathspecs in your `.cvmfsdirtab` file seem to actually exit (yet) in your repository. + +**Scripted summary of steps** -For convenience, we list all the commands here together: +For convenience, we list all the commands from the prior steps together: -```bash +``` { .bash .copy } # Define CVMFS repository name repo_name=name.sitename.tld echo "Defined CVMFS repository name as $repo_name" @@ -113,9 +139,15 @@ sudo apt-get -y install cvmfs cvmfs-server # Manually mount cvmfs-config.cern.ch and software.eessi.io repositories echo "Manually mounting cvmfs-config.cern.ch and software.eessi.io repositories" sudo mkdir -p /cvmfs/{cvmfs-config.cern.ch,software.eessi.io} +# Creating minimal client config +sudo bash -c "echo 'CVMFS_CLIENT_PROFILE="single"' > /etc/cvmfs/default.local" +sudo bash -c "echo 'CVMFS_QUOTA_LIMIT=10000' >> /etc/cvmfs/default.local" +# Adding the cvmfs mounts to fstab sudo bash -c "echo 'cvmfs-config.cern.ch /cvmfs/cvmfs-config.cern.ch cvmfs defaults 0 0' >> /etc/fstab" sudo bash -c "echo 'software.eessi.io /cvmfs/software.eessi.io cvmfs defaults 0 0' >> /etc/fstab" +# Rerun the fstab generator to create the .mount files sudo systemctl daemon-reload +# Actually trigger mounting the cvmfs filesystems sudo mount -a # Check that manually mounted repositories are available @@ -131,18 +163,19 @@ sudo cvmfs_server mkfs -o root $repo_name # Create the .cvmfsdirtab file in the root of the repository echo "Opening transaction, adding .cvmfsdirtab file to the root of $repo_name, and then publish the transaction" sudo cvmfs_server transaction $repo_name -cp /cvmfs/software.eessi.io/.cvmfsdirtab /cvmfs/$repo_name/ -rm /cvmfs/$repo_name/new_repository +# Essentially copy the .cvmfsdirtab from EESSI, but strip every pattern related to the compatibility layer +sudo bash -c "cat /cvmfs/software.eessi.io/.cvmfsdirtab | grep -v '^/versions/\*/compat' > /cvmfs/$repo_name/.cvmfsdirtab" +sudo rm /cvmfs/$repo_name/new_repository sudo cvmfs_server publish -m "Add .cvfmsdirtab file and remove new_repository file" ``` -#### Sanity checking your Stratum 0 setup +### Sanity checking your Stratum 0 setup On the machine where you've set up your CVMFS stratum 0, you can perform some checks to see if things where set up correctly: 1. Check that the repository was created correctly: -```bash +``` { .bash .copy } cvmfs_server list ``` @@ -150,7 +183,7 @@ lists all the Stratum servers installed on this machine and should report someth 2. Check that two mount points are now present related to your repository: -```bash +``` { .bash .copy } mount ``` @@ -165,7 +198,7 @@ The first is a read-only mount of the current state of your repository. The seco 3. The directory -```bash +``` { .bash .copy } ls /srv/cvmfs/$repo_name ``` @@ -173,46 +206,46 @@ should now contain some hidden `.cvmfs<...>` files and a `data` directory. The l 4. The directory -```bash +``` { .bash .copy } ls -al /cvmfs/$repo_name ``` should now show you the `.cvmfsdirtab` file we added in our transaction. -#### Setting up a CVMFS Stratum 1 +### Setting up a CVMFS Stratum 1 -#### Setting up proxies +### Setting up proxies -#### Configuring your CVMFS clients +### Configuring your CVMFS clients -### Setting up an object store to stage build tarballs +## Setting up an object store to stage build tarballs -#### Creating a bucket +### Creating a bucket - create bucket - set policies -#### Create tokens to access bucket +### Create tokens to access bucket - consider creating seperate IAM identities with separate permissions for your build bot and Stratum 0 -### Setting up the EESSI build bot +## Setting up the EESSI build bot -#### Creating a SMEE channel +### Creating a SMEE channel -#### Registering a GitHub App for the bot +### Registering a GitHub App for the bot -#### Installing the GitHub App onto a repository +### Installing the GitHub App onto a repository - create new repo to hold easystacks - install GH app on new repo -#### Install EESSI build bot on a machine +### Install EESSI build bot on a machine - app.cfg - set up an environment for the bot to run in - run the necessary scripts -### Set up automatic ingestion on CVMFS Stratum 0 (optional) +## Set up automatic ingestion on CVMFS Stratum 0 (optional) - list steps this needs to do -### Add your first software +## Add your first software - add easystack - create PR - bot:show_config first! From c31746d28b709d267e61f00acbde8e0e884366f8 Mon Sep 17 00:00:00 2001 From: Caspar van Leeuwen Date: Tue, 9 Jun 2026 20:43:09 +0200 Subject: [PATCH 7/7] Finished description for Stratum 0 and Stratum 1 --- docs/site_build/overview.md | 7 +- docs/site_build/site_cvmfs.md | 184 ++++++++++++++++++++++++++++++++-- 2 files changed, 181 insertions(+), 10 deletions(-) diff --git a/docs/site_build/overview.md b/docs/site_build/overview.md index 78611fe2c8..413ddb55e6 100644 --- a/docs/site_build/overview.md +++ b/docs/site_build/overview.md @@ -2,9 +2,10 @@ This documentation is aimed at HPC sites or other facilities that make EESSI available on their system, but would like to offer additional installations that are performed 'on top' of EESSI (i.e. using dependencies provided by EESSI). There are several reasons why, as a site, you may want to offer additional software on top of EESSI. For example: + 1. You want to offer software that does is not suitable for upstream deployment in EESSI (e.g. because it is proprietary, or because it is a development build / otherwise very specific build that is not useful for a general audience). 2. You need to make software available on (very) short notice to your users, and cannot wait for it to be deployed in upstream EESSI. -3. You want to retain full autonomy over what gets deployed +3. You want to retain full autonomy over what gets deployed. While all of these are valid arguments, note that there is also one major downside to deploying things locally: you loose one of the core benefits of EESSI, namely that it provides _the same software on every system_. The more site-specific installations you have, the more difficult it will be for your users to move their workflows from e.g. their own development machine/cloud environment to your cluster, or scale up to larger clusters. If you're doing site-builds to make software available to your users on short notice, we highly encourage you to _also_ contribute the same software installation in upstream EESSI. This way, once accepted upstream, users that rely on that software retain their 'mobility'. @@ -21,10 +22,12 @@ Here, we list some advantages and disadvantages to help you choose which approac ## Approach 1: using EESSI-extend on shared FS Advantages: + - Easy to get started: no additional setup or knowledge needed - Automatically optimizes for the host on which you run the installation, and installs in architecture-specific prefix that matches the host architecture. This means you can install optimized software for each of your CPU/GPU architectures in an organized way. Disadvantages: + - This is a manual procedure (unless you create your own automation around it). As such, doesn't scale well to installing large amounts of software and/or installing software for many different hardware targets. - The fact that you get optimized installations means that on a very heterogeneous system, you will have to run the installation many times - once for each architecture on which you want to offer that particular piece of software. - Shared filesystems (and especially _parallal_ filesystems) are generally ill-suited to serve software. This means start-up time can be quite long (you can find some numbers [here](../training-events/2025/tutorial-best-practices-cvmfs-hpc/performance.md)). @@ -32,6 +35,7 @@ Disadvantages: ## Approach 2: leveraging all of EESSI's tooling for site builds Advantages: + - Highly automated - Scalable to many architectures & installations - Site builds are done based on a list of software in a GitHub repo - making it very transparent what is available / got added on your system @@ -39,6 +43,7 @@ Advantages: - End-user look & feel are very similar to EESSI Disadvantages + - More setup time - Requires more extnesive knowledge (CVMFS, EESSI build bot, object store) - More hardware resources (CVMFS infrastructure, bot infrastructure) diff --git a/docs/site_build/site_cvmfs.md b/docs/site_build/site_cvmfs.md index 1521e8ba78..8fa0bfa159 100644 --- a/docs/site_build/site_cvmfs.md +++ b/docs/site_build/site_cvmfs.md @@ -57,7 +57,7 @@ sudo apt-get -y install cvmfs cvmfs-server **3. Make `software.eessi.io` available on your Stratum 0** -To facilitate ingestion later on, we make sure that the `software.eessi.io` repository is available on our Stratum 0 machine as well. This allows us to leverage e.g. the Lmod installation from there to build the Lmod cache. Because the `cvmfs-server` cannot perform certain actions when `autofs` is enabled (which is usually how CVMFS repositories are mounted), we have to mount it manually. We also mount the `cvmfs-config.cern.ch` repository, as that provides the configuration for `software.eessi.io` +To facilitate ingestion later on, we make sure that the `software.eessi.io` repository is available on our Stratum 0 machine as well. This allows us to leverage e.g. the Lmod installation from there to build the Lmod cache. Because the `cvmfs-server` cannot perform certain actions when `autofs` is enabled (which is usually how CVMFS repositories are mounted), we have to mount `software.eessi.io` manually. We also mount the `cvmfs-config.cern.ch` repository, as that provides the configuration for `software.eessi.io` ``` { .bash .copy } sudo mkdir -p /cvmfs/{cvmfs-config.cern.ch,software.eessi.io} @@ -101,9 +101,9 @@ The `-o root` tells CVMFS that this repository should be owned by root. Here, we have two options. -**Option 1:** we create a `.cvmfsdirtab` file in the root of the repository. This will tell CVMFS at which directory levels to create [catalog files](https://cvmfs.readthedocs.io/en/stable/cpt-details.html#nested-catalogs). We advise that you simply use the latest `.cvmfsdirtab` that is used for the upstream EESSI repository as well. You can get it from [the `EESSI/filesystem-layer` repository](https://github.com/EESSI/filesystem-layer/blob/main/roles/create_cvmfs_content_structure/files/.cvmfsdirtab) or simply copy it from `/cvmfs/software.eessi.io/.cvmfsdirtab` on a system where `EESSI` is available. The upside of this approach is that it creates catalogue files at the root of each EasyBuild installation prefix. This causes files that are typically accessed together (namely: that belong to the same software installation) to be indexed within the same catalog, which is typically good for performance. The downside is that if installations are extremely big, the catalog may exceed the largest size that CVMFS recommends (upto 200k files/dirs per catalog). +**Option 1:** we create a `.cvmfsdirtab` file in the root of the repository. This will tell CVMFS at which directory levels to create [catalog files](https://cvmfs.readthedocs.io/en/stable/cpt-details.html#nested-catalogs). We advise that you simply use the latest `.cvmfsdirtab` that is used for the upstream EESSI repository as well. You can get it from [the EESSI/filesystem-layer repository](https://github.com/EESSI/filesystem-layer/blob/main/roles/create_cvmfs_content_structure/files/.cvmfsdirtab) or simply copy it from `/cvmfs/software.eessi.io/.cvmfsdirtab` on a system where `EESSI` is available. The upside of Option 1 is that it creates catalogue files at the root of each EasyBuild installation prefix. This causes files that are typically accessed together (namely: that belong to the same software installation) to be indexed within the same catalog, which is typically good for performance. The downside is that if installations are extremely big, the catalog may exceed the largest size that CVMFS recommends (upto 200k files/dirs per catalog). -**Option 2:** you can configure your CVMFS server to do [automatic catalog creation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#automatic-management-of-nested-catalogs) by setting `CVMFS_AUTOCATALOGS=true` in the server configuration file (`/etc/cvmfs/repositories.d/$repo_name/server.conf`). The upside is that this option will ensure that the number of files per catalog stays within the recommended limits. The downside is that CVMFS does not know which files are commonly accessed together (e.g. because they belong to the same software installation) and might spread them over multiple catalogues - even when that's not strictly needed in terms of catalog size. +**Option 2:** you can configure your CVMFS server to do [automatic catalog creation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#automatic-management-of-nested-catalogs) by setting `CVMFS_AUTOCATALOGS=true` in the server configuration file (`/etc/cvmfs/repositories.d/$repo_name/server.conf`). The upside of Option 2 is that it will ensure that the number of files per catalog stays within the recommended limits. The downside is that CVMFS does not know which files are commonly accessed together (e.g. because they belong to the same software installation) and might spread them over multiple catalogues - even when that's not strictly needed in terms of catalog size. Here, we follow **Option 1**. @@ -119,6 +119,25 @@ sudo cvmfs_server publish -m "Add .cvfmsdirtab file and remove new_repository fi As you now have a `.cvmfsdirtab` file in place, you should see CVMFS going through the logic of creating catalogs as soon as you run the `cvmfs_server publish` command. No catalogs will be created at this point, as none of the directory structures listed in the `.cvmfsdirtab` file match existing directories in your repository (since it is still empty). CVMFS will warn you about the patterns that don't have any match ('WARNING: cannot apply pathspec') - these warnings are harmless and only serve as an indication that not all pathspecs in your `.cvmfsdirtab` file seem to actually exit (yet) in your repository. +**7. Setup automatic whitelist resigning" +Each CVMFS repository has a whitelist (`.cvmfswhitelist`) with fingerprints of certificates that are allowed to sign a repository manifest (`.cvmfspublished`) (see [signature details](https://cvmfs.readthedocs.io/en/stable/apx-security.html#signature-details)). This whitelist has to be resigned with the repository master key every 30 days (or every 7 days if using a smartcard, like a Yubikey, to store the master key) (see [master keys](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#master-keys)). You can check the current validity of the signature using + +``` { .bash .copy } +sudo cvmfs_server info $repo_name +``` + +Which will print something like: + +``` +Whitelist is valid for another X days +``` + +We recommend that you set up automatic resigning in a daily cronjob, e.g. + +``` { .bash .copy } +sudo bash -c "echo '0 11 * * * root /usr/bin/cvmfs_server resign $repo_name' > /etc/cron.d/cvmfs_resign" +``` + **Scripted summary of steps** For convenience, we list all the commands from the prior steps together: @@ -167,13 +186,17 @@ sudo cvmfs_server transaction $repo_name sudo bash -c "cat /cvmfs/software.eessi.io/.cvmfsdirtab | grep -v '^/versions/\*/compat' > /cvmfs/$repo_name/.cvmfsdirtab" sudo rm /cvmfs/$repo_name/new_repository sudo cvmfs_server publish -m "Add .cvfmsdirtab file and remove new_repository file" + +# Set up a daily cronjob to sign the .cvmfswhitelist +echo "Setting up a cronjob for daily whitelist signing" +sudo bash -c "echo '0 11 * * * root /usr/bin/cvmfs_server resign $repo_name' > /etc/cron.d/cvmfs_resign" ``` ### Sanity checking your Stratum 0 setup On the machine where you've set up your CVMFS stratum 0, you can perform some checks to see if things where set up correctly: -1. Check that the repository was created correctly: +**1. Check that the repository was created correctly** ``` { .bash .copy } cvmfs_server list @@ -181,10 +204,10 @@ cvmfs_server list lists all the Stratum servers installed on this machine and should report something like `$repo_name (stratum0 / local)`. -2. Check that two mount points are now present related to your repository: +**2. Check mount points for your repository** ``` { .bash .copy } -mount +mount | grep "$repo_name" ``` Should print something like @@ -196,15 +219,17 @@ overlay_$repo_name on /cvmfs/$repo_name type overlay (...) The first is a read-only mount of the current state of your repository. The second is an overlay filesystem that shows the current state of your repositories (as `lowerdir`) with any changes done in a currently open transaction (if any) overlayed on top (as `upperdir`, for which it uses `/var/spool/cvmfs/$repo_name/scratch/current`). I.e. it displays the state of your repository under `/cvmfs/$repo_name` as it will be once you publish any open transactions. -3. The directory +**3. Check the repository storage backend** + +The directory ``` { .bash .copy } -ls /srv/cvmfs/$repo_name +ls -al /srv/cvmfs/$repo_name ``` should now contain some hidden `.cvmfs<...>` files and a `data` directory. The latter is where the data in your repository will actually be stored. -4. The directory +**4. Check the repository contents** ``` { .bash .copy } ls -al /cvmfs/$repo_name @@ -212,8 +237,149 @@ ls -al /cvmfs/$repo_name should now show you the `.cvmfsdirtab` file we added in our transaction. +**5. Checking the repository info** + +``` { .bash .copy } +sudo cvmfs_server info $repo_name +``` + ### Setting up a CVMFS Stratum 1 +Again, the documentation below provides you with the minimal steps to set up a working Stratum 1 specifically aimed at hosting a site software stack on top of EESSI. There are a lot of things you can configure here, which are described in detail in the [upstream documentation](https://cvmfs.readthedocs.io/en/stable/cpt-replica.html). Also, the [CVMFS tutorial](https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/03_stratum1_proxies/) may be helpful. + +**1. Set up your environment** + +For convencience, let's start by redefining the repository name in an environment variable on our Stratum 1 machine, as well as our Stratum 0's IP (or DNS name): + +``` { .bash .copy} +site_tld=sitename.tld +repo_name="name.${site_tld}" +stratum0_ip= +``` + +**2. Install the `cvmfs-server` and `mod-wsgi` package** + +Note that although we will not use the `mod-wsgi` functionality (which is required for GEO-API lookups), we still need to install it. + +Typically: + +``` { .bash .copy} +wget https://cvmrepo.s3.cern.ch/cvmrepo/apt/cvmfs-release-latest_all.deb +sudo dpkg -i cvmfs-release-latest_all.deb +rm -f cvmfs-release-latest_all.deb +sudo apt-get -y update +sudo apt-get -y install cvmfs-server +sudo apt install -y libapache2-mod-wsgi-py3 +``` + +Note that the client package (`cvmfs`) is not needed on Stratum 1's. + +**3. Add repository master public key** + +On your CVMFS **Stratum 0**, check the contents of your master key: + +``` { .bash .copy} +cat "/etc/cvmfs/keys/${repo_name}.pub" +``` + +and copy that to `/etc/cvmfs/keys/${site_tld}/${repo_name}.pub` on your CVMFS **Stratum 1** (note that this is one level deeper than it was on the CVMFS Stratum 0). + +**4. Disable use of the Geo-API** + +The Geo API is an API that clients normally use to figure out which Stratum 1 is closest to them. This is useful for CVMFS repositories have Stratum 1's all over the world, but for a site repository, where all Stratum 1's are typically very close to the clients that use them anyway, it adds complexity we don't need, so we disable it. Note that if you want, you can keep it enabled and set it up [as documented upstream](https://cvmfs.readthedocs.io/en/stable/cpt-replica.html#geo-api-setup). + +``` { .bash .copy } +sudo bash -c "echo 'CVMFS_GEO_DB_FILE=NONE' > /etc/cvmfs/server.local" +``` + +**5. Create a replica** + +Now, we create a replica of the Stratum 0, owned by the current user `$USER` (no need for it to be owned by `root` here, as we will never want to overwrite anything here): + +``` { .bash .copy } +sudo cvmfs_server add-replica -o $USER http://${stratum0_ip}/cvmfs/${repo_name} /etc/cvmfs/keys/${site_tld}/ +``` + +Note that this command creates two configuration files for the replication: + +``` +/etc/cvmfs/repositories.d/$repo_name/server.conf +/etc/cvmfs/repositories.d/$repo_name/replica.conf +``` + +**6. Initiate first sychronization** + +We initialize the first synchronization manually: + +``` { .bash .copy } +sudo cvmfs_server snapshot ${repo_name} +``` + +**7. Set up a cronjob for synchronization** + +We create a cronjob that synchronizes your Stratum 1 to the Stratum 0 every 5 minutes. Note that if a previous `cvmfs_server snapshot` command is still running, it'll just skip the new invocation, so a short interval should not cause trouble. You can pick a different sychronization frequency if you like - just realize that this affects the delay with which new software will be visible on your clients. + +``` { .bash .copy } +sudo bash -c "echo '*/5 * * * * root output=\$(/usr/bin/cvmfs_server snapshot -a -i 2>&1) || echo \"\$output\"' > /etc/cron.d/cvmfs_stratum1_snapshot" +``` + +**8. Confirm the synchronization is working** + +While it is not easily possible to check which files are hosted on a Stratum 1, you can check the synchronization log at `/var/log/cvmfs/snapshots.log` to see if the synchronization process finshes correctly. The report also stathes the revision the Stratum 1 is serving ('Serving revision X'). You can cross-check that this is the latest revision by running on the **Stratum 0**: + +``` { .bash .copy } +sudo cvmfs_server tag "$repo_name" +``` + +**Scripted summary of steps** + +For convenience, we list all the commands from the prior steps together. Note that you'll manually have to copy in the CVMFS Stratum 0's public key. + +``` { .bash .copy } +# Define environment variables +site_tld=sitename.tld +repo_name="name.${site_tld}" +stratum0_ip= +echo "Setting up Stratum 1 for CVMFS repository: ${repo_name}, which is hosted on ${stratum0_ip}" + +# Install cvmfs-server and mod-wsgi +echo "Installing cvmfs-server and mod-wsgi" +wget https://cvmrepo.s3.cern.ch/cvmrepo/apt/cvmfs-release-latest_all.deb +sudo dpkg -i cvmfs-release-latest_all.deb +rm -f cvmfs-release-latest_all.deb +sudo apt-get -y update +sudo apt-get -y install cvmfs-server +sudo apt install -y libapache2-mod-wsgi-py3 + +# Add repository master public key +echo "You'll need to add the CVMFS Stratum 0 mast key before this step" +echo "Checking that it exists by printing the content of the public key file..." +cat /etc/cvmfs/keys/${site_tld}/${repo_name}.pub + +# Disable geo-api +echo "Disabling Geo-API" +sudo bash -c "echo 'CVMFS_GEO_DB_FILE=NONE' > /etc/cvmfs/server.local" + +# Create replica +echo "Creating replica from Stratum 0 at 'http://${stratum0_ip}/cvmfs/${repo_name}', using public key from directory '/etc/cvmfs/keys/${site_tld}/'. Replica will be owned by $USER." +sudo cvmfs_server add-replica -o $USER http://${stratum0_ip}/cvmfs/${repo_name} /etc/cvmfs/keys/${site_tld}/ + +# Creating first snapshot +echo "Creating first snapshot for $repo_name" +sudo cvmfs_server snapshot ${repo_name} + +# Setting up synchronization cronjob +echo "Setting up cronjob for synchronization" +sudo bash -c "echo '*/5 * * * * root output=\$(/usr/bin/cvmfs_server snapshot -a -i 2>&1) || echo \"\$output\"' > /etc/cron.d/cvmfs_stratum1_snapshot" +echo "Content of cronjob:" +cat /etc/cron.d/cvmfs_stratum1_snapshot + +# Checking that synchronization we are running the latest revision +echo "Checking that we are running the latest revision by checking the snapshot.log:" +tail /var/log/cvmfs/snapshots.log +``` + + ### Setting up proxies ### Configuring your CVMFS clients