Document approaches for site builds on top of EESSI#778
Draft
casparvl wants to merge 8 commits into
Draft
Conversation
ocaisa
requested changes
Jun 9, 2026
| 2. You need to make software available on (very) short notice to your users, and cannot wait for it to be deployed in upstream EESSI. | ||
| 3. You want to retain full autonomy over what gets deployed | ||
|
|
||
| While all of these are valid arguments, note that there is also one major downside to deploying things locally: you loose one of the core benefits of EESSI, namely that it provides _the same software on every system_. The more site-specific installations you have, the more difficult it will be for your users to move their workflows from e.g. their own development machine/cloud environment to your cluster, or scale up to larger clusters. If you're doing site-builds to make software available to your users on short notice, we highly encourage you to _also_ contribute the same software installation in upstream EESSI. This way, once accepted upstream, users that rely on that software retain their 'mobility'. |
Member
There was a problem hiding this comment.
Suggested change
| While all of these are valid arguments, note that there is also one major downside to deploying things locally: you loose one of the core benefits of EESSI, namely that it provides _the same software on every system_. The more site-specific installations you have, the more difficult it will be for your users to move their workflows from e.g. their own development machine/cloud environment to your cluster, or scale up to larger clusters. If you're doing site-builds to make software available to your users on short notice, we highly encourage you to _also_ contribute the same software installation in upstream EESSI. This way, once accepted upstream, users that rely on that software retain their 'mobility'. | |
| While all of these are valid arguments, note that there is also one major downside to deploying things locally: you loose one of the core benefits of EESSI, namely that it provides _the same software on every system_. The more site-specific installations you have, the more difficult it will be for your users to move their workflows from, e.g., their own development machine/cloud environment to your cluster, or scale up to larger clusters. If you're doing site-builds to make software available to your users on short notice, we highly encourage you to _also_ contribute the same software installation in upstream EESSI. This way, once accepted upstream, users that rely on that software retain their 'mobility'. |
|
|
||
| In both cases, you build 'on top' of EESSI, meaning that dependencies that are already provided by EESSI will not be reinstalled: they will simply be loaded from EESSI. | ||
|
|
||
| Here, we list some advantages and disadvantages to help you choose which approach best suites your requirements. |
Member
There was a problem hiding this comment.
Suggested change
| Here, we list some advantages and disadvantages to help you choose which approach best suites your requirements. | |
| Here, we list some advantages and disadvantages to help you choose which approach best suits your requirements. |
| - Automatically optimizes for the host on which you run the installation, and installs in architecture-specific prefix that matches the host architecture. This means you can install optimized software for each of your CPU/GPU architectures in an organized way. | ||
|
|
||
| Disadvantages: | ||
| - This is a manual procedure (unless you create your own automation around it). As such, doesn't scale well to installing large amounts of software and/or installing software for many different hardware targets. |
Member
There was a problem hiding this comment.
Suggested change
| - This is a manual procedure (unless you create your own automation around it). As such, doesn't scale well to installing large amounts of software and/or installing software for many different hardware targets. | |
| - This is a manual procedure (unless you create your own automation around it). As such, it doesn't scale well to installing large amounts of software and/or installing software for many different hardware targets. |
| Disadvantages: | ||
| - This is a manual procedure (unless you create your own automation around it). As such, doesn't scale well to installing large amounts of software and/or installing software for many different hardware targets. | ||
| - The fact that you get optimized installations means that on a very heterogeneous system, you will have to run the installation many times - once for each architecture on which you want to offer that particular piece of software. | ||
| - Shared filesystems (and especially _parallal_ filesystems) are generally ill-suited to serve software. This means start-up time can be quite long (you can find some numbers [here](../training-events/2025/tutorial-best-practices-cvmfs-hpc/performance.md)). |
Member
There was a problem hiding this comment.
Suggested change
| - Shared filesystems (and especially _parallal_ filesystems) are generally ill-suited to serve software. This means start-up time can be quite long (you can find some numbers [here](../training-events/2025/tutorial-best-practices-cvmfs-hpc/performance.md)). | |
| - Shared filesystems (and especially _parallel_ filesystems) are generally ill-suited to serve software. This means start-up time can be quite long (you can find some numbers [here](../training-events/2025/tutorial-best-practices-cvmfs-hpc/performance.md)). |
|
|
||
| Disadvantages | ||
| - More setup time | ||
| - Requires more extnesive knowledge (CVMFS, EESSI build bot, object store) |
Member
There was a problem hiding this comment.
Why is codespell not catching this?
Suggested change
| - Requires more extnesive knowledge (CVMFS, EESSI build bot, object store) | |
| - Requires more extensive knowledge (CVMFS, EESSI build bot, object store) |
| @@ -0,0 +1,221 @@ | |||
| # Leverage EESSI's build procedure for site builds | |||
| In this approach, you use the EESSI build bot (`EESSI/eessi-bot-software-layer`), together with the EESSI build scripts (`EESSI/software-layer-scripts`) to build and deploy software into a CernVM-FS repository of your own. Essentially, this means you'll build in a way that is essentially identical to how it is done for upstream EESSI - with the only major difference being the target CernVM-FS repository. | |||
Member
There was a problem hiding this comment.
Suggested change
| In this approach, you use the EESSI build bot (`EESSI/eessi-bot-software-layer`), together with the EESSI build scripts (`EESSI/software-layer-scripts`) to build and deploy software into a CernVM-FS repository of your own. Essentially, this means you'll build in a way that is essentially identical to how it is done for upstream EESSI - with the only major difference being the target CernVM-FS repository. | |
| In this approach, you use the EESSI build bot (`EESSI/eessi-bot-software-layer`), together with the EESSI build scripts (`EESSI/software-layer-scripts`) to build and deploy software into a CernVM-FS repository of your own. Essentially, this means you'll build in a way that is effectively identical to how it is done for upstream EESSI - with the only major difference being the target CernVM-FS repository. |
|
|
||
| !!! note | ||
|
|
||
| The recommended CVMFS setup requires a fair amount of machines. If this is more than you can afford, there are some tricks you can pull. First, you can combine each proxy with a Stratum 1 on the same machine, only use the proxies for proxy-ing upstream EESSI, and simply have your clients contact your site-specific Stratum 1's directly (without proxy). In this scenario, you can achieve load-balancing by configuring half your clients with `CVMFS_SERVER_URL="<instance_1>;<instance_2>"` and half with `CVMFS_SERVER_URL="<instance_2>;<instance_1>"`, where `instance_1` and `instance_2` are the IPs of your Stratum 1's. Finally, you can even use the Stratum 0 instead of a second Stratum 1. Note that this has security implications, as it means your Stratum 0 needs to be directly accessible to your clients. This is a potential concern: if there are vulnarebilities in the Stratum 0 software, end-users may be able to push (malicious) software in there. |
Member
There was a problem hiding this comment.
Suggested change
| The recommended CVMFS setup requires a fair amount of machines. If this is more than you can afford, there are some tricks you can pull. First, you can combine each proxy with a Stratum 1 on the same machine, only use the proxies for proxy-ing upstream EESSI, and simply have your clients contact your site-specific Stratum 1's directly (without proxy). In this scenario, you can achieve load-balancing by configuring half your clients with `CVMFS_SERVER_URL="<instance_1>;<instance_2>"` and half with `CVMFS_SERVER_URL="<instance_2>;<instance_1>"`, where `instance_1` and `instance_2` are the IPs of your Stratum 1's. Finally, you can even use the Stratum 0 instead of a second Stratum 1. Note that this has security implications, as it means your Stratum 0 needs to be directly accessible to your clients. This is a potential concern: if there are vulnarebilities in the Stratum 0 software, end-users may be able to push (malicious) software in there. | |
| The recommended CVMFS setup requires a fair amount of machines. If this is more than you can afford, there are some tricks you can pull. First, you can combine each proxy with a Stratum 1 on the same machine, only use the proxies for proxy-ing upstream EESSI, and simply have your clients contact your site-specific Stratum 1's directly (without proxy). In this scenario, you can achieve load-balancing by configuring half your clients with `CVMFS_SERVER_URL="<instance_1>;<instance_2>"` and half with `CVMFS_SERVER_URL="<instance_2>;<instance_1>"`, where `instance_1` and `instance_2` are the IPs of your Stratum 1's. Finally, you can even use the Stratum 0 instead of a second Stratum 1. Note that this has security implications, as it means your Stratum 0 needs to be directly accessible to your clients. This is a potential concern: if there are vulnerabilities in the Stratum 0 software, end-users may be able to push (malicious) software in there. |
|
|
||
| !!! note | ||
|
|
||
| The reason we configure `root` to be the owner of the CVMFS repository is that EasyBuild, when configured through `EESSI-extend`, by default creates read-only installations. This causes issues if CVMFS has to put catalog files (`.cvmfscatalog`) files in these directories, which are metadata files that CVMFS uses to list the files/directories present in the repository. While it is technically possible to use a regular user, this would require making all directories in which CVMFS would create a `.cvmfscatalog` file writeable in a transaction, then create the catalog files, then remove the write permissions again. The same approach would need to be taken to reinstall software that was already installed. We consider this unnecessarily complex, and instead prefer to have the repository owned by root. |
Member
There was a problem hiding this comment.
Suggested change
| The reason we configure `root` to be the owner of the CVMFS repository is that EasyBuild, when configured through `EESSI-extend`, by default creates read-only installations. This causes issues if CVMFS has to put catalog files (`.cvmfscatalog`) files in these directories, which are metadata files that CVMFS uses to list the files/directories present in the repository. While it is technically possible to use a regular user, this would require making all directories in which CVMFS would create a `.cvmfscatalog` file writeable in a transaction, then create the catalog files, then remove the write permissions again. The same approach would need to be taken to reinstall software that was already installed. We consider this unnecessarily complex, and instead prefer to have the repository owned by root. | |
| The reason we configure `root` to be the owner of the CVMFS repository is that EasyBuild, when configured through `EESSI-extend`, by default creates read-only installations. This causes issues if CVMFS has to put catalog files (`.cvmfscatalog`) files in these directories, which are metadata files that CVMFS uses to list the files/directories present in the repository. While it is technically possible to use a regular user, this would require making all directories in which CVMFS would create a `.cvmfscatalog` file writeable in a transaction, then create the catalog files, then remove the write permissions again. The same approach would need to be taken to reinstall software what was already installed. We consider this unnecessarily complex, and instead prefer to have the repository owned by root. |
casparvl
commented
Jun 10, 2026
| This documentation will go through the steps to set each of these up, in order. Since many of these individual steps are documented elsewhere, we will often reference that (and only list a very short summary here). | ||
|
|
||
| ## Site-specific CVMFS infrastructure | ||
| The recommended CVMFS setup for a site-specific CVMFS repository is: |
Collaborator
Author
There was a problem hiding this comment.
TODO: Add picture for this
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.