|
| 1 | +--- |
| 2 | +authors: [admccartney] |
| 3 | +date: 2026-04-02 |
| 4 | +slug: eessi-musica |
| 5 | +--- |
| 6 | + |
| 7 | +# Choosing EESSI as a base for MUSICA |
| 8 | + |
| 9 | +<figure markdown="span"> |
| 10 | + {width=75%} |
| 11 | + <figcaption>(c) Matthias Heisler 2026</figcaption> |
| 12 | +</figure> |
| 13 | + |
| 14 | +MUSICA (Multi-Site Computer Austria) is the latest addition to Austria's |
| 15 | +national supercomputing infrastructure. The system's compute resources |
| 16 | +are distributed across three locations in Austria: Vienna, Innsbruck, |
| 17 | +and Linz. We describe the process that led to the adoption of EESSI |
| 18 | +as a base for the software stack on the MUSICA system at the [Austrian |
| 19 | +Scientific Computing (ASC) research center](https://asc.ac.at/home/). |
| 20 | + |
| 21 | +<!-- more --> |
| 22 | + |
| 23 | +The background section aims to provide a brief history of how cluster |
| 24 | +computing at ASC has evolved, with a particular focus on the various |
| 25 | +incarnations of the software stack. We outline our motivations for |
| 26 | +redesigning a system that delivers the software stack, for initial |
| 27 | +use on the MUSICA HPC system. We describe the timeline of events that |
| 28 | +lead to the experiments with EESSI and EasyBuild, and offer details of |
| 29 | +the two complementary approaches of building a software stack that we |
| 30 | +compared. Finally, we offer a critical reflection on our experiments |
| 31 | +and outline our ultimate reason for choosing to use EESSI as a base and |
| 32 | +blueprint for the software stack. |
| 33 | + |
| 34 | + |
| 35 | +## Background |
| 36 | + |
| 37 | +The ASC (formerly VSC) is a national center for high performance |
| 38 | +computing and research powered by scientific software. The flagship |
| 39 | +cluster VSC-1 was in service from 2009-2015, succeeded by a series of |
| 40 | +clusters (2-5)[^1]. VSC 4 and 5 are the two clusters that remain in |
| 41 | +service as of 2026, they will be joined in April 2026 by a new cluster |
| 42 | +MUSICA, which stands for Multi-Site Compute Austria. MUSICA is a GPU |
| 43 | +centric cluster run on OpenStack and has so far been the main testing |
| 44 | +ground for our initial experiments with EasyBuild and EESSI. |
| 45 | + |
| 46 | +The management of the software stack at ASC evolved along the following |
| 47 | +lines: |
| 48 | + |
| 49 | ++ VSC 1, 2: Initially catered to small groups of expert users, all |
| 50 | + software was installed manually. |
| 51 | + |
| 52 | ++ VSC 3, 4: Still partially managed by hand. A set of scripting tools for |
| 53 | + structuring software directory trees. These tools were initially copied |
| 54 | + from Innsbruck and adapted to work on the VSC. Use of Tcl modules was also |
| 55 | + adopted at this time. |
| 56 | + |
| 57 | ++ VSC 4, 5: Spack introduced (reduced the need for custom install |
| 58 | + scripts, install lots of software quickly, pull in dependencies |
| 59 | + automatically). |
| 60 | + |
| 61 | +## Motivation |
| 62 | + |
| 63 | +Internal discussions led to a comprehensive understanding of where |
| 64 | +the current software stack was lacking and where it would ideally be. |
| 65 | +During the discussions, members of the user support team were able |
| 66 | +to clearly articulate the various use cases generated by users. This |
| 67 | +lead to setting a number of high level goals that were used to derive |
| 68 | +requirements. At a very high level, some of the more important goals can |
| 69 | +be summarized as: |
| 70 | + |
| 71 | + - Improved reproducibility and redeployment. |
| 72 | + - Establishment of clear release cycles. |
| 73 | + - Creation of a more organized and user-friendly representations for the |
| 74 | + cluster users. |
| 75 | + |
| 76 | +We articulated what an ideal software stack should look like, and we |
| 77 | +identified a number of issues with the way the software stack was |
| 78 | +currently managed. |
| 79 | + |
| 80 | +### Tooling & Presentation |
| 81 | + |
| 82 | +The way that we had been using Spack and Tcl Modules had led to a |
| 83 | +fairly unmanageable situation on our clusters. To meet user requests |
| 84 | +for software, we adopted a pragmatic approach. This led to a situation |
| 85 | +in which a myriad of software variants were installed into the shared |
| 86 | +file system hosting the systems' software. This quickly led to a |
| 87 | +fairly overwhelming presentation of available modules to the user. |
| 88 | +Another major issue here was that there were significant issues around |
| 89 | +deduplication. We don't know the root cause of this, it may just have |
| 90 | +been a misconfigured Spack. In any case, we ended up in an untenable |
| 91 | +situation where certain dependencies would get installed many times |
| 92 | +over. For example, there were multiple installs of the same OpenMPI |
| 93 | +version on the system, all built slightly differently and most untested |
| 94 | +on the systems. This meant that there was no way to indicate to the user |
| 95 | +which version of a particular software was the one that worked. |
| 96 | + |
| 97 | +### Build procedure hard to reproduce |
| 98 | + |
| 99 | +During the last operating system upgrade, the need for a more automated |
| 100 | +build process was painfully felt. Because most software was built ad-hoc |
| 101 | +in response to user request, sometimes the only record of the build procedure |
| 102 | +were the build artefacts themselves. This meant manually going over a very large |
| 103 | +software repository and rebuilding everything more or less by hand for the new |
| 104 | +operating system. |
| 105 | + |
| 106 | +### Poor bus factor |
| 107 | + |
| 108 | +This one refers to the well known metric from software engineering about |
| 109 | +the degree of shared knowledge on a specialized domain within the team. |
| 110 | +How many people would have to be hit by a bus before the team could |
| 111 | +cease to carry out its work? In this particular case, the knowledge about the |
| 112 | +software stack was concentrated in one or two individuals. |
| 113 | + |
| 114 | +## Searching |
| 115 | + |
| 116 | +As outlined above, the numerous issues with the current stack |
| 117 | +established the frame in which to search for a set of tools and methods |
| 118 | +to ease the realisation of the high level goals for the software |
| 119 | +stack. To reiterate, manageability and user-friendliness were top of the list. |
| 120 | + |
| 121 | + |
| 122 | +### Timeline |
| 123 | + |
| 124 | +We formed the The Software And Modules (SAM) working group in Q4 2024. |
| 125 | +SAM consists of 5 people that are dedicating the majority of their |
| 126 | +time to exploring possible alternative ways of building, managing and |
| 127 | +presenting the software stack to users. The members draw on expertise |
| 128 | +from different areas, notably from their work on the user-support, |
| 129 | +sysadmin and platform teams. The goal for the new software stack was to |
| 130 | +have it up and running on the new MUSICA system towards the end of 2025. |
| 131 | + |
| 132 | ++ *Summer 2024*: |
| 133 | + Initial meetings that highlighted the need to reform the management |
| 134 | + of software so that it could be easy to use, transparent and logical, |
| 135 | + as well as tested and performant. This is the first mention of |
| 136 | + EESSI/EasyBuild as possible alternatives to Spack and Lmod as an alternative |
| 137 | + to Tcl Modules. |
| 138 | + |
| 139 | ++ *Autumn 2024*: |
| 140 | + Working group established and a broad set of tools and approaches were |
| 141 | + compared, namely an installation of Spack with Environment modules, |
| 142 | + an installation of Guix, and an installation of EESSI. |
| 143 | + These tools were evaluated against a set of high level user |
| 144 | + requirements that we agreed. The outcome was to focus on Easybuild and |
| 145 | + EESSI. |
| 146 | + |
| 147 | ++ *Winter 2024 - Spring 2025*: |
| 148 | + Made the strategic decision to have EESSI installed on the MUSICA |
| 149 | + system. Decided to run a small experiment whereby a small software |
| 150 | + stack would be built and installed, in order to compare and contrast |
| 151 | + approaches - "EESSI on the side" vs. "EESSI as a base". |
| 152 | + |
| 153 | ++ *Summer 2025*: |
| 154 | + In June 2025, the system entered a closed test phase. In this phase |
| 155 | + the system was open to a small number of power users. The core |
| 156 | + software was provided by EESSI. The custom stack was extended during this |
| 157 | + phase, in response to user software requests that center mostly around |
| 158 | + proprietary software. |
| 159 | + |
| 160 | ++ *Autumn 2025 - Winter 2025/2026*: |
| 161 | + In November 2025 the MUSICA open test phase began. At this stage |
| 162 | + anyone with an existing account at ASC was granted access to the |
| 163 | + system upon request. At the end of the open test phase, users |
| 164 | + participated in a survey. Generally the response was quite positive |
| 165 | + towards the setup of the system. |
| 166 | + |
| 167 | + - Users categorized their usage according to scientific domain, the largest |
| 168 | + groups were: |
| 169 | + Physics (45), AI (41), Chemistry (24), Data Science (15), Bioinformatics (11) |
| 170 | + |
| 171 | + - In response to a question as to whether the module system was used, or if |
| 172 | + the user relied on individual installations: 32 used the module system; 24 |
| 173 | + preferred an individual installation; 43 used a mixture of both. |
| 174 | + |
| 175 | + - What did users used to build, install or run their software? Of 99 |
| 176 | + respondents: |
| 177 | + + 63: Conda/Pip |
| 178 | + + 21: `EESSI-extend` |
| 179 | + + 16: None of these |
| 180 | + + 15: Containers |
| 181 | + + 13: buildenv |
| 182 | + + 5: Spack |
| 183 | + |
| 184 | + - 5 out of 77 comments on the experience of compiling software on the |
| 185 | + system explicitly mention using `$LD_LIBRARY_PATH`. Despite having |
| 186 | + highlighted the recommendation to use the `buildenv` modules when |
| 187 | + compiling, the users preferred their own approach. |
| 188 | + Generally the `buildenv` modules and usage of RPATH wrappers is not |
| 189 | + that well understood on the SAM team, so it's hard to explain to |
| 190 | + users *why* the should be using this approach. |
| 191 | + |
| 192 | + |
| 193 | +## Experiments |
| 194 | + |
| 195 | +### Test stack |
| 196 | + |
| 197 | +The following programs were agreed upon as a way to come in to contact |
| 198 | +with specific workflows, such as writing easyconfig files, writing |
| 199 | +custom easybuild hooks, installing commercial software, installing gpu |
| 200 | +specific application software. |
| 201 | + |
| 202 | ++ AOCC 5.0.0 |
| 203 | ++ Intel Compilers |
| 204 | ++ Vasp 6.5.0 |
| 205 | ++ 1 Commercial software (starccm, mathematica) |
| 206 | ++ NVHPC |
| 207 | ++ VASP 6.5.0 GPU |
| 208 | ++ Containers (singularity, docker, nvidia) |
| 209 | + |
| 210 | +### EESSI on the side |
| 211 | + |
| 212 | +This approach in a sense represents the traditional way to build a |
| 213 | +software stack, building everything directly on the host (Rocky Linux 9), and |
| 214 | +relying on system libraries. It used scripts and wrappers from the sse2 |
| 215 | +toolkit from National Supercomputer Centre at Linköping University as |
| 216 | +a way to manage and structure the modules and software installations. |
| 217 | +The software builds were a mixture of EasyBuild scripts and makefiles. |
| 218 | +EESSI was offered as a module in its pure form and in general users were |
| 219 | +discouraged from using `EESSI-extend`, or at their own risk. |
| 220 | + |
| 221 | +### EESSI as a base |
| 222 | + |
| 223 | +With this approach, we leveraged `EESSI-extend` extensively and aimed to |
| 224 | +build the whole stack with the compatibility layer from EESSI as a base. |
| 225 | +The learning curve for building software more or less moved back and |
| 226 | +forth between three distinct phases, leveraging the various possible |
| 227 | +settings for the `EESSI-extend` module. |
| 228 | + |
| 229 | ++ Phase 0 -> `$EESSI_USER_INSTALL` |
| 230 | ++ Phase 1 -> `$EESSI_SITE_INSTALL` |
| 231 | ++ Phase 2 -> `$EESSI_PROJECT_INSTALL` set to `/cvmfs/software.asc.ac.at` |
| 232 | + |
| 233 | + |
| 234 | +## Reflections |
| 235 | + |
| 236 | +### EESSI on the side |
| 237 | + |
| 238 | +By comparison, it was much quicker and easier to build all the software |
| 239 | +in list using this approach. It also offers a lot of control to the |
| 240 | +sysadmin who builds the software and doing things like tweaking or |
| 241 | +modifying module files in place was possible. The downsides were |
| 242 | +reproducibility and portability, there would be obvious work involved |
| 243 | +with building the stack again upon the next OS upgrade. That said, |
| 244 | +everything worked much more smoothly than with `EESSI-extend`, it was |
| 245 | +possible to build all the software that was listed and run basic tests |
| 246 | +with Slurm. We had some open questions around interoperability between |
| 247 | +custom modules and EESSI, and whether it would be problematic to mix |
| 248 | +modules from the two independent stacks without running into issues |
| 249 | +(probably not due to different libc versions). |
| 250 | + |
| 251 | + |
| 252 | +### EESSI as a base |
| 253 | + |
| 254 | +By the end of the closed test phase of MUSICA, the engineering team |
| 255 | +chose EESSI as the foundation for the software stack. While this approach |
| 256 | +introduced complexity into our build and installation workflows, it |
| 257 | +enabled us to meet certain key requirements for the MUSICA software |
| 258 | +infrastructure. |
| 259 | + |
| 260 | +Specifically, we leveraged CernVM-FS to distribute the software stack across |
| 261 | +the three sites - Vienna, Linz, and Innsbruck. EESSI offers access |
| 262 | +to approximately 1960 modules that are ready to load on the target |
| 263 | +architecture. Setting up EESSI was quite straight forward, and despite |
| 264 | +team members finding the many options of installing with `EESSI-extend` |
| 265 | +module too complex, adopting this method aligned with modern practices |
| 266 | +for managing HPC software. EESSI is open source, well documented, and |
| 267 | +maintained by colleagues within Europe's HPC ecosystem. |
| 268 | + |
| 269 | +Engaging with EESSI's documentation, source code, and community proved |
| 270 | +valuable. We identified a reusable blueprint that we could adapt to fit |
| 271 | +our specific needs. Despite the initial learning curve, this approach |
| 272 | +provided long-term benefits in terms of maintainability and scalability. |
| 273 | + |
| 274 | + |
| 275 | +[^1]: <https://docs.vsc.ac.at/systems/> |
0 commit comments