Skip to content

Commit 67edc97

Browse files
authored
Merge pull request #723 from adammccartney/blogpost-eessi-musica
Blogpost eessi musica
2 parents 4db6139 + 4276a0a commit 67edc97

4 files changed

Lines changed: 281 additions & 1 deletion

File tree

.github/workflows/test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ jobs:
2121
check_filenames: true
2222
# MarkDown files in docs/available_software/detail are skipped because they are auto-generated
2323
skip: '*.pdf,.git,*.json,./docs/available_software/detail/*.md'
24-
ignore_words_list: Fram,fram,ND,nd
24+
ignore_words_list: Fram,fram,ND,nd,Linz
2525

2626
# - name: Markdown Linting Action
2727
# uses: avto-dev/markdown-lint@v1.2.0

docs/blog/.authors.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,3 +119,8 @@ authors:
119119
description: European Molecular Biology Laboratory, Germany
120120
avatar: https://avatars.githubusercontent.com/u/44709261?v=4
121121
slug: https://github.com/stefanomarangoni495
122+
admccartney:
123+
name: Adam McCartney
124+
description: Austrian Scientific Computing (ASC), TU Wien
125+
avatar: https://avatars.githubusercontent.com/u/35410331?v=4
126+
slug: https://github.com/adammccartney
2.24 MB
Loading
Lines changed: 275 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,275 @@
1+
---
2+
authors: [admccartney]
3+
date: 2026-04-02
4+
slug: eessi-musica
5+
---
6+
7+
# Choosing EESSI as a base for MUSICA
8+
9+
<figure markdown="span">
10+
![MUSICA](MUSICA-v2-32-Matthias_Heisler.jpg){width=75%}
11+
<figcaption>(c) Matthias Heisler 2026</figcaption>
12+
</figure>
13+
14+
MUSICA (Multi-Site Computer Austria) is the latest addition to Austria's
15+
national supercomputing infrastructure. The system's compute resources
16+
are distributed across three locations in Austria: Vienna, Innsbruck,
17+
and Linz. We describe the process that led to the adoption of EESSI
18+
as a base for the software stack on the MUSICA system at the [Austrian
19+
Scientific Computing (ASC) research center](https://asc.ac.at/home/).
20+
21+
<!-- more -->
22+
23+
The background section aims to provide a brief history of how cluster
24+
computing at ASC has evolved, with a particular focus on the various
25+
incarnations of the software stack. We outline our motivations for
26+
redesigning a system that delivers the software stack, for initial
27+
use on the MUSICA HPC system. We describe the timeline of events that
28+
lead to the experiments with EESSI and EasyBuild, and offer details of
29+
the two complementary approaches of building a software stack that we
30+
compared. Finally, we offer a critical reflection on our experiments
31+
and outline our ultimate reason for choosing to use EESSI as a base and
32+
blueprint for the software stack.
33+
34+
35+
## Background
36+
37+
The ASC (formerly VSC) is a national center for high performance
38+
computing and research powered by scientific software. The flagship
39+
cluster VSC-1 was in service from 2009-2015, succeeded by a series of
40+
clusters (2-5)[^1]. VSC 4 and 5 are the two clusters that remain in
41+
service as of 2026, they will be joined in April 2026 by a new cluster
42+
MUSICA, which stands for Multi-Site Compute Austria. MUSICA is a GPU
43+
centric cluster run on OpenStack and has so far been the main testing
44+
ground for our initial experiments with EasyBuild and EESSI.
45+
46+
The management of the software stack at ASC evolved along the following
47+
lines:
48+
49+
+ VSC 1, 2: Initially catered to small groups of expert users, all
50+
software was installed manually.
51+
52+
+ VSC 3, 4: Still partially managed by hand. A set of scripting tools for
53+
structuring software directory trees. These tools were initially copied
54+
from Innsbruck and adapted to work on the VSC. Use of Tcl modules was also
55+
adopted at this time.
56+
57+
+ VSC 4, 5: Spack introduced (reduced the need for custom install
58+
scripts, install lots of software quickly, pull in dependencies
59+
automatically).
60+
61+
## Motivation
62+
63+
Internal discussions led to a comprehensive understanding of where
64+
the current software stack was lacking and where it would ideally be.
65+
During the discussions, members of the user support team were able
66+
to clearly articulate the various use cases generated by users. This
67+
lead to setting a number of high level goals that were used to derive
68+
requirements. At a very high level, some of the more important goals can
69+
be summarized as:
70+
71+
- Improved reproducibility and redeployment.
72+
- Establishment of clear release cycles.
73+
- Creation of a more organized and user-friendly representations for the
74+
cluster users.
75+
76+
We articulated what an ideal software stack should look like, and we
77+
identified a number of issues with the way the software stack was
78+
currently managed.
79+
80+
### Tooling & Presentation
81+
82+
The way that we had been using Spack and Tcl Modules had led to a
83+
fairly unmanageable situation on our clusters. To meet user requests
84+
for software, we adopted a pragmatic approach. This led to a situation
85+
in which a myriad of software variants were installed into the shared
86+
file system hosting the systems' software. This quickly led to a
87+
fairly overwhelming presentation of available modules to the user.
88+
Another major issue here was that there were significant issues around
89+
deduplication. We don't know the root cause of this, it may just have
90+
been a misconfigured Spack. In any case, we ended up in an untenable
91+
situation where certain dependencies would get installed many times
92+
over. For example, there were multiple installs of the same OpenMPI
93+
version on the system, all built slightly differently and most untested
94+
on the systems. This meant that there was no way to indicate to the user
95+
which version of a particular software was the one that worked.
96+
97+
### Build procedure hard to reproduce
98+
99+
During the last operating system upgrade, the need for a more automated
100+
build process was painfully felt. Because most software was built ad-hoc
101+
in response to user request, sometimes the only record of the build procedure
102+
were the build artefacts themselves. This meant manually going over a very large
103+
software repository and rebuilding everything more or less by hand for the new
104+
operating system.
105+
106+
### Poor bus factor
107+
108+
This one refers to the well known metric from software engineering about
109+
the degree of shared knowledge on a specialized domain within the team.
110+
How many people would have to be hit by a bus before the team could
111+
cease to carry out its work? In this particular case, the knowledge about the
112+
software stack was concentrated in one or two individuals.
113+
114+
## Searching
115+
116+
As outlined above, the numerous issues with the current stack
117+
established the frame in which to search for a set of tools and methods
118+
to ease the realisation of the high level goals for the software
119+
stack. To reiterate, manageability and user-friendliness were top of the list.
120+
121+
122+
### Timeline
123+
124+
We formed the The Software And Modules (SAM) working group in Q4 2024.
125+
SAM consists of 5 people that are dedicating the majority of their
126+
time to exploring possible alternative ways of building, managing and
127+
presenting the software stack to users. The members draw on expertise
128+
from different areas, notably from their work on the user-support,
129+
sysadmin and platform teams. The goal for the new software stack was to
130+
have it up and running on the new MUSICA system towards the end of 2025.
131+
132+
+ *Summer 2024*:
133+
Initial meetings that highlighted the need to reform the management
134+
of software so that it could be easy to use, transparent and logical,
135+
as well as tested and performant. This is the first mention of
136+
EESSI/EasyBuild as possible alternatives to Spack and Lmod as an alternative
137+
to Tcl Modules.
138+
139+
+ *Autumn 2024*:
140+
Working group established and a broad set of tools and approaches were
141+
compared, namely an installation of Spack with Environment modules,
142+
an installation of Guix, and an installation of EESSI.
143+
These tools were evaluated against a set of high level user
144+
requirements that we agreed. The outcome was to focus on Easybuild and
145+
EESSI.
146+
147+
+ *Winter 2024 - Spring 2025*:
148+
Made the strategic decision to have EESSI installed on the MUSICA
149+
system. Decided to run a small experiment whereby a small software
150+
stack would be built and installed, in order to compare and contrast
151+
approaches - "EESSI on the side" vs. "EESSI as a base".
152+
153+
+ *Summer 2025*:
154+
In June 2025, the system entered a closed test phase. In this phase
155+
the system was open to a small number of power users. The core
156+
software was provided by EESSI. The custom stack was extended during this
157+
phase, in response to user software requests that center mostly around
158+
proprietary software.
159+
160+
+ *Autumn 2025 - Winter 2025/2026*:
161+
In November 2025 the MUSICA open test phase began. At this stage
162+
anyone with an existing account at ASC was granted access to the
163+
system upon request. At the end of the open test phase, users
164+
participated in a survey. Generally the response was quite positive
165+
towards the setup of the system.
166+
167+
- Users categorized their usage according to scientific domain, the largest
168+
groups were:
169+
Physics (45), AI (41), Chemistry (24), Data Science (15), Bioinformatics (11)
170+
171+
- In response to a question as to whether the module system was used, or if
172+
the user relied on individual installations: 32 used the module system; 24
173+
preferred an individual installation; 43 used a mixture of both.
174+
175+
- What did users used to build, install or run their software? Of 99
176+
respondents:
177+
+ 63: Conda/Pip
178+
+ 21: `EESSI-extend`
179+
+ 16: None of these
180+
+ 15: Containers
181+
+ 13: buildenv
182+
+ 5: Spack
183+
184+
- 5 out of 77 comments on the experience of compiling software on the
185+
system explicitly mention using `$LD_LIBRARY_PATH`. Despite having
186+
highlighted the recommendation to use the `buildenv` modules when
187+
compiling, the users preferred their own approach.
188+
Generally the `buildenv` modules and usage of RPATH wrappers is not
189+
that well understood on the SAM team, so it's hard to explain to
190+
users *why* the should be using this approach.
191+
192+
193+
## Experiments
194+
195+
### Test stack
196+
197+
The following programs were agreed upon as a way to come in to contact
198+
with specific workflows, such as writing easyconfig files, writing
199+
custom easybuild hooks, installing commercial software, installing gpu
200+
specific application software.
201+
202+
+ AOCC 5.0.0
203+
+ Intel Compilers
204+
+ Vasp 6.5.0
205+
+ 1 Commercial software (starccm, mathematica)
206+
+ NVHPC
207+
+ VASP 6.5.0 GPU
208+
+ Containers (singularity, docker, nvidia)
209+
210+
### EESSI on the side
211+
212+
This approach in a sense represents the traditional way to build a
213+
software stack, building everything directly on the host (Rocky Linux 9), and
214+
relying on system libraries. It used scripts and wrappers from the sse2
215+
toolkit from National Supercomputer Centre at Linköping University as
216+
a way to manage and structure the modules and software installations.
217+
The software builds were a mixture of EasyBuild scripts and makefiles.
218+
EESSI was offered as a module in its pure form and in general users were
219+
discouraged from using `EESSI-extend`, or at their own risk.
220+
221+
### EESSI as a base
222+
223+
With this approach, we leveraged `EESSI-extend` extensively and aimed to
224+
build the whole stack with the compatibility layer from EESSI as a base.
225+
The learning curve for building software more or less moved back and
226+
forth between three distinct phases, leveraging the various possible
227+
settings for the `EESSI-extend` module.
228+
229+
+ Phase 0 -> `$EESSI_USER_INSTALL`
230+
+ Phase 1 -> `$EESSI_SITE_INSTALL`
231+
+ Phase 2 -> `$EESSI_PROJECT_INSTALL` set to `/cvmfs/software.asc.ac.at`
232+
233+
234+
## Reflections
235+
236+
### EESSI on the side
237+
238+
By comparison, it was much quicker and easier to build all the software
239+
in list using this approach. It also offers a lot of control to the
240+
sysadmin who builds the software and doing things like tweaking or
241+
modifying module files in place was possible. The downsides were
242+
reproducibility and portability, there would be obvious work involved
243+
with building the stack again upon the next OS upgrade. That said,
244+
everything worked much more smoothly than with `EESSI-extend`, it was
245+
possible to build all the software that was listed and run basic tests
246+
with Slurm. We had some open questions around interoperability between
247+
custom modules and EESSI, and whether it would be problematic to mix
248+
modules from the two independent stacks without running into issues
249+
(probably not due to different libc versions).
250+
251+
252+
### EESSI as a base
253+
254+
By the end of the closed test phase of MUSICA, the engineering team
255+
chose EESSI as the foundation for the software stack. While this approach
256+
introduced complexity into our build and installation workflows, it
257+
enabled us to meet certain key requirements for the MUSICA software
258+
infrastructure.
259+
260+
Specifically, we leveraged CernVM-FS to distribute the software stack across
261+
the three sites - Vienna, Linz, and Innsbruck. EESSI offers access
262+
to approximately 1960 modules that are ready to load on the target
263+
architecture. Setting up EESSI was quite straight forward, and despite
264+
team members finding the many options of installing with `EESSI-extend`
265+
module too complex, adopting this method aligned with modern practices
266+
for managing HPC software. EESSI is open source, well documented, and
267+
maintained by colleagues within Europe's HPC ecosystem.
268+
269+
Engaging with EESSI's documentation, source code, and community proved
270+
valuable. We identified a reusable blueprint that we could adapt to fit
271+
our specific needs. Despite the initial learning curve, this approach
272+
provided long-term benefits in terms of maintainability and scalability.
273+
274+
275+
[^1]: <https://docs.vsc.ac.at/systems/>

0 commit comments

Comments
 (0)