About the service

The CISM manage several compute clusters. A cluster is a made of computers (nodes) that are interconnected and appear to the user as one large machine. The cluster is accessed through a frontend where users can manage their jobs and data in their home directory. Each node has a certain number of processors, a certain amount of memory (RAM) and some local storage (scratch space). Each processor comprises several independent computing units (cores). In a hardware context, a CPU is often understood as a processor die, which you can buy from a vendor and fits into a socket on the motherboard, while in a software context, a CPU is often understood as one compute unit, a.k.a. a core.

The CISM operates two clusters: Lemaitre and Manneback. Lemaitre was named in reference of Georges Lemaitre (17 July 1894 – 20 June 1966), a Belgian priest, astronomer and professor of physics at UCLouvain. He is seen by many as the father of the Big Bang Theory but also he is the one who brought the first supercomputer to our University. Manneback was named after Charles Manneback (1894-1975), Professor of Physics at UCLouvain. Close friend to Georges Lemaitre, he was the lead of the FNRS-IRSIA project to build the first supercomputer in Belgium in the 50’s.

Lemaitre

The Lemaitre cluster is a CÉCI clusters and is shared ammong all CÉCI users, by contrast with Manneback, which is a UCLouvain-only machine. Consequently, all documentation regarding Lemaitre will be found on the CÉCI documentation website, while only Manneback is described in this document.

Manneback

Manneback is a cluster built with hardware acquired progressively thanks to multiple funding solutions brought by CISM users.

Job sumbission

As for all the CÉCI clusters, job submission is managed on Manneback by Slurm. More details about Slurm and how to submit your job with Slurm can be found here.

Available hardware

While CÉCI clusters are mostly homogeneous, Manneback is made of several different generations of hardware.

Use the sinfo command to learn about the available hardware on Manneback

[dfr@mbackf1 ~]$ sinfo
Partitions:
Def* (5days)        Zoe (12hours)   cp3 (5days)     cp3-gpu (5days) gpu (5days)
Nodes:
#Nodes  Partition  CPU                       Cores/Slots  Memory  GPUs
1       cp3-gpu    SandyBridge,Xeon,E5-2640  20           63G     TeslaK80:2
14      cp3        IvyBridge,Xeon,E5-2695v2  48           126G
2       cp3        K10,Opteron,6134          16           31G
8       cp3        Rome,EPYC,7452            128          504G
2       cp3        SandyBridge,Xeon,E5-2670  32           63G
20      cp3        SkyLake,Xeon,4116         48           187G
16      cp3        SkyLake,Xeon,4214         48           187G
2       Def*       Bulldozer,Opteron,6276    32           126G
23      Def*       Haswell,Xeon,E5-2630v3    16           63G
8       Def*       IvyBridge,Xeon,E5-2650v2  16           63G
3       Def*       Nehalem,Xeon,L5520        16           23G
8       Def*       Nehalem,Xeon,L5520        8            23G
19      Def*       SandyBridge,Xeon,E5-2650  16           63G
1       Def*       SandyBridge,Xeon,E5-4620  32           126G
2       Def*       SandyBridge,Xeon,E5-4640  32           252G
10      Def*       Xeon,E5649                12           47G
61      Def*       Xeon,E5649                24           47G
1       Def*       Zen,EPYC,7551             128          504G
2       gpu        Rome,EPYC,7302            64           504G    TeslaA100:2
1       gpu        Rome,EPYC,7352            96           504G    GeForceRTX3090:4
1       gpu        SandyBridge,Xeon,E5-2660  16           63G     TeslaM10:2
1       gpu        SkyLake,Xeon,5217         16           377G    TeslaV100:2
1       gpu        SkyLake,Xeon,5217         32           377G    TeslaV100:2
1       gpu        SkyLake,Xeon,6244         32           376G    GeForceRTX2080Ti:6
1       gpu        Zen,EPYC,7313             64           252G    TeslaA100:2
6       Zoe        SkyLake,Xeon,5118         24           94G
Filesystems:
Filesystem      quota
$CECIHOME       100.0GiB
$CECITRSF       1.0TiB
$HOME           50G
$GLOBALSCRATCH  unlimited

Multiple CPU vendors and generations are represented. The CPU column in the above list show a triplet with the CPU code name, the CPU family (Xeon is Intel’s server CPU brand, EPYC is AMD’s) and the CPU reference.

The code name is representative of the generation of the CPU (from older to more recent):

  • Intel: Nehalem > Westmere > SandyBridge > IvyBridge > Haswell > Broadwell > SkyLake
  • AMD: K10 > Zen > Rome

In your submission script, you can select one or more specific features with the --constraint= option, like --constraint="Nehalem|Westmere" for instance to choose a compute node with an older CPU.

Some nodes are equiped with GPUs. They are listed in the last column as a GPU name followed by the number of GPUs in each node.

As for CPUs multiple generations are available, in chronological order:

  • nVidia: TeslaM10 > TeslaV100 (Volta) > GeForceRTX2080Ti (Turing) > TeslaA100, GeForceRTX3090 (Ampere)

The GPU’s are considered a “generic resource” in Slurm, meaning that you can reserve the GPU for your job with a command like --gres="gpu:TeslaV100:2" in your submission script. This would request two V100 GPUs.

The partitions

The nodes are organised into partitions. Three CPU partitions are available on Manneback: The default one (Def) which is opened to everyone, partition Zoe which is open but with higher priority for specific users and finally, cp3, which is reserved to the IRMP/CP3 users. The GPU nodes are grouped into two GPU partitions: gpu and cp3-gpu. Note that you can specify #SBATCH –partition=Def,Zoe to submit a job that will run on the earliest available partition.

You can use the sload -p command (-p as “partition”) to see the current cluster usage and the number of pending jobs (PD) on the different partitions .. code:

[dfr@mbackf1 ~]$ sload
MBack : 3992/6816=58%
 mb Zoe            : 72/144=50% (0 PD)
 mb Grid           : 2000/3552=56%  (1780 PD)
 mb cp3-local      : 470/3552=13% (0 PD)
 mb cp3-gpu        : 0/20=0% (0 PD)
 mb Def            : 1786/2876=62% (0 PD)
 mb gpu            : 64/368=17% (16 PD)

Disk space

Every user has access to a home directory with a 100GB quota.

A global scratch /globalscratch is available, offering a 90TB space NFS mounted on all the compute nodes. There is no quota enforced on that filesystem ; it is the responsibility of the users to remove the files that are not needed anymore by any job.

Each node furthermore has a local /scratch space. Local scratch is a smaller file system directly attached to the worker node. It is created at the beginning of a job and deleted automatically at the end. Again here no quota is enforced, but the available space is limited by the hardware.

Access and file transfer

Clusters are accessed with your CÉCI login and SSH key. Refer to the CÉCI documentation for more details. Access to Manneback is achieved by pointing your SSH client to manneback.cism.ucl.ac.be with your CÉCI login and SSH key. The frontend’s SSH key fingerprint are

  • SHA256:iR1HQsjGvKxo4uwswD/xLepW6DA3e45jUbNEZTntWRc (ECDSA)
  • SHA256:i2Hb6HDaeMz6h99/qHu3lIqGUX6Zrx8Yuz0ELTQzsjc (ED25519)

About the cost

Access to the computing facilities is free of charge. Usage of the equipment for fundamental research is free since 2017 for most researchers with a normal usage.

Note

Although access is free, the acquisition model of the hardware for Manneback is solely based on funding brought by users.

However, we encourage users who foresee a large use of the equipment for a significant duration to contact us and include budget for additional equipment in their project funding requests. When equipment is acquired thanks to funds brought by a specific group, the equipment is shared with all users but the funding entity can obtain exclusive reservation periods on the equipment or request specific configurations.

If the expected usage does not justify buying new equipment, but the project’s budge include computation time, the CISM can also bill the usage of the equipment for the duration of the project. Rates vary based on the funding agency (European, Federal, Regional, etc.) and the objective of the research (fundamental, applied, commercial, etc.)

Before 2017, if a research group (pôle de recherche) usage exceeded 200.000 hCPU (CPU hours), equivalent to the usage of 23 processors during a full year, the cost was computed as a function of the yearly consumption as follows:

Rate applied in 2016 related to you research group consumption in 2015:

- Below 200.000 hCPU: 0 €/hCPU
- Between 200.000 and 2.500.000 hCPU: 0.00114 €/hCPU
- Over 2.500.000 hCPU: 0.00065 €/hCPU

For 2017 and beyond, thanks to a participation by the SGSI in the budget of the CISM, the cost will be null for the users who do not have specific funding for computational resources. Users funded by a Regional, Federal, European or Commercial project with specific needs should contact the CISM team for a quote.