The CISM manage several compute clusters. A cluster is a made of computers (nodes) that are interconnected and appear to the user as one large machine. The cluster is accessed through a frontend where users can manage their jobs and data in their home directory. Each node has a certain number of processors, a certain amount of memory (RAM) and some local storage (scratch space). Each processor comprises several independent computing units (cores). In a hardware context, a CPU is often understood as a processor die, which you can buy from a vendor and fits into a socket on the motherboard, while in a software context, a CPU is often understood as one compute unit, a.k.a. a core.

The CISM operates two clusters: Lemaitre and Manneback. Lemaitre was named in reference of Georges Lemaitre (17 July 1894 – 20 June 1966), a Belgian priest, astronomer and professor of physics at UCLouvain. He is seen by many as the father of the Big Bang Theory but also he is the one who brought the first supercomputer to our University. Manneback was named after Charles Manneback (1894-1975), Professor of Physics at UCLouvain. Close friend to Georges Lemaitre, he was the lead of the FNRS-IRSIA project to build the first supercomputer in Belgium in the 50’s.

The Lemaitre cluster is a CÉCI clusters and is shared ammong all CÉCI users, by contrast with Manneback, which is a UCL-only machine. Consequently, all documentation regarding Lemaitre will be found on the CÉCI documentation website, while only Manneback is described in this document.

Manneback¶

Job sumbission¶

As for all the CÉCI clusters, job submission is managed on Manneback by Slurm. More details about Slurm and how to submit your job with Slurm can be found here.

Available hardware¶

While CÉCI clusters are mostly homogeneous, Manneback is made of several different generations of hardware.

Use the sinfo command to learn about the available hardware on Manneback

[user@manneback ~]\$ sinfo -o "%15N %10c %10m  %25f %40G"
NODELIST        CPUS       MEMORY      FEATURES                  GRES
mb-ivy[201-208] 16         64224       IvyBridge,Xeon,E5-2650v2  localscratch:211
mb-neh070       8          23937       Nehalem,Xeon,X5550        localscratch:853,gpu:TeslaC1060/M1060:2
mb-neh[201-209, 8+         23938       Nehalem,Xeon,L5520        localscratch:853
mb-sab[001-021] 16         64098+      SandyBridge,Xeon,E5-2650  localscratch:395
mb-sab103       32         128723      SandyBridge,Xeon,E5-4620  localscratch:853
mb-wes[001-030, 12+        48128       Xeon,E5649                localscratch:395
mb-wes031       24         48128       Xeon,E5649                localscratch:624
mb-zen001       128        515708      Zen,EPYC,7551             localscratch:1802
mb-sab101       32         258285      SandyBridge,Xeon,E5-4640  localscratch:355
mb-sab102       32         257738      SandyBridge,Xeon,E5-4640  localscratch:395
mb-opt[111-112, 32         112641+     SkyLake,Bulldozer,Opteron localscratch:395
mb-sab040       16         64237       SandyBridge,Xeon,E5-2660  localscratch:46,gpu:TeslaM2090:2
mb-wes[251-252] 12         96509       Westmere,Xeon,X5675       localscratch:211,gpu:TeslaC2050/C2075:1
mb-has[001-004, 16+        64011+      Haswell,Xeon,E5-2630v3    localscratch:486
mb-has005       16         64171       Haswell,Xeon,E5-2630v3    localscratch:447
mb-skg[001-006] 24         96191       SkyLake,Xeon,5118         localscratch:156
mb-har[022,024, 8          15947       Harpertown,Xeon,L5420     localscratch:46
mb-har[021,023, 8          15947       Harpertown,Xeon,E5420     localscratch:46
mb-ivy[211-212, 48         129022      IvyBridge,Xeon,E5-2695v2  localscratch:814
mb-opt[012-032, 16         32103+      K10,Opteron,6134          localscratch:126
mb-opt033       16         32009       K10,Opteron,6134          localscratch:166
mb-sab[081,084, 32         64386       SandyBridge,Xeon,E5-2670  localscratch:355
mb-sky[001-008] 48         191882      Skylake,Xeon,4116         localscratch:1770
mb-opt[043-046, 16         64423       K10,Opteron,6134          localscratch:814
mb-opt[049-054] 16         32103+      K10,Opteron,6134          localscratch:916
mb-bro080       20         64153       SandyBridge,Xeon,E5-2640  localscratch:156,gpu:TeslaK80:2


you can use the sload command to see the current cluster usage on the different architectures .. code:

[user@manneback ~]# sload
MBack : 4067/6304=64%
mb-clo Clovertown :    0/96=0%
mb-har Harpertown :    240/248=96%
mb-opt Opteron    :    980/1152=85%
mb-neh Nehalem    :    10/120=8%
mb-wes Westmere   :    577/2124=27%
mb-sab SandyBridge:    485/608=79%
mb-ivy IvyBridge  :    807/848=95%
mb-has Haswell(p zoe): 416/432=96%
mb-sky Skylake Silver: 384/384=100%
mb-skg Skylake Gold  : 144/144=100%
mb-zen AMD EPYC      : 4/128=3%


In your submission script, you can select one or more specific features using

--constraint="feature1&feature2" or --constraint="feature1|feature2"

The GPU’s are considered a “generic resource” in Slurm, meaning that you can reserve the GPU for your job only with a command like --gres="gpu:TeslaM2090:2" in your submission script if you want access to both GPU’s on the node.

The partitions¶

Three differents partitions are available on Manneback: The default one (Def) which is opened to everyone, partition Zoe which is open but with higher priority for specific users and finally, cp3, which is reserved to the IRMP/CP3 users. Note that you can specify #SBATCH –partition=Def,Zoe to submit a job that will run on the earliest available partition.

You can use the sload -p command (-p as “partition”) to see the current cluster usage and the number of pending jobs (PD) on the different partitions .. code:

[user@manneback ~]# sload -p
MBack : 4067/6304=64%
mb Zoe            : 560/560=100% (4592 PD)
mb Grid           : 1651/2520=65%  (1352 PD)
mb cp3-local      : 797/2520=31% (418 PD)
mb cp3-fast       : 0/48=0% (0 PD)
mb cp3-gpu        : 20/20=100% (100 PD)
mb Def            : 1039/3152=32% (408 PD)


Limits¶

• Maximum walltime: run sinfo --summarize
• Maximum home space: run quota -us

Global scratch space¶

A global scratch /globalfs is a 42TB space NFS mounted on all the compute nodes. There is no quota enforced on that filesystem ; it is the responsibility of the users to remove the files that are not needed anymore by any job.

Each node furthermore has a local /scratch space. Local scratch is a smaller file system directly attached to the worker node.

Access and file transfer¶

Clusters are accessed with your CÉCI login and SSH key. Refer to the CÉCI documentation for more details. Access to Manneback is achieved by pointing your SSH client to manneback.cism.ucl.ac.be with your CÉCI login and SSH key. The frontend’s SSH key fingerprint are

• SHA256:Q3IYMwb5QElBkqmVbJyi8UgFoyKZMZQsWRRU3CEvV8s.
• MD5:b3:cf:6c:21:3e:fa:e8:9a:b7:ea:eb:13:7d:4b:70:f6

Access to the computing facilities is free of charge for most of researchers with a normal usage. However, beyond a certain limit, this use could become billable.

If a research group (pôle de recherche) exceeds 200.000 hCPU (CPU hours), equivalent to the usage of 23 processors during a full year, the cost is computed as a function of the yearly consumption.

For 2016, the rate was computed as follow:

Rate applied in 2016 related to you research group consumption in 2015:

- Below 200.000 hCPU: 0 €/hCPU
- Between 200.000 and 2.500.000 hCPU: 0.00114 €/hCPU
- Over 2.500.000 hCPU: 0.00065 €/hCPU


Note

This rate is only applied for researchers whose funding doesn’t support access to a computing facility. This rate, in this case, is to be considered as a contribution to the CISM plateform operations. For any other situation, please contact the CISM team.

For 2017 and beyond, thanks to a participation by the SGSI in the budget of the CISM, the cost will be null for the users who do not have specific funding for computational resources. Users funded by a Regional, Federal, European or Commercial project with specific needs should contact the CISM team for a quote.