About the service¶
The CISM manage several compute clusters. A cluster is a made of computers (nodes) that are interconnected and appear to the user as one large machine. The cluster is accessed through a frontend where users can manage their jobs and data in their home directory. Each node has a certain number of processors, a certain amount of memory (RAM) and some local storage (scratch space). Each processor comprises several independent computing units (cores). In a hardware context, a CPU is often understood as a processor die, which you can buy from a vendor and fits into a socket on the motherboard, while in a software context, a CPU is often understood as one compute unit, a.k.a. a core.
The CISM operates three clusters: Hmem, Lemaitre and Manneback. The name Hmem was chosen in reference to the “High Memory” characteristics of this cluster. Lemaitre was named in reference of Georges Lemaitre (17 July 1894 – 20 June 1966), a Belgian priest, astronomer and professor of physics at UCLouvain. He is seen by many as the father of the Big Bang Theory but also he is the one who brought the first supercomputer to our University. Manneback was named after Charles Manneback (1894-1975), Professor of Physics at UCLouvain. Close friend to Georges Lemaitre, he was the lead of the FNRS-IRSIA project to build the first supercomputer in Belgium in the 50’s.
The Lemaitre and Hmem clusters are CÉCI clusters and are shared ammong all CÉCI users, by contrast with Manneback, which is a UCL-only machine. Consequently, all documentation regarding Lemaitre and Hmem will be found on the CÉCI documentation website, while only Manneback is described in this document.
While CÉCI clusters are mostly homogeneous, Manneback is made of several different generations of hardware.
sinfo command to learn about the available hardware on Manneback
[user@manneback ~]$ sinfo -o "%15N %10c %10m %25f %40G" NODELIST CPUS MEMORY FEATURES GRES mb-ivy[201-208] 16 64224 IvyBridge,Xeon,E5-2650v2 localscratch:211 mb-neh070 8 23937 Nehalem,Xeon,X5550 localscratch:853,gpu:TeslaC1060/M1060:2 mb-neh[201-209, 8+ 23938 Nehalem,Xeon,L5520 localscratch:853 mb-sab[001-021] 16 64098+ SandyBridge,Xeon,E5-2650 localscratch:395 mb-sab103 32 128723 SandyBridge,Xeon,E5-4620 localscratch:853 mb-wes[001-030, 12+ 48128 Xeon,E5649 localscratch:395 mb-wes031 24 48128 Xeon,E5649 localscratch:624 mb-zen001 128 515708 Zen,EPYC,7551 localscratch:1802 mb-sab101 32 258285 SandyBridge,Xeon,E5-4640 localscratch:355 mb-sab102 32 257738 SandyBridge,Xeon,E5-4640 localscratch:395 mb-opt[111-112, 32 112641+ SkyLake,Bulldozer,Opteron localscratch:395 mb-sab040 16 64237 SandyBridge,Xeon,E5-2660 localscratch:46,gpu:TeslaM2090:2 mb-wes[251-252] 12 96509 Westmere,Xeon,X5675 localscratch:211,gpu:TeslaC2050/C2075:1 mb-has[001-004, 16+ 64011+ Haswell,Xeon,E5-2630v3 localscratch:486 mb-has005 16 64171 Haswell,Xeon,E5-2630v3 localscratch:447 mb-skg[001-006] 24 96191 SkyLake,Xeon,5118 localscratch:156 mb-har[022,024, 8 15947 Harpertown,Xeon,L5420 localscratch:46 mb-har[021,023, 8 15947 Harpertown,Xeon,E5420 localscratch:46 mb-ivy[211-212, 48 129022 IvyBridge,Xeon,E5-2695v2 localscratch:814 mb-opt[012-032, 16 32103+ K10,Opteron,6134 localscratch:126 mb-opt033 16 32009 K10,Opteron,6134 localscratch:166 mb-sab[081,084, 32 64386 SandyBridge,Xeon,E5-2670 localscratch:355 mb-sky[001-008] 48 191882 Skylake,Xeon,4116 localscratch:1770 mb-opt[043-046, 16 64423 K10,Opteron,6134 localscratch:814 mb-opt[049-054] 16 32103+ K10,Opteron,6134 localscratch:916 mb-bro080 20 64153 SandyBridge,Xeon,E5-2640 localscratch:156,gpu:TeslaK80:2
you can use the
sload command to see the current cluster usage on the different architectures
[user@manneback ~]# sload MBack : 4067/6304=64% mb-clo Clovertown : 0/96=0% mb-har Harpertown : 240/248=96% mb-opt Opteron : 980/1152=85% mb-neh Nehalem : 10/120=8% mb-wes Westmere : 577/2124=27% mb-sab SandyBridge: 485/608=79% mb-ivy IvyBridge : 807/848=95% mb-has Haswell(p zoe): 416/432=96% mb-bro Broadwell+K80 : 20/20=100% mb-sky Skylake Silver: 384/384=100% mb-skg Skylake Gold : 144/144=100% mb-zen AMD EPYC : 4/128=3%
In your submission script, you can select one or more specific features using
The GPU’s are considered a “generic resource” in Slurm, meaning that you
can reserve the GPU for your job only with a command like
in your submission script if you want access to both GPU’s on the node.
Three differents partitions are available on Manneback: The default one (Def) which is opened to everyone,
partition Zoe which is open but with higher priority for specific users and finally, cp3, which is reserved
to the IRMP/CP3 users.
Note that you can specify
#SBATCH –partition=Def,Zoe to submit a job that will run on the
earliest available partition.
You can use the
sload -p command (-p as “partition”) to see the current cluster usage and the number of pending jobs (PD) on the different partitions
[user@manneback ~]# sload -p MBack : 4067/6304=64% mb Zoe : 560/560=100% (4592 PD) mb Grid : 1651/2520=65% (1352 PD) mb cp3-local : 797/2520=31% (418 PD) mb cp3-fast : 0/48=0% (0 PD) mb cp3-gpu : 20/20=100% (100 PD) mb Def : 1039/3152=32% (408 PD)
- Maximum walltime: run
- Maximum home space: run
Global scratch space¶
A global scratch
/globalfs is a 42TB space NFS mounted on all
the compute nodes. There is no quota enforced on that filesystem ; it is the responsibility of the users to remove the files that are not needed anymore by any job.
Each node furthermore has a local
/scratch space. Local scratch is a smaller file system directly attached to the worker node.
Access and file transfer¶
Clusters are accessed with your CÉCI login and SSH key. Refer to the CÉCI documentation for more details. Access to Manneback is achieved by pointing your SSH client to
manneback.cism.ucl.ac.be with your CÉCI login and SSH key. The frontend’s SSH key fingerprint are
About the cost¶
Access to the computing facilities is free of charge for most of researchers with a normal usage. However, beyond a certain limit, this use could become billable.
If a research group (pôle de recherche) exceeds 200.000 hCPU (CPU hours), equivalent to the usage of 23 processors during a full year, the cost is computed as a function of the yearly consumption.
For 2016, the rate was computed as follow:
Rate applied in 2016 related to you research group consumption in 2015: - Below 200.000 hCPU: 0 €/hCPU - Between 200.000 and 2.500.000 hCPU: 0.00114 €/hCPU - Over 2.500.000 hCPU: 0.00065 €/hCPU
This rate is only applied for researchers whose funding doesn’t support access to a computing facility. This rate, in this case, is to be considered as a contribution to the CISM plateform operations. For any other situation, please contact the CISM team.
For 2017 and beyond, thanks to a participation by the SGSI in the budget of the CISM, the cost will be null for the users who do not have specific funding for computational resources. Users funded by a Regional, Federal, European or Commercial project with specific needs should contact the CISM team for a quote.