About the service¶
The CISM manage several compute clusters. A cluster is a made of computers (nodes) that are interconnected and appear to the user as one large machine. The cluster is accessed through a frontend where users can manage their jobs and data in their home directory. Each node has a certain number of processors, a certain amount of memory (RAM) and some local storage (scratch space). Each processor comprises several independent computing units (cores). In a hardware context, a CPU is often understood as a processor die, which you can buy from a vendor and fits into a socket on the motherboard, while in a software context, a CPU is often understood as one compute unit, a.k.a. a core.
The CISM operates two clusters: Lemaitre and Manneback. Lemaitre was named in reference of Georges Lemaitre (17 July 1894 – 20 June 1966), a Belgian priest, astronomer and professor of physics at UCLouvain. He is seen by many as the father of the Big Bang Theory but also he is the one who brought the first supercomputer to our University. Manneback was named after Charles Manneback (1894-1975), Professor of Physics at UCLouvain. Close friend to Georges Lemaitre, he was the lead of the FNRS-IRSIA project to build the first supercomputer in Belgium in the 50’s.
The Lemaitre cluster is a CÉCI clusters and is shared ammong all CÉCI users, by contrast with Manneback, which is a UCL-only machine. Consequently, all documentation regarding Lemaitre will be found on the CÉCI documentation website, while only Manneback is described in this document.
Manneback¶
Job sumbission¶
As for all the CÉCI clusters, job submission is managed on Manneback by Slurm. More details about Slurm and how to submit your job with Slurm can be found here.
Available hardware¶
While CÉCI clusters are mostly homogeneous, Manneback is made of several different generations of hardware.
Use the sinfo
command to learn about the available hardware on Manneback
[user@manneback ~]$ sinfo -o "%15N %10c %10m %25f %40G"
NODELIST CPUS MEMORY FEATURES GRES
mb-ivy[201-208] 16 64224 IvyBridge,Xeon,E5-2650v2 localscratch:211
mb-neh070 8 23937 Nehalem,Xeon,X5550 localscratch:853,gpu:TeslaC1060/M1060:2
mb-neh[201-209, 8+ 23938 Nehalem,Xeon,L5520 localscratch:853
mb-sab[001-021] 16 64098+ SandyBridge,Xeon,E5-2650 localscratch:395
mb-sab103 32 128723 SandyBridge,Xeon,E5-4620 localscratch:853
mb-wes[001-030, 12+ 48128 Xeon,E5649 localscratch:395
mb-wes031 24 48128 Xeon,E5649 localscratch:624
mb-zen001 128 515708 Zen,EPYC,7551 localscratch:1802
mb-sab101 32 258285 SandyBridge,Xeon,E5-4640 localscratch:355
mb-sab102 32 257738 SandyBridge,Xeon,E5-4640 localscratch:395
mb-opt[111-112, 32 112641+ SkyLake,Bulldozer,Opteron localscratch:395
mb-sab040 16 64237 SandyBridge,Xeon,E5-2660 localscratch:46,gpu:TeslaM2090:2
mb-wes[251-252] 12 96509 Westmere,Xeon,X5675 localscratch:211,gpu:TeslaC2050/C2075:1
mb-has[001-004, 16+ 64011+ Haswell,Xeon,E5-2630v3 localscratch:486
mb-has005 16 64171 Haswell,Xeon,E5-2630v3 localscratch:447
mb-skg[001-006] 24 96191 SkyLake,Xeon,5118 localscratch:156
mb-har[022,024, 8 15947 Harpertown,Xeon,L5420 localscratch:46
mb-har[021,023, 8 15947 Harpertown,Xeon,E5420 localscratch:46
mb-ivy[211-212, 48 129022 IvyBridge,Xeon,E5-2695v2 localscratch:814
mb-opt[012-032, 16 32103+ K10,Opteron,6134 localscratch:126
mb-opt033 16 32009 K10,Opteron,6134 localscratch:166
mb-sab[081,084, 32 64386 SandyBridge,Xeon,E5-2670 localscratch:355
mb-sky[001-008] 48 191882 Skylake,Xeon,4116 localscratch:1770
mb-opt[043-046, 16 64423 K10,Opteron,6134 localscratch:814
mb-opt[049-054] 16 32103+ K10,Opteron,6134 localscratch:916
mb-bro080 20 64153 SandyBridge,Xeon,E5-2640 localscratch:156,gpu:TeslaK80:2
you can use the sload
command to see the current cluster usage on the different architectures
.. code:
[user@manneback ~]# sload
MBack : 4067/6304=64%
mb-clo Clovertown : 0/96=0%
mb-har Harpertown : 240/248=96%
mb-opt Opteron : 980/1152=85%
mb-neh Nehalem : 10/120=8%
mb-wes Westmere : 577/2124=27%
mb-sab SandyBridge: 485/608=79%
mb-ivy IvyBridge : 807/848=95%
mb-has Haswell(p zoe): 416/432=96%
mb-bro Broadwell+K80 : 20/20=100%
mb-sky Skylake Silver: 384/384=100%
mb-skg Skylake Gold : 144/144=100%
mb-zen AMD EPYC : 4/128=3%
In your submission script, you can select one or more specific features using
--constraint="feature1&feature2"
or
--constraint="feature1|feature2"
The GPU’s are considered a “generic resource” in Slurm, meaning that you
can reserve the GPU for your job only with a command like --gres="gpu:TeslaM2090:2"
in your submission script if you want access to both GPU’s on the node.
The partitions¶
Three differents partitions are available on Manneback: The default one (Def) which is opened to everyone,
partition Zoe which is open but with higher priority for specific users and finally, cp3, which is reserved
to the IRMP/CP3 users.
Note that you can specify
#SBATCH –partition=Def,Zoe
to submit a job that will run on the
earliest available partition.
You can use the sload -p
command (-p as “partition”) to see the current cluster usage and the number of pending jobs (PD) on the different partitions
.. code:
[user@manneback ~]# sload -p
MBack : 4067/6304=64%
mb Zoe : 560/560=100% (4592 PD)
mb Grid : 1651/2520=65% (1352 PD)
mb cp3-local : 797/2520=31% (418 PD)
mb cp3-fast : 0/48=0% (0 PD)
mb cp3-gpu : 20/20=100% (100 PD)
mb Def : 1039/3152=32% (408 PD)
Limits¶
- Maximum walltime: run
sinfo --summarize
- Maximum home space: run
quota -us
Global scratch space¶
A global scratch /globalfs
is a 42TB space NFS mounted on all
the compute nodes. There is no quota enforced on that filesystem ; it is the responsibility of the users to remove the files that are not needed anymore by any job.
Each node furthermore has a local /scratch
space. Local scratch is a smaller file system directly attached to the worker node.
Access and file transfer¶
Clusters are accessed with your CÉCI login and SSH key. Refer to the CÉCI documentation for more details. Access to Manneback is achieved by pointing your SSH client to manneback.cism.ucl.ac.be
with your CÉCI login and SSH key. The frontend’s SSH key fingerprint are
SHA256:Q3IYMwb5QElBkqmVbJyi8UgFoyKZMZQsWRRU3CEvV8s.
MD5:b3:cf:6c:21:3e:fa:e8:9a:b7:ea:eb:13:7d:4b:70:f6
About the cost¶
Access to the computing facilities is free of charge for most of researchers with a normal usage. However, beyond a certain limit, this use could become billable.
If a research group (pôle de recherche) exceeds 200.000 hCPU (CPU hours), equivalent to the usage of 23 processors during a full year, the cost is computed as a function of the yearly consumption.
For 2016, the rate was computed as follow:
Rate applied in 2016 related to you research group consumption in 2015:
- Below 200.000 hCPU: 0 €/hCPU
- Between 200.000 and 2.500.000 hCPU: 0.00114 €/hCPU
- Over 2.500.000 hCPU: 0.00065 €/hCPU
Note
This rate is only applied for researchers whose funding doesn’t support access to a computing facility. This rate, in this case, is to be considered as a contribution to the CISM plateform operations. For any other situation, please contact the CISM team.
For 2017 and beyond, thanks to a participation by the SGSI in the budget of the CISM, the cost will be null for the users who do not have specific funding for computational resources. Users funded by a Regional, Federal, European or Commercial project with specific needs should contact the CISM team for a quote.