1. F.A.Q. Calcul intensif

Contents

About cluster computing at CISM :

  • What is a cluster/node/cpu/core ? Which cluster can I use ? How do I use a cluster ?
  • How do I get a login ?
  • How do I copy data to and from my home directory on the cluster ? How much space can I use ? How do I log into the frontend ?
  • Why do I have to submit jobs rather than just launching them; and how do I that ?
  • How do I create a parallel job ? What is the /scratch space ? How do I know my job has finished ?
  • Where can I get additional information ?
  • How do I thank you for all this ?

More about connecting / copying :

  • What is a public/private key pair; how do I create one ?
  • How do I connect from a Windows computer ? What is the difference between a unix/linux text file and a dos/windows text file ?
  • How do I connect from outside UCLouvain ? Why do you sometimes need my IP address ? How do I find it ?
  • Can you remind me my password ?
  • Do you have backups of my directory ?

Submitting jobs with SGE :

  • What is SGE ?
  • How do I monitor my jobs ? Why is my job still waiting ? How do I cancel a job ?
  • What are the options specific to lemaitre/green ?
  • How do I get more information about running and managing jobs ? Who decides when my job is scheduled ?
  • What is a job array ? What is resource reservation ?
  • How do I know how busy the cluster is at the moment ?

Submitting jobs with Slurm :

  • What is a job step, a task ?
  • What does the submission script look like ?
  • How do I submit a script ? How do I monitor my jobs ? Why is my job still waiting ?
  • How do I cancel a job ?
  • How do I submit a job to a specific queue ? How do I choose a parallel environment?
  • How do I choose a node with certain features (e.g. CPU, GPU, etc.) ? How do I get the list of features and resources of each node ?
  • Is OpenMP 'slurm-aware' ? Is MPI 'slurm-aware' ?
  • How do I get more information about running and managing jobs ? Who decides when my job is scheduled ?
  • How do I get the node list in full rather than in compressed format ?
  • How do I know which slots exactly are assigned to my job ?
  • Is there a quicker way to submit a job than writing a submission script ?

Available hardware :

  • Are the nodes connected with rapid interconnection ? Are there nodes with large memory ?
  • Are there nodes with external accelerators (e.g. GPU) ? Are there Itanium processors ?
  • Are there Apple Mac computers ?

Available software :

  • What is the module command ?
  • Is my favorite compiler installed ?
  • Is math library BLAS/LAPACK/… installed ?
  • Is … installed ?

Green specifics :

  • Why the name ? When was Green acquired ? Where is Green located ?
  • What is the maximum number of cores I can ask for on Green ? What should a submission script look like ? What are the SGE options specific to Green ?
  • What is /workdir ?
  • What is 'New Branch' ?

Manneback specifics :

  • Why the name ?
  • What hardware is available on Manneback ?
  • Is there a global temporary storage space I can use to dump large data ?

Hmem specifics :

  • Why the name ?
  • What partitions are available on Hmem ?
  • Is there a global temporary storage space I can use to dump large data ?

Lemaitre2 specifics :

  • Why the name ?
  • What partitions are available on Lemaitre2 ?
  • Is there a global temporary storage space I can use to dump large data ?

About cluster computing at CISM

:?: What is a cluster/node/processor/core ?

A cluster is a set of several computers (nodes) that are interconnected and appear to the user as one large machine. The cluster is accessed through a frontend where users can manage their jobs and data in their home directory. Each node has a certain number of processors, a certain amount of memory (RAM) and some local storage (scratch space). Each processor comprises several independent computing units (cores). In a hardware context, a CPU is often understood as a processor die, which you can buy from a vendor and fits into a socket on the motherboard, while in a a software context, a CPU is often understood as one compute unit, a.k.a. a core.

:?: Which cluster can I use ?

The following clusters/computers are available for use:

Name Login Type Dedication
Manneback CISM Cluster (Slurm) Many single core jobs
Green (to be decommissioned soon) CISM Cluster (SGE) Many single core jobs
Hmem CÉCI Cluster (Slurm) Large memory jobs
Lemaitre2 CÉCI Cluster (Slurm) Massively parallel jobs
SMCS1&2 CISM Interactive Statistical software
lm9 CISM Interactive Engineering software
Dragon1 (Umons) CÉCI Cluster (Slurm) Very long jobs
Hercules (UNamur) CÉCI Cluster (Slurm) Very long jobs
Vega (ULB) CÉCI Cluster (Slurm) SMP or many single-core jobs

See more detailed specifications on the CISM infrastructure web page and the CÉCI clusters web page.

:?: How do I get a login ?

For CISM clusters, go to http://www.cism.ucl.ac.be/login, while for CÉCI clusters, go to http://www.ceci-hpc.be and click on the “Create account” link in the top right corner of the page.

Note that you need to be connected to the network of a CÉCI university to access those pages. Creating an account from home is not possible.

:?: How do I use a cluster ?

Each cluster has a frontend which you can access to test pre-installed software, compile your own code, copy your data, and submit your jobs. All computation nodes run a Linux operating system. Typical usage of a cluster consists in the following steps:

  1. copy your data/code from your local computer to your home directory on the cluster (i.e. on the frontend)
  2. log into the frontend
  3. (optionally) compile your code if you use home-made software
  4. submit your job
  5. wait (i.e. do something else in the meantime)
  6. copy the results back to your local computer.

You can learn all this from scratch by attending the training sessions that are organised each year.

:?: How do I copy data to and from my home directory on the cluster ?

The usual way is to use the commands scp, sftp, and/or rsync. You can always use any fancy graphical user interface you like as long as it uses scp or sftp to make the transfers. For instance, you can run the following in a terminal on your desktop/laptop, provided it runs Linux or MacOS and you are in the UCLouvain network,

scp -r mywork/ mylogin@clustername.cism.ucl.ac.be:

Make sure to replace mylogin with your actual login and clustername with the name of the cluster you want your data copied to. If you do not use an SSH key, you will be prompted for your password on the cluster. Then, your directory mywork will be copied in your home directory on the cluster. Copying back is done by

scp -r mylogin@clustername.cism.ucl.ac.be:mywork/ .

Your directory mywork in your home directory on the cluster will be copied in the current directory on your desktop/laptop.

Type ''man scp'' or ''man rsync'' to get information about those commands. You can alternatively attend the training sessions, or contact the sysadmins. If your computer runs Windows or if you would like to connect from outside UCLouvain, please see More about connecting / copying.

:?: How much space can I use ?

Quotas are enforced on home directories. For CISM clusters, limits are enforced by group (see them with quota -gs), while on CÉCI clusters, limits are individual (use quota -s.)

If you need more room temporarily for a job, you can use the scratch spaces. See the cluster-specific sections of this FAQ for information about which scratch space is available on a given cluster. There is no limitation on the scratch space used by the users, but old data are subject to cleanup without notice once the job is finished.

If you need more room permanently, you must use to our mass storage services.

:?: How do I log into the frontend ?

You need an SSH client i.e. a piece of software that allows connecting to a remote computer using the SSH protocol.

On Linux, simply type in

ssh -X mylogin@clustername.cism.ucl.ac.be

to access the frontend of cluster clustername. Do not forget to replace mylogin with your actual login and clustername with the actual name of the cluster. The -X option allows using software with a graphical user interface (GUI). You can ignore it if all you need is a command line interface (CLI).

With CÉCI clusters you need also to provide your CÉCI private key (called id_rsa.ceci) to your SSH client (see the CÉCI FAQ):

ssh -X -i id_rsa.ceci mylogin@clustername.cism.ucl.ac.be

Note that all CÉCI clusters do not end in “cism.ucl.ac.be” !

Interestingly, if you set the following in a file named config in your .ssh directory, where your id_rsa.ceci file also lies

Host clustername
    HostName clustername.cism.ucl.ac.be
    User mylogin
    ForwardX11 yes
    IdentityFile ~/.ssh/id_rsa.ceci

you can just issue commands like

ssh clustername

to connect.

:?: Why do I have to submit jobs rather than just launching them; and how do I that ?

Because you are not the only one to use the cluster. There is a piece of software called a job scheduler (e.g. Sun (Oracle) Grid Engine or Slurm) that makes sure that the jobs are dispatched to the cluster nodes as optimally as possible with respect to the available resources and as fairly as possible with respect to users.

To submit a job, you first have to write a shell script (i.e. a file containing a sequence of Shell commands) that contains information about the requirements of the task you want to launch, the environment that must be set up for it to launch correctly, the executable file that is to be launched, etc.

Then, once logged into the frontend of the cluster you want to use, you type either qsub my_script or sbatch my_script depending on the job management software installed on the cluster; you then receive a job identification number. With that job id, you can monitor your job, stop it, etc.

:?: How do I create a parallel job ?

You can either use/develop parallel software using one of the main standards, namely OpenMP and MPI.

  • OpenMP is an extension of the compiler (Fortran or C) that allows distributing operations accross several cores (on the same node) through compiler directives. In practice, you add specific instructions in you code around loops to allow those loops to run in parallel.
  • MPI (Message Passing Interface) is a standard that allows managing explicitely the communication between several processes that may run on distinct nodes.

Or you can launch several instances of the same program on distinct data pieces, or with different parameters, using 'Job arrays' with SGE option -t (cfr man qsub), or srun with Slurm.

:?: What is the /scratch space ?

Every node of the cluster has access to the home directories of the users. Those are indeed mounted through the network. When a job performs a lot of reading and writing from/to files (e.g. temporary files, or data files), its performances can be reduced because of the network connections that are necessary to access the home directory. The main alternative in that case is to have the job read/write from/to another, faster (but maybe less robust) /scratch in the directory tree. Once the job has finished all the I/O-intensive parts, the data can be copied back in the home directory. Note that on some clusters this space is visible only from the node, while on others, it is shared by all nodes. The scratch space is furthermore shared among all users so you have to create a sub directory for your files, and then make sure to delete your files once you have copied them back in your home directory.

:?: How do I know my job has finished ?

If you typically get an email when your job has finished. Then you will find, in your directory on the cluster, files, named after the job id, that contain the output of your program, the errors, etc.

:?: Where can I get additional information ?

The rest of this wiki contains additional information, notably in the Knowledgebase and Troubleshooting sections. Additionally, the CISM organizes every year training sessions that topics ranging from ther beginner's introduction to Linux to complex optimization of parallel software. Finally, the following reference books can be borrowed from CISM (a small deposit will be asked and returned when the book is given back):

  1. MPI, The Complete Reference (Marc Snir et al.; MIT Press)
  2. High Performance Computing (Kevin Dowd, Charles Severance; O'Reilly)
  3. Performance Optimization of Numerically Intensive Codes (Stefan Goedecker, Adolfy Hoisie; Siam)
  4. Using MPI (William Gropp, Ewing Lusk, Anthony Skjellum; MIT Press)
  5. Parallel Programming (Barry Wilkinson, Micahel Allen; Prentice Hall)
  6. Parallel Programming in OpenMP (Rohit Chandra et al.; Morgan Kaufmann Publishers)
  7. Introduction to Parallel Computing (Ananth Grama et al.; Addison Wesley)

:?: How do I thank you for all this ?

Two things you need to do:

  1. Acknowledge the use of the CISM/CECI infrastructure in you publications. For instance: “Computational resources have been provided by the supercomputing facilities of the Université catholique de Louvain (CISM/UCL) and the Consortium des Équipements de Calcul Intensif en Fédération Wallonie Bruxelles (CÉCI) funded by the Fond de la Recherche Scientifique de Belgique (F.R.S.-FNRS) under convention 2.5020.11.”
  2. Add the single keyword CISM:CECI when you register your publications with Dial.pr, in the 'Institut/Pole' field as depicted below.

More about connecting / copying

:?: What is a public/private key pair; how do I create one ?

Just as it sounds, a public/private key pair is a pair of keys, one of which is public, while the other is private. The publick key should actually be seen as the keyhole corresponding to the private key. On your desktop/laptop computer, you first have to generate such a pair and then copy the public key to a special directory (.ssh/) in your home on the cluster. Then you can log onto the cluster from your local computer without typing in a password. Find detailed instruction here. Note that for CÉCI clusters, the private key is given to you by the CÉCI. That private key is protected by a password (passphrase) you choose.

:?: How do I connect from a Windows computer ?

If you want to have a Unix-like environment on your Windows computer, Cygwin and MinGW are interesting choices. Once installed, they offer 'A collection of tools which provide Linux look and feel.', among which those necessary for accessing remote computers with SSH.

If you just want a lightweight, free, SSH client for windows you should consider Putty.

Under Host Name simply put the name of the frontend of the cluster you want to connect to, e.g. green.cism.ucl.ac.be.

Note that to be able to use software with a graphical user interface, you also need a local X server installed on your Windows computer, such as Xming.

:?: What is the difference between a unix/linux text file and a dos/windows text file ?

Line breaks are encoded in Unix with one character, represented as '\n', or LF: line feed, while it is encoded with two characters, '\r\n' or CR:carriage return, LF:line feed. Some programs, such as text editors, are mostly insensitive to it, but some others (a.o. slurm utilities) are not. See here how to deal with it.

:?: How do I connect from outside UCLouvain ?

If you have access to UCLouvain's VPN, you can simply use it. Otherwise, you can go through the gateway hall.cism.ucl.ac.be with your CSIM account. Either you log into hall.cism.ucl.ac.be and then into the cluster frontend you want (i.e. green.cism.ucl.ac.be or lemaitre2.cism.ucl.ac.be), or you can set up an SSH tunnel.

One way to create an SSH tunnel is to type

ssh my_cism_login@hall.cism.ucl.ac.be -L 1234:hmem.cism.ucl.ac.be:22

in a terminal on your local computer. Make sure to replace my_cism_login with your CISM login. To connect to another cluster than hmem, simply replace hmem with the name of that cluster. Note that 1234 is a number corresponding to the port you want to use. If you create several tunnels from your computer, each one will need to use a distinct port (between 1024 and 65536.)

Leave that terminal open.

Then, in another terminal, on the same computer, a command such as

ssh -i .ssh/id_rsa.ceci -p 1234 my_CECI_login@localhost 

will actually connect to hmem.cism.ucl.ac.be through hall.cism.ucl.ac.be in a transparent manner. Make sure to replace my_CÉCI_login with your CÉCI login. Make sure to provide your CECI private key with -i .ssh/id_rsa.ceci, or, better, using an SSH agent (then you can ignore the -i .ssh/id_rsa.ceci part). Make sure to write localhost ; do note replace that part with something else.

If you want to connect to a CISM cluster (e.g. manneback or green), you need of course to use, in the above command, your CISM login and password.

The commands scp and sftp will work the same way (adapt the login and SSH key if necessary):

scp -r -P 1234 myworkdir localhost:

will copy your local directory myworkdir/ in your home directory on frontend.cism.ucl.ac.be.

:?: Why do you sometimes need my IP address ? How do I find it ?

If you travel abroad, you might find it impossible to connect to hall.cism.ucl.ac.be. This is because the firewall prevents access from regions that were denied access because of historical, security-related, reasons. In such case, you need to contact us.

We then will ask for your IP address. We need your external IP address, which may, or may not be the one you see in your computer's settings. If your computer has a regular (public) address, then your external address matches the one in the configuration, but if your computer has a private address, then you will need to find out the external address by visiting for instance http://www.myipaddress.com/, or by running the command `curl ifconfig.me` .

It also can happen that your IP address changes from time to time. If suddenly the connections fails again while you had it working the day before, check that your IP hasn't changed. If that's the case, you will need to let us know so we can deal with that fact appropriately.

:?: Can you remind me my password ?

No we cannot. We do not store your passwords in plain text, rather in hashed form. That means we are able to check that the password you provide when you connect is correct, but we are not able to give it back to you.

In case you forget your password, you simply need to go back to http://www.cism.ucl.ac.be/login and follow the same instructions as when you created your account. Just make sure to give the exact same email address each time and your existing account will be updated with the new information.

:?: Do you have backups of my directory ?

No we do not, due to lack of storage space. Make sure to always a copy of your code/scripts/data on either your personal computer or a storage machine (see the mass storage FAQ). rsync should be your best friend.

Submitting jobs with SGE (Green cluster only)

:?: What is SGE ?

SGE is the Sun Grid Engine, now named Oracle Grid Engine since Oracle bought Sun. Its earlier versions were open source. See http://en.wikipedia.org/wiki/Oracle_Grid_Engine

:?: What does the submission script look like ?

Here is a generic submission script we could name submission_script.sh. The .sh extension means that it is a script that contains shell commands.

sge_submit.sh
# SGE generic - beware that specific clusters may need specific options !!
#!/bin/sh
#
# Advised: define the job name
#$ -N My_Job
#
# Optional: choose a parallel environment (e.g. mpi, openmp)
# and the number of cores needed. Example here mpich with 6 proc.
#$ -pe mpich 6
#
# Advised: requested memory for each core
#$ -l h_vmem=2G 
#
# Mandatory!! the requested run-time, expressed as
#$ -l h_rt=xxxx
# (xxxx sec or hh:mm:ss (max 5 days=120:0:0)
# SGE will kill your job after the requested period.
#
# Advised: your Email here, for job notification
#$ -M my@mail.ucl.ac.be
# SGE: when do you want to be notified (b for begin, e for end, s for error)?
#$ -m bes
#
# Optional: ask for specific resources (licence, etc.) with 
## -l resourcename = ...
#
#
# Optional: activate resources reservation when you need a large number of cores
## -R y
#
# Advised: output in the current working dir
#$ -cwd    
# Advised: combine output/error messages into one file
#$ -j y
#
# Launch job
echo "Got $NSLOTS slots. Temp dir is $TMPDIR, Node file is:"
cat $TMPDIR/machines
echo Start at
date
... 
echo End at
date
# end of job

:?: How do I submit a script ?

The first step is to login onto the frontend of the cluster you want to use (e.g. green, lemaitre, etc.). Then you type in qsub submission_script.sh. Upon success, you will get a message like:

Your job 418187 ("My_Job") has been submitted

The number 418187 is the job id of your job. You can use that number to refer to that job in subsequent commands.

:?: How do I monitor my jobs ?

On the frontend, type qstat. The output of the command looks like:

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
 418187 0.00000 My_Job     dfr          qw    11/26/2010 09:55:25                                    1        

The columns 'state' indicates whether your job is waiting ( qw ), running ( r ), or done ( d ). If the output is empty, it means all your jobs are finished.

:?: Why is my job still waiting ?

To knwo why a specific job is waiting, and, more generally, to get information about one job, use the qstat command with the -j option and the job id.

dfr@lemaitre ~/Formation/Part1 >qstat -j 418190
==============================================================
job_number:                 418187
exec_file:                  job_scripts/418187
submission_time:            Fri Nov 26 10:08:11 2010
owner:                      dfr
uid:                        106
group:                      grppan
gid:                        205
sge_o_home:                 /home/pan/dfr
sge_o_log_name:             dfr
sge_o_path:                 /gridware/sge/bin/lx24-amd64:/opt/intel/fce/9.1.036/bin:/opt/intel/cce/9.1.042/bin:/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/bin:/opt/gnome/bin:/opt/kde3/bin:/opt/pathscale/bin:/usr/pgi/linux86-64/6.0/bin:/mnt/optlm/bin:/home/pan/dfr/bin:.
sge_o_shell:                /bin/bash
sge_o_workdir:              /mnt/homezfs/pan/dfr/Formation/Part1
sge_o_host:                 lemaitre
account:                    sge
cwd:                        /home/pan/dfr/Formation/Part1
hard resource_list:         h_rt=20,num_proc=8,matlab=true
mail_list:                  my@mail.ucl.ac.be
notify:                     FALSE
job_name:                   My_Job
jobshare:                   0
env_list:                   MPICH_PROCESS_GROUP=0
script_file:                submission_script.sh
usage    1:                 cpu=00:00:02, mem=0.00000 GBs, io=0.00000, vmem=N/A, maxvmem=165.320M
scheduling info:            queue instance "all.q@lmexec-11" dropped because it is temporarily not available
                            queue instance "all.q@lmexec-79" dropped because it is overloaded: load_avg=2.080000 (no load adjustment) >= 1.5
                            queue instance "all.q@lmexec-90" dropped because it is disabled
                            queue instance "all.q@lmexec-92" dropped because it is full
                            ...

Under scheduling info, you will find, for each node, the reason why it cannot take care of your job. Messages such as temporary not available, overloaded, disabled, full are self-explanatory. You might also get messages about lack of a specific resource you asked for with particular options in the submission script.

:?: How do I cancel a job ?

Use the qdel jobid command on the frontend with the jobid of the job you want canceled.

:?: What are the options specific to lemaitre/green ?

See the parts of this FAQ that are related to those clusters.

:?: How do I get more information about running and managing jobs ?

Lemaitre uses the Sun Grid Engine 6.1 while Green uses the Sun Grid Engine version 6.2. You can consult the full documentation here.

:?: Who decides when my job is scheduled ?

The grid engine, SGE, is configured to ensure all users a fair access to the resources. SGE constantly performs bookkeeping for who has used what resources for how long. The priority for a new job is established as a (undocumented), decaying, function of the number of cores and the amount of memory used in the past. The priority is comprised within 0 and 1 (maximum priority); it is printed in the output of the qstat command.

:?: What is a job array ?

A job array, or also called a parametric job, is a job that is made of several independent tasks. In SGE, it is created with the -t a-b option. For instance, writing -t 1-10 will create a job made of 10 tasks labeled from 1 to 10. Each task has its environment variable SGE_TASK_ID set to match its label. The software that is run can access that variable to decide what to compute.

:?: What is resource reservation ?

Resource reservation is a mechanism that allows jobs requesting a large number of cores to be scheduled as fairly as the others. For instance, without reservation, when a job is submitted with a request for 32 cores, the job will not start unless 32 cores are available, which can virtually be impossible when the cluster is used at 80% of its capacity which is the case most of the time. If that same job, if submitted with the option -R y, which activates resource reservation, then SGE will, at each cycle, reserve some cores for the job, based on the following set of rules:

  • if there is a job waiting on the same queue with a higher priority, that job is scheduled first on that core and no reservation is performed.
  • if there is a job waiting on the queue with a lower priority, then
    • if the execution time of that job is lower than an estimation made by SGE of how long it will take to reserve all requested cores, then the lower-priority job is scheduled, and the core is afterwards added to the reservation pool.
    • if not, then the core is added to the reservation pool.

Once the reservation pool contains all requested cores, the job is scheduled.

:?: How do I know how busy the cluster is at the moment ?

The command

qload 

will give you an instant overview of the load of the cluster, while the command

qstat -u "*"

will tell you how many jobs are waiting, with their respective priorities.

Submitting jobs with Slurm

:?: What is Slurm ?

Slurm stands for “Simple Linux Utility for Resource Management” although some people insist it means “Sophisticated Linux Utility for Resource Management”. It was first developed at the Lawrence Livermore National Laboratory and is getting more and more attention. Cfr http://en.wikipedia.org/wiki/Simple_Linux_Utility_for_Resource_Management.

:?: What does the submission script look like ?

See examples on the CÉCI website.

:?: How do I submit a script ?

The first step is to login onto the frontend of the cluster you want to use (e.g. green, lemaitre, etc.). Then you type in sbatch submission_script.sh. Upon success, you will get a message like:

Submitted batch job 125512

The number 125512 is the job id of your job. You can use that number to refer to that job in subsequent commands.

:?: How do I monitor my jobs ?

On the frontend, type squeue. The output of the command looks like:

dfr@hmem00:~ $ squeue -u dfr
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
 125512       Low My_Test_      dfr   R       0:33      2 hmem[15,17]

The column 'ST' indicates whether your job is pending ( PD ), or running ( R ). If the output is empty, it means all your jobs are finished.

:?: Why is my job still waiting ?

To know why a specific job is waiting, and, more generally, to get information about one job, use the squeue command with the -j option and the job id, and the -l option to get more information.

dfr@hmem00:~ $ squeue -l -j 125503
Wed Feb 22 11:04:45 2012
  JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
 125503      High quartz_s      dfr  PENDING       0:00 5-00:00:00      1 (Priority)

:?: How do I cancel a job ?

Use the scancel jobid command on the frontend with the jobid of the job you want canceled.

:?: Who decides when my job is scheduled ?

The grid engine, Slurm, is configured to ensure all users a fair access to the resources. Slurm constantly performs bookkeeping for who has used what resources for how long. The priority for a new job is established as a decaying function of the number of cores and the amount of memory used in the past. The priority is comprised within 0 and 1 (maximum priority); it is printed in the output of the sprio command.

:?: How do I submit a job to a specific queue ?

Slurm uses the term partition rather than queue. To submit a job to a given partition, use the partition option.

       -p, --partition=<partition_names>
              Request a specific partition for the  resource  allocation.   If
              not  specified, the default behaviour is to allow the slurm con-
              troller to select the default partition  as  designated  by  the
              system  administrator.  If  the job can use more than one parti-
              tion, specify their names in a comma separate list and  the  one
              offering earliest initiation will be used.

:?: How do I choose a parallel environment?

Slurm ignores the concept of parallel environment as such. Slurm simply requires that the number of nodes, or number of cores be specified. But you can have the control on how the cores are allocated; on a single nodes, on several nodes, etc. using the –cpus-per-task and –ntasks-per-node options for instance.

With those options, there are several ways to get the same allocation. For instance, the following :

--nodes=4 --ntasks=4 --cpus-per-task=4

is equivalent in terms of resource allocation to

--ntasks=16 --ntasks-per-node=4 

but it will lead to environment variables being set, and understood, differently by srun and mpirun (in the first case 4 processes are launched while in the second one 16 processes are launched).

Suppose you need 16 cores. Here are some use cases

  • you use mpi and do not care about where those cores are distributed. set ntasks=16
  • you want to launch 16 independent processes (no communication). set ntasks=16
  • you want those cores to spread across distinct nodes. set (ntasks=16 and ntasks-per-node=1) or (ntasks=16 and nodes=16)
  • you want those cores to spread across distinct nodes and no interference with other jobs. set ntasks=14 nodes=16 exclusive
  • you want 16 processes to spread across 8 nodes to have two processes per node. set ntasks=16 ntasks-per-node=2
  • you want 16 processes to stay on the same node (the good old 'snode'). set ntasks=16 ntasks-per-node=16
  • you want one process that can use 16 cores for multithreading. set ntasks=1 cpus-per-task=16
  • you want 4 processes that can use 4 cores each for multithreading. set ntasks=4 cpus-per-task=4

:?: How do I choose a node with certain features (e.g. CPU, GPU, etc.) ?

Slurm associates to each nodes a set of Features and a set of Generic resources. Features are immutable characteristics of the node (e.g. network connexion type) while generic resources are “consumable” resources, meaning that as users reserve them, they become unavailable for the others (e.g. compute accelerators).

Features are requested with

--constraint="feature1&feature2"

or

--constraint="feature1|feature2"

the former requesting both, while the latter, as one would expect, requesting at least one of feature1 and feature2. More complex expressions can be constructed. See man sbatch for details.

Generic resources are requested with

--gres="resource:2"

to request 2 resources.

:?: How do I get the list of features and resources of each node ?

The command sinfo gives such information. You need to run it with specific output parameters though:

sinfo -o "%15N %10c %10m  %25f %10G"

It will output something like:

dfr@manneback:~ $ sinfo -o "%15N %10c %10m  %25f %10G"
NODELIST        CPUS       MEMORY      FEATURES                  GRES      
mback[01-02]    8          31860+      Opteron,875,InfiniBand    (null)    
mback[03-04]    4          31482+      Opteron,852,InfiniBand    (null)    
mback05         8          64559       Opteron,2356              (null)    
mback06         16         64052       Opteron,885               (null)    
mback07         8          24150       Xeon,X5550                TeslaC1060
mback[08-19]    8          24151       Xeon,L5520,InfiniBand     (null)    
mback[20-32,34] 8          16077       Xeon,L5420                (null)    

:?: Is OpenMP 'slurm-aware' ?

No you need to specify

export OMP_NUM_THREADS=...

For instance:

export OMP_NUM_THREADS=$SLURM_NTASKS_PER_NODE

:?: Is MPI 'slurm-aware' ?

Yes. you do not need to specify the -np nor the -host, hostfile options. Simply go with

mpirun ./a.out

or

srun ./a.out

depending on the MPI implementation you choose (OpenMPI, mvapich, etc.) See the Slurm documentation for precise information.

But do not forget to set the environment correctly with something like

module load openmpi/gcc

:?: How do I get more information about running and managing jobs ?

You can find the Slurm documentation here: https://computing.llnl.gov/linux/slurm/

:?: Who decides when my job is scheduled ?

All users are ensured a fair usage. The command

sprio

gives you the priority of your job depending on several factors. One of the factors is the fair share, for which you can find further information with

sshare

that will give you the faire share you can claim and your past usage.

:?: Can I send a Unix signal to my processes ?

Yes with the scancel command;

scancel --signal=9

for instance will kill your processes.

:?: How do I know how much memory my job used ?

sacct -j JOB_ID -o OUTPUT_FIELD_LIST

The OUTPUT_FIELD_LIST is a comma-separated of the following items:

JobIDThe number of the job or job step. It is in the form: job.jobstep.
jobnameThe name of the job or job step.
nodelistList of nodes in job/step.
nnodesNumber of nodes in a job or step.
NTasksTotal number of tasks in a job or step.
AvePagesAverage number of page faults of all tasks in job.
AveRSSAverage resident set size of all tasks in job.
AveVMSizeAverage Virtual Memory size of all tasks in job.
MaxPagesMaximum number of page faults of all tasks in job.
MaxPagesNodeThe node on which the maxpages occurred.
MaxPagesTaskThe task ID where the maxpages occurred.
MaxRSSMaximum resident set size of all tasks in job.
MaxRSSNodeThe node on which the maxrss occurred.
MaxRSSTaskThe task ID where the maxrss occurred.
MaxVMSizeMaximum Virtual Memory size of all tasks in job.
MaxVMSizeNodeThe node on which the maxvmsize occurred.
MaxVMSizeTaskThe task ID where the maxvmsize occurred.

e.g.

sacct -o JobID,jobname,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize

:?: How do I use the local scratch space ?

Slurm offer the sbcast command that propagates a file to the local file systems of the nodes that were allocated to the job. However, sbcast works one file at a time. It is therefore unsuited for copying entire data directories for instance.

One neat way is to use a construction like

srun cp 

For instance, in the script below

#!/bin/bash
#SBATCH -N 2
#SBATCH -o output.txt
SCRATCH=/scratch/$USER/$SLURM_JOB_ID

echo Creating temp dir $SCRATCH
srun mkdir -p $SCRATCH || exit $?
echo Coping files. srun cp is equivalent to loop over each node + scp
srun cp -r $SLURM_SUBMIT_DIR/*  $SCRATCH || exit $?

the data are copied from the home to the local scratch. A directory is created with the login and the job ID. Suppose the script creates a file myres.txt with some results, on each machine, and that the rest can be discarded:

# Do some work here
srun touch $SCRATCH/myres.txt

In this case, we want to get each files from distinct nodes into a distinct dir.

for node in `srun hostname`;
do
  echo Copying from $node 
  mkdir -p $node
  scp -r $node:$SCRATCH/myres.txt hmem00:$SLURM_SUBMIT_DIR/$node/ || exit $?;
done;

If each result file has a distinct name, we can simply srun cp from the scratch to the home.

At the end, make sure to clean the scratch space.

echo Removing $SCRATCH
srun rm -rf $SCRATCH || exit $?

:?: How do I get the node list in full rather than in compressed format ?

Slurm describes node lists with notations like hmem[05-07,09-17]. To get the full list, use scontrol

dfr@hmem00:~ $ scontrol show hostname hmem[05-07,09-17] | paste -d, -s
hmem05,hmem06,hmem07,hmem09,hmem10,hmem11,hmem12,hmem13,hmem14,hmem15,hmem16,hmem17

:?: How do I know which slots exactly are assigned to my job ?

The command

scontrol show -d job <JOBID>

gives very detailed information about jobs.

:?: Is there a quicker way of submitting a batch job than writing a submission script ?

Well you can submit a one-line command with defaults SBATCH parameters with the –wrapper option.

       --wrap=<command string>
              Sbatch  will  wrap  the  specified  command string in a simple "sh" shell script, 
              and submit that script to the slurm controller.  When --wrap is used, a script name and
              arguments may not be specified on the command line; instead the sbatch-generated wrapper script is used.

e.g.

sbatch --wrap="hostname"

Available hardware

:?: Are the nodes connected with rapid interconnection ?

Yes. Except for the Green cluster, all nodes are interconnected with a QDR Infiniband connexion.

:?: Are there nodes with large memory ?

By desktop/laptop standards, all nodes have large memory. For instance, on Green, all nodes have either 16GB or 32GB of RAM. On Lemaitre2, the nodes have 48GB of memory.

If you need real big memory, use Hmem. Its nodes have a minimum of 128GB of RAM and a maximum of half a terabyte.

:?: Are there nodes with external accelerators (e.g. GPU)

Yes. The three postprocessing nodes of Lemaitre2 have a Q4000 nVidia GPU each. The SMCS computers have a C2075 nVidia GPU each. One node of Manneback has two C1060 nVidia GPU's, and another one has two nVidia M2090 and a Xeon Phi. If you would like some guidance on how to use the XeonPhi, see this quick tutorial and feel free to contact us.

:?: Are there Itanium processors ?

There is one computer with Itanium processors. Contact the managing team for more information.

Available software

:?: What is the module command ?

Modules have been set to ease the definition of the environment for specific purposes. Rather than setting PATHs and other environment variables, you simply have to use a command such as

module load intel/cce

to use the intel compiler, or

module load blas

to run a program that links to the BLAS library.

For a complete list of available modules, type

module avail

You should see something like this

---------------------------------------------------------------------------------------- /cvos/local/modulefiles -----------------------------------------------------------------------------------------
cluster-tools/3.1   ganglia/3.0.7       ipmitool/1.8.9      modules             shared              version
dot                 installer-tools/3.1 module-info         null                use.own

---------------------------------------------------------------------------------------- /cvos/shared/modulefiles ----------------------------------------------------------------------------------------
acml/gcc/64/3.6.0                   blas/intel/64/1                     globalarrays/gcc/openmpi/64/4.0.6   intel/idbe/10.1.008                 netperf/2.4.2
acml/gcc/mp/64/3.6.0                blas/pgi/64/1                       globalarrays/intel/openmpi/64/4.0.6 intel/mkl/9.0.018                   openmpi/gcc/64/1.2.6
[...]

For a list of modules that have been loaded, type

module list

Modules can be removed with module rm module_name or with module purge to remove them all.

:?: Is my favorite compiler installed ?

Compiler Cluster Version Path or module
Intel C/C++ Green 10.1.008 module load intel/cce/<version>
11.1 source /usr/local/intel/Compiler/11.1/038/bin/iccvars.sh intel64
Hmem 11.1.038, 11.1.073 module load intel/compiler/<version>
12.0.0.084 module load intel/compilerpro/<version>
Lemaitre2 10.1.008 module load /intel/cc(e)|fc(e)/<version>
11.1.038, 11.1.073 module load intel/compiler/<version>
12.0.0.084 module load intel/compilerpro/<version>
Manneback 10.1.008 module load /intel/cc(e)|fc(e)/<version>
11.1.038, 11.1.073 module load intel/compiler/<version>
12.0.0.084 module load intel/compilerpro/<version>
GNU C/C++ Green 4.1.2 /usr/bin/{gcc, g++}
4.2.1 /usr/local/gcc-<version>/bin/{gcc,g++}
Hmem 4.4.4 /usr/bin/{gcc, g++}
Lemaitre2 4.4.6 /usr/bin/{gcc, g++}
Manneback 4.4.6 /usr/bin/{gcc, g++}
PGI C/C++ Green 8.0.3, 7.0.7 module load pgi/<version>
Hmem 11.2-1 module load pgi/<version>
Lemaitre2 11.2-1 module load pgi/<version>
Manneback 11.2-1 module load pgi/<version>

The version written in bold characters is the one by default when applicatble.

:?: Is … installed ?

Matlab (INMA) At the moment, Matlab is installed only on lm9.

Note that the Matlab runtime libraries are installed on each node of each cluster, so compiled Matlab code can run on every node. See Matlab on the cluster for more detailed information.

The Matlab component Runtime is installed on Hmem. Use module load mcr/v713 for instance to use it with your compiled Matlab code.

Thermocalc (IMAP) is installed on lm9 for interactive use. Connect to lm9 and launch the following command

source /usr/imap/tcinit.sh

That command sets up the environment to use Thermocalc. It should be run once when you connect. To avoid having to type it every time, add the line to your .bashrc file, for instance with

echo "source /usr/imap/tcinit.sh" >> ~.bashrc

Then you can launch Thermocalc simply by typing

tcs

To use dictra simply type

dictra

Note that the TC_Matlab toolbox is not installed on the server as it comes only in Windows version.

Green specifics

:?: Why the name ?

The Green cluster was built with environmental concerns, hence its name. It was specifically designed to maximize both the computing performance and the energy performance, i.e. maximize GFlops and minimize consumed Watts.

:?: When was Green acquired ?

The frontend, the 96 Dell M600 and the 6 PowerEdge 1950 were acquired in 2008.

:?: Where is Green located ?

The Green cluster is located in the 'Tier2' room of the 'Marc de Hemptine' building (aka Cyclotron).

:?: What is the maximum number of cores I can ask for on Green ?

Maximum 204 cores can be used at once by a single user.

:?: What should a submission script look like ?

Here is a generic submission script.

sge_green.sh
# SGE example1 on Green Cluster
#!/bin/sh
#
# SGE: the job name
#$ -N My_Green_Job
#
# SGE: pe=parallel environment request
# and the number of proc. needed
# SGE: here mpich with 6 proc.
# SGE: snode or snode8 also exist
#$ -pe mpich 6
#
# SGE: mandatory!! the requested run-time, expressed as
#$ -l h_rt=xxxx 
# (xxxx sec or hh:mm:ss (max 5 days=120:0:0)
# SGE will kill your job after the requested period.
#
# SGE: mandatory!! the required available RAM)
#$ -l mf=... e.g. 2G
#
# SGE: your Email here, for job notification
#$ -M my@mail.ucl.ac.be
# SGE: when do you want to be notified (b for begin, e for end, s for error)?
#$ -m bes
#
# SGE: ouput in the current working dir
#$ -cwd    
#
# Load modules
. /etc/profile.d/modules.sh
module purge
module load ...
module list
 
# Launch job
echo "Got $NSLOTS slots. Temp dir is $TMPDIR, Node file is:"
cat $TMPDIR/machines
echo Start at
date
... 
echo End at
date
# end of job

?: What are the SGE options specific to Green ?

Resources
-l h_vmem=x size of the required memory per core, e.g. “0.6G” for 600MB. This imposes the upper limit. Warning: if yu are using the models, and set this option, you also need to set -l h_stack=128m for modules to work.
-l mf=x amount of free memory required per core, e.g. “2G”. This option consider RAM only and disregards swap size ; it furthermore makes sure that the amount of free memory is available based on what the other jobs running on the node use. This imposes a lower limit
-l hm=true/false if set to true, request a node with 32GB or RAM.
Parallel environments
-pe mipch x requires an MPI environement. SGE allocates x cores accross the available nodes. The list of the allocated nodes is put by SGE in the file FIXME
-pe openmpi x Same as mipch but using openmpi.
-pe snode x requires that all cores of a node are allocated on the same node for multi-threading (e.g. OpenMP).
-pe snode8 x same as snode except that the required number of cores must be either 8, 16, 32 or 64
Queues
-q all.q submit the job to the generic queue (default)
-q fast.q submit the job to the fast queue. If you job takes less than 24 hours, submitting it to this queue will reduce its waiting time (think of the 'fast lane for customers with less than 10 products' at the cashier of your favorite market place)
-q snode8.q submit the job to nodes dedicated to jobs requiring all cores from a unique node.

:?: What is /workdir ?

The directory /workdir is mounted through the network on a disk array connected with fiber optics and set up to write on the disks in parallel. As a result, reading/writing from/to /workdir is much faster than from/to the home directories ; however the space is limited. /workdir should then be used as /scratch (see previous sections) : create a subdirectory at the beginning of the jobs and empty it at the end. /workdir is not as fast as the /scratch space, but it can be accessed by all nodes, just like the home directories. It can be seen as a trade-off between the speed of /scratch and the accessibility of the home directories.

:?: What is 'New Branch' ? New Branch was a set of 16 HP ProLiant servers which were acquired in 2010. They were later removed from Green and attached to Manneback.

Manneback specifics

:?: Why the name ?

The cluster Manneback is named after Charles Manneback (1894-1975), Professor of Physics at UCLouvain. Close friend to Georges Lemaitre, he was the lead of the FNRS-IRSIA project to build the first supercomputer in Belgium in the 50's.

:?: What hardware is available on Manneback ?

Use the sinfo command to learn about the available hardware

[root@manneback ~]# sinfo -a  -S hostname -o "%15n %10C %10m  %25f %18G %T"
HOSTNAMES       CPUS(A/I/O MEMORY      FEATURES                  GRES               STATE
mback07         0/8/0/8    24150       Xeon,X5550                TeslaC1060:2       idle
mback08         8/0/0/8    24151       Xeon,L5520,InfiniBand,Fhg (null)             allocated
mback09         4/4/0/8    24151       Xeon,L5520,InfiniBand,Fhg (null)             allocated
mback10         4/4/0/8    24151       Xeon,L5520,InfiniBand,Fhg (null)             allocated
mback11         8/0/0/8    24151       Xeon,L5520,InfiniBand,Fhg (null)             allocated
mback12         8/0/0/8    24151       Xeon,L5520,InfiniBand,Fhg (null)             allocated
mback13         0/8/0/8    24151       Xeon,L5520,InfiniBand,Fhg (null)             idle
mback14         0/8/0/8    24151       Xeon,L5520,InfiniBand,Fhg (null)             idle
mback15         0/8/0/8    24151       Xeon,L5520,InfiniBand,Fhg (null)             idle
mback16         0/8/0/8    24151       Xeon,L5520,InfiniBand,Fhg (null)             idle
mback17         0/8/0/8    24151       Xeon,L5520,InfiniBand,Fhg (null)             idle
mback18         0/8/0/8    24151       Xeon,L5520,InfiniBand,Fhg (null)             idle
mback19         0/8/0/8    24151       Xeon,L5520,InfiniBand,Fhg (null)             idle
mback20         7/1/0/8    16077       Xeon,L5420                (null)             allocated
mback21         3/5/0/8    16077       Xeon,L5420                (null)             allocated
mback22         0/8/0/8    16077       Xeon,L5420                (null)             idle
mback23         0/8/0/8    16077       Xeon,L5420                (null)             idle
mback24         0/8/0/8    16077       Xeon,L5420                (null)             idle
mback25         0/8/0/8    16077       Xeon,L5420                (null)             idle
mback26         0/8/0/8    16077       Xeon,L5420                (null)             idle
mback27         0/8/0/8    16077       Xeon,L5420                (null)             idle
mback28         0/8/0/8    16077       Xeon,L5420                (null)             idle
mback29         0/8/0/8    16077       Xeon,L5420                (null)             idle
mback30         0/8/0/8    16077       Xeon,L5420                (null)             idle
mback31         0/8/0/8    16077       Xeon,L5420                (null)             idle
mback32         0/8/0/8    16077       Xeon,L5420                (null)             idle
mback33         0/8/0/8    16077       Xeon,L5420                (null)             idle
mback34         0/8/0/8    16077       Xeon,L5420                (null)             idle
mback35         0/8/0/8    16077       Xeon,L5420                (null)             idle
mback40         0/16/0/16  64405       Xeon,E5-2660              TeslaM2090,XeonPhi idle

In your submission script, setting

#SBATCH --constraint="L5520"

will get your job to run on a compute node with a XeonL5520. Setting

#SBATCH --constraint="InfiniBand"

will make sure your job is allocated to a node with an InfiniBand connexion.

The GPU's are considered a “generic resource” in Slurm, meaning that you can reserve the GPU for your job only with a command like

#SBATCH --gres=TeslaC1060:2

in your submission script if you want access to both GPU's on the node.

:?: Is there a global temporary storage space I can use to dump large data ?

Two 'global' scratch spaces are available.

  • /workdir is an 7TB NFS shared space accessible from all Manneback nodes (front-end and compute nodes)
  • /globalfs is a 10TB FraunhoferFS space putting together all scratch spaces from the compute nodes. It is also accessible from front-end and compute nodes.

/workdir is slower than /globalfs, but the latter might be less robust, and its stability under heavy load is not guaranteed yet.

Each node furthermore has a local /scratch space.

:?: What is the Indus partition ?

New computers have be installed in a new partition named Indus. You can use them (#SBATCH –partition=Indus) but be warned that some jobs have absolute priority on that partition. Your job will be stopped and re-queued to let such a job start if any is subsequently submitted. The maximum job length on that partition is 24 hours.

Hmem specifics

:?: Why the name ?

Hmem stands for High memory. All Hmem nodes have at least 128GB or RAM, which is considered large at the time of writing.

:?: What partitions are available on Hmem ?

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
High         up 10-00:00:0      2  alloc hmem[01-02]
Middle       up 5-00:00:00      7  alloc hmem[03-09]
Low*         up 5-00:00:00     15  alloc hmem[03-17]
Fast         up 1-00:00:00      3   idle hmem[18-20]

Partition high groups nodes with 512GB of RAM; it has a time limit of 10 days. Partitions Middle and Low group respectively the nodes with 256GB and 128GB of RAM. Nodes in those partitions have 48 cores and the time limit is 5 days. Partition Fast comprises nodes with 128GB of RAM and 8 cores. Its maximum allowed run time is 24 hours.

Partition Low is the default partition.

:?: Is there a global temporary storage space I can use to dump large data ?

One 'global' scratch space is available.

  • /globalfs is a 31TB FraunhoferFS space putting together all scratch spaces from the compute nodes. It is also accessible from front-end and compute nodes, except from those of the Fast partition.

The old /workdir filesystem is not mounted anymore.

Each node furthermore has a local /scratch space.

Lemaitre2 specifics

:?: Why the name ?

This cluster was meant as a replacement for the older cluster name Lemaitre. It was named Lemaitre2 to avoid possible confusion. The cluster Lemaitre was named after Georges Lemaitre (17 July 1894 – 20 June 1966), a Belgian priest, astronomer and professor of physics at UCLouvain. He is seen by many as the father of the Big Bang Theory but also he is the one who brought the first supercomputer to our University.

:?: What partitions are available on Lemaitre2 ?

Def*         up 5-00:00:00   108   idle lmWn[001-008]
PostP        up    6:00:00      3  alloc lmPp[001-003]

Most nodes are in the default, Def, partition. Its time limit is 5 days. Partition PostP comprises nodes with powerful GPU cards.

:?: Is there a global temporary storage space I can use to dump large data ?

The directory /scratch is a lustre file system shared by all the compute nodes and the frontend. It has a capacity of approx. 110TB.

2. F.A.Q. Stockage de masse

Contents

About mass storage at CISM :?: What is/are mass storage/archiving/backups :?: What hardware is available ? :?: How do I access the storage servers ?

More about connecting/Copying :?: How do I copy data to and from my storage space ? :?: What is the ALSOreplicus directory ? :?: How can I speed up transfers ?

More about the costs :?: How much does it cost ? :?: What is the difference between package ('forfait') and quota ? :?: I need more quota! What do I do ?

About mass storage at CISM

:?: What is/are mass storage/archiving/backups

Mass storage consists in storing large amounts of data while ensuring:

  • high availability: data are accessible 24/7 from any computer connected to the local network.
  • high transfer rates: data are accessed through high-bandwith networks and stored on high-throuput disks.
  • security: data are protected from unwanted access with software and hardware means.
  • safety: data are protected against hardware failure with enterprise-level disks (designed to spin 24/7) and redundancy.

Archiving consists in storing data that are not accessed for long periods of time. By contrast with mass storage high availability and high transfer rates are irrelevant; archiving is often done on low-consumption infrastructures with disks designed for stability when powered off. The mass storage infrastructure at CISM is not designed for archiving. Users can of course use the mass storage facilities for archiving, but the cost will correspond to mass storage and not to archiving.

Backing up consist in storing data in a way that allows recovering previously deleted information. The CISM does not offer a full backup solution (the CISM does offer a simple Replicus service. See below). What is deleted cannot be restored. Users can of course use the mass storage facilities for backups but they need their own set of scripts/software to implement a full backup solution.

:?: What hardware is available ?

The following servers are currently available.

Name Netto capacity File system Note
diskus 12TB ext3 Honorably discharged
lmstor 14TB ZFS Honorably discharged
lmx 18TB ZFS Honorably discharged
storage02 36TB ZFS compressed Attached to Green
storage03 42TB ZFS compressed + 2 SSD cache disks
storage04 74TB ZFS compressed
storage05 74TB+90TB ZFS compressed

:?: How do I access the storage servers ?

You need an SSH client i.e. a piece of software that allows connecting to a remote computer using the SSH protocol.

On Linux, simply type in

ssh -X mylogin@storagexxx.cism.ucl.ac.be

to access the server storagexxx. Do not forget to replace mylogin with your actual login. The -X option allows using software with a graphical user interface (GUI). You can ignore it if all you need is a command line interface (CLI).

More about connecting/copying

:?: How do I copy data to and from my storage space ?

The usual way is to use the Unix commands scp and/or rsync. The home directory can be accessed by all nodes on the cluster. You can always use any fancy graphical user interface you like as long as it uses scp or sftp to make the transfers. For instance, you can launch, from your desktop/laptop, provided it runs Linux or MacOS and you are in the UCLouvain network,

scp -r mywork/ mylogin@storagexxx.cism.ucl.ac.be:/path/to/my/space

You will be prompted for your password on the storage server (the same as on the cluster if you did not change it). Then, your directory mywork will be copied in your directory on the server. The path to your directory on the server has been communicated to you when you registered for a storage space. Copying back is done by

scp -r mylogin@storagexxx.cism.ucl.ac.be:/path/to/my/space/mywork/ .

After you enter your password, your directory mywork on the server will be copied in the current directory on your desktop/laptop. To avoid having to enter your password every time, use a public/private key pair.

Type ''man scp'' or ''man rsync'' to get information about those commands. You can alternatively attend the training sessions, or contact the sysadmins. If your computer runs Windows or if you would like to connect from outside UCLouvain, please see More about connecting / copying.

Another alternative is to use FTP.

Open an FTP session

 ftp mylogin@storagexxx.cism.ucl.ac.be

Copy to and from the server

 put filename
 get filename

Close the FTP session

 bye

GUI FTP clients are plenty and will ease the job for you.

One final word: if you have many small files and need to transfer them, you will gain much time by merging them together in a large compressed tar file or a zip file and then copying the large file and finally uncompressing it. Having many small files really kills the transfer bandwidths.

:?: What is the ALSOreplicus directory ?

This directory was created for you when you registered. It is given as a simple means to organize a replication-based backup strategy. This strategy consists in duplicating all your data on distinct servers in distinct rooms so as to minimize losses due to hazard

Everyfile that you copy in this directory is synchronized every hour on another server in another room. Note that therefore, the space you use in that directory is charged twice!

:?: How can I speed up transfers ?

If you have a small number of large files, you can instruct SSH to use a faster but “less secure” encryption scheme with the option -c arcfour. You can also use the -C option to compress the data if the file itself is not already compressed.

If you have a large number of small files, you first should gather them in a single archive file (so called .tar file) and compress it. You can do it on the fly like this

tar -cf – directory/ | ssh mylogin@storagexxx tar -xf - 

This will have the same effect as

scp -r directory mylogin@storagexxx:

More about the costs

:?: How much does it cost ?

Using the mass storage facilities comes at a cost. The cost per Gigabyte per year is determined each year by the Comité de Gestion. The cost structure is designed so as to amortize the cost of the hardware over its lifetime.

The total cost per year is computed as the average (integral) use of the storage space over the year. Alternatively, you can opt in for a package ('forfait'), expressed in Gigabytes. In the latter case, you will pay for the space corresponding to the package ('forfait') for the whole year. The price per Gigabyte in the case you opt for the pacakge ('forfait') is lower than in the other case.

The precise formula is given here.

:?: What is the difference between package ('forfait') and quota ?

The package ('forfait') is a concept that is related to how the storage is charged. If you buy a package ('forfait'), then you have a fixed cost that does not depend on the actual storage used (provided you stay within the bounds of your package) by contrast to others you have a variable cost that depends on the storage used. Packages cost less per Gigabyte.

A quota is a technical concept that is related to a particular server. Servers indeed have a fixed capacity which is shared among users. Each user group (in the Unix sense or in the Research group sense, it depends on the server) has a fixed quota that the group cannot exceed.

A group with large needs will probably have several quotas on several servers ; the sum of which should correspond to their package size ('forfait')

:?: I need more quota! What do I do ?

Contact the CISM team (egs-cism@listes.uclouvain.be). Depending on the free space on the server for which you request a quota increase, you might be offered to migrate to another server or to open a new space for your group on another server.

:?: Why is the space used in my folder ALSOreplicus charged twice ?

Because it is used twice on distinct servers ; data in ALSOreplicus are copied every hour onto another server to serve as a simple backup procedure.

| 3/06/2011 |