|
|
1. F.A.Q. Calcul intensifContentsAbout cluster computing at CISM :
More about connecting / copying :
About cluster computing at CISM
A cluster is a set of several computers (nodes) that are interconnected and appear to the user as one large machine. The cluster is accessed through a frontend where users can manage their jobs and data in their home directory. Each node has a certain number of processors, a certain amount of memory (RAM) and some local storage (scratch space). Each processor comprises several independent computing units (cores). In a hardware context, a CPU is often understood as a processor die, which you can buy from a vendor and fits into a socket on the motherboard, while in a a software context, a CPU is often understood as one compute unit, a.k.a. a core.
The following clusters/computers are available for use:
See more detailed specifications on the CISM infrastructure web page and the CÉCI clusters web page.
For CISM clusters, go to http://www.cism.ucl.ac.be/login, while for CÉCI clusters, go to http://www.ceci-hpc.be and click on the “Create account” link in the top right corner of the page. Note that you need to be connected to the network of a CÉCI university to access those pages. Creating an account from home is not possible.
Each cluster has a frontend which you can access to test pre-installed software, compile your own code, copy your data, and submit your jobs. All computation nodes run a Linux operating system. Typical usage of a cluster consists in the following steps:
You can learn all this from scratch by attending the training sessions that are organised each year.
The usual way is to use the commands scp -r mywork/ mylogin@clustername.cism.ucl.ac.be:
Make sure to replace scp -r mylogin@clustername.cism.ucl.ac.be:mywork/ .
Your directory Type ''man scp'' or ''man rsync'' to get information about those commands. You can alternatively attend the training sessions, or contact the sysadmins. If your computer runs Windows or if you would like to connect from outside UCLouvain, please see More about connecting / copying.
You need an SSH client i.e. a piece of software that allows connecting to a remote computer using the SSH protocol. On Linux, simply type in ssh -X mylogin@clustername.cism.ucl.ac.be
to access the frontend of cluster With CÉCI clusters you need also to provide your CÉCI private key (called id_rsa.ceci) to your SSH client (see the CÉCI FAQ): ssh -X -i id_rsa.ceci mylogin@clustername.cism.ucl.ac.be Note that all CÉCI clusters do not end in “cism.ucl.ac.be” !
Interestingly, if you set the following in a file named Host clustername
HostName clustername.cism.ucl.ac.be
User mylogin
ForwardX11 yes
IdentityFile ~/.ssh/id_rsa.ceci
you can just issue commands like ssh clustername to connect.
Because you are not the only one to use the cluster. There is a piece of software called a job scheduler (e.g. Sun (Oracle) Grid Engine or Slurm) that makes sure that the jobs are dispatched to the cluster nodes as optimally as possible with respect to the available resources and as fairly as possible with respect to users. To submit a job, you first have to write a shell script (i.e. a file containing a sequence of Shell commands) that contains information about the requirements of the task you want to launch, the environment that must be set up for it to launch correctly, the executable file that is to be launched, etc.
Then, once logged into the frontend of the cluster you want to use, you type either
You can either use/develop parallel software using one of the main standards, namely OpenMP and MPI.
Or you can launch several instances of the same program on distinct data pieces, or with different parameters, using 'Job arrays' with SGE option -t (cfr
Every node of the cluster has access to the home directories of the users. Those are indeed
If you typically get an email when your job has finished. Then you will find, in your directory on the cluster, files, named after the job id, that contain the output of your program, the errors, etc.
The rest of this wiki contains additional information, notably in the Knowledgebase and Troubleshooting sections. Additionally, the CISM organizes every year training sessions that topics ranging from ther beginner's introduction to Linux to complex optimization of parallel software. Finally, the following reference books can be borrowed from CISM (a small deposit will be asked and returned when the book is given back):
Two things you need to do:
More about connecting / copying
Just as it sounds, a public/private key pair is a pair of keys, one of which is public, while the other is private. The publick key should actually be seen as the keyhole corresponding to the private key. On your desktop/laptop computer, you first have to generate such a pair and then copy the public key to a special directory (
If you want to have a Unix-like environment on your Windows computer, Cygwin and MinGW are interesting choices. Once installed, they offer 'A collection of tools which provide Linux look and feel.', among which those necessary for accessing remote computers with SSH. If you just want a lightweight, free, SSH client for windows you should consider Putty.
Under Host Name simply put the name of the frontend of the cluster you want to connect to, e.g. Note that to be able to use software with a graphical user interface, you also need a local X server installed on your Windows computer, such as Xming.
Line breaks are encoded in Unix with one character, represented as '\n', or LF: line feed, while it is encoded with two characters, '\r\n' or CR:carriage return, LF:line feed. Some programs, such as text editors, are mostly insensitive to it, but some others (a.o. slurm utilities) are not. See here how to deal with it.
If you have access to UCLouvain's VPN, you can simply use it. Otherwise,
you can go through the gateway One way to create an SSH tunnel is to type ssh my_cism_login@hall.cism.ucl.ac.be -L 1234:hmem.cism.ucl.ac.be:22
in a terminal on your local computer. Make sure to replace Leave that terminal open. Then, in another terminal, on the same computer, a command such as ssh -i .ssh/id_rsa.ceci -p 1234 my_CECI_login@localhost
will actually connect to If you want to connect to a CISM cluster (e.g. manneback or green), you need of course to use, in the above command, your CISM login and password.
The commands scp -r -P 1234 myworkdir localhost:
will copy your local directory
No we cannot. We do not store your passwords in plain text, rather in hashed form. That means we are able to check that the password you provide when you connect is correct, but we are not able to give it back to you. In case you forget your password, you simply need to go back to http://www.cism.ucl.ac.be/login and follow the same instructions as when you created your account. Just make sure to give the exact same email address each time and your existing account will be updated with the new information.
No we do not, due to lack of storage space. Make sure to always a copy of your code/scripts/data on either your personal computer or a storage machine (see the mass storage FAQ). Submitting jobs with SGE (Green cluster only)
SGE is the Sun Grid Engine, now named Oracle Grid Engine since Oracle bought Sun. Its earlier versions were open source. See http://en.wikipedia.org/wiki/Oracle_Grid_Engine
Here is a generic submission script we could name
The first step is to login onto the frontend of the cluster you want to use (e.g. green, lemaitre, etc.). Then you type in
Your job 418187 ("My_Job") has been submitted
The number
On the frontend, type job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 418187 0.00000 My_Job dfr qw 11/26/2010 09:55:25 1 The columns 'state' indicates whether your job is waiting ( qw ), running ( r ), or done ( d ). If the output is empty, it means all your jobs are finished.
To knwo why a specific job is waiting, and, more generally, to get information about one job, use the dfr@lemaitre ~/Formation/Part1 >qstat -j 418190
==============================================================
job_number: 418187
exec_file: job_scripts/418187
submission_time: Fri Nov 26 10:08:11 2010
owner: dfr
uid: 106
group: grppan
gid: 205
sge_o_home: /home/pan/dfr
sge_o_log_name: dfr
sge_o_path: /gridware/sge/bin/lx24-amd64:/opt/intel/fce/9.1.036/bin:/opt/intel/cce/9.1.042/bin:/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/bin:/opt/gnome/bin:/opt/kde3/bin:/opt/pathscale/bin:/usr/pgi/linux86-64/6.0/bin:/mnt/optlm/bin:/home/pan/dfr/bin:.
sge_o_shell: /bin/bash
sge_o_workdir: /mnt/homezfs/pan/dfr/Formation/Part1
sge_o_host: lemaitre
account: sge
cwd: /home/pan/dfr/Formation/Part1
hard resource_list: h_rt=20,num_proc=8,matlab=true
mail_list: my@mail.ucl.ac.be
notify: FALSE
job_name: My_Job
jobshare: 0
env_list: MPICH_PROCESS_GROUP=0
script_file: submission_script.sh
usage 1: cpu=00:00:02, mem=0.00000 GBs, io=0.00000, vmem=N/A, maxvmem=165.320M
scheduling info: queue instance "all.q@lmexec-11" dropped because it is temporarily not available
queue instance "all.q@lmexec-79" dropped because it is overloaded: load_avg=2.080000 (no load adjustment) >= 1.5
queue instance "all.q@lmexec-90" dropped because it is disabled
queue instance "all.q@lmexec-92" dropped because it is full
...
Under scheduling info, you will find, for each node, the reason why it cannot take care of your job. Messages such as
Use the
See the parts of this FAQ that are related to those clusters.
Lemaitre uses the Sun Grid Engine 6.1 while Green uses the Sun Grid Engine version 6.2. You can consult the full documentation here.
The grid engine, SGE, is configured to ensure all users a fair access to the resources. SGE constantly performs bookkeeping for who has used what resources for how long. The priority for a new job is established as a (undocumented), decaying, function of the number of cores and the amount of memory used in the past. The priority is comprised within 0 and 1 (maximum priority); it is printed in the output of the
A job array, or also called a parametric job, is a job that is made of several independent tasks. In SGE, it is created with the
Resource reservation is a mechanism that allows jobs requesting a large number of cores to be scheduled as fairly as the others. For instance, without reservation, when a job is submitted with a request for 32 cores, the job will not start unless 32 cores are available, which can virtually be impossible when the cluster is used at 80% of its capacity which is the case most of the time.
If that same job, if submitted with the option
Once the reservation pool contains all requested cores, the job is scheduled.
The command qload will give you an instant overview of the load of the cluster, while the command qstat -u "*" will tell you how many jobs are waiting, with their respective priorities. Submitting jobs with Slurm
Slurm stands for “Simple Linux Utility for Resource Management” although some people insist it means “Sophisticated Linux Utility for Resource Management”. It was first developed at the Lawrence Livermore National Laboratory and is getting more and more attention. Cfr http://en.wikipedia.org/wiki/Simple_Linux_Utility_for_Resource_Management.
Here is a generic submission script we could name
The first step is to login onto the frontend of the cluster you want to use (e.g. green, lemaitre, etc.). Then you type in
Submitted batch job 125512
The number
On the frontend, type dfr@hmem00:~ $ squeue -u dfr JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 125512 Low My_Test_ dfr R 0:33 2 hmem[15,17] The column 'ST' indicates whether your job is pending ( PD ), or running ( R ). If the output is empty, it means all your jobs are finished.
To know why a specific job is waiting, and, more generally, to get information about one job, use the dfr@hmem00:~ $ squeue -l -j 125503 Wed Feb 22 11:04:45 2012 JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON) 125503 High quartz_s dfr PENDING 0:00 5-00:00:00 1 (Priority)
Use the
The grid engine, Slurm, is configured to ensure all users a fair access to the resources. Slurm constantly performs bookkeeping for who has used what resources for how long. The priority for a new job is established as a decaying function of the number of cores and the amount of memory used in the past. The priority is comprised within 0 and 1 (maximum priority); it is printed in the output of the
Slurm uses the term partition rather than queue. To submit a job to a given partition, use the -p, --partition=<partition_names>
Request a specific partition for the resource allocation. If
not specified, the default behaviour is to allow the slurm con-
troller to select the default partition as designated by the
system administrator. If the job can use more than one parti-
tion, specify their names in a comma separate list and the one
offering earliest initiation will be used.
Slurm ignores the concept of parallel environment as such. Slurm simply requires that the number of nodes, or number of cores be specified. But you can have the control on how the cores are allocated; on a single nodes, on several nodes, etc. using the
With those options, there are several ways to get the same allocation. For instance, the following : --nodes=4 --ntasks=4 --cpus-per-task=4 is equivalent in terms of resource allocation to --ntasks=16 --ntasks-per-node=4
but it will lead to environment variables being set, and understood, differently by Suppose you need 16 cores. Here are some use cases
Slurm associates to each nodes a set of Features and a set of Generic resources. Features are immutable characteristics of the node (e.g. network connexion type) while generic resources are “consumable” resources, meaning that as users reserve them, they become unavailable for the others (e.g. compute accelerators). Features are requested with --constraint="feature1&feature2" or --constraint="feature1|feature2"
the former requesting both, while the latter, as one would expect, requesting at least one of Generic resources are requested with --gres="resource:2" to request 2 resources.
The command sinfo -o "%15N %10c %10m %25f %10G" It will output something like: dfr@manneback:~ $ sinfo -o "%15N %10c %10m %25f %10G" NODELIST CPUS MEMORY FEATURES GRES mback[01-02] 8 31860+ Opteron,875,InfiniBand (null) mback[03-04] 4 31482+ Opteron,852,InfiniBand (null) mback05 8 64559 Opteron,2356 (null) mback06 16 64052 Opteron,885 (null) mback07 8 24150 Xeon,X5550 TeslaC1060 mback[08-19] 8 24151 Xeon,L5520,InfiniBand (null) mback[20-32,34] 8 16077 Xeon,L5420 (null)
No you need to specify export OMP_NUM_THREADS=... For instance: export OMP_NUM_THREADS=$SLURM_NTASKS_PER_NODE
Yes. you do not need to specify the -np nor the -host, hostfile options. Simply go with mpirun ./a.out or srun ./a.out depending on the MPI implementation you choose (OpenMPI, mvapich, etc.) See the Slurm documentation for precise information. But do not forget to set the environment correctly with something like module load openmpi/gcc
You can find the Slurm documentation here: https://computing.llnl.gov/linux/slurm/
All users are ensured a fair usage. The command sprio gives you the priority of your job depending on several factors. One of the factors is the fair share, for which you can find further information with sshare that will give you the faire share you can claim and your past usage.
Yes with the scancel command; scancel --signal=9 for instance will kill your processes.
sacct -j JOB_ID -o OUTPUT_FIELD_LIST The OUTPUT_FIELD_LIST is a comma-separated of the following items:
e.g. sacct -o JobID,jobname,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize
Slurm offer the sbcast command that propagates a file to the local file systems of the nodes that were allocated to the job. However, sbcast works one file at a time. It is therefore unsuited for copying entire data directories for instance. One neat way is to use a construction like srun cp For instance, in the script below #!/bin/bash #SBATCH -N 2 #SBATCH -o output.txt SCRATCH=/scratch/$USER/$SLURM_JOB_ID echo Creating temp dir $SCRATCH srun mkdir -p $SCRATCH || exit $? echo Coping files. srun cp is equivalent to loop over each node + scp srun cp -r $SLURM_SUBMIT_DIR/* $SCRATCH || exit $?
the data are copied from the home to the local scratch. A directory is created with the login and the job ID.
Suppose the script creates a file # Do some work here srun touch $SCRATCH/myres.txt In this case, we want to get each files from distinct nodes into a distinct dir. for node in `srun hostname`; do echo Copying from $node mkdir -p $node scp -r $node:$SCRATCH/myres.txt hmem00:$SLURM_SUBMIT_DIR/$node/ || exit $?; done;
If each result file has a distinct name, we can simply At the end, make sure to clean the scratch space. echo Removing $SCRATCH srun rm -rf $SCRATCH || exit $?
Slurm describes node lists with notations like dfr@hmem00:~ $ scontrol show hostname hmem[05-07,09-17] | paste -d, -s hmem05,hmem06,hmem07,hmem09,hmem10,hmem11,hmem12,hmem13,hmem14,hmem15,hmem16,hmem17
The command scontrol show -d job <JOBID> gives very detailed information about jobs.
Well you can submit a one-line command with defaults SBATCH parameters with the --wrap=<command string>
Sbatch will wrap the specified command string in a simple "sh" shell script,
and submit that script to the slurm controller. When --wrap is used, a script name and
arguments may not be specified on the command line; instead the sbatch-generated wrapper script is used.
e.g. sbatch --wrap="hostname" Available hardware
Yes. Except for the Green cluster, all nodes are interconnected with a QDR Infiniband connexion.
By desktop/laptop standards, all nodes have large memory. For instance, on Green, all nodes have either 16GB or 32GB of RAM. On Lemaitre2, the nodes have 48GB of memory. If you need real big memory, use Hmem. Its nodes have a minimum of 128GB of RAM and a maximum of half a terabyte.
Yes. The three postprocessing nodes of Lemaitre2 have a Q4000 nVidia GPU each. The SMCS computers have a M2090 nVidia GPU each. One node of Manneback has two C1060 nVidia GPU's, and another one has a nVidia M2090 and a Xeon Phi. If you would like some guidance on how to use the XeonPhi, see this quick tutorial and feel free to contact us.
There is one computer with Itanium processors. Contact the managing team for more information. Available software
Modules have been set to ease the definition of the environment for specific purposes. Rather than setting PATHs and other environment variables, you simply have to use a command such as module load intel/cce to use the intel compiler, or module load blas to run a program that links to the BLAS library. For a complete list of available modules, type module avail You should see something like this ---------------------------------------------------------------------------------------- /cvos/local/modulefiles ----------------------------------------------------------------------------------------- cluster-tools/3.1 ganglia/3.0.7 ipmitool/1.8.9 modules shared version dot installer-tools/3.1 module-info null use.own ---------------------------------------------------------------------------------------- /cvos/shared/modulefiles ---------------------------------------------------------------------------------------- acml/gcc/64/3.6.0 blas/intel/64/1 globalarrays/gcc/openmpi/64/4.0.6 intel/idbe/10.1.008 netperf/2.4.2 acml/gcc/mp/64/3.6.0 blas/pgi/64/1 globalarrays/intel/openmpi/64/4.0.6 intel/mkl/9.0.018 openmpi/gcc/64/1.2.6 [...] For a list of modules that have been loaded, type module list
Modules can be removed with
The version written in bold characters is the one by default when applicatble.
Matlab (INMA) At the moment, Matlab is installed only on lm9. Note that the Matlab runtime libraries are installed on each node of each cluster, so compiled Matlab code can run on every node. See Matlab on the cluster for more detailed information.
The Matlab component Runtime is installed on Hmem. Use Thermocalc (IMAP) is installed on lm9 for interactive use. Connect to lm9 and launch the following command source /usr/imap/tcinit.sh That command sets up the environment to use Thermocalc. It should be run once when you connect. To avoid having to type it every time, add the line to your .bashrc file, for instance with echo "source /usr/imap/tcinit.sh" >> ~.bashrc Then you can launch Thermocalc simply by typing tcs
To use dictra Note that the TC_Matlab toolbox is not installed on the server as it comes only in Windows version. Green specifics
The Green cluster was built with environmental concerns, hence its name. It was specifically designed to maximize both the computing performance and the energy performance, i.e. maximize GFlops and minimize consumed Watts.
The frontend, the 96 Dell M600 and the 6 PowerEdge 1950 were acquired in 2008. The 16 HP ProLiant (so-called 'New Branch') were acquired in 2010. They were later removed from Green and attached to Manneback.
The Green cluster is located in the 'Tier2' room of the 'Marc de Hemptine' building (aka Cyclotron).
Maximum 204 cores can be used at once by a single user.
Here is a generic submission script that will work for all nodes of Green, except for the newer HP ProLiant nodes (see below)
?: What are the SGE options specific to Green ?
The directory Manneback specifics
The cluster Manneback is named after Charles Manneback (1894-1975), Professor of Physics at UCLouvain. Close friend to Georges Lemaitre, he was the lead of the FNRS-IRSIA project to build the first supercomputer in Belgium in the 50's.
Use the [root@manneback ~]# sinfo HOSTNAMES CPUS(A/I/O MEMORY FEATURES GRES STATE mback07 0/8/0/8 24150 Xeon,X5550 TeslaC1060:2 idle mback08 8/0/0/8 24151 Xeon,L5520,InfiniBand,Fhg (null) allocated mback09 4/4/0/8 24151 Xeon,L5520,InfiniBand,Fhg (null) allocated mback10 4/4/0/8 24151 Xeon,L5520,InfiniBand,Fhg (null) allocated mback11 8/0/0/8 24151 Xeon,L5520,InfiniBand,Fhg (null) allocated mback12 8/0/0/8 24151 Xeon,L5520,InfiniBand,Fhg (null) allocated mback13 0/8/0/8 24151 Xeon,L5520,InfiniBand,Fhg (null) idle mback14 0/8/0/8 24151 Xeon,L5520,InfiniBand,Fhg (null) idle mback15 0/8/0/8 24151 Xeon,L5520,InfiniBand,Fhg (null) idle mback16 0/8/0/8 24151 Xeon,L5520,InfiniBand,Fhg (null) idle mback17 0/8/0/8 24151 Xeon,L5520,InfiniBand,Fhg (null) idle mback18 0/8/0/8 24151 Xeon,L5520,InfiniBand,Fhg (null) idle mback19 0/8/0/8 24151 Xeon,L5520,InfiniBand,Fhg (null) idle mback20 7/1/0/8 16077 Xeon,L5420 (null) allocated mback21 3/5/0/8 16077 Xeon,L5420 (null) allocated mback22 0/8/0/8 16077 Xeon,L5420 (null) idle mback23 0/8/0/8 16077 Xeon,L5420 (null) idle mback24 0/8/0/8 16077 Xeon,L5420 (null) idle mback25 0/8/0/8 16077 Xeon,L5420 (null) idle mback26 0/8/0/8 16077 Xeon,L5420 (null) idle mback27 0/8/0/8 16077 Xeon,L5420 (null) idle mback28 0/8/0/8 16077 Xeon,L5420 (null) idle mback29 0/8/0/8 16077 Xeon,L5420 (null) idle mback30 0/8/0/8 16077 Xeon,L5420 (null) idle mback31 0/8/0/8 16077 Xeon,L5420 (null) idle mback32 0/8/0/8 16077 Xeon,L5420 (null) idle mback33 0/8/0/8 16077 Xeon,L5420 (null) idle mback34 0/8/0/8 16077 Xeon,L5420 (null) idle mback35 0/8/0/8 16077 Xeon,L5420 (null) idle mback40 0/16/0/16 64405 Xeon,E5-2660 TeslaM2090,XeonPhi idle In your submission script, setting #SBATCH --constraint="L5520" will get your job to run on a compute node with a XeonL5520. Setting #SBATCH --constraint="InfiniBand" will make sure your job is allocated to a node with an InfiniBand connexion. The GPU's are considered a “generic resource” in Slurm, meaning that you can reserve the GPU for your job only with a command like #SBATCH --gres=TeslaC1060:2 in your submission script if you want access to both GPU's on the node.
Two 'global' scratch spaces are available.
/workdir is slower than /globalfs, but the latter might be less robust, and its stability under heavy load is not guaranteed yet.
Each node furthermore has a local
New computers have be installed in a new partition named Indus. You can use them ( Hmem specifics
Hmem stands for High memory. All Hmem nodes have at least 128GB or RAM, which is considered large at the time of writing.
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST High up 10-00:00:0 2 alloc hmem[01-02] Middle up 5-00:00:00 7 alloc hmem[03-09] Low* up 5-00:00:00 15 alloc hmem[03-17] Fast up 1-00:00:00 3 idle hmem[18-20]
Partition
Partition
Two 'global' scratch spaces are available.
/workdir is much slower than /globalfs, but the latter has not been fully tested yet and its stability under heavy load is not guaranteed yet. It nevertheless passed all regular tests we performed. /globalfs is due to replace /workdir soon.
Each node furthermore has a local Lemaitre2 specifics
This cluster was meant as a replacement for the older cluster name Lemaitre. It was named Lemaitre2 to avoid possible confusion. The cluster Lemaitre was named after Georges Lemaitre (17 July 1894 – 20 June 1966), a Belgian priest, astronomer and professor of physics at UCLouvain. He is seen by many as the father of the Big Bang Theory but also he is the one who brought the first supercomputer to our University.
Def* up 5-00:00:00 108 idle lmWn[001-008] PostP up 6:00:00 3 alloc lmPp[001-003]
Most nodes are in the default,
The directory 2. F.A.Q. Stockage de masseContents
About mass storage at CISM
More about connecting/Copying
More about the costs
About mass storage at CISM
Mass storage consists in storing large amounts of data while ensuring
Archiving consists in storing data that are not accessed for long periods of time. By contrast with mass storage high availability and high transfer rates are irrelevant; archiving is often done on low-consumption infrastructures with disks designed for stability when powered off. The mass storage infrastructure at CISM is not designed for archiving. Users can of course use the mass storage facilities for archiving, but the cost will correspond to mass storage and not to archiving. Backing up consist in storing data in a way that allows recovering previously deleted information. The CISM does not offer a full backup solution (the CISM does offer a simple Replicus service. See below). What is deleted cannot be restored. Users can of course use the mass storage facilities for backups but they need their own set of scripts/software to implement a full backup solution.
The following servers are currently available.
You need an SSH client i.e. a piece of software that allows connecting to a remote computer using the SSH protocol. On Linux, simply type in ssh -X mylogin@storagexxx.cism.ucl.ac.be
to access the server storagexxx. Do not forget to replace More about connecting/copying
The usual way is to use the Unix commands scp -r mywork/ mylogin@storagexxx.cism.ucl.ac.be:/path/to/my/space
You will be prompted for your password on the storage server (the same as on the cluster if you did not change it). Then, your directory scp -r mylogin@storagexxx.cism.ucl.ac.be:/path/to/my/space/mywork/ .
After you enter your password, your directory Type ''man scp'' or ''man rsync'' to get information about those commands. You can alternatively attend the training sessions, or contact the sysadmins. If your computer runs Windows or if you would like to connect from outside UCLouvain, please see More about connecting / copying. Another alternative is to use FTP. Open an FTP session ftp mylogin@storagexxx.cism.ucl.ac.be Copy to and from the server put filename get filename Close the FTP session bye GUI FTP clients are plenty and will ease the job for you. One final word: if you have many small files and need to transfer them, you will gain much time by merging them together in a large compressed tar file or a zip file and then copying the large file and finally uncompressing it. Having many small files really kills the transfer bandwidths.
This directory was created for you when you registered. It is given as a simple means to organize a replication-based backup strategy. This strategy consists in duplicating all your data on distinct servers in distinct rooms so as to minimize losses due to hazard Everyfile that you copy in this directory is synchronized every hour on another server in another room. Note that therefore, the space you use in that directory is charged twice!
If you have a small number of large files, you can instruct SSH to use a faster but “less secure” encryption scheme with the option
If you have a large number of small files, you first should gather them in a single archive file (so called tar -cf – directory/ | ssh mylogin@storagexxx tar -xf - This will have the same effect as scp -r directory mylogin@storagexxx: More about the costs
Using the mass storage facilities comes at a cost. The cost per Gigabyte per year is determined each year by the Comité de Gestion. The cost structure is designed so as to amortize the cost of the hardware over its lifetime. The total cost per year is computed as the average (integral) use of the storage space over the year. Alternatively, you can opt in for a package ('forfait'), expressed in Gigabytes. In the latter case, you will pay for the space corresponding to the package ('forfait') for the whole year. The price per Gigabyte in the case you opt for the pacakge ('forfait') is lower than in the other case. The precise formula is given here.
The package ('forfait') is a concept that is related to how the storage is charged. If you buy a package ('forfait'), then you have a fixed cost that does not depend on the actual storage used (provided you stay within the bounds of your package) by contrast to others you have a variable cost that depends on the storage used. Packages cost less per Gigabyte. A quota is a technical concept that is related to a particular server. Servers indeed have a fixed capacity which is shared among users. Each user group (in the Unix sense or in the Research group sense, it depends on the server) has a fixed quota that the group cannot exceed. A group with large needs will probably have several quotas on several servers ; the sum of which should correspond to their package size ('forfait')
Contact the CISM team (egs-cism@listes.uclouvain.be). Depending on the free space on the server for which you request a quota increase, you might be offered to migrate to another server or to open a new space for your group on another server.
Because it is used twice on distinct servers ; data in ALSOreplicus are copied every hour onto another server to serve as a simple backup procedure.
|
3/06/2011
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||