About Storage

Data is most of the time stored in files and those files are organised in a hierarchy of directories in a filesystem. Although many filesystems exist in the clusters, they are not meant for long-term storage of large amounts of data.

The CISM hosts two types of file storage systems that fullfil that purpose:

  • mass storage
  • common CÉCI storage

Access and conditions

  • access to mass storage is billable by the volume (see below) and is organised by Pôle
  • access to the common CÉCI storage is free but available space is limited

Mass storage

Mass storage consists in storing large amounts of data while ensuring:

high availability: data are normally accessible at all time from any computer connected to the local network.

high transfer rates: data are accessed through high-bandwith networks and stored on high-throuput disk arrays.

security: data are protected from unwanted access with software and hardware means.

safety: data are protected against hardware failure with enterprise-level disks (designed to spin 24/7) and redundancy.

Archiving consists in storing data that are not accessed for long periods of time. By contrast with mass storage high availability and high transfer rates are irrelevant; archiving is often done on low-consumption infrastructures with disks designed for stability when powered off. The mass storage infrastructure at CISM is not designed for archiving. Users can of course use the mass storage facilities for archiving, but the cost will correspond to mass storage and not to archiving.

Backing up consist in storing data in a way that allows recovering previously deleted information. The CISM does not offer a full backup solution (the CISM does offer a simple Replicus service. See ALSOreplicus ). What is deleted cannot be restored. Users can of course use the mass storage facilities for backups and use their own set of scripts/software to implement a full backup solution. Incremental backups can also be performed by the CISM but only upon explicit request and the user must bear the cost accordingly.

The following servers are currently available:

Name Netto capacity File system Note
storage08 220TB ZFS compressed - 2 SSD caches - 2 parity disks per 11 disks - 3 spare disks
storage09 260TB ZFS compressed - 2 SSD caches - 2 parity disks per 11 disks - 3 spare disks
storage10 270TB ZFS compressed - 2 SSD caches - 2 parity disks per 11 disks - 3 spare disks

Common CÉCI storage

The common CÉCI storage offers a common home directory (limited to 100GB per user) available from every CÉCI cluster. It also features a TRSF partition where the quota is much larger, that is meant to transfer data from one cluster to another. More information about it can be found in the CÉCI website

Connection and file transfer

The common CÉCI storage is accessible from any CÉCI cluster but also from Manneback.

Access and file tranfers to the mass storage systems are done through the SSH protocol (ssh, scp, rsync, sftp, or any graphical user interface built upon SSH: MobaXTerm, Filezilla, Cyberduck, etc.) in a very standard way with your CISM login and password. The main entry point is storage.cism.ucl.ac.be. See the homonymous section in Interactive computing for more details.

Note that these machines are accessible directly only from within the network of the university or through the UCLouvain VPN. You won’t be able to access it from your home or from abroad. To access it from outside the university, you will need to use a gateway as explained here.

About the costs

Using the mass storage facilities comes at a cost. The cost per Gigabyte per year is determined each year by the Comité de Gestion. The cost structure is designed so as to amortize the cost of the hardware over its lifetime.

The total cost per year is computed as the average (integral) use of the storage space over the year. Alternatively, you can opt in for a package (‘forfait’), expressed in Terabytes. In the latter case, you will pay for the space corresponding to the package (‘forfait’) for the whole year. The price per Terabyte in the case you opt for the package (‘forfait’) is lower than in the other case.

The precise formula is given below:

\[APC = 0.8 \frac{UC} { \frac{1}{VP} + \frac{1}{2\cdot VPM}}\]

with:

  • \(APC\) : Annual Package Cost
  • \(UC\) : Rate for proportional usage in €/TB/Year
  • \(VP\) : Total Volume of the requested package
  • \(VPM\) : Total Volume of the biggest package requested by a research group

For example, in 2020, the rate was computed using the following values:

UC = 50 €/TB/Year
VPM = 320 TB

example: APC = 39.94 € for a requested package of 1TB

The package (‘forfait’) is a concept that is related to how the storage is charged. If you buy a package (‘forfait’), then you have a fixed cost that does not depend on the actual storage used (provided you stay within the bounds of your package) by contrast to others you have a variable cost that depends on the storage used. Packages cost less per Gigabyte.

A quota is a technical concept that is related to a particular server. Servers indeed have a fixed capacity which is shared among users. Each user group (in the Unix sense or in the Research group sense, it depends on the server) has a fixed quota that the group cannot exceed.

A group with large needs will probably have several quotas on several servers ; the sum of which should correspond to their package size (‘forfait’).

If you need more storage space, contact the CISM team:

egs-cism@listes.uclouvain.be