About Data sharing¶
Sharing with collaborators outside UCLouvain¶
The CISM has setup two web services for sharing data:
Nextcloud : a DropBox-like service. It can be connected to the mass storage infrastructure to enable sharing of data from the mass storage servers to collaborators outside UCLouvain through protected download web links (URLs). You can connect to Nextcloud with your CISM account.
Dataverse: a platform that allows publishing datasets along with a DOI for easy and permanent referencing from published papers. You must connect to Dataverse with your UCLouvain portal account.
Those two services allow sharing data with collaborators outside UCLouvain, but they have different purposes. Nextcloud is more suitable for sharing data during the research process, while Dataverse is more suitable for sharing data after the research process, when the data is ready to be published and cited in a paper.
We can dub those two services as “external sharing” services aimed at collaborators who do not have access to the CISM infrastructure. For sharing data with collaborators who do have access to the CISM infrastructure, or to the CÉCI infrastructure, you can work with the UNIX permissions on the files.
Sharing with everyone on the same server¶
To share with everyone you need to make sure that your home directory has read and execute permission for others (e.g. rwxr-xr-x or 755 in octal form). Then, everyone can list the content of your home directory and access any file in there that has read access for others. The reasoning applies recursively to sub-directories.
To find which permissions apply to your home directory, run the ls
command (remember that ~ is the home directory of the current user,
the -l options is used to show permissions, the -d option is to
list the directory itself rather than its content, and the -L option
is necessary because on most CÉCI systems, the home directory is a
symbolic link):
ceciuser1@cecicluster:~ $ ls -dlL ~
drwxr-xr-x 56 ceciuser1 ceciuser1 8192 Aug 12 16:20 /home/ceciuser1
Here, the home directory of the ceciuser1 user on cecicluster is readable and
accessible by everyone.
Should the above command return something like the following:
ceciuser1@cecicluster:~ $ ls -dlL ~
drwxr-x--- 56 ceciuser1 ceciuser1 8192 Aug 12 16:20 /home/ceciuser1
another user (for instance here ceciuser2) would not be able to see
the contents:
ceciuser2@cecicluster:~ $ ls -l ~ceciuser1
ls: cannot open directory /home/ceciuser1: Permission denied
Then the chmod command must be used to give access to others. The
manual of the chmod command can be found by issuing the
man chmod command.
ceciuser1@cecicluster:~ $ chmod o=rx ~
ceciuser1@cecicluster:~ $ ls -dlL ~
drwxr-xr-x 56 ceciuser1 ceciuser1 8192 Aug 12 16:20 /home/ceciuser1
The above chmod command reads, in plain English, Change the
permissions of my home directory so that others have (=) read and execute access to it., so that other users will be
able to see its contents. Indeed, user ceciuser2 now can list the
content of the home directory of user ceciuser1.
ceciuser2@cecicluster:~ $ ls -l ~ceciuser1
drwxr-xr-x ceciuser1 ceciuser1 Aug 12 16:20 shared_dir
drwxr-x--- ceciuser1 ceciuser1 Aug 12 16:20 private_dir
-rw-r--r-- ceciuser1 ceciuser1 Aug 12 16:20 fileA
-rw-r----- ceciuser1 ceciuser1 Aug 12 16:20 fileB
What we see is that other (all other) users have read access to
shared_dir and fileA, but not to fileB for instance:
ceciuser2@cecicluster:~ $ cat ~ceciuser1/fileA
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
ceciuser2@cecicluster:~ $ cat ~ceciuser1/fileB
cat: /home/ceciuser1/fileB: Permission denied
Restricting access through hiding¶
Sometimes you do not want to share your files with all other users, though.
To share with only a restricted list of users, one option is to remove the read permission on your home directory. Then, other users won’t be able to list the content of your home directory, but if they know exactly the name of the file they need, and that file has read permission for others, they will be able to read it. If the name is not trivially guessable, that file is actually hidden but accessible to those who know it exists.
To reach this state, use the chmod command this way:
ceciuser1@cecicluster:~ $ chmod o=x ~
ceciuser1@cecicluster:~ $ ls -dlL ~
drwxr-x--x ceciuser1 ceciuser1 Aug 12 16:20 /home/ceciuser1
We see that the permissions are now drwxr-x–x rather than drwxr-xr-x, and other users can’t see what’s inside the directory:
ceciuser2@cecicluster:~ $ ls -l ~ceciuser1
ls: cannot open directory /home/ceciuser1: Permission denied
Nevertheless, if ceciuser2 knows that ceciuser1 has a file named
fileA, in his home directory, he can read its contents.
ceciuser2@cecicluster:~ $ cat ~ceciuser1/fileA
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Using this settings, ceciuser1 can decide with whom he shares a file.
Setting the file or directory name to an unguessable character string,
and giving that string to ceciuser2, is roughly equivalent to giving
ceciuser2 a password to access the file. Note though that anyone who
knows the ‘password’ (i.e. the secret file/directory name) will be able
to access it.
ceciuser1@cecicluster:~ $ ls -l ~
drwxr-xr-x ceciuser1 ceciuser1 Aug 12 16:20 shared_dir_&aqw1AQW
drwxr-x--- ceciuser1 ceciuser1 Aug 12 16:20 private_dir
-rw-r--r-- ceciuser1 ceciuser1 Aug 12 16:20 fileA
-rw-r----- ceciuser1 ceciuser1 Aug 12 16:20 fileB
Everyone who knows about shared_dir_&aqw1AQW will be able to access
it but it won’t be discoverable by others.
This hack is suitable for one-off file sharing, but for long-term sharing of files (e.g. code, scripts, input data), you might want to consider the next option.
Enforcing the restricted access (one to many)¶
The above solution is sufficient in most cases. But if you want to guarantee that only a specific set of users have access to your file even if they know its name, the CÉCI system administrators need to be involved.
To share with only a limited restricted list of users in a ‘one to many’ fashion (for instance you are installing software for all people in your research group, or you have all your group’s data in your home directory), and enforce that others cannot access the data, one option is to add each authorized parties to your group. Then, the above reasoning applies to the members of your group provided you replace every occurrence of others with group in the above paragraphs. Users not in your group will not be able to access your data at all (as long as your home directory has no execute permission for others of course.)
Assuming the CÉCI system administrators have included ceciuser2 in
ceciuser1‘s group:
ceciuser2@cecicluster:~$ id
uid=3000014(ceciuser2) gid=3000014(ceciuser2) groups=3000014(ceciuser2),3000003(ceciuser1)
then ceciuser2 has access to any file/directory that is
group-readable. Beware, that means most often all of ceciuser1‘s
files under the default settings!
Note
CECI projects (Tier-1 projects, Common_storage projects, etc.) come with a dedicated UNIX group that can be used as described above.
Enforcing the restricted access (many to many)¶
To share with a larger group, allow everyone in the group to share as well, and still enforce that others cannot access the data, the most convenient option is to create a UNIX group specifically for that group. Then, the reasoning in the previous paragraph applies provided you first change the group to which the data belong.
Assuming the CÉCI system administrators have created a group
ourgroup with ceciuser1 and ceciuser2,
ceciuser1@cecicluster:~ $ getent group ourgroup
ourgroup:*:4999998:ceciuser1,ceciuser2
then ceciuser1 can share the shared_dir directory with chgrp
ceciuser1@cecicluster:~ $ chgrp ourgroup shared_dir
ceciuser1@cecicluster:~ $ ls -l ~
drwxr-x--- ceciuser1 ourgroup Aug 12 16:20 shared_dir
drwxr-x--- ceciuser1 ceciuser1 Aug 12 16:20 private_dir
-rw-r--r-- ceciuser1 ceciuser1 Aug 12 16:20 fileA
-rw-r----- ceciuser1 ceciuser1 Aug 12 16:20 fileB
Note that the permissions of shared_dir allow here read access to
shared_dir by a member of the ourgroup group, but no write access,
and also no read access to users not in the ourgroup group. Should
ceciuser1 want other users in the group to be able to write files in his
directory, that can be done easily:
ceciuser1@cecicluster:~ $ chmod g+w shared_dir
ceciuser1@cecicluster:~ $ ls -l ~
drwxrwx--- ceciuser1 ourgroup Aug 12 16:20 shared_dir
drwxr-x--- ceciuser1 ceciuser1 Aug 12 16:20 private_dir
-rw-r--r-- ceciuser1 ceciuser1 Aug 12 16:20 fileA
-rw-r----- ceciuser1 ceciuser1 Aug 12 16:20 fileB
The above chmod command reads, in plain English, Add (+) write permission to members of the group owning shared_dir
Concluding remarks¶
- UNIX groups corresponding to TIER1 project are also created on the CÉCI clusters and can be used there.
- on the clusters, the
umask– the configuration setting that dictates the permissions set to newly created files – is set to 0002 or 0022, meaning that by default, your files are created with permission664or644respectively.