Scientific Computing - /work - Knowledge Portal

/work is a meta-filesystem which provides several hundred terabytes of persistent (but not backed up!) storage for various projects, exported via NFS to on-site CUE systems (including the Farm). Its directory structure is hierarchical, usually first by Hall, then experiment, e.g. a /work area for experiment e98123 in Hall A would be /work/halla/e98123.

There is no backup copy of data stored in /work. Data, including snapshots (which are just second references to the original, single copy of data), may be permanently lost in a variety of circumstances, including but not limited to unintentional deletion or overwriting (whether by you or someone else), administrative mistakes, malicious tampering, equipment failure, hardware damage (e.g. in a fire, flood, or earthquake), and software bugs resulting in filesystem corruption. Do not keep anything on /work which you cannot restore from another, non-/work location (e.g. GitHub, /group, or /home) or regenerate automatically.

File transfers

Use Globus. If you'd rather use SSH (e.g. rsync, scp, or sftp), you will need to proxy your connection through one of JLab's login gateways, as there is no longer access to /work via a host to which you can directly SSH from outside JLab's network. In particular, mounts were removed from ftp.jlab.org. on April 18^th, 2023.

Quotas

Each Hall has been allocated a set amount of work disk space, which they manage and distribute among the experiments taking place in their Hall. Requests for work disk space or increased allocations of work disk space should be made to the individual in charge of computing for the Hall. It is up to the Hall and/or experiment to implement procedures for purging files when more space is required. Some automated procedures that have been implemented by some include the deleting of the least recently used files, the deleting of the oldest files, and the deleting of large files. Check with the individual in charge of computing for the Hall and experiment to find out if such procedures are in place for a given work disk area. Current usage can be found on the scicomp.jlab.org. website.

Paths

In order to facilitate incremental migration to new equipment, /work is structured as a filesystem of links to filesystems automounted in /w. Make sure to use the /work path to access data, rather than the mount point name of a specific host (i.e. the /w path), as it will change over time as hardware is updated, resulting in "no such file or directory" or "read-only file system" errors if you are using the /w path.

Performance

There are several servers which host /work filesystems, but any particular path is hosted by only one. A job which runs well individually can easily overwhelm a /work server when several hundred or thousand are run simultaneously on the Farm, causing that server and the subfilesystems it hosts to become unresponsive for all users. Therefore, be cautious when scheduling jobs that will read or write a lot of data to or from /work, and consider staggering their release, if I/O intensity varies throughout, or using /cache or /volatile for large files (which are hosted by up to twelve servers in parallel) or local /scratch for small files (where the individual compute node can handle many operations in memory).

Compression

We enable lz4 by default, since the overhead of that algorithm is generally regarded to be so small that the reduction in disk access is almost always a net benefit to performance. This setting can result in du producing a figure that is much smaller than the apparent length of a file. Quotas are accounted by the compressed figure. Compression is configurable per subfilesystem, so let us know via your computing coordinator if you would like to disable compression or try a different algorithm.

Snapshots (which are not a backup!)

Since the fall of 2021, the /work/hall[abcd] filesystems (and a few others) have had snapshots enabled. Snapshots protect you from some cases¹ of one type of data loss, mistaken modifications or deletions. Other circumstances that may destroy data, as described above, will destroy the snapshot data at the same time as the original data, because the snapshots are merely an additional reference to the same data, not an actual copy. Only an independent copy, stored elsewhere, and preferably offline, can protect you from other risks, and can be properly termed a "backup." You still must not keep anything on /work that you can't replace.

That said, if all you need is a previous version of a file, and the filesystem is otherwise uncompromised, you may be able to retrieve it from a snapshot. Snapshots are available at the root of a subfilesystem under .zfs/snapshot (labels UTC), e.g.

lsh@ifarm1801 ~> ls /work/halla/moller12gev/.zfs/snapshot
zfs-auto-snap_daily-2022-07-02-0400/     zfs-auto-snap_hourly-2022-07-08-0300/
zfs-auto-snap_daily-2022-07-03-0400/     zfs-auto-snap_hourly-2022-07-08-0400/
zfs-auto-snap_daily-2022-07-04-0400/     zfs-auto-snap_hourly-2022-07-08-0500/
zfs-auto-snap_daily-2022-07-05-0400/     zfs-auto-snap_hourly-2022-07-08-0600/
zfs-auto-snap_daily-2022-07-06-0400/     zfs-auto-snap_hourly-2022-07-08-0700/
zfs-auto-snap_daily-2022-07-07-0400/     zfs-auto-snap_hourly-2022-07-08-0800/
zfs-auto-snap_daily-2022-07-08-0400/     zfs-auto-snap_hourly-2022-07-08-0900/
zfs-auto-snap_frequent-2022-07-08-1900/  zfs-auto-snap_hourly-2022-07-08-1000/
zfs-auto-snap_frequent-2022-07-08-1915/  zfs-auto-snap_hourly-2022-07-08-1100/
zfs-auto-snap_frequent-2022-07-08-1930/  zfs-auto-snap_hourly-2022-07-08-1200/
zfs-auto-snap_frequent-2022-07-08-1945/  zfs-auto-snap_hourly-2022-07-08-1300/
zfs-auto-snap_hourly-2022-07-07-2000/    zfs-auto-snap_hourly-2022-07-08-1400/
zfs-auto-snap_hourly-2022-07-07-2100/    zfs-auto-snap_hourly-2022-07-08-1500/
zfs-auto-snap_hourly-2022-07-07-2200/    zfs-auto-snap_hourly-2022-07-08-1600/
zfs-auto-snap_hourly-2022-07-07-2300/    zfs-auto-snap_hourly-2022-07-08-1700/
zfs-auto-snap_hourly-2022-07-08-0000/    zfs-auto-snap_hourly-2022-07-08-1800/
zfs-auto-snap_hourly-2022-07-08-0100/    zfs-auto-snap_hourly-2022-07-08-1900/
zfs-auto-snap_hourly-2022-07-08-0200/
lsh@ifarm1801 ~>

Effect on (apparent) quota

The "Used" and "Quota"/"Size"/"Total" figures in the output of df/du and the display on our website show the current usage (i.e. without snapshots) and that amount plus what's available, respectively.² When you delete something that was there when a snapshot was taken, it is no longer counted in df/du and starts consuming space in the snapshots until all snapshots referring to it are retired, and as the space used by snapshots thereby varies, the Quota/Size/Total will appear to change,³ and with the default schedule it takes a week for space to be made available again (the advantage, of course, being that you have that much time to restore the data). There is not presently a way for users to directly query the size of snapshots.

While providing that benefit, the use of quota by snapshots can result in a filesystem being full in a way that is not immediately resolvable non-administratively. If that happens, please submit an incident. Please let us know via your computing coordinator if you would like us to delete snapshots, change their schedule, disable them entirely, or explore other options for snapshot accounting (e.g. oversubscribing or reserving space for snapshots), whether for one or several filesystems.

1. The data you want to restore must have been on the filesystem when a snapshot was taken, and you must notice and act before all snapshots referring to the desired data are retired, which happens automatically on a schedule and may also happen manually.

2. Ultimately, this goes back to the POSIX statfs(2) interface only providing two of the three figures, so Total, Used, and Available cannot all vary independently. Usually the most important one is how many bytes could be written right now, and in filesystem configurations for which their relationship is not strictly additive (such as ZFS, and Lustre MDT inodes), for Used and Available to be current and actual, Total must vary.

3. In the past, changes in the available space without a corresponding change in used space were due to oversubscription, where one subfilesystem would start using space available to several subfilesystems and therefore what was available to all would decrease. We are no longer oversubscribed, so the space for snapshots has to come out of the space allocated to the subfilesystem.