SupRsync Agent

The SupRsync agent keeps a local directory synced with a remote server. It continuously copies over new files to its destination, verifying the copy by checking the md5sum and deleting the local files after a specified amount of time if the local and remote checksums match.

usage: python3 agent.py [-h] --archive-name ARCHIVE_NAME --remote-basedir
                        REMOTE_BASEDIR --db-path DB_PATH [--ssh-host SSH_HOST]
                        [--ssh-key SSH_KEY]
                        [--delete-local-after DELETE_LOCAL_AFTER]
                        [--max-copy-attempts MAX_COPY_ATTEMPTS]
                        [--copy-timeout COPY_TIMEOUT]
                        [--cmd-timeout CMD_TIMEOUT]
                        [--files-per-batch FILES_PER_BATCH]
                        [--sleep-time SLEEP_TIME] [--compression]
                        [--bwlimit BWLIMIT] --suprsync-file-root
                        SUPRSYNC_FILE_ROOT

Agent Options

--archive-name

Name of managed archive. Determines which files should be copied

--remote-basedir

Base directory on the remote server where files will be copied

--db-path

Path to the suprsync sqlite db

--ssh-host

Remote host to copy files to (e.g. ‘<user>@<host>’). If None, will copy files locally

--ssh-key

Path to ssh-key needed to access remote host

--delete-local-after

Time (sec) after which this agent will delete local copies of successfully transfered files. If None, will not delete files.

--max-copy-attempts

Number of failed copy attempts before the agent will stop trying to copy a file

Default: 5

--copy-timeout

Time (sec) before the rsync command will timeout

--cmd-timeout

Time (sec) before remote commands will timeout

--files-per-batch

Number of files to copy over per batch. Default is None, which will copy over all available files.

--sleep-time

Time to sleep (sec) in between copy iterations

Default: 60

--compression

Activate gzip on data transfer (rsync -z)

Default: False

--bwlimit

Bandwidth limit arg (passed through to rsync)

--suprsync-file-root

Local path where agent will write suprsync files

Configuration File Examples

Below is an example of what the SupRsync configuration might look like on a smurf-server, with an instance copying g3 files and an instance copying smurf auxiliary files.

OCS Site Config

Below is an example configuration that copies files to the base directory /path/to/base/dir on a remote system. This will delete successfully transferred files after 7 days, or 604800 seconds:

{'agent-class': 'SupRsync',
 'instance-id': 'timestream-sync',
 'arguments':[
   '--archive-name', 'timestreams',
   '--remote-basedir', '/path/to/base/dir/timestreams',
   '--db-path', '/data/so/dbs/suprsync.db',
   '--ssh-host', '<user>@<hostname>',
   '--ssh-key', '<path_to_ssh_key>',
   '--delete-local-after', '604800',
   '--max-copy-attempts', '10',
   '--copy-timeout', '60',
   '--cmd-timeout', '5',
   '--suprsync-file-root', '/data/so/suprsync',
   ]},

{'agent-class': 'SupRsync',
 'instance-id': 'smurf-sync',
 'arguments':[
   '--archive-name', 'smurf',
   '--remote-basedir', '/path/to/base/dir/smurf',
   '--db-path', '/data/so/dbs/suprsync.db',
   '--ssh-host', '<user>@<hostname>',
   '--ssh-key', '<path_to_ssh_key>',
   '--delete-local-after', '604800',
   '--max-copy-attempts', '10',
   '--copy-timeout', '20',
   '--cmd-timeout', '5',
   '--suprsync-file-root', '/data/so/suprsync',
   ]},

Note

Make sure you add the public-key corresponding to the ssh id file you will be using to the remote server using the ssh-copy-id function.

Note

If for some reason the user running the suprsync agent does not have file permissions, if the --delete-local-after param is set the agent will crash when trying to delete files. In such an environment, make sure this option is not used.

Docker Compose

Below is a sample docker-compose entry for the SupRsync agents. Because the data we’ll be transfering is owned by the cryo:smurf user, we set that as the user of the agent so it has the correct permissions. This is only possible because the cryo:smurf user is already built into the SuprSync docker:

ocs-timestream-sync:
     image: simonsobs/socs:latest
     hostname: ocs-docker
     user: cryo:smurf
     network_mode: host
     container_name: ocs-timestream-sync
     environment:
         - INSTANCE_ID=timestream-sync
         - SITE_HUB=ws://${CB_HOST}:8001/ws
         - SITE_HTTP=http://${CB_HOST}:8001/call
     volumes:
         - ${OCS_CONFIG_DIR}:/config
         - /data:/data
         - /home/cryo/.ssh:/home/cryo/.ssh

ocs-smurf-sync:
     image: simonsobs/socs:latest
     hostname: ocs-docker
     user: cryo:smurf
     network_mode: host
     container_name: ocs-smurf-sync
     environment:
         - INSTANCE_ID=smurf-sync
         - SITE_HUB=ws://${CB_HOST}:8001/ws
         - SITE_HTTP=http://${CB_HOST}:8001/call
     volumes:
         - ${OCS_CONFIG_DIR}:/config
         - /data:/data
         - /home/cryo/.ssh:/home/cryo/.ssh

Note

If copying to a remote host from a docker container it is required that the corresponding ssh-key is also mounted into the container, and that it belongs to the user inside of the container with permission 600. The configuration above works because the cryo user in the container is the same as the one on the host, so the .ssh directory is already properly configured. For instance if using an ocs user in the docker, you may want to add a .ssh directory for the ocs user on the host (with correct permissions) and mount that directory into the container.

Copying Files

The SupRsync agent works by monitoring a table of files to be copied in a sqlite database. The full table is described here, but importantly it contains information such as the full local path, the remote path relative to a base directory determined by the SupRsync cfg, checksumming info, several timestamps, and an “archive” name which is used to determine which SupRsync agent should manage each file. This makes it possible for multiple SupRsync agents to monitor the same db, but write their archives to different base-directories or remote computers.

Note

This agent does not remove empty directories, since I can’t think of a foolproof way to determine whether or not more files will be written to a given directory, and pysmurf output directories will likely have unregistered files that stick around even after copied files have been removed. I think it’s probably best to have a separate cronjob make that determination and remove directory husks.

Adding Files to the SupRsyncFiles Database

To add a file to the SupRsync database, you can use the SupRsyncFiles Manager class as follows:

from socs.db.suprsync import SupRsyncFilesManager

local_abspath = '/path/to/local/file.g3'
remote_relpath = 'remote/path/file.g3'
archive = 'timestreams'
srfm = SupRsyncFilesManager('/path/to/suprsync.db')
srfm.add_file(local_abspath, remote_relpath, archive)

The SupRsync agent will then copy the file /path/to/local/file.g3 to the <remote_basedir>/remote/path/file.g3 on the remote server (or locally if none is specified). If the --delete-local-after option is used, the original file will be deleted after the specified amount of time.

Interfacing with Smurf

The primary use case of this agent is copying files from the smurf-server to a daq node or simons1. The pysmurf monitor agent now populates the files table with both pysmurf auxiliary files (using the archive name “smurf”) and g3 timestream files (using the archive name “timestreams”), as described in this section. On the smurf-server we’ll be running one SupRsync agent for each of these two archives.

Timecode Directories

For data packaging, file destinations are usually grouped into timecode directories, based on the first 5-digits of the ctime at which the file was created. Destination paths (or remote_paths in the db) will typically look like:

<remote_basedir>/<5-digit timecode>/<any-number-of-subdirs>/<filename>

where the remote-basedir is set in the suprsync site config, and everything after is what is registered as the remote_path in the suprsync database.

With this schema, it is important to transmit information that will allow downstream processors to determine whether or not a timecode directory is complete, as opposed to not copied over yet.

The SupRsync database will now automatically detect the timecode directory of new files that are added, and will track whether a timecode directory is complete (meaning suprsync expects no new files will be added) and synced (that all files in the timecode directory have been synced).

Once a timecode directory is complete and fully synced, suprsync will write a finalization file:

<timestamp>_<archive_name>_<dir timecode>_finalized.yaml

that will contain information about the timecode directory, including the number of files that this suprsync instance has synced over, the sub-directories that this suprsync instance has added to, the instance-id of the suprsync agent, and the finalization time. The directory on the local host that these files will be written to is set using the --suprsync-file-root argument, and the path on the remote host will be automatically generated.

Timecode Dir Completion Requirements

A timecode directory will be marked as complete if any of the following are true:

  • There are files in the suprsync archive written to a newer timecode directory

  • tc_now - tc_directory > 1, or we are roughly 1 day past when the timecode directory ended.

Example

For example, say we have a suprsync agent running on smurf-srv20 with instance-id smurf-sync-srv20, with suprsync-file-root = /data/so/suprsync. Suppose that at timestamp 1686152766 this agent detects that all files in the timecode directory 16750 of the smurf archive have been successfully synced. This agent will write the local file:

/data/so/suprsync/16861/smurf-sync-srv20/1686152766_smurf_16750_finalized.yaml

This file will be synced to the remote path:

<remote_basedir>/16861/suprsync/smurf-sync-srv20/1686152766_smurf_16750_finalized.yaml

where it can be processed by downstream data packaging software.

Agent API

class socs.agents.suprsync.agent.SupRsync(agent, args)[source]

Agent to rsync files to a remote (or local) destination, verify successful transfer, and delete local files after a specified amount of time.

Parameters:
  • agent (OCSAgent) – OCS agent object

  • args (Namespace) – Namespace with parsed arguments

agent

OCS agent object

Type:

OCSAgent

log

txaio logger object, created by the OCSAgent

Type:

txaio.tx.Logger

archive_name

Name of the managed archive. Sets which files in the suprsync db should be copied.

Type:

string

ssh_host

Remote host to copy data to. If None, will copy data locally.

Type:

str, optional

ssh_key

ssh-key to use to access the ssh host.

Type:

str, optional

remote_basedir

Base directory on the destination server to copy files to

Type:

path

db_path

Path of the sqlite db to monitor

Type:

path

delete_after

Seconds after which this agent will delete successfully copied files.

Type:

float

cmd_timeout

Time (sec) for which cmds run on the remote will timeout

Type:

float

copy_timeout

Time (sec) after which a copy command will timeout

Type:

float

run()[source]

Process - Main run process for the SupRsync agent. Continuosly checks the suprsync db checking for files that need to be handled.

Notes

The session data object contains info about recent transfers:

>>> response.session['data']
{
  "activity": "idle",
  "timestamp": 1661284493.5398622,
  "last_copy": {
    "start_time": 1661284493.5333128,
    "files": [],
    "stop_time": 1661284493.5398622
  },
  "archive_stats": {
    "smurf": {
        "finalized_until": 1687797424.652119,
        "num_files": 3,
        "uncopied_files": 0,
        "last_file_added": "/path/to/file",
        "last_file_copied": "/path/to/file"
    }
  },
  "counters": {
    "iterations": 1,
    "copies": 0,
    "errors_timeout": 0,
    "errors_nonzero": 0
  },
}

Supporting APIs

SupRsyncFiles Table

class socs.db.suprsync.SupRsyncFile(**kwargs)[source]

Files table utilized by the SupRsync agent.

local_path

Absolute path of the local file to be copied

Type:

String

local_md5sum

locally calculated checksum

Type:

String

archive_name

Name of the archive, i.e. timestreams or smurf. Each archive is managed by its own SupRsync instance, so they can be copied to different base-dirs or hosts.

Type:

String

remote_path

Path of the file on the remote server relative to the base-dir. specified in the SupRsync agent config.

Type:

String

remote_md5sum

Md5sum calculated on remote machine

Type:

String, optional

timestamp

Timestamp that file was added to db

Type:

Float

copied

Time at which file was transfered

Type:

Float, optional

removed

Time at which file was removed from local server.

Type:

Float, optional

failed_copy_attempts

Number of failed copy attempts

Type:

Int

deletable

Whether file should be deleted after copying

Type:

Bool

ignore

If true, file will be ignored by SupRsync agent and not included in finalized_until.

Type:

Bool

SupRsyncFiles Manager

class socs.db.suprsync.SupRsyncFilesManager(db_path, create_all=True, echo=False)[source]

Helper class for accessing and adding entries to the SupRsync files database.

Parameters:
  • db_path (path) – path to sqlite db

  • create_all (bool) – Create table if it hasn’t been generated yet.

  • echo (bool) – If true, writes sql statements to stdout

get_finalized_until(archive_name, session=None)[source]

Returns a timetamp for which all files preceding are either successfully copied, or ignored. If all files are copied, returns the current time.

Parameters:
  • archive_name (String) – Archive name to get finalized_until for

  • session (sqlalchemy session) – SQLAlchemy session to use. If none is passed, will create a new session

add_file(local_path, remote_path, archive_name, local_md5sum=None, timestamp=None, session=None, deletable=True)[source]

Adds file to the SupRsyncFiles table.

Parameters:
  • local_path (String) – Absolute path of the local file to be copied

  • remote_path (String) – Path of the file on the remote server relative to the base-dir. specified in the SupRsync agent config.

  • archive_name (String) – Name of the archive, i.e. timestreams or smurf. Each archive is managed by its own SupRsync instance, so they can be copied to different base-dirs or hosts.

  • local_md5sum (String, optional) – locally calculated checksum. If not specified, will calculate md5sum automatically.

  • session (sqlalchemy session) – Session to use to add the SupRsyncFile. If None, will create a new session and commit afterwards.

  • deletable (bool) – If true, can be deleted by suprsync agent

get_copyable_files(archive_name, session=None, max_copy_attempts=None, num_files=None)[source]
Gets all SupRsyncFiles that are copyable, meaning they satisfy:
  • local and remote md5sums do not match

  • Failed copy attempts is below the max number of attempts

Parameters:
  • archive_name (string) – Name of archive to get files from

  • session (sqlalchemy session) – Session to use to get files. If none is specified, one will be created. You need to specify this if you wish to change file data and commit afterwards.

  • max_copy_attempts (int) – Max number of failed copy atempts

  • num_files (int) – Number of files to return

get_deletable_files(archive_name, delete_after, session=None)[source]

Gets all files that are deletable, meaning that the local and remote md5sums match, and they have existed longer than delete_after seconds.

Parameters:
  • archive_name (str) – Name of archive to pull files from

  • delete_after (float) – Time since creation (in seconds) for which it’s ok to delete files.

  • session (sqlalchemy session) – Session to use to query files.

get_known_files(archive_name, session=None, min_ctime=None)[source]

Gets all files. This can be used to help avoid double-registering files.

Parameters:
  • archive_name (str) – Name of archive to pull files from

  • session (sqlalchemy session) – Session to use to query files.

  • min_ctime (float, optional) – minimum ctime to use when querying files.

TimecodeDir Table

class socs.db.suprsync.TimecodeDir(**kwargs)[source]

Table for information about ‘timecode’ directories. These are directories in particular archives such as ‘smurf’ that we care about tracking whether or not they’ve completed syncing.

Timecode directories must start with a 5-digit time-code.

timecode

Timecode for directory. Must be 5 digits, and will be roughly 1 a day.

Type:

int

archive_name

Archive the directory is in.

Type:

str

completed

True if we expect no more files to be added to this directory.

Type:

bool

synced

True if all files in this directory have been synced to the remote.

Type:

bool

finalized

True if the ‘finalization’ file has been written and added to the db.

Type:

bool

finalize_file_id

ID for the SupRsyncFile object that is the finalization file for this timecode dir.

Type:

int