SupRsync Agent
The SupRsync agent keeps a local directory synced with a remote server. It continuously copies over new files to its destination, verifying the copy by checking the md5sum and deleting the local files after a specified amount of time if the local and remote checksums match.
usage: python3 agent.py [-h] --archive-name ARCHIVE_NAME --remote-basedir
REMOTE_BASEDIR --db-path DB_PATH [--ssh-host SSH_HOST]
[--ssh-key SSH_KEY]
[--delete-local-after DELETE_LOCAL_AFTER]
[--max-copy-attempts MAX_COPY_ATTEMPTS]
[--copy-timeout COPY_TIMEOUT]
[--cmd-timeout CMD_TIMEOUT]
[--files-per-batch FILES_PER_BATCH]
[--sleep-time SLEEP_TIME] [--compression]
[--bwlimit BWLIMIT] --suprsync-file-root
SUPRSYNC_FILE_ROOT
Agent Options
- --archive-name
Name of managed archive. Determines which files should be copied
- --remote-basedir
Base directory on the remote server where files will be copied
- --db-path
Path to the suprsync sqlite db
- --ssh-host
Remote host to copy files to (e.g. ‘<user>@<host>’). If None, will copy files locally
- --ssh-key
Path to ssh-key needed to access remote host
- --delete-local-after
Time (sec) after which this agent will delete local copies of successfully transfered files. If None, will not delete files.
- --max-copy-attempts
Number of failed copy attempts before the agent will stop trying to copy a file
Default: 5
- --copy-timeout
Time (sec) before the rsync command will timeout
- --cmd-timeout
Time (sec) before remote commands will timeout
- --files-per-batch
Number of files to copy over per batch. Default is None, which will copy over all available files.
- --sleep-time
Time to sleep (sec) in between copy iterations
Default: 60
- --compression
Activate gzip on data transfer (rsync -z)
Default: False
- --bwlimit
Bandwidth limit arg (passed through to rsync)
- --suprsync-file-root
Local path where agent will write suprsync files
Configuration File Examples
Below is an example of what the SupRsync configuration might look like on a smurf-server, with an instance copying g3 files and an instance copying smurf auxiliary files.
OCS Site Config
Below is an example configuration that copies files to the base directory
/path/to/base/dir
on a remote system. This will delete successfully
transferred files after 7 days, or 604800 seconds:
{'agent-class': 'SupRsync',
'instance-id': 'timestream-sync',
'arguments':[
'--archive-name', 'timestreams',
'--remote-basedir', '/path/to/base/dir/timestreams',
'--db-path', '/data/so/dbs/suprsync.db',
'--ssh-host', '<user>@<hostname>',
'--ssh-key', '<path_to_ssh_key>',
'--delete-local-after', '604800',
'--max-copy-attempts', '10',
'--copy-timeout', '60',
'--cmd-timeout', '5',
'--suprsync-file-root', '/data/so/suprsync',
]},
{'agent-class': 'SupRsync',
'instance-id': 'smurf-sync',
'arguments':[
'--archive-name', 'smurf',
'--remote-basedir', '/path/to/base/dir/smurf',
'--db-path', '/data/so/dbs/suprsync.db',
'--ssh-host', '<user>@<hostname>',
'--ssh-key', '<path_to_ssh_key>',
'--delete-local-after', '604800',
'--max-copy-attempts', '10',
'--copy-timeout', '20',
'--cmd-timeout', '5',
'--suprsync-file-root', '/data/so/suprsync',
]},
Note
Make sure you add the public-key corresponding to the ssh id file you will
be using to the remote server using the ssh-copy-id
function.
Note
If for some reason the user running the suprsync agent does not have file
permissions, if the --delete-local-after
param is set the agent will crash
when trying to delete files. In such an environment, make sure this option
is not used.
Docker Compose
Below is a sample docker-compose entry for the SupRsync agents.
Because the data we’ll be transfering is owned by the cryo:smurf
user, we
set that as the user of the agent so it has the correct permissions. This is
only possible because the cryo:smurf
user is already built into the
SuprSync docker:
ocs-timestream-sync:
image: simonsobs/socs:latest
hostname: ocs-docker
user: cryo:smurf
network_mode: host
container_name: ocs-timestream-sync
environment:
- INSTANCE_ID=timestream-sync
- SITE_HUB=ws://${CB_HOST}:8001/ws
- SITE_HTTP=http://${CB_HOST}:8001/call
volumes:
- ${OCS_CONFIG_DIR}:/config
- /data:/data
- /home/cryo/.ssh:/home/cryo/.ssh
ocs-smurf-sync:
image: simonsobs/socs:latest
hostname: ocs-docker
user: cryo:smurf
network_mode: host
container_name: ocs-smurf-sync
environment:
- INSTANCE_ID=smurf-sync
- SITE_HUB=ws://${CB_HOST}:8001/ws
- SITE_HTTP=http://${CB_HOST}:8001/call
volumes:
- ${OCS_CONFIG_DIR}:/config
- /data:/data
- /home/cryo/.ssh:/home/cryo/.ssh
Note
If copying to a remote host from a docker container it is required that the corresponding ssh-key is also mounted into the container, and that it belongs to the user inside of the container with permission 600. The configuration above works because the cryo user in the container is the same as the one on the host, so the .ssh directory is already properly configured. For instance if using an ocs user in the docker, you may want to add a .ssh directory for the ocs user on the host (with correct permissions) and mount that directory into the container.
Copying Files
The SupRsync agent works by monitoring a table of files to be copied in a sqlite database. The full table is described here, but importantly it contains information such as the full local path, the remote path relative to a base directory determined by the SupRsync cfg, checksumming info, several timestamps, and an “archive” name which is used to determine which SupRsync agent should manage each file. This makes it possible for multiple SupRsync agents to monitor the same db, but write their archives to different base-directories or remote computers.
Note
This agent does not remove empty directories, since I can’t think of a foolproof way to determine whether or not more files will be written to a given directory, and pysmurf output directories will likely have unregistered files that stick around even after copied files have been removed. I think it’s probably best to have a separate cronjob make that determination and remove directory husks.
Adding Files to the SupRsyncFiles Database
To add a file to the SupRsync database, you can use the SupRsyncFiles Manager class as follows:
from socs.db.suprsync import SupRsyncFilesManager
local_abspath = '/path/to/local/file.g3'
remote_relpath = 'remote/path/file.g3'
archive = 'timestreams'
srfm = SupRsyncFilesManager('/path/to/suprsync.db')
srfm.add_file(local_abspath, remote_relpath, archive)
The SupRsync agent will then copy the file /path/to/local/file.g3
to
the <remote_basedir>/remote/path/file.g3
on the remote server (or locally
if none is specified). If the --delete-local-after
option is used, the original
file will be deleted after the specified amount of time.
Interfacing with Smurf
The primary use case of this agent is copying files from the smurf-server to a daq node or simons1. The pysmurf monitor agent now populates the files table with both pysmurf auxiliary files (using the archive name “smurf”) and g3 timestream files (using the archive name “timestreams”), as described in this section. On the smurf-server we’ll be running one SupRsync agent for each of these two archives.
Timecode Directories
For data packaging, file destinations are usually grouped into timecode directories, based on the first 5-digits of the ctime at which the file was created. Destination paths (or remote_paths in the db) will typically look like:
<remote_basedir>/<5-digit timecode>/<any-number-of-subdirs>/<filename>
where the remote-basedir is set in the suprsync site config, and everything
after is what is registered as the remote_path
in the suprsync database.
With this schema, it is important to transmit information that will allow downstream processors to determine whether or not a timecode directory is complete, as opposed to not copied over yet.
The SupRsync database will now automatically detect the timecode directory of new files that are added, and will track whether a timecode directory is complete (meaning suprsync expects no new files will be added) and synced (that all files in the timecode directory have been synced).
Once a timecode directory is complete and fully synced, suprsync will write a finalization file:
<timestamp>_<archive_name>_<dir timecode>_finalized.yaml
that will contain information about the timecode directory, including the
number of files that this suprsync instance has synced over, the sub-directories
that this suprsync instance has added to, the instance-id of the suprsync agent,
and the finalization time. The directory on the local host that these files will
be written to is set using the --suprsync-file-root
argument, and the
path on the remote host will be automatically generated.
Timecode Dir Completion Requirements
A timecode directory will be marked as complete if any of the following are true:
There are files in the suprsync archive written to a newer timecode directory
tc_now - tc_directory > 1
, or we are roughly 1 day past when the timecode directory ended.
Example
For example, say we have a suprsync agent running on smurf-srv20
with
instance-id smurf-sync-srv20
, with suprsync-file-root =
/data/so/suprsync
. Suppose that at timestamp 1686152766
this agent detects
that all files in the timecode directory 16750
of the smurf
archive have
been successfully synced. This agent will write the local file:
/data/so/suprsync/16861/smurf-sync-srv20/1686152766_smurf_16750_finalized.yaml
This file will be synced to the remote path:
<remote_basedir>/16861/suprsync/smurf-sync-srv20/1686152766_smurf_16750_finalized.yaml
where it can be processed by downstream data packaging software.
Agent API
- class socs.agents.suprsync.agent.SupRsync(agent, args)[source]
Agent to rsync files to a remote (or local) destination, verify successful transfer, and delete local files after a specified amount of time.
- Parameters:
agent (OCSAgent) – OCS agent object
args (Namespace) – Namespace with parsed arguments
- agent
OCS agent object
- Type:
OCSAgent
- log
txaio logger object, created by the OCSAgent
- Type:
txaio.tx.Logger
- archive_name
Name of the managed archive. Sets which files in the suprsync db should be copied.
- Type:
string
- ssh_host
Remote host to copy data to. If None, will copy data locally.
- Type:
str, optional
- ssh_key
ssh-key to use to access the ssh host.
- Type:
str, optional
- remote_basedir
Base directory on the destination server to copy files to
- Type:
path
- db_path
Path of the sqlite db to monitor
- Type:
path
- delete_after
Seconds after which this agent will delete successfully copied files.
- Type:
float
- cmd_timeout
Time (sec) for which cmds run on the remote will timeout
- Type:
float
- copy_timeout
Time (sec) after which a copy command will timeout
- Type:
float
- run()[source]
Process - Main run process for the SupRsync agent. Continuosly checks the suprsync db checking for files that need to be handled.
Notes
The session data object contains info about recent transfers:
>>> response.session['data'] { "activity": "idle", "timestamp": 1661284493.5398622, "last_copy": { "start_time": 1661284493.5333128, "files": [], "stop_time": 1661284493.5398622 }, "archive_stats": { "smurf": { "finalized_until": 1687797424.652119, "num_files": 3, "uncopied_files": 0, "last_file_added": "/path/to/file", "last_file_copied": "/path/to/file" } }, "counters": { "iterations": 1, "copies": 0, "errors_timeout": 0, "errors_nonzero": 0 }, }
Supporting APIs
SupRsyncFiles Table
- class socs.db.suprsync.SupRsyncFile(**kwargs)[source]
Files table utilized by the SupRsync agent.
- local_path
Absolute path of the local file to be copied
- Type:
String
- local_md5sum
locally calculated checksum
- Type:
String
- archive_name
Name of the archive, i.e. timestreams or smurf. Each archive is managed by its own SupRsync instance, so they can be copied to different base-dirs or hosts.
- Type:
String
- remote_path
Path of the file on the remote server relative to the base-dir. specified in the SupRsync agent config.
- Type:
String
- remote_md5sum
Md5sum calculated on remote machine
- Type:
String, optional
- timestamp
Timestamp that file was added to db
- Type:
Float
- copied
Time at which file was transfered
- Type:
Float, optional
- removed
Time at which file was removed from local server.
- Type:
Float, optional
- failed_copy_attempts
Number of failed copy attempts
- Type:
Int
- deletable
Whether file should be deleted after copying
- Type:
Bool
- ignore
If true, file will be ignored by SupRsync agent and not included in finalized_until.
- Type:
Bool
SupRsyncFiles Manager
- class socs.db.suprsync.SupRsyncFilesManager(db_path, create_all=True, echo=False)[source]
Helper class for accessing and adding entries to the SupRsync files database.
- Parameters:
db_path (path) – path to sqlite db
create_all (bool) – Create table if it hasn’t been generated yet.
echo (bool) – If true, writes sql statements to stdout
- get_finalized_until(archive_name, session=None)[source]
Returns a timetamp for which all files preceding are either successfully copied, or ignored. If all files are copied, returns the current time.
- Parameters:
archive_name (String) – Archive name to get finalized_until for
session (sqlalchemy session) – SQLAlchemy session to use. If none is passed, will create a new session
- add_file(local_path, remote_path, archive_name, local_md5sum=None, timestamp=None, session=None, deletable=True)[source]
Adds file to the SupRsyncFiles table.
- Parameters:
local_path (String) – Absolute path of the local file to be copied
remote_path (String) – Path of the file on the remote server relative to the base-dir. specified in the SupRsync agent config.
archive_name (String) – Name of the archive, i.e. timestreams or smurf. Each archive is managed by its own SupRsync instance, so they can be copied to different base-dirs or hosts.
local_md5sum (String, optional) – locally calculated checksum. If not specified, will calculate md5sum automatically.
session (sqlalchemy session) – Session to use to add the SupRsyncFile. If None, will create a new session and commit afterwards.
deletable (bool) – If true, can be deleted by suprsync agent
- get_copyable_files(archive_name, session=None, max_copy_attempts=None, num_files=None)[source]
- Gets all SupRsyncFiles that are copyable, meaning they satisfy:
local and remote md5sums do not match
Failed copy attempts is below the max number of attempts
- Parameters:
archive_name (string) – Name of archive to get files from
session (sqlalchemy session) – Session to use to get files. If none is specified, one will be created. You need to specify this if you wish to change file data and commit afterwards.
max_copy_attempts (int) – Max number of failed copy atempts
num_files (int) – Number of files to return
- get_deletable_files(archive_name, delete_after, session=None)[source]
Gets all files that are deletable, meaning that the local and remote md5sums match, and they have existed longer than
delete_after
seconds.- Parameters:
archive_name (str) – Name of archive to pull files from
delete_after (float) – Time since creation (in seconds) for which it’s ok to delete files.
session (sqlalchemy session) – Session to use to query files.
- get_known_files(archive_name, session=None, min_ctime=None)[source]
Gets all files. This can be used to help avoid double-registering files.
- Parameters:
archive_name (str) – Name of archive to pull files from
session (sqlalchemy session) – Session to use to query files.
min_ctime (float, optional) – minimum ctime to use when querying files.
TimecodeDir Table
- class socs.db.suprsync.TimecodeDir(**kwargs)[source]
Table for information about ‘timecode’ directories. These are directories in particular archives such as ‘smurf’ that we care about tracking whether or not they’ve completed syncing.
Timecode directories must start with a 5-digit time-code.
- timecode
Timecode for directory. Must be 5 digits, and will be roughly 1 a day.
- Type:
int
- archive_name
Archive the directory is in.
- Type:
str
- completed
True if we expect no more files to be added to this directory.
- Type:
bool
- synced
True if all files in this directory have been synced to the remote.
- Type:
bool
- finalized
True if the ‘finalization’ file has been written and added to the db.
- Type:
bool
- finalize_file_id
ID for the SupRsyncFile object that is the finalization file for this timecode dir.
- Type:
int