Skip to content

Lightweight implementation of rsync specifically designed to regularly copy Bruker NMR datasets from instrument computers to a server.

License

Notifications You must be signed in to change notification settings

greenwoodad/nmrsync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nmrsync

nmrsync is a bash script for periodic synchronizing of Bruker NMR data to a data server using rsync. It takes an input file as an argument which provides information including paths, SSH aliases, and rsync options. The script searches for files that have been modified recently (< x days as specified in the input file), searching in folders up to the level of RemoteDataPath/(user)/nmr/(data set). For example, if a new spectrum 1/ or 2/ appears in the data set folder, the data set is flagged for syncing, but if something deeper (e.g. a proc file in a pdata folder) is changed, it won't be flagged.

Before syncing, the script can search for folders that are identically named except for case differences (which are unique in Linux but indistinguishable on Windows and Mac), as well as for folders that end in a period (which is permissible on Linux and Mac but not on Windows). The names of these spectra are then placed in SkipFileOld and they are not synced. An email is instead sent to the NMR manager who can then manually change the folder names once the data has finished acquiring. This can also be accomplished automatically using nmrfolderfix (https://github.com/greenwoodad/nmrfolderfix/).

There is an option to perform a second rsync to a second local location by specifying a third path in the input file. This rsync is performed with different customizable options. In the example input file, the second rsync skips copying permissions, user, or group information because this can't actually be changed on my mounted Windows filesystem.

I personally run this as a cron job every five minutes as well as every week with a second input file to ensure data is still eventually transferred after network or power outages.

Prerequisites

This script requires a linux operating system with rsync. It has been tested in CentOS 6.8, 7.5 and Ubuntu 20 on the local side and CentOS 7.5, CentOS 5.1, and RHEL7.3 on the remote side. I've only tested this with Bruker NMR data, but future releases may be able to handle file structures generated by other instruments.

The email feature requires that the application sendmail is working on the machine running the script.

Installing

git clone https://github.com/greenwoodad/nmrsync

or

git clone https://(your github username)@github.com/greenwoodad/nmrsyc.git

followed by:

chmod +x ./nmrsync/nmrsync

Getting Started

Setting up password-less ssh logins to instrument machines

Because this script is intended to be run as a cron job, it is necessary to authorize the local machine to access the remote machine(s) with password-less ssh login using ssh keys. Tutorials are available here:

Briefly:

  1. On the machine you want to run the script and send emails from (as the user you want to do this as) run the command:
ssh-keygen -t rsa -b 4096

This will generate files ~/.ssh/id_rsa and ~/.ssh/id_rsa.pub

Press enter at the prompt "Enter passphrase (empty for no passphrase):" to skip passphrase generation.

  1. Next, run this command (from the local machine) for each remote workstation:
ssh-copy-id remote_username@remote_ip_address

You will be prompted for the password for this remote workstation.

If ssh-copy-id is not available, you should be able to run this instead:

cat ~/.ssh/id_rsa.pub | ssh remote_username@remote_ip_address "mkdir -p ~/.ssh && chmod 700 ~/.ssh && cat >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys"
  1. Last, add SSH aliases to your hosts file. In /etc/hosts, add entries:
IPAddress DomainName SSHAlias

for each remote workstation.

for example:

198.51.100.50     dmx500.chem.university.edu       DMX500
198.51.100.54     av400.chem.university.edu        AV400
198.51.100.59     neo400.chem.university.edu       NEO400

The SSHAliases here should be the same SSHAliases you enter in the nmrsync input file.

You should now be able to SSH to the remote workstations without entering a password by typing:

ssh remote_username@SSHAlias

in addition to

ssh remote_username@IPAddress 

and

ssh remote_username@DomainName 

The first time you do this, you will need to type "yes" to the question "Are you sure you want to continue connecting (yes/no)?" however. After this, you will be able to run the script automatically without manual password entry.

Configuring the input file

In the input file (nmrsync_input) there are a number of parameters and paths to set:

  • ScriptsPath: Full path to the location of the main script and the input, emailtxt, and log folders on local machine. Use full path!

  • ManagerEmail: Email address of the NMR facility manager.

  • AgeDay: How many days back to look for recent experiments to sync. Default is 3, which works well unless you run spectra that take longer than 3 days to acquire.

  • RsyncOptions_1: Rsync options for first rsync. Default is '-auvr --protect-args'

  • RsyncOptions_2: Rsync options for second rsync (optional). Default is '-uvrltD --modify-window=1 --protect-args'

  • SkipFlag: Defines what folders are not synced. 'period' to skip folders ending in a period, 'dup' to skip folders with case-insensitive duplicates, 'both,' or 'none.' Default is 'both.' Note that if a different value of SkipFlag is specified with -s when the script is run, it overrules the value specified in the input file.

  • Instrument: Name of instrument. Can be anything (no spaces) but make sure it is unique (not entered twice in the table).

  • SSHAlias: Alias for password-less SSH to this instrument computer.

  • RemoteUser: User on the remote computer that you can SSH as.

  • /nmr folder?: Set this to 'y' for the default /(user)/nmr/(data set)/(expt #) data organization on the remote computer. Set it to 'n' for data organized as /(user)/(data set)/(expt #)

  • RemoteDataPath: Full path containing NMR data on the remote computer. Topspin/ICON-NMR usernames should be found in this folder. Use full path!

  • LocalDataPath: Full path on local computer to transfer the data to. Use full path!

  • MountedPath: (optional) A second path on the local computer to transfer data to. This can be an external hard drive or a mounted windows file share, for instance. (There's no requirement that this actually refer to a mounted file system but it should be local.)

IMPORTANT: When editing this file, entries should be separated by either a tab or multiple spaces.

Instruments in the instrument table can be commented out with a #.

NOTE: Additional modifications can be made to the variables 'SendMailPath', 'ManualFlag', 'ExcludeFlag', and 'FullFlag' at top of the script itself. These are generally the default values for options that can be provided when the script is run (see Usage, below).

Usage

nmrsync [OPTIONS]... path/to/nmrsync_input

Options

-h, -?, --help Show help message.

-i, --input Set input file (flag optional).

-s, --skip (default 'both') Set to 'period' to skip folders ending in period, 'dup' to skip case-insensitive duplicates, 'both' to skip both and 'none' to skip none.

-m, --manual (default n) Set to y to operate in manual mode (enter password instead of using SSH keys--not recommended).

-e, --excludelist (default y) Set to n to not utilize instrument-specific exclude list in the input folder (will copy processed data).

-f, --full (default n) Set to y to copy over all data instead of just recently added data."

The defaults here can be modified at the top of the script itself.

To run this as a cron job, make an entry in your crontab like this:

*/5 * * * * /path/to/nmrsync "/path/to/input/nmrsync_input"
40 6 * * 0 /path/to/nmrsync "/path/to/input/nmrsync_input_weekly"

In the preceding example, there is a second input file which is configured to run looking for data that has been collected over the last week. The "fast" version is set to run every five minutes ( * /5) while the "slow" version is set to run on Sunday (0) at 6:40 AM (40 6).

Contributing

Pull requests and bug reports are welcome.

Authors

License

MIT

About

Lightweight implementation of rsync specifically designed to regularly copy Bruker NMR datasets from instrument computers to a server.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages