New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker Node updates queue files too slowly #3
Comments
Sorry, I meant this... |
This makes perfect sense. The best way to work around this is likely to either move away from launchd (not really problematic) or more intelligently label and migrate queue files when a job starts. I think the latter will be more universally usable (as aw-queue.pl is sort of a standalone app outside of Castor). I think the best way to accomplish this is to build some logic that datestamps a queue file at the start of an aw-queue call and cleans the file BEFORE the job runs. This way, if another restore call gets made before the first job is finished, the next call of aw-queue is going to reference the now empty (or possibly small) queue file. Another way to work around this is to build a quick little widget on the desktop (I used Applescript) to call aw-queue with the proper variables on double click. This way, you could completely eliminate the need to have it launchd. The final option would be to add another metadata field in CatDV called "submit restore" so that you could build an archive queue and then monitor that field to call your aw-queue script. All great ideas here - now to find some time to implement them. |
Good ideas. Here's another one - Write a simple bash script using the CLI to output the queue file, reset the PresStore field, and call aw-queue: /bin/bash/Applications/CatDV\ Worker/catdv -query U30=Restore -print1 MF >> /usr/local/Castor/queues/restore-queue.txt To trigger this, you could do it a few different ways:
Most of the time, in an ideal world, it would be fine for WN to take its sweet old time to step through each clip, output each media paths, change each PresStore field, one by one until it's done. But I'm also thinking of the situations where an editor or producer forgets to check to make sure their media is online, they are booked for a block of 4 hours, and they need their media. But it's in the tape library. So now they have to restore it... but how long will it take? An hour, or four hours? |
Mike, The third method I suggested above works like a charm. A restore queue file is generated immediately and the field is reset moments later. This could drastically improve performance on restores. Here's the script I used: #! /bin/bash /usr/local/Castor/aw-restore.shif [[ ! "$1" || ! "$2" || ! "$3" || ! "$4" || ! "$5" ]]; then /Applications/CatDV\ Worker/catdv -query "$5"=Restore -print1 MF >> "$1" Ok... that said, I had to make some modifications to org.provideotech.aw-queue-restore.plist:
|
I'm hoping to have time to test this out in the next few days and possibly incorporate into the master branch. If all works well I will update the configure.pl script to reference the worker batch commands vs doing the poll that slows everything down. |
Cool! The only downside I see to this is that aw-queue.pl has to be on the same machine as Worker Node, but that is probably the case in a lot of installations. In my case, I had to completely change my configuration to make this work. I have Worker Node on one server, CatDV Server on another server, and PresStore on a third. I was running Castor on the PresStore server and had modified it so that the queue and temp files were being written to and read from our Xsan. Now Castor is running on the same server as Worker Node and I am using nsdchat -s awsock:/. Which brings me to another issue... I will submit it separately though. |
Another discovery today on this issue. I was having a problem where when I marked a large quantity of assets for archive in the evening, PresStore would get a whole bunch of separate archive jobs with only 1-2 files each. The first one was a significant number of files, but the rest were these little jobs. I discovered after examining the log that at 2AM, aw-queue would run, but then it would run again every minute until 3AM. During that time, WN would be continuing to add files to archive-queue, about 1-2 per minute. So PresStore would get flooded with small 1-2 file jobs every minute until 3AM. The solution, I believe, is to add a Minute key to the org.provideotech.aw-queue-archive.plist that's generated by the configure script. Will test this evening. |
The other day I tried to restore an entire project's worth of clips. There were 664 files. It was about 250-300GB of data. I have the restore queue interval set to 5 minutes. What happened was that Worker Node took too long to step through all 664 files - it got through about 35-45 of the clips, and then aw-queue.pl would run, submit those 35-45 files to PresStore, and it would continue. By the time all the clips were submitted to PresStore, there were about 21 restore jobs running at the same time, each with between 35 and 45 files to restore. All the clips were on the same volume in PresStore, so it seemed that PresStore was constantly trying to accommodate "waiting for volume xxxx to become available" requests. After about 4 hours, only 7 of those jobs had completed, so I told PresStore to cancel the remaining jobs. It took a while to cancel the jobs as well. By the time it was all said and done, after 4 hours, only about 222 files of the 664 were restored. In an ideal world, producers and editors would plan and schedule in advance, and when they needed to edit an archived project, it would be ready because they will have unarchived it with plenty of time. But that doesn't always happen, and usually they need it back more quickly. One of the reasons we went with PresStore was the fact that we knew we could archive and restore relatively quickly.
So the short term answer to this problem would be to increase the restore queue time to 10, 15, 20 minutes, or greater. That would help, but we'd still be likely to experience this phenomenon of more jobs being submitted to PresStore than we want (just 1 large restore job). And we'd have to wait for that restore interval before PresStore even started working.
So I did an experiment. I went into the CatDV catalog, made a new view that showed me only the media path, did a grouping of all offline files in this project (remaining 442 files), copied the media path column, and pasted it into the restore-queue.txt file. When aw-queue.pl ran, it submitted the remaining 442 files as one job. That job took only 50 minutes to complete.
So it seems like PresStore prefers having small numbers of large jobs, as opposed to large numbers of smaller jobs. To accommodate this, we could use the WN command line located at /Applications/CatDV Worker Node/catdv, do a query for the PresStore field (when set to Restore), output the media path, and redirect the output to the queue file. Then a second query of the same field would reset it back to blank. This accomplishes the task of sending all media paths to the queue file and also accomplishes the task of resetting the archive action in a matter of a few seconds. Here's the command I used (my User 30 field is the trigger field for Castor/PresStore):
catdv -query U30=Restore -print1 MF >> /usr/local/Castor/queues/restore-queue.txt
and to reset the archive action field:
catdv -query U30=Restore -set U30=
I tried accomplishing this in one query command but it didn't work, it does either the print1 or the set, depending on which one is first, but ignores the second one. So two lines it is.
So then, instead of using WN to monitor for Restore jobs, we would have to have a launchdaemon periodically run this command.
Now, for archive jobs, everything works great - I have the archive time set in org.provideotech.aw-queue-restore.plist to 2am. By that time even large projects have everything added to the queue. Also, there doesn't seem to be a quick way to export a CatDV XML file using the WN command line. You can use
catdv -query U30=Archive -print MF,CREF,U11 -format xml
But that will print out an XML batch file, which would have to be split into individual XML files, and you have to make sure you include all the fields you want in the -print option. So using the WN GUI seems to be the better way to go for archives.
But for quicker restores, the command line paired with a launchd might be a quicker route.
Thanks for your time!
The text was updated successfully, but these errors were encountered: