Recovery on failure in the middle of an upload/download #62

yo8192 · 2013-12-15T08:59:08Z

Hi,

Apologies if this is explained in the doc and I missed it (in which case please point me to the right place), but what happens if an upload/download stops suddendly before it is finished?
e.g. the server crashes, power cut, or someone hits CTRL-C by mistake.

WIll mt-aws-glacier handle that well, and:

don't upload another copy of the same file, which would mean paying Amazon for more storage space than necessary / intended. I don't know if Glacier allows for such a thing, or if it would not "commit" the new file if it fails to upload fully, or if it would override the existing file if we upload it again. But I'm not keen on finding out with a big bill at the end of the month! ;)
resume the upload/download where it stopped the next time the upload/download is attempted.

If not, is this something you would consider adding?

Failing this I would have to split my reasonnably big backup files (tens of GB) to limit the risk, which is not very convenient.

Thx,
Thibault.

vsespb · 2013-12-15T09:40:40Z

Hello.

When uploading:

if upload terminated in a random place, upload is not finished. next time upload will start from scratch. I think additional charges will be only chargest for requests (i.e. $0.05 for 1000 requests now). file will not be uploaded so you won't be charged for additional storage.

there can be a race-condition - after upload is finished mtglacier will write record about it into journal file (within several milliseconds i think). if during that period process is terminated - you'll get one duplicate, untracked file in mtglacier and will pay for storage in the future.

When downloading (restore-completed):

same, download will start from scratch. downloaded data, stored in temporaty file before crash is not reused (and left on disk if this was crash or removed from disk if it was Ctrl-C). I think yo'll pay $0.05 for 1000 requests + for bandwidth.

no other race conditions here.

For retrieving (i.e. expensive operation)

File either retrieved or no. There can be race condition (i.e. if file retrieved but this record did not reach journal due to crash within short few milliseconds info)

Two things can be improved here:

reuse uploads/download data to save bandwith. this is possible.
avoid race conditions listed above (by writing special records indicating unfinished operation, before that operation). I think it's useless for upload, because even if possible race condition detected mtglacier will have to wait 24+4 hours to fix that. For retrieval it's pretty useful and doable.

Also, you really can workaround race condition during upload by your self. Wait 24h (don't upload anything), retrieve-inventory/download-inventory, compare new journal with old, use new journal if there is extra file.

Note that when we talk about race conditions here, assuming there is no bug in software, and crash time is really random, those race conditions are really rare.

Failing this I would have to split my reasonnably big backup files (tens of GB)

if that size big for you (i.e. it's high %% of all your data). it's recommended to split to small parts, because:
a) you'll have ability to pay less if retrieving during long period (see amazon pricing for retrieval)
b) you can't download file if it's too big for your bandwidth. amazon downloads discarded after 24h. so you can download file only if your bandwidth allows this to do within 24h (+ count here risk of possible crash, download restart, bandwith downtime etc). so it would be safe, say, if you can download file during 6h or so.

vsespb · 2013-12-15T09:53:18Z

note:

For retrieving (i.e. expensive operation)
There can be race condition

race conditions apply here only if you retrieve twice. no race condition if you retrieve once + download.

quote from documentation of restore command:

Initiate Amazon Glacier RETRIEVE oparation for files listed in Journal, which don't exist on local filesystem and for which RETRIEVE was not initiated during last 24 hours (that information obtained from Journal

quote from documentation of restore-completed command:

Unlike restore command, list of retrieved files is requested from Amazon Glacier servers at runtime using API, not from journal.

i.e. restore-completed takes file list from Amazon Servers.

yo8192 · 2013-12-15T10:19:52Z

Hi,

Thanks for the very quick feedback.

If Amazon Glacier doesn't "commit" the file until it finished uploading successfully, then the worst that can happen is that we need to start again from scratch, which takes time, but doesn't cost more (no upload per GB fee). I'm ignoring the request fee which is not really a problem for me.

And good point the file split, it does look like splitting will be a good idea in my case anyway then.

So we are left with a feature to resume downloads, which I think would be useful to avoid paying retrieval fees twice in case of crashes/failures, and a better handling of the race conditions, which again sound like a good idea to me.

Thx,
Thibault.

vsespb · 2013-12-15T10:27:52Z

then the worst that can happen is that we need to start again from scratch

yes, except if race condition happen.

So we are left with a feature to resume downloads, which I think would be useful to avoid paying retrieval

when you downloading you don't pay high retrieval fee. only bandwith and requests fee. retrieval fee paid when you retrieve file restore command. after that you can download file several times with restore-completed without paying high fee again.

which again sound like a good idea to me

yes, I will leave this ticket open as enhancement. most likely in the future I split it into several tickets.

it's unlikely that all things listed here can be implemented soon (enhancements are low priority for me, bugfixes is high, some of those enhancements are hard to implement and some are not really important - I don't think other software vendors ever care about rare race conditions)

yo8192 · 2013-12-15T10:59:56Z

No problem at all, it's a free software, I fully understand if you don't have the time or the will to implement some or all of what is discussed here.
Thanks for all the work you have already done on this useful tool.

vsespb mentioned this issue Jul 2, 2014

Question about overriding the config and pausing uploads #82

Closed

vsespb mentioned this issue Dec 14, 2016

Sync resume? #124

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recovery on failure in the middle of an upload/download #62

Recovery on failure in the middle of an upload/download #62

yo8192 commented Dec 15, 2013

vsespb commented Dec 15, 2013

vsespb commented Dec 15, 2013

yo8192 commented Dec 15, 2013

vsespb commented Dec 15, 2013

yo8192 commented Dec 15, 2013

Recovery on failure in the middle of an upload/download #62

Recovery on failure in the middle of an upload/download #62

Comments

yo8192 commented Dec 15, 2013

vsespb commented Dec 15, 2013

vsespb commented Dec 15, 2013

yo8192 commented Dec 15, 2013

vsespb commented Dec 15, 2013

yo8192 commented Dec 15, 2013