Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recovery on failure in the middle of an upload/download #62

Open
yo8192 opened this issue Dec 15, 2013 · 5 comments
Open

Recovery on failure in the middle of an upload/download #62

yo8192 opened this issue Dec 15, 2013 · 5 comments

Comments

@yo8192
Copy link

yo8192 commented Dec 15, 2013

Hi,

Apologies if this is explained in the doc and I missed it (in which case please point me to the right place), but what happens if an upload/download stops suddendly before it is finished?
e.g. the server crashes, power cut, or someone hits CTRL-C by mistake.

WIll mt-aws-glacier handle that well, and:

  1. don't upload another copy of the same file, which would mean paying Amazon for more storage space than necessary / intended. I don't know if Glacier allows for such a thing, or if it would not "commit" the new file if it fails to upload fully, or if it would override the existing file if we upload it again. But I'm not keen on finding out with a big bill at the end of the month! ;)
  2. resume the upload/download where it stopped the next time the upload/download is attempted.

If not, is this something you would consider adding?

Failing this I would have to split my reasonnably big backup files (tens of GB) to limit the risk, which is not very convenient.

Thx,
Thibault.

@vsespb
Copy link
Owner

vsespb commented Dec 15, 2013

Hello.

When uploading:

if upload terminated in a random place, upload is not finished. next time upload will start from scratch. I think additional charges will be only chargest for requests (i.e. $0.05 for 1000 requests now). file will not be uploaded so you won't be charged for additional storage.

there can be a race-condition - after upload is finished mtglacier will write record about it into journal file (within several milliseconds i think). if during that period process is terminated - you'll get one duplicate, untracked file in mtglacier and will pay for storage in the future.

When downloading (restore-completed):

same, download will start from scratch. downloaded data, stored in temporaty file before crash is not reused (and left on disk if this was crash or removed from disk if it was Ctrl-C). I think yo'll pay $0.05 for 1000 requests + for bandwidth.

no other race conditions here.

For retrieving (i.e. expensive operation)

File either retrieved or no. There can be race condition (i.e. if file retrieved but this record did not reach journal due to crash within short few milliseconds info)

Two things can be improved here:

  1. reuse uploads/download data to save bandwith. this is possible.

  2. avoid race conditions listed above (by writing special records indicating unfinished operation, before that operation). I think it's useless for upload, because even if possible race condition detected mtglacier will have to wait 24+4 hours to fix that. For retrieval it's pretty useful and doable.

Also, you really can workaround race condition during upload by your self. Wait 24h (don't upload anything), retrieve-inventory/download-inventory, compare new journal with old, use new journal if there is extra file.

Note that when we talk about race conditions here, assuming there is no bug in software, and crash time is really random, those race conditions are really rare.

Failing this I would have to split my reasonnably big backup files (tens of GB)

if that size big for you (i.e. it's high %% of all your data). it's recommended to split to small parts, because:
a) you'll have ability to pay less if retrieving during long period (see amazon pricing for retrieval)
b) you can't download file if it's too big for your bandwidth. amazon downloads discarded after 24h. so you can download file only if your bandwidth allows this to do within 24h (+ count here risk of possible crash, download restart, bandwith downtime etc). so it would be safe, say, if you can download file during 6h or so.

@vsespb
Copy link
Owner

vsespb commented Dec 15, 2013

note:

For retrieving (i.e. expensive operation)
There can be race condition

race conditions apply here only if you retrieve twice. no race condition if you retrieve once + download.

quote from documentation of restore command:

Initiate Amazon Glacier RETRIEVE oparation for files listed in Journal, which don't exist on local filesystem and for which RETRIEVE was not initiated during last 24 hours (that information obtained from Journal

quote from documentation of restore-completed command:

Unlike restore command, list of retrieved files is requested from Amazon Glacier servers at runtime using API, not from journal.

i.e. restore-completed takes file list from Amazon Servers.

@yo8192
Copy link
Author

yo8192 commented Dec 15, 2013

Hi,

Thanks for the very quick feedback.

If Amazon Glacier doesn't "commit" the file until it finished uploading successfully, then the worst that can happen is that we need to start again from scratch, which takes time, but doesn't cost more (no upload per GB fee). I'm ignoring the request fee which is not really a problem for me.

And good point the file split, it does look like splitting will be a good idea in my case anyway then.

So we are left with a feature to resume downloads, which I think would be useful to avoid paying retrieval fees twice in case of crashes/failures, and a better handling of the race conditions, which again sound like a good idea to me.

Thx,
Thibault.

@vsespb
Copy link
Owner

vsespb commented Dec 15, 2013

then the worst that can happen is that we need to start again from scratch

yes, except if race condition happen.

So we are left with a feature to resume downloads, which I think would be useful to avoid paying retrieval

when you downloading you don't pay high retrieval fee. only bandwith and requests fee. retrieval fee paid when you retrieve file restore command. after that you can download file several times with restore-completed without paying high fee again.

which again sound like a good idea to me

yes, I will leave this ticket open as enhancement. most likely in the future I split it into several tickets.

it's unlikely that all things listed here can be implemented soon (enhancements are low priority for me, bugfixes is high, some of those enhancements are hard to implement and some are not really important - I don't think other software vendors ever care about rare race conditions)

@yo8192
Copy link
Author

yo8192 commented Dec 15, 2013

No problem at all, it's a free software, I fully understand if you don't have the time or the will to implement some or all of what is discussed here.
Thanks for all the work you have already done on this useful tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants