New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand PutBlob API to allow for idempotent creates #341
Conversation
The new error type `ErrBlobAlreadyExists` should be returned when we try to put a blob using a blob.ID that already exists, and that is somehow problematic for this operation. This error type may be useful for other operations in the future.
This will be used to indicate that a PutBlob operation should fail if it is attempting to modify a blob that already exists. We can think of it also as "do not overwrite" or "create only if exists".
Expanded the function `VerifyStorage` to test both write paths where DoNotRecreate is true and false.
For all storage types that do not (or do not yet) support idempotent blob creates, throw an explicit error indicating so.
Was using the old name `RecreateIfExists`, changed to use `DoNotRecreate`.
@adowair Don't know if you were aware of it, but looks like the GCS API is changing a bit relating to idempotency. It may or may not affect this change, just wanted to make sure you were aware: googleapis/google-cloud-go#5210 |
Thanks @ewhamilton, this change adds some logic to retry puts (with backoff). As far as I know, Kopia implements its own retry/backoff logic, so we don't use any of the GCS options that are affected by the PR #5210. |
Nice work untangling the validation sequence. Looks good to me. You should be able to close this PR and reopen from your kastenhq branch targeting the kopia master. |
`ErrBlobAlready exists` -> `ErrBlobAlreadyExists`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG. See comments inside. Open upstream and run it by Jarek.
Also, please cleanup the commit sequence for the upstream PR.
case http.StatusPreconditionFailed: | ||
return blob.ErrBlobAlreadyExists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this implementation, it is the case that a precondition failure means the blob already exists, since it is the only condition being set.
But a StatusPreconditionFailed
could be returned if a different precondition were set. How can we future proof this so a bug is not inadvertently introduced if for some reason we end up setting other pre condition checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion @julio-lopez, the context needed for this is not available within translateError()
, so we'd have to pass the put options opts
into translateError()
, or do some special error handling outside said function. Neither solution is particularly neat, but I am leaning towards the former. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate to say this, but how about a simple error string match ?
I think passing in an error code or wrapping the error with more context in the same function where the pre-conditions are set is a better option, but I would avoid putting too much scope on this one PR.
Doing the wrap closer to the condition setup is the key. Passing in put-options is also a neat idea, but it may not be quite intuitive to code readers why a generic-op translateError() needs the put-opts. Not saying it cannot/should not be done though.
case opts.DoNotRecreate: | ||
return errors.New("setting blob do-not-recreate is not supported") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be a function of the actual storage backend? It is true that currently the sharded provider is only used with storage backends that do not support DoNotRecreate, but still, it seems that it should be relayed to the backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point @julio-lopez. I can push this logic down to the individual storage providers.
@Shrekster , I see you implement handling of opts.HasRetentionOpts()
in a similar way to me. Should that be pushed down to the storage implementations too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
Yes, my bad. I should have taken opts.HasRetentionOpts()
down to the three implementations of the shared.Impl.PutBlobInPath
interface.
@adowair Can you move both the checks to the storage backend implementations of the shared interface ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd may be a good idea to break down and restructure the PR in self-contained changes (commits) that can be individually reviewed. For example, the change about passing down the options to the storage backends can be one of those steps. Merely a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the right way to do things. Rebasing upstream.
Currently, the
PutBlob()
endpoint is used to both create and update Kopia blobs. When this function is called with a blob ID that already exists, the existing blob will be overwritten by the new contents, which may not be intended. This PR expands thePutBlob
SDK API to allow for idempotent creates, wherein a client can create a blob only if it doesn't already exist, and fail otherwise. This option will be controlled by a new fieldDoNotRecreate
in thePutOptions
struct.