Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse.Object.saveAll can result in duplicate objects #859

Open
4 tasks done
simonaberry opened this issue Nov 22, 2021 · 13 comments
Open
4 tasks done

Parse.Object.saveAll can result in duplicate objects #859

simonaberry opened this issue Nov 22, 2021 · 13 comments

Comments

@simonaberry
Copy link

New Issue Checklist

Issue Description

this issue has been specifically created to reference in this PR

Parse.Object.saveAll uses the /batch endpoint

if you have a large batch of large objects that are being saved via saveAll, and the network quality is very slow, it can happen that the initial request times out, and will therefore be automatically retried. However, some of the objects in the first batch may have already been successfully saved - but then the second retry would result in duplicate objects being created.

Steps to reproduce

Watch this Loom

Actual Outcome

duplicate objects created

Expected Outcome

no duplicate objects

Environment

JS SDK used in web client, over spotty networks

Server

  • Parse Server version: any
  • Operating system: browser
  • Local or remote host (AWS, Azure, Google Cloud, Heroku, Digital Ocean, etc): remote

Database

  • System (MongoDB or Postgres): mongo
  • Database version: any
  • Local or remote host (MongoDB Atlas, mLab, AWS, Azure, Google Cloud, etc): remote

Client

  • Parse JS SDK version: latest

Logs

@parse-github-assistant
Copy link

parse-github-assistant bot commented Nov 22, 2021

Thanks for opening this issue!

  • 🚀 You can help us to fix this issue faster by opening a pull request with a failing test. See our Contribution Guide for how to make a pull request, or read our New Contributor's Guide if this is your first time contributing.

@mtrezza
Copy link
Member

mtrezza commented Nov 22, 2021

Could you please fill out the template to describe the issue and indicate whether you can reproduce the issue, see checkbox at the top. If the issue does not exist in the latest version, we may close this issue and related PRs.

@simonaberry
Copy link
Author

Here is a video explaining the issue https://www.loom.com/share/9744297bacfd41f3bd18e60527f0eacc

@mtrezza
Copy link
Member

mtrezza commented Nov 25, 2021

Thanks for updating the issue.

If the cause of the failure was a network delay, then the saveAll will automatically retry, with all 20 objects again.

This sounds like a bug, because duplicate objects would hardly be an intended outcome. So maybe we don't need a new Parse.Object.saveAllSettled method but fix the existing Parse.Object.saveAll method to only retry for failed objects?

I've classified this as a bug with severity:medium, as the workaround is to not use Parse.Object.saveAll but a custom batch saving mechanism.

@simonaberry
Copy link
Author

@mtrezza I guess it is debatable if it is a bug or not... but the issue stems from the fact that the saveAll call is a batch call, so if there is an error, the entire batch gets resent. The PR we have proposed used individual object.save() calls as a workaround

@mtrezza
Copy link
Member

mtrezza commented Nov 25, 2021

I guess it is debatable if it is a bug or not

Can you think of a use case in which one desires this behavior leading to duplicate objects? My assumption would rather be the opposite, like you pointed out in your issue.

@simonaberry
Copy link
Author

ok - point taken. It's a bug.

Just surprised no-one else has reported it in the last ~10 years!

@mtrezza
Copy link
Member

mtrezza commented Nov 25, 2021

Well, if we had an "archeological bug excavator" badge, you would be the first to get it 🙂
Do you have any suggestion how that bug could be fixed without introducing a breaking change?

@simonaberry
Copy link
Author

I have had a very quick look at the batch saving and it think it would be possible to change that logic from using the 'batch' endpoint to using a series of individual saves (which would give the client the benefit of understanding which individual items were successfully saved or not).

But I guess the first step is an agreement (from you guys who have to see the big picture on the overall direction of changes) that in principal that is the correct approach... I imaging the reason for using /batch calls in the first place was to reduce the overhead associated with bulk transactions.... saving objects individually would undo that benefit.

@simonaberry
Copy link
Author

you could argue that the same risk exists that if there is a network interruption during the individual save, you could equally end up with duplicate objects on the server. This is true. However, the 'damage' is reduced to just one (or two) duplicates, not 20 (default batch size?).

@mtrezza
Copy link
Member

mtrezza commented Nov 26, 2021

you could argue that the same risk exists that if there is a network interruption during the individual save, you could equally end up with duplicate objects on the server.

That is why the idempotency feature has been introduced.

However, there is a difference between what you originally described in your issue and your last comment:

  • a) If the server response fails, duplicate objects cannot be prevented, unless using an idempotency strategy.
  • b) If the server response fails, resending the whole batch of N objects can create up to N-1 duplicates. Even with the idempotency feature enabled, since the /batch endpoint receives a single network request wrapping multiple save requests, it will only have one UUID (if batch requests are even currently supported by the idempotency feature).

Some thoughts:

  • The advantage of using the /batch endpoint is that multiple requests are mapped into a single request, reducing the request frequency. This can be a significant cost factor for cloud services with pay-per-request, so a change would likely have to be considered a breaking change.
  • The current alternative to ParseObject.saveAll using /batch is to make individual ParseObject.save requests and handle each response individually. This reduces the risk that a single failing request leads to duplicates of more than 1 object. To further reduce the risk, activating the idempotency feature ensures that objects are not duplicated.
  • Since the issue is caused by the automatic retry mechanism build into the various Parse client SDKs, an alternative solution could be deactivating the retry mechanism (setting to 0 retries).

My suggestion is that we just add a caveat warning to the saveAll and REST batch endpoint that saving N objects can create up to N-1 duplicates in bad networks conditions if idempotency is not activated.

@mtrezza
Copy link
Member

mtrezza commented Nov 27, 2021

@simonaberry Kindly let me know if you agree with the suggestion above and whether we can regard this issue as merely a "docs" issue.

@simonaberry
Copy link
Author

@mtrezza yes - I think your summary of the issue above is good and logical.

We have already resolved the problem ourselves by building our own error handling logic - but at the price of multiple requests.

So yes - happy that we just improve the documentation to make everyone aware of the potential downsides of saveAll

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants