Parse.Object.saveAll can result in duplicate objects #859

simonaberry · 2021-11-22T10:49:42Z

New Issue Checklist

I am not disclosing a vulnerability.
I am not just asking a question.
I have searched through existing issues.
I can reproduce the issue with the latest versions of Parse Server and the Parse JS SDK.

Issue Description

this issue has been specifically created to reference in this PR

Parse.Object.saveAll uses the /batch endpoint

if you have a large batch of large objects that are being saved via saveAll, and the network quality is very slow, it can happen that the initial request times out, and will therefore be automatically retried. However, some of the objects in the first batch may have already been successfully saved - but then the second retry would result in duplicate objects being created.

Steps to reproduce

Watch this Loom

Actual Outcome

duplicate objects created

Expected Outcome

no duplicate objects

Environment

JS SDK used in web client, over spotty networks

Server

Parse Server version: any
Operating system: browser
Local or remote host (AWS, Azure, Google Cloud, Heroku, Digital Ocean, etc): remote

Database

System (MongoDB or Postgres): mongo
Database version: any
Local or remote host (MongoDB Atlas, mLab, AWS, Azure, Google Cloud, etc): remote

Client

Parse JS SDK version: latest

Logs

The text was updated successfully, but these errors were encountered:

parse-github-assistant · 2021-11-22T10:49:45Z

Thanks for opening this issue!

🚀 You can help us to fix this issue faster by opening a pull request with a failing test. See our Contribution Guide for how to make a pull request, or read our New Contributor's Guide if this is your first time contributing.

mtrezza · 2021-11-22T18:43:04Z

Could you please fill out the template to describe the issue and indicate whether you can reproduce the issue, see checkbox at the top. If the issue does not exist in the latest version, we may close this issue and related PRs.

simonaberry · 2021-11-25T12:59:29Z

Here is a video explaining the issue https://www.loom.com/share/9744297bacfd41f3bd18e60527f0eacc

mtrezza · 2021-11-25T13:04:49Z

Thanks for updating the issue.

If the cause of the failure was a network delay, then the saveAll will automatically retry, with all 20 objects again.

This sounds like a bug, because duplicate objects would hardly be an intended outcome. So maybe we don't need a new Parse.Object.saveAllSettled method but fix the existing Parse.Object.saveAll method to only retry for failed objects?

I've classified this as a bug with severity:medium, as the workaround is to not use Parse.Object.saveAll but a custom batch saving mechanism.

simonaberry · 2021-11-25T13:09:49Z

@mtrezza I guess it is debatable if it is a bug or not... but the issue stems from the fact that the saveAll call is a batch call, so if there is an error, the entire batch gets resent. The PR we have proposed used individual object.save() calls as a workaround

mtrezza · 2021-11-25T13:29:46Z

I guess it is debatable if it is a bug or not

Can you think of a use case in which one desires this behavior leading to duplicate objects? My assumption would rather be the opposite, like you pointed out in your issue.

simonaberry · 2021-11-25T13:38:07Z

ok - point taken. It's a bug.

Just surprised no-one else has reported it in the last ~10 years!

mtrezza · 2021-11-25T18:23:06Z

Well, if we had an "archeological bug excavator" badge, you would be the first to get it 🙂
Do you have any suggestion how that bug could be fixed without introducing a breaking change?

simonaberry · 2021-11-26T06:16:38Z

I have had a very quick look at the batch saving and it think it would be possible to change that logic from using the 'batch' endpoint to using a series of individual saves (which would give the client the benefit of understanding which individual items were successfully saved or not).

But I guess the first step is an agreement (from you guys who have to see the big picture on the overall direction of changes) that in principal that is the correct approach... I imaging the reason for using /batch calls in the first place was to reduce the overhead associated with bulk transactions.... saving objects individually would undo that benefit.

simonaberry · 2021-11-26T06:21:37Z

you could argue that the same risk exists that if there is a network interruption during the individual save, you could equally end up with duplicate objects on the server. This is true. However, the 'damage' is reduced to just one (or two) duplicates, not 20 (default batch size?).

mtrezza · 2021-11-26T10:07:07Z

you could argue that the same risk exists that if there is a network interruption during the individual save, you could equally end up with duplicate objects on the server.

That is why the idempotency feature has been introduced.

However, there is a difference between what you originally described in your issue and your last comment:

a) If the server response fails, duplicate objects cannot be prevented, unless using an idempotency strategy.
b) If the server response fails, resending the whole batch of N objects can create up to N-1 duplicates. Even with the idempotency feature enabled, since the /batch endpoint receives a single network request wrapping multiple save requests, it will only have one UUID (if batch requests are even currently supported by the idempotency feature).

Some thoughts:

The advantage of using the /batch endpoint is that multiple requests are mapped into a single request, reducing the request frequency. This can be a significant cost factor for cloud services with pay-per-request, so a change would likely have to be considered a breaking change.
The current alternative to ParseObject.saveAll using /batch is to make individual ParseObject.save requests and handle each response individually. This reduces the risk that a single failing request leads to duplicates of more than 1 object. To further reduce the risk, activating the idempotency feature ensures that objects are not duplicated.
Since the issue is caused by the automatic retry mechanism build into the various Parse client SDKs, an alternative solution could be deactivating the retry mechanism (setting to 0 retries).

My suggestion is that we just add a caveat warning to the saveAll and REST batch endpoint that saving N objects can create up to N-1 duplicates in bad networks conditions if idempotency is not activated.

mtrezza · 2021-11-27T11:22:47Z

@simonaberry Kindly let me know if you agree with the suggestion above and whether we can regard this issue as merely a "docs" issue.

simonaberry · 2021-11-28T11:33:12Z

@mtrezza yes - I think your summary of the issue above is good and logical.

We have already resolved the problem ourselves by building our own error handling logic - but at the price of multiple requests.

So yes - happy that we just improve the documentation to make everyone aware of the potential downsides of saveAll

mtrezza transferred this issue from parse-community/Parse-SDK-JS Nov 29, 2021

mtrezza added docs:rest docs:sdk-js and removed docs:rest labels Nov 29, 2021

mtrezza mentioned this issue Dec 5, 2021

feat: add Parse.Object.saveAllSettled to support more robust saving of multiple Parse Objects parse-community/Parse-SDK-JS#1432

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse.Object.saveAll can result in duplicate objects #859

Parse.Object.saveAll can result in duplicate objects #859

simonaberry commented Nov 22, 2021

parse-github-assistant bot commented Nov 22, 2021 •

edited

mtrezza commented Nov 22, 2021 •

edited

simonaberry commented Nov 25, 2021

mtrezza commented Nov 25, 2021 •

edited

simonaberry commented Nov 25, 2021

mtrezza commented Nov 25, 2021 •

edited

simonaberry commented Nov 25, 2021

mtrezza commented Nov 25, 2021 •

edited

simonaberry commented Nov 26, 2021

simonaberry commented Nov 26, 2021

mtrezza commented Nov 26, 2021 •

edited

mtrezza commented Nov 27, 2021

simonaberry commented Nov 28, 2021

Parse.Object.saveAll can result in duplicate objects #859

Parse.Object.saveAll can result in duplicate objects #859

Comments

simonaberry commented Nov 22, 2021

New Issue Checklist

Issue Description

Steps to reproduce

Actual Outcome

Expected Outcome

Environment

Logs

parse-github-assistant bot commented Nov 22, 2021 • edited

Thanks for opening this issue!

mtrezza commented Nov 22, 2021 • edited

simonaberry commented Nov 25, 2021

mtrezza commented Nov 25, 2021 • edited

simonaberry commented Nov 25, 2021

mtrezza commented Nov 25, 2021 • edited

simonaberry commented Nov 25, 2021

mtrezza commented Nov 25, 2021 • edited

simonaberry commented Nov 26, 2021

simonaberry commented Nov 26, 2021

mtrezza commented Nov 26, 2021 • edited

mtrezza commented Nov 27, 2021

simonaberry commented Nov 28, 2021

parse-github-assistant bot commented Nov 22, 2021 •

edited

mtrezza commented Nov 22, 2021 •

edited

mtrezza commented Nov 25, 2021 •

edited

mtrezza commented Nov 25, 2021 •

edited

mtrezza commented Nov 25, 2021 •

edited

mtrezza commented Nov 26, 2021 •

edited