Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch write ignoring failed writes #29

Open
ktwbc opened this issue Jul 21, 2018 · 2 comments
Open

batch write ignoring failed writes #29

ktwbc opened this issue Jul 21, 2018 · 2 comments

Comments

@ktwbc
Copy link
Contributor

ktwbc commented Jul 21, 2018

The response back from DocumentClient on BatchWrite will include any failed writes, which usually need to be resubmitted (sometimes with a linear backoff)

The call from DynamoDb returns UnprocessedItems but batch_write.js will not check this, and this results in missed items when writing batches to the table.

will try and submit a PR when I have time but I have code from a different attempt which was working but would need to be redone to fit in with how this file works

async.forEachOf(arrParams, function (params: DynamoDB.DocumentClient.BatchWriteItemInput, key: number, eachDone: Callback) {

      let backoffCount = 1;

      let processItemsCallback = (error: AWSError, result: DynamoDB.DocumentClient.BatchWriteItemRequestMap) => {
        
        if (error) {
          return eachDone(error);
        } else {
          if (Object.keys(result.UnprocessedItems).length !== 0) {

            let params: any = {};
            params.RequestItems = result.UnprocessedItems;

            backoffCount++;

            setTimeout(() => {
              dynamoDb.batchWrite(params, processItemsCallback);
            }, 1000 * backoffCount);

          } else {
            eachDone();
          }
        }
      };

      dynamoDb.batchWrite(params, processItemsCallback);

    }, function done(err: Error) {
    ...etc
@breath103
Copy link
Contributor

breath103 commented Jul 23, 2018

https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html
The individual PutItem and DeleteItem operations specified in BatchWriteItem are atomic; however BatchWriteItem as a whole is not. If any requested operations fail because the table's provisioned throughput is exceeded or an internal processing failure occurs, the failed operations are returned in the UnprocessedItems response parameter. You can investigate and optionally resend the requests. Typically, you would call BatchWriteItem in a loop. Each iteration would check for unprocessed items and submit a new BatchWriteItem request with those unprocessed items until all items have been processed.
So this is legit, thanks for pointing out :) . However i'm slightly concerned about the interface here, since there are many options we can customize. (How many retries / Delay between retry / What happens when insert keeps fails)

For

  • How many retries (even "infinite" until unprocessed items become empty)
  • Delay between retries,
    we can decide some default value on library level and let user to customize it if they want i guess? which would be sufficient for most of common use cases anyway. So that keep code not be rewritten because of this change.
    (if we add "maximumRetry" as required parameter on batchDelete / batchPut, that would require existing codes to revised)

But

  • When the maximum number of retries all fails, what happens?
    is a bit more complicated problem. should we just raise an error? or... let me know if you have an opinion on this. I'd do this personally if the specification become more clear! this is something we've been clearly missing

@ktwbc
Copy link
Contributor Author

ktwbc commented Aug 5, 2018

Sorry for the delay in responding. Since the error message only comes about through throttling and eventually throttling will let it through, infinite is probably not a bad option. (I'm using this in a lambda function which would time out on its own in 5 minutes or less, I'm not sure what a standard node function would do if a runaway process can be killed by the node engine with express. But the throttling by definition basically says "eventually this will succeed")

For delay strategy, probably an exponential backoff strategy would be best, though maybe a setting between exponential and linear would be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants