Idempotency Concern in DocumentBatchingFunction #11

NullPointer4096 · 2023-04-16T15:40:56Z

Description:
I would like to kindly bring attention to a potential issue with the DocumentBatchingFunction, which is used to batch articles into a single JSON file when there are more than five articles in the S3 bucket. The function fetches all articles and batched article lists to the function runtime's local file system, deletes the copies in the bucket, and then uploads the newly generated article list in one json file. However, such a multi-step workflow is not idempotent; if the function runtime crashes after deleting any articles in the S3 bucket but before the new JSON file is uploaded, when the function retries, that article will not be found for processing. That is, some articles will be lost.

Suggested Fix:
To address this issue, please consider uploading the newly concatenated JSON file first before deleting any old articles. If a retry happens before the newly batched file is uploaded, since no documents have been lost, the workflow can start and execute normally with the original documents. If the batch has been uploaded and the runtime crashes, simply proceed to delete the old unbatched articles if they exist. Currently, the batch name is generated from the current time; however, if the name uses context.aws_request_id, which is constant across retries, the program can determine whether the batch has been uploaded.

Thank you for considering this feedback. I hope my suggestion proves helpful in enhancing the reliability and idempotency of the DocumentBatchingFunction. Please don't hesitate to reach out if you have any questions or concerns.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idempotency Concern in DocumentBatchingFunction #11

Idempotency Concern in DocumentBatchingFunction #11

NullPointer4096 commented Apr 16, 2023

Idempotency Concern in DocumentBatchingFunction #11

Idempotency Concern in DocumentBatchingFunction #11

Comments

NullPointer4096 commented Apr 16, 2023