Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the ability to stream results from the bulk API. #530

Open
wants to merge 1 commit into
base: release/v1.12.6
Choose a base branch
from

Conversation

bkbeckman
Copy link

I have been using simple-salesforce to pull information from our very large SFDC instance. I discovered that the bulk API implementation can allocate large amounts of memory when pulling extremely large result sets.

This commit provides the ability to stream results from the bulk api, which keeps the memory footprint low.

I have provided two mechanisms for streaming from the bulk API, one is streamed JSON, which uses the json-stream library, and the other is streamed plain text.

…pports streamed json parsing via the json-stream library as well as streamed plain text.
@jon-wobken
Copy link
Collaborator

Does the existing lazy_operation that returns a generator handle the memory issue you are facing?

@bkbeckman
Copy link
Author

No, because lazy_operation still must download the entire contents of the request and parses it before it can begin to return results.

This PR does two things:

  1. Passes the stream=True flag to the session.request method, which prevents the entire request content from being downloaded before being returned.
  2. Adds the json-stream library, which is capable of parsing json without loading the entire string into memory first.

@jon-wobken jon-wobken added the Backlog Issue will be looked at sometime in the future label May 12, 2022
@jon-wobken jon-wobken self-assigned this May 23, 2022
@jon-wobken jon-wobken changed the base branch from master to release/v1.12.0 May 23, 2022 17:59
@jon-wobken
Copy link
Collaborator

Stepped away from this project a bit will work on getting this reviewed and merged into the release branch (v1.12)

@bkbeckman
Copy link
Author

I appreciate it. Let me know if there's anything I can assist with.

@jon-wobken jon-wobken deleted the branch simple-salesforce:release/v1.12.6 July 14, 2022 13:02
@jon-wobken jon-wobken closed this Jul 14, 2022
@jon-wobken jon-wobken reopened this Jul 25, 2022
@jon-wobken jon-wobken changed the base branch from release/v1.12.0 to release/v1.12.2 July 25, 2022 14:57
@jon-wobken jon-wobken changed the base branch from release/v1.12.2 to release/v1.12.3 September 12, 2022 20:53
@jon-wobken jon-wobken changed the base branch from release/v1.12.3 to release/v1.12.4 January 12, 2023 19:02
@jon-wobken jon-wobken changed the base branch from release/v1.12.4 to release/v1.12.6 October 24, 2023 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backlog Issue will be looked at sometime in the future In Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants