-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize bulk 1 and 2 APIs expoing "streams" to bulk2.query #1317
base: 2.0
Are you sure you want to change the base?
Conversation
…now the data is piped to a Readable stream. Change the way maxRecords is converted to string
- create getRecords to remove code repetition - rename tests with a better description of what they do
… a query param in the request url - add a counter for the number of batches - rename the counter that counts the number of records from numberOfRecordsProcessed to numberRecordsRetrieved
|
||
while (this.locator !== 'null') { | ||
const nextResults = await this.request<Record[]>({ | ||
private async *getResults(): AsyncGenerator<Record[], void, unknown> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather you name this method something different, like getResultsGenerator
, and still have a public getResults() method that does:
return new Promise((resolve) => {
let records = [];
this.stream()
.on('data', (data) => {
records = records.concat(data);
})
.on('end', () => {
resolve(records);
});
});
That way everyone who is used to the getResult method can still access it from the query job and doesn't have to write it themselves. Migrating from v1 -v2 would be easier, as it would only need to change bulk.query()
to bulkd.query().getResults()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the getRecords()
method because bulk v1 does not have it, and because I wanted to make v2's API equal to v1. Another good reason for not creating this method, is because it could cause Fatal error: Javascript heap out of memory
if used to query huge data sets without using the LIMIT keyword.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AllanOricil Even though jsforce v2 is still in beta there's lot of projects using it so I would prefer not break the current bulk v2 implementation. What do you think about deprecating this and add an option to the query job, then make that the default at v2 GA?
maybe a third arg to BulkV2.query
that makes it return the job instead of the results could work, default to false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Np. I'm going to add this arg 👍
Hey @AllanOricil, I opened a PR a few days ago to fix this in v3: At first I was about to merge your PR and start from there but after playing a bit whit the record stream system in jsforce I decided to start from scratch (we had some planned breaking changes in bulk2 too). After it gets merged we'll start using jsforce v3 in the CLI but this issue will not be solved because |
If u are going to add the whole file in mem, do some additional checks to avoid unnecessary processing: Something like this:
If(!canProcess(...)) throw new Error ("sorry you can't use this file because o XYZ. Please do something like bla and then try again") This way users won't wait for the mem to be all filled to see an error. |
BEFORE
soql queries that return huge number of rows could throw
FATAL Error ... Javascript heap out of memory
becausebulk2.query
stores all records in memory.AFTER
query results can be retrieved in batches with size equal to the value passed to
maxRecords
.obs: there isn't a real stream from Salesforce to jsforce because bulk api v2 does not offer it. These changes are just sending pages to a stream so that the api for v2 is the same as v1.
obs: page results are not stored in memory unless the developer chooses it by concatenation results from every page
or, if developers want to gather all records in memory before doing something with them