-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provides a workaround for unreasonable overhead encountered in prepro… #303
base: main
Are you sure you want to change the base?
Conversation
…cessors - specifically in datasets.map applied to the tokenizer
To verify:
Examine profile stats with:
after the fix:
|
Regarding NQ dataset: I ran
Turns out a huge amount of time was spent in
I can't find the corresponding
is still suspicious. |
should this PR be closed @jsmcibm ? |
During code review, Bhavani was unable to replicate the problem. I suspect that there is some additional factor (python version, etc.) that we haven't identified that is influencing the behavior in |
…cessors - specifically in datasets.map applied to the tokenizer
PrimeQA Pull Request
What does this PR do?
provides a workaround for unreasonable overhead encountered in preprocessors - specifically in datasets.map applied to the tokenizer
Closes #(issue)
Notes:
(issue)
above ↑↑↑ with the issue this PR closes to automatically link the two.This must be done when the PR is created.
Closes #(issue)
as needed.Closes
.Description
Describe the changes proposed by this PR below to give the reviewer context below ↓↓↓
Wraps the output of the tokenizer in a dictionary of np.arrays - datasets.map is observed to be much faster with this data structure than with standard tokenizer output object.
(description)
Request Review
Be sure to request a review from one or more reviewers (unless the PR is to an unprotected branch).
Versioning
When opening a PR to make changes to PrimeQA (i.e.
primeqa/
) master, be sure to increment the version followingsemantic versioning. The VERSION is stored here
and is incremented using
bump2version {patch,minor,major}
as described in the development guide documentation (https://github.com/primeqa/primeqa/blob/main/docs/development.md).primeqa
package or was not into master?After pulling in changes from master to an existing PR, ensure the VERSION is updated appropriately.
This may require bumping the version again if it has been previously bumped.
If you're not quite ready yet to post a PR for review, feel free to open a draft PR.
Releases
After Merging
If merging into master and VERSION was updated, after this PR is merged:
Checklist
Review the following and mark as completed: