Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FActScore can be slow to run #23

Open
martiansideofthemoon opened this issue Aug 30, 2023 · 0 comments
Open

FActScore can be slow to run #23

martiansideofthemoon opened this issue Aug 30, 2023 · 0 comments

Comments

@martiansideofthemoon
Copy link
Collaborator

Pointed out by a number of people including Katherine Tian.

Sewon's reply / points:

You are right that FactScore does not work in a batch. I think it should be possible to modify the code to make it work in a batch. My recommendation is to identify which part of the pipeline is causing a bottleneck in speed, and then parallelize from the slowest parts one by one. For instance, there are four possible bottlenecks: (1) atomic fact generation, (2) GTR retrieval, (3) InstLLAMA generation, (4) NPM verification.

If (1) is the bottleneck: the bottleneck is coming from OpenAI API, which is possibly because you are sharing the API key with many others and you run into the Rate Limit error. You can check if this is the case by always printing these lines. If this is the case, the best strategy is to use another API key that is not shared by others. Other than that, it's not straightforward how to speed this part up. (In the past, we've heard from users that this is the main bottleneck in speed in their cases, but it's definitely possible it's not the case for you.)

If (2) is the bottleneck, you can make encoding of the query vector (in passage retrieval) work in a batch.

If (3) is the bottleneck, you can make the _generate function work in a batch.

If (4) is the bottleneck, you can make npm work in a batch, or skip NPM by specifying retrieval+llama instead of retrieval+llama+npm which I think should give descent results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant