Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch processing does not speed up en_core_web_trf #13500

Open
njaramish opened this issue May 16, 2024 · 0 comments
Open

Batch processing does not speed up en_core_web_trf #13500

njaramish opened this issue May 16, 2024 · 0 comments

Comments

@njaramish
Copy link

How to reproduce the behaviour

spacy.prefer_gpu()
nlp = spacy.load(
                "en_core_web_trf", 
                disable=['tagger', 'ner', 'lemmatizer', 'textcat']
            )

node = """Some really long string, 3000 characters"""

# simulating 96 pretty long docs
nodes = [node*25]*96

Then, run each of the below lines separately and time it:

# 1 minute 7.5 s
[list(doc.sents) for doc in nlp.pipe(nodes, batch_size=96)]

# 1 minute 7.3 s 
[list(doc.sents) for doc in nlp.pipe(nodes, batch_size=32)]

# 1 m 8.2 s
[list(doc.sents) for doc in nlp.pipe(nodes, batch_size=1)]

Running the same thing with en_core_web_lg results in substantial gains due to batching. Largest batch size is roughly 1/4 of the runtime of batch_size=1.

Your Environment

Using a single RTX A6000

python -m spacy info --markdown:

Info about spaCy

  • spaCy version: 3.7.4
  • Platform: Linux-5.15.0-94-generic-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Pipelines: en_core_web_lg (3.7.1), en_core_web_trf (3.7.3), en_core_web_sm (3.7.1), de_core_news_sm (3.7.0)

Expected Behavior

My understanding from the documentation and this issue is that we should expect significant gains from batching, as observed with en_core_web_lg. However, using en_core_web_trf does not yield significant gains from batching.

I'm wondering if this is a bug, or if we should not expect improved performance due to batching for a Transformer-Parser pipeline. Thanks for this awesome package, and in advance for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant