Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set the batch size? #61

Open
atoutou opened this issue Jul 14, 2021 · 1 comment
Open

How to set the batch size? #61

atoutou opened this issue Jul 14, 2021 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@atoutou
Copy link

atoutou commented Jul 14, 2021

Hi,

The prediction process takes a long time to finish so I check the GPU memory usage and find out it only uses 3GB memory ( I have 16GB memory GPU).
I want to set a larger batch size to speed up the process but I can't find the argument.
How to set the batch size when using the predict function?

import nlu
pipe = nlu.load('xx.embed_sentence.labse', gpu=True)
pipe.pipe.predict(text, output_level='document')

Thanks

@C-K-Loan
Copy link
Member

Hi @atoutou

pipe = nlu.load('xx.embed_sentence.labse', gpu=True)
pipe.print_info()

will print

The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :
>>> pipe['bert_sentence@labse'] has settable params:
pipe['bert_sentence@labse'].setBatchSize(8)          | Info: Size of every batch | Currently set to : 8
pipe['bert_sentence@labse'].setIsLong(False)         | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False
pipe['bert_sentence@labse'].setMaxSentenceLength(128)  | Info: Max sentence length to process | Currently set to : 128
pipe['bert_sentence@labse'].setDimension(768)        | Info: Number of embedding dimensions | Currently set to : 768
pipe['bert_sentence@labse'].setCaseSensitive(False)  | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False
pipe['bert_sentence@labse'].setStorageRef('labse')   | Info: unique reference name for identification | Currently set to : labse
>>> pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:
pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False)  | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False
pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97')  | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97
pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@7f47d7d6)  | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@7f47d7d6
pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e'])  | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']
pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn')  | Info: Model architecture (CNN) | Currently set to : cnn
>>> pipe['document_assembler'] has settable params:
pipe['document_assembler'].setCleanupMode('shrink')  | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink

With
pipe['bert_sentence@labse'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8

Should fix your problem.

Let me know if it helps

@C-K-Loan C-K-Loan self-assigned this Jul 17, 2021
@C-K-Loan C-K-Loan added the question Further information is requested label Jul 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants