Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the best way to deploy models in terms of translation speed? #70

Open
jamie0725 opened this issue Nov 2, 2022 · 3 comments
Open

Comments

@jamie0725
Copy link

Hi,

Firstly, thanks for making your translation models publicly available. It is really helpful for the industry.

I have a question though, related to this question, if I am going to translate a large amount of text, what is the best way to use your models? Currently I am using the transformers library, but the speed is pretty slow even on gpu, which is not satisfying enough.

@jamie0725
Copy link
Author

jamie0725 commented Nov 2, 2022

In terms of speed, currently to translate 30k documents of about 300 words each, it takes 10+ hours on a single gpu. Is this expected?

@opme
Copy link

opme commented Jan 10, 2023

There was a good answer on that stackoverflow: Helsinki-NLP models were originally trained in Marian and then converted to Huggingface Transformers. Marian is a specialized tool for MT and is very fast. If you do not need the internals of the models and only need the translation, it should be a better choice.

I have been able to get the transformers working and it is taking 2-3 seconds per paragraph with a new 2022 laptop with 64 gb ram and A2000 gpu.

I am trying the dockerized version of Opus-MT but when I run it is so far giving me incomplete or garbage response.

docker build -f Dockerfile.gpu . -t opus-mt-gpu
nvidia-docker run -p 8888:8888 opus-mt-gpu:latest
~/git/Opus-MT$ echo "I am a dog" | ./opusMT-client.py -H 172.17.0.2 -P 10001 -s en -t es
Soy un

I am trying to translate company profiles for 80k stock symbols into european languages. just venting. I don't want to start the bulk translations until I can get it running at <1 second.

@jorgtied
Copy link
Member

For batch translation it would be better to run directly through the marian-decoder and not through the server/client setup. Also note that the opusMT server/client implementation does not do batching and, therefore, does not really use the full power of a GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants