Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for improved LASER2 embeddings #45

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Thommy96
Copy link

Hi,

in the current version of Facebook's LASER repository they provide an improved LASER2 model trained on the same languages as the original LASER model. However they also introduced a sentencepiece model (SPM) for Tokenization. So I made a few changes to your code such that one can use the improved model easily in Python. In order to ensure that it is working, I compared generated embeddings with original LASER2 embeddings (obtained by using a fork of your test data repository. The resulting report shows an almost perfect matching.
The tests for the new embeddings might still have to be adapted such that one can run them with poetry and pytest.

Feel free to check out the changes :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant