Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
embed.sh		embed.sh

README.md

LASER: calculation of sentence embeddings

This codes shows how to calculate sentence embeddings for an arbitrary text file:

bash ./embed.sh INPUT-FILE LANGUAGE OUTPUT-FILE

The input will be tokenized, using the mode of the specified language, BPE will be applied and the sentence embeddings will be calculated.

Output format

The embeddings are stored in float32 matrices in raw binary format. They can be read in Python by:

import numpy as np
dim = 1024
X = np.fromfile("my_embeddings.raw", dtype=np.float32, count=-1)                                                                          
X.resize(X.shape[0] // dim, dim)

X is a N x 1024 matrix where N is the number of lines in the text file.

Example

./embed.sh ${LASER}/data/tatoeba/v1/tatoeba.fra-eng.fra fr my_embeddings.raw

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

embed

embed

README.md

README.md

embed.sh

embed.sh

README.md

LASER: calculation of sentence embeddings

Output format

Example

Files

embed

Directory actions

More options

Directory actions

More options

Latest commit

History

embed

Folders and files

parent directory

README.md

README.md

embed.sh

embed.sh

README.md

LASER: calculation of sentence embeddings

Output format

Example