Skip to content

lukmanaj/Cohere-Parallel-Language-Sentence-Alignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cohere-Parallel-Language-Sentence-Alignment

Open In Colab

Cohere-Align

This repo takes two text files in the source and target languages, and returns sentences that are most likely translations of each other.

Before running, create an account on cohere to get your api key.

Then install cohere, using the following command

pip install cohere

To align sentences, create two text files, with each line containing a distinct text, for the source and target languages. Afterwards , run the following command:

Cohere

python3 scripts/cohere_align.py \
   --cohere_api_key '<api_key>' \
   -m 'embed-multilingual-v2.0' \
   -s src.txt \
   -t trg.txt \
   -o cohere \
   --retrieval 'nn' \
   --dot \
   --cuda

There's also a comparison with laser autoencoder for the same files

Laser

python3 scripts/laser_align.py \
  -s src.txt \
  -t trg.txt \
  -o cohere \
  --src_lang ha \
  --trg_lang en \
  --retrieval 'nn' \
  --dot \
  --cuda

where m is model name, s is source text path, t is target text path, o is output directory path, and provide the cuda option if you have GPU. For more parameters, see the alignment script.

You can also use the jupyter notebook above to align the sentences.

About

Code base for CohereAIHack submission

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published