Cross-Lingual and Multilingual Word Alignment

This project provides an API to perform word alignment.
The list of languages supported depends on the transformer architecture used.

How to use

See the main.py file to run an example

sentence1 = "Today I went to the supermarket to buy apples".split()
sentence2 = "Oggi io sono andato al supermercato a comprare le mele".split()
BERT_NAME = "bert-base-multilingual-cased"
wa = WordAlignment(model_name=BERT_NAME, tokenizer_name=BERT_NAME, device='cpu', fp16=False)
_, decoded = wa.get_alignment(sentence1, sentence2, calculate_decode=True)
for (sentence1_w, sentence2_w) in decoded:
    print(sentence1_w, "\t--->", sentence2_w)

Output:

Today           ---> Oggi
I               ---> io
went            ---> andato
to              ---> al
the             ---> al
supermarket     ---> supermercato
to              ---> a
buy             ---> comprare
apples          ---> mele

get_alignment api

The signature of the function is List[str], List[str], bool -> Tuple[List[int], List[List[str]]]
To speed up the computation you can avoid calculating the decoding posing the boolean value to False.
If calculate_decode is False the second value returned will be None.

FP16 Support

The WordAlignment support FP16 but we discourage their use.

How to install

The Word Alignment is fully compatible with NVIDIA CUDA.
To use CUDA you have to install the CUDA version of Torch-Scatter lib, I made a simple script to automate it

bash cuda_install_requirements.sh

N.B.: The CUDA installation of Torch-Scatter require minutes to be compiled.

Dependencies

Python3
Torch
Transformers
Torch-Scatter

Authors

Andrea Bacciu - Github

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
README.md		README.md
WordAlignment.py		WordAlignment.py
cuda_install_requirements.sh		cuda_install_requirements.sh
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

WordAlignment.py

WordAlignment.py

cuda_install_requirements.sh

cuda_install_requirements.sh

main.py

main.py

requirements.txt

requirements.txt

Repository files navigation

Cross-Lingual and Multilingual Word Alignment

How to use

get_alignment api

FP16 Support

How to install

Dependencies

Authors

About

Releases

Packages

Languages

andreabac3/Word_Alignment_BERT

Folders and files

Latest commit

History

Repository files navigation

Cross-Lingual and Multilingual Word Alignment

How to use

get_alignment api

FP16 Support

How to install

Dependencies

Authors

About

Topics

Resources

Stars

Watchers

Forks

Languages