Translation of pre-split and pre-tokenized sentences #407

BLKSerene · 2024-04-19T09:31:26Z

Hi, the doc says Argos Translate uses SentencePiece (and maybe Sacremoses?) for tokenization and Stanza for sentence boundary detection. I'm wondering whether it is possible to translate pre-split and pre-tokenized sentences (a list of lists of tokens), in which case I could drop many dependencies of Argos Translate, since there are many problems concerning the strict version pin of dependencies (cf. #362, #395).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translation of pre-split and pre-tokenized sentences #407

Translation of pre-split and pre-tokenized sentences #407

BLKSerene commented Apr 19, 2024

Translation of pre-split and pre-tokenized sentences #407

Translation of pre-split and pre-tokenized sentences #407

Comments

BLKSerene commented Apr 19, 2024