Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not translate words from a given vocabulary #72

Open
rotcx opened this issue Nov 16, 2023 · 5 comments
Open

do not translate words from a given vocabulary #72

rotcx opened this issue Nov 16, 2023 · 5 comments

Comments

@rotcx
Copy link

rotcx commented Nov 16, 2023

e.g., do not translate LLM to 法学硕士. Leave it as LLM.

e.g., do not Transformer LLM to 变压器. Leave it as Transformer.

@rotcx
Copy link
Author

rotcx commented Nov 18, 2023

if we could not set such a non-translating vocab for translators (google, tencent ... )

the only way is to remedy it replace the (wrongly) translated words to the origin EN word after translation ...

@rotcx
Copy link
Author

rotcx commented Nov 18, 2023

An impl could be:

    from functools import reduce
    replace_dict = {"法学硕士": "LLM", "变压器": "Transformer", "代币":"token"}
    text_final = reduce(lambda text, kv: text.replace(*kv), replace_dict.items(), text_final)

image

@rotcx
Copy link
Author

rotcx commented Nov 18, 2023

Another (downstream way) is to proc the translated main.tex file:

#!/bin/bash

declare -A replace_dict=(["法学硕士"]="LLM" ["变压器"]="Transformer" ["代币"]="token")

while read -r line; do
    for key in "${!replace_dict[@]}"; do
        line=${line//${key}/${replace_dict[$key]}}
    done
    echo $line
done < main.tex

@rotcx
Copy link
Author

rotcx commented Nov 18, 2023

iter all .tex files of directory dir and proc (as we could not in general not know which .tex is the main tex file?):

#!/bin/bash

declare -A replace_dict=(["法学硕士"]="LLM" ["变压器"]="Transformer" ["代币"]="token")

find dir -name "*.tex" | while read -r file; do
    while read -r line; do
        for key in "${!replace_dict[@]}"; do
            line=${line//${key}/${replace_dict[$key]}}
        done
        echo $line
    done < "$file"
done

@sherrylixuecheng
Copy link
Collaborator

Thank you for reporting issues to us. Since we are a general translation tool instead of a tool only working for CS or DL, we think it might be better to leave it as what it is temporarily.
We could consider a functionality as a "user dictionary", by asking the users to manually define the "popular vocabulary". The only thing user need is to load a list of vocabulary. Similar to your solution here but more systematic and friendly to users. @SUSYUSTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants