Skip to content

Latest commit

 

History

History
126 lines (108 loc) · 6.02 KB

加别的翻译器.md

File metadata and controls

126 lines (108 loc) · 6.02 KB

如果你有python编程基础, 知道怎么用python调用需要的翻译器api或翻译模型, 按如下步骤实现一个类写进dl/translators.init.py里就能直接在程序里用.
下面作为实例的DummyTranslator在dl/translator/init.py里被注释掉了, 可以反注释在程序里看结果.

# "dummy translator" is the name showed in the app
@register_translator('dummy translator')
class DummyTranslator(BaseTranslator):

    concate_text = True

    # parameters showed in the config panel. 
    # keys are parameter names, if value type is str, it will be a text editor(required key)
    # if value type is dict, you need to spicify the 'type' of the parameter, 
    # following 'device' is a selector, options a cpu and cuda, default is cpu
    params: Dict = {
        'api_key': '', 
        'device': {
            'type': 'selector',
            'options': ['cpu', 'cuda'],
            'value': 'cpu'
        }
    }

    def _setup_translator(self):
        '''
        do the setup here.  
        keys of lang_map are those languages options showed in the app, 
        assign corresponding language keys accepted by API to supported languages.  
        Only the languages supported by the translator are assigned here, this translator only supports Japanese, and English.
        For a full list of languages see LANGMAP_GLOBAL in translator.__init__
        '''
        self.lang_map['日本語'] = 'ja'
        self.lang_map['English'] = 'en'  
        
    def _translate(self, src_list: List[str]) -> List[str]:
        '''
        do the translation here.  
        This translator do nothing but return the original text.
        '''
        source = self.lang_map[self.lang_source]
        target = self.lang_map[self.lang_target]
        
        translation = text
        return translation

    def updateParam(self, param_key: str, param_content):
        '''
        required only if some state need to be updated immediately after user change the translator params,
        for example, if this translator is a pytorch model, you can convert it to cpu/gpu here.
        '''
        super().updateParam(param_key, param_content)
        if param_key == 'device':
            # get current state from params
            # self.model.to(self.params['device']['value'])
            pass

    @property
    def supported_tgt_list(self) -> List[str]:
        '''
        required only if the translator's language supporting is asymmetric, 
        for example, this translator only supports English -> Japanese, no Japanese -> English.
        '''
        return ['English']

    @property
    def supported_src_list(self) -> List[str]:
        '''
        required only if the translator's language supporting is asymmetric.
        '''
        return ['日本語']

首先这个翻译器必须用register_translator装饰并继承基类BaseTranslator, 装饰器内的参数'dummy translator'是最终在界面里显示的翻译器名字, 注意不要和已有翻译器重名.
这个concate_text留到后面再提, 如果是离线模型或在线api接受字符串表就设成False.

@register_translator('dummy translator')
class DummyTranslator(BaseTranslator):  
    concate_text = True

如果新翻译器需要用户配置参数就仿照下面构造一个名为params的字典, 否则不用管或者赋值为None.
params里的键值是界面里显示的对应参数名, 值可以是str, 下面的api_key在界面里会是一个默认值为空的文本编辑器.
参数值也可以是字典, 但是必须指定类型'type', 指定为'selector'后在界面里显示为选择器, 下面的device是一个选择器, 可以选择cpu和cuda, 默认是cpu.

    params: Dict = {
        'api_key': '', 
        'device': {
            'type': 'selector',
            'options': ['cpu', 'cuda'],
            'value': 'cpu'
        }
    }

上面参数字典在界面设置面板里的显示结果

翻译器需要实现_setup_translator, 这里做初始化. lang_map字典的键值是界面里显示的语言选项, 赋的是API接受的这种语言关键字, 比如谷歌翻译简体中文对应'zh'. 这里只对翻译器支持的语言赋值, 完整的语言列表见translator.__init__里的LANGMAP_GLOBAL.

    def _setup_translator(self):
        self.lang_map['简体中文'] = 'zh'
        self.lang_map['日本語'] = 'ja'
        self.lang_map['English'] = 'en'  

翻译器还需要实现_translate, 下面的lang_source和lang_target是此时界面里选择的语言, 可以通过之前的lang_map获取对应的api关键字, 以拼接api参数并发送请求.
注意如果前面的concate_text设置为False, 这里传入的text会是字符串表, 对应当前翻译页面的每个文本块原文内容, 翻译的输出也应当是一一对应的译文表. 设置为True时传入的text是所有文本块内容拼接成的纯字符串, 输出应当是这个字符串的翻译文本.
每个文本块都发请求太慢了所以拼接后整页一起翻译, concate_text设置后拼/拆是自动的这里不用管, 默认会将'\n###\n'作为分隔符拼接成一整个文本块, 再将译文用'###'分割回文本表. 这种方法对我测试过的多数翻译器管用, 但是有些翻译器会把这些#处理掉, 这时可以禁用concate_text逐个文本块翻译或者实现自己的拼接方法.

    def _translate(self, src_list: List[str]) -> List[str]:
        api_key = self.params['api_key']  # 如此获取用户修改过的api_key
        source = self.lang_map[self.lang_source]
        target = self.lang_map[self.lang_target]
        return text

这个dummy translator什么都不做只返回原文.

如果有必要重新实现updateParam, supported_tgt_list, supported_src_list, 详见这些函数注释.

翻译器实现后建议仿照tests/test_translators.py下的例子写个自己翻译器的测试查看输出是否正确. 测试通过就能在程序里正常使用了.