Unicode Charecter training issue #40

rahat10120141 · 2022-07-29T04:56:06Z

I tried to train My model for translating English to Bengali. After Training when I run the code, The output is not Unicode Bengali character.

I Eat Rice (eng)=> আমি ভাত খাই (Bn)

this type of data is input to the model while training. After complete, when I tested the model by inputting "I Eat Rice" I was expecting "আমি ভাত খাই" as output. But instead of this, the model gave me "Ich esse Reis." I dont know what kind of language is this. Its not related to bengali.

rahat10120141 · 2022-07-29T04:57:22Z

I tested the output. It was in the german language. But why its In German Language

rahat10120141 · 2022-07-29T05:07:13Z

    model = SimpleT5()
    model.from_pretrained(model_type="t5", model_name="t5-base")
    path = "D:\\Python\\Quilbot\\Dataset\\translation.csv"
    df = pd.read_csv(path, encoding='utf8',quotechar="'")
    # df.apply(lambda x: pd.lib.infer_dtype(x.values))
    # print(df)
    df = df.rename(columns={"headlines": "source_text", "text": "target_text"})
    df = df[['source_text', 'target_text']]
    # T5 model expects a task related prefix: since it is a summarization task, we will add a prefix "summarize: "
    df['source_text'] = "tn2bn: " + df['source_text']
    print(df)
    train_df, test_df = train_test_split(df, test_size=0.2)
    train_df.shape, test_df.shape
    print(train_df.shape, test_df.shape)
    model.train(train_df=train_df,
                eval_df=test_df,
                source_max_token_len=128,
                target_max_token_len=50,
                batch_size=8,
                max_epochs=3,
                use_gpu=False
                )
    model.load_model("t5", "outputs/translate", use_gpu=False)

    text_to_summarize = "translate: I eat rice."
    print(model.predict(text_to_summarize))

rahat10120141 · 2022-07-29T05:10:08Z

I have tested it with the commanding phrase: "tn2bn"

Shivanandroy · 2022-07-29T11:05:57Z

@rahat10120141 : How does your train_df looks like before feeding to model?

rahat10120141 · 2022-07-31T16:53:51Z

T5 Doesn't have an English to Bengali translation. From the beginning, it was giving me German result

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode Charecter training issue #40

Unicode Charecter training issue #40

rahat10120141 commented Jul 29, 2022

rahat10120141 commented Jul 29, 2022

rahat10120141 commented Jul 29, 2022

rahat10120141 commented Jul 29, 2022

Shivanandroy commented Jul 29, 2022

rahat10120141 commented Jul 31, 2022

Unicode Charecter training issue #40

Unicode Charecter training issue #40

Comments

rahat10120141 commented Jul 29, 2022

rahat10120141 commented Jul 29, 2022

rahat10120141 commented Jul 29, 2022

rahat10120141 commented Jul 29, 2022

Shivanandroy commented Jul 29, 2022

rahat10120141 commented Jul 31, 2022