Machine Translation Task with DiffuSeq #74

chiral-carbon · 2024-02-20T06:05:48Z

I was wondering how I might go about implementing a machine translation task with DiffuSeq. I have trained DiffuSeq for the paraphrase task, but I want to be able to use it for translation tasks.
Would supplying a translation dataset to the existing codebase (since it designed for seq2seq tasks) suffice or would further changes be required?

Would appreciate any advice, thanks!

summmeer · 2024-02-23T06:06:44Z

Hi,
You can have a try. But different hyper-parameters may lead to different results, including bsz, steps, dim, seq_len, and tokenizers. Currently many follow-up works achieve better MT performance and you can refer to their codebase, too.

chiral-carbon · 2024-02-25T21:34:14Z

Yeah makes sense, thanks! Are you referring to works like SeqDiffuSeq which builds on DiffuSeq directly?

summmeer · 2024-02-26T03:10:59Z

It depends on what your goal is using diffusion model for MT tasks. Follow-up works are not exactly the same with DiffuSeq. SeqDiffuSeq is based on encoder-decoder architecture, while RDM is based on discrete text diffusion. This work also involves pre-trained MLMs. If you're aiming the performance, you could refer to the SOTA model.

chiral-carbon · 2024-02-29T04:37:03Z

@summmeer thanks, this is very helpful! in the paper DiNoiSer, the authors claim to have surpassed DiffuSeq's performance on the WMT14 EN->DE task, so I wanted to do a similar comparison between DiffuSeq and DiNoiSer on the IWSLT14 task, but DiffuSeq takes a long time to train.
Even with the QQP task reported in the paper, I tried training it to replicate the results and on 4 A100 GPUs it took 6.5 days to train (WandB overview), so do you think there is additional distributed training code required to train DiffuSeq more efficiently?

Sorry for the trivial question, your replies are really helpful, thanks!

summmeer · 2024-02-29T05:30:47Z

Hi,
Maybe you can try our updated version 2, which is 4x faster on training and 800x faster on sampling on QQP datasets. [We update the information of v2 in README.md]

chiral-carbon · 2024-02-29T16:24:39Z

I will, thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machine Translation Task with DiffuSeq #74

Machine Translation Task with DiffuSeq #74

chiral-carbon commented Feb 20, 2024 •

edited

summmeer commented Feb 23, 2024

chiral-carbon commented Feb 25, 2024

summmeer commented Feb 26, 2024

chiral-carbon commented Feb 29, 2024

summmeer commented Feb 29, 2024

chiral-carbon commented Feb 29, 2024

Machine Translation Task with DiffuSeq #74

Machine Translation Task with DiffuSeq #74

Comments

chiral-carbon commented Feb 20, 2024 • edited

summmeer commented Feb 23, 2024

chiral-carbon commented Feb 25, 2024

summmeer commented Feb 26, 2024

chiral-carbon commented Feb 29, 2024

summmeer commented Feb 29, 2024

chiral-carbon commented Feb 29, 2024

chiral-carbon commented Feb 20, 2024 •

edited