Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine Translation Task with DiffuSeq #74

Open
chiral-carbon opened this issue Feb 20, 2024 · 6 comments
Open

Machine Translation Task with DiffuSeq #74

chiral-carbon opened this issue Feb 20, 2024 · 6 comments

Comments

@chiral-carbon
Copy link

chiral-carbon commented Feb 20, 2024

Hi @summmeer,

I was wondering how I might go about implementing a machine translation task with DiffuSeq. I have trained DiffuSeq for the paraphrase task, but I want to be able to use it for translation tasks.
Would supplying a translation dataset to the existing codebase (since it designed for seq2seq tasks) suffice or would further changes be required?

Would appreciate any advice, thanks!

@summmeer
Copy link
Collaborator

Hi,
You can have a try. But different hyper-parameters may lead to different results, including bsz, steps, dim, seq_len, and tokenizers. Currently many follow-up works achieve better MT performance and you can refer to their codebase, too.

@chiral-carbon
Copy link
Author

Yeah makes sense, thanks! Are you referring to works like SeqDiffuSeq which builds on DiffuSeq directly?

@summmeer
Copy link
Collaborator

It depends on what your goal is using diffusion model for MT tasks. Follow-up works are not exactly the same with DiffuSeq. SeqDiffuSeq is based on encoder-decoder architecture, while RDM is based on discrete text diffusion. This work also involves pre-trained MLMs. If you're aiming the performance, you could refer to the SOTA model.

@chiral-carbon
Copy link
Author

@summmeer thanks, this is very helpful! in the paper DiNoiSer, the authors claim to have surpassed DiffuSeq's performance on the WMT14 EN->DE task, so I wanted to do a similar comparison between DiffuSeq and DiNoiSer on the IWSLT14 task, but DiffuSeq takes a long time to train.
Even with the QQP task reported in the paper, I tried training it to replicate the results and on 4 A100 GPUs it took 6.5 days to train (WandB overview), so do you think there is additional distributed training code required to train DiffuSeq more efficiently?

Sorry for the trivial question, your replies are really helpful, thanks!

@summmeer
Copy link
Collaborator

Hi,
Maybe you can try our updated version 2, which is 4x faster on training and 800x faster on sampling on QQP datasets. [We update the information of v2 in README.md]

@chiral-carbon
Copy link
Author

I will, thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants