Guided alignments in Sockeye. #1105

tomsbergmanis · 2024-02-19T09:51:28Z

Hi!
We finally have time to reopen work on guided alignments for Sockeye 3
To recap: guided alignments are handy for formatted document translation, non-translatable entity and placeholder handling, and variations of automatic post-editing. Guided alignments are described in this paper Jointly Learning to Align and Translate with Transformer Models

Previously we were recommended to start from metadata branch. Would it still be the best point from where to start? If so, would getting it up-to-date be complicated?

Cheers!
Toms

mjdenkowski · 2024-02-20T22:15:11Z

Hi Toms,

At this point, the metadata branch is somewhat out of sync with main, but it could still be helpful as a reference. One path forward would be to follow how metadata is woven through data preparation and training in the metadata branch and add alignment tracking in similar places in the main branch.

Best,
Michael

iPRET · 2024-03-14T09:09:01Z

Hi Michael,
I am the developer with Tilde implementing guided alignments in Sockeye 3. Things are going well, but I have a question: the sockeye.layers.MultiHeadAttention class uses torch's torch.nn.functional.multi_head_attention_forward, which does dropout post-softmax on the attentions, which breaks the cross-entropy loss' assumption that its inputs are valid probability distributions (this makes training a lot worse ༼ つ ◕_◕ ༽つ).
So, we currently see two options:

To reimplement (mostly copy and modify) torch.nn.functional.multi_head_attention_forward
To turn off attention dropout for the entire layer used to learn guided alignments

Do you have any preference? Or do you see another way forward?

Thanks,
Ingus Jānis Pretkalniņš

P.S. We were surprised that dropout on attention is implemented post and not pre-softmax. Post-softmax seems to be standard in transformers. Do you know of why that is?

mjdenkowski · 2024-03-15T17:52:31Z

Hi Ingus,

I'm not familiar with the internals of torch.nn.functional.multi_head_attention_forward. I believe we use it during training because it is faster than our inference implementation (layers.py#L544-L570, layers.py#L655-L678). When we switch between implementations, we need to either interleave or separate the parameters to match what different layers expect (layers.py#L455-L510).

If the inference implementation doesn't have the dropout issue, one option would be also using this implementation during training when the option for guided alignments is active. This may be a shorter version of the reimplementation path you mentioned.

Best,
Michael

iPRET · 2024-04-15T07:13:29Z

Hello Michael,

We're doing some final internal checks on the changes we've made (it's about 1000 lines of changes (づ｡◕‿‿◕｡)づ), we'll probably do the pull request very soon.
Apart from the developer requirements https://awslabs.github.io/sockeye/development.html,
are there any graphs/checks/experiments that You would like to see, before investing time into doing a code review?

Thanks,
IP

mjdenkowski · 2024-04-15T15:19:33Z

It sounds like you've made a lot of progress toward your goal. If you're primarily making these changes to enable your own work, you could keep them on a fork of Sockeye without the need to go through a full code review.

If you're interested in merging your changes into Sockeye's main branch, you could run additional experiments to verify the following:

The feature works for the scale of model it would be used with (according to your measure of success).
The changes do not negatively impact baseline training (and inference if changed). This includes speed, accuracy, and memory usage.

iPRET · 2024-05-15T13:08:11Z

Hello Michael,

We've prepared a report looking over the ups and downs of adding alignment matrices to Sockeye.
Sockeye_Alignment_Matrix_Report-6.pdf

I will open a pull request promptly. ٩(◕‿◕)۶

Thanks,
IP

iPRET mentioned this issue May 15, 2024

Add alignment matrix learning #1108

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guided alignments in Sockeye. #1105

Guided alignments in Sockeye. #1105

tomsbergmanis commented Feb 19, 2024

mjdenkowski commented Feb 20, 2024

iPRET commented Mar 14, 2024

mjdenkowski commented Mar 15, 2024

iPRET commented Apr 15, 2024

mjdenkowski commented Apr 15, 2024

iPRET commented May 15, 2024 •

edited

Guided alignments in Sockeye. #1105

Guided alignments in Sockeye. #1105

Comments

tomsbergmanis commented Feb 19, 2024

mjdenkowski commented Feb 20, 2024

iPRET commented Mar 14, 2024

mjdenkowski commented Mar 15, 2024

iPRET commented Apr 15, 2024

mjdenkowski commented Apr 15, 2024

iPRET commented May 15, 2024 • edited

iPRET commented May 15, 2024 •

edited