transformer #16

mrityunjay-tripathi · 2020-06-04T11:39:45Z

Hello everyone! I've implemented the transformer encoder and decoder. Though there are other dependencies for this PR, I made this PR to get some insights and opinions. Some things still remaining regarding this PR:

Get PR multihead attention mlpack#2375 merged.
Implement positional encoding
Adding wiki-dataset2
Adding tests for encoder and decoder
Adding documentation for encoder and decoder.
Do something useful using this code.

mlpack-bot · 2020-06-04T11:39:48Z

Thanks for opening your first pull request in this repository! Someone will review it when they have a chance. In the mean time, please be sure that you've handled the following things, to make the review process quicker and easier:

All code should follow the style guide
Documentation added for any new functionality
Tests added for any new functionality
Tests that are added follow the testing guide
Headers and license information added to the top of any new code files
HISTORY.md updated if the changes are big or user-facing
All CI checks should be passing

Thank you again for your contributions! 👍

kartikdutt18 · 2020-06-04T11:43:41Z

Hey @mrityunjay-tripathi, Thanks for opening this PR. If possible could you create a models folder and place the transformer inside. As many models will be added to the repo it would be great if we had a models folder rather than having a lot of models in main folder. So it would be models/models/transformer.

mrityunjay-tripathi · 2020-06-04T11:48:11Z

Sure @kartikdutt18! Actually I was also thinking about why it was not that way. But no worries. I will make it that way now 👍

kartikdutt18 · 2020-06-04T11:50:37Z

Sure @kartikdutt18! Actually I was also thinking about why it was not that way. But no worries. I will make it that way now 👍

Awesome, The reason for that it is, it's part of Restructuring - 3. All existing models will be replaced with something similar to what you have implemented i.e. in a class so that user could use pre trained models.

models/transformer/decoder.hpp

models/transformer/decoder_impl.hpp

mlpack-bot · 2020-07-09T20:02:24Z

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

models/transformer/decoder_impl.hpp

mrityunjay-tripathi · 2020-08-22T09:31:22Z

@lozhnikov I've made the changes as you suggested. Can you please take a look? It's mostly done and once the required layers are merged we can test this.

lozhnikov · 2020-08-22T10:19:39Z

Sure, I'll review the PR today (in the evening).

lozhnikov

Some comments.

models/transformer/encoder.hpp

models/transformer/encoder_impl.hpp

models/transformer/decoder_impl.hpp

models/transformer/encoder_impl.hpp

models/transformer/transformer_impl.hpp

mrityunjay-tripathi · 2020-08-25T17:52:42Z

I was trying to test this locally and I got following error--

error: matrix multiplication: incompatible matrix dimensions: 0x0 and 16x10
unknown location(0): fatal error: in "FFNModelsTests/TransformerEncoderTest": std::logic_error: matrix multiplication: incompatible matrix dimensions: 0x0 and 16x10

I feel there is some problem with Reset. The weights and biases are not allocated memory. But I can't find why.

lozhnikov · 2020-08-25T18:14:02Z

I feel there is some problem with Reset. The weights and biases are not allocated memory. But I can't find why.

I'll look into it in the morning.

Upd: I'd use GDB in order to find the actual place where it happens.

lozhnikov · 2020-08-26T10:13:42Z

@mrityunjay-tripathi I think I get it. Looks like my comment #16 (comment) was wrong. We need to pass model = true to the constructors (I mean Sequential, Concat, AddMerge). In this case all the visitors will go through their sublayers. And there will be no memory issues since the Delete visitor does the same.

mrityunjay-tripathi · 2020-08-26T10:50:56Z

We need to pass model = true to the constructors (I mean Sequential, Concat, AddMerge).

Yeah. Got it. Thanks :)

Co-authored-by: Mikhail Lozhnikov <lozhnikovma@gmail.com>

…ormer_model

lozhnikov · 2020-09-05T13:16:33Z

@mrityunjay-tripathi Is this PR ready? Can I review this?

mrityunjay-tripathi · 2020-09-05T15:39:42Z

Yes. This is ready for review.

lozhnikov

Sorry for the slow response. The beginning of the term was hard. Now things settled a bit. I found a tiny flaw in the decoder implementation. I'll suggest the fix in the evening.

models/transformer/decoder.hpp

lozhnikov · 2020-09-18T06:40:52Z

models/transformer/decoder.hpp

+    MultiheadAttention<>* mha1 = new MultiheadAttention<>(tgtSeqLen,
+                                                         tgtSeqLen,
+                                                         dModel,
+                                                         numHeads);


Shouldn't the second argument be equal to srcSeqLen?

lozhnikov · 2020-09-18T06:55:30Z

models/transformer/decoder.hpp

+    // This layer concatenates the output of the bottom decoder block (query)
+    // and the output of the encoder (key, value).
+    Concat<>* encDecAttnInput = new Concat<>(true);
+    encDecAttnInput->Add<Subview<>>(1, 0, dModel * tgtSeqLen - 1, 0, -1);


I think this is incorrect. It's the decoder bottom input. But the encoder-decoder attention block should receive the output of the decoder bottom.

Suggested change

encDecAttnInput->Add<Subview<>>(1, 0, dModel * tgtSeqLen - 1, 0, -1);

encDecAttnInput->Add(decoderBlockBottom);

lozhnikov · 2020-09-18T07:06:16Z

models/transformer/decoder.hpp

+    // Residual connection.
+    AddMerge<>* residualAdd2 = new AddMerge<>(true);
+    residualAdd2->Add(encoderDecoderAttention);
+    residualAdd2->Add(decoderBlockBottom);


You can't pass the same block twice (see the comment to encDecAttnInput). Looks like we need to change the model a bit. I have to go now. I'll come up with the idea in the evening.

…ormer_model

mlpack-bot · 2020-11-09T03:24:24Z

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

mlpack-bot bot added s: needs review s: unanswered s: unlabeled labels Jun 4, 2020

zoq added t: added feature and removed s: unanswered s: unlabeled labels Jun 4, 2020

zoq reviewed Jun 4, 2020

View reviewed changes

models/transformer/decoder.hpp Outdated Show resolved Hide resolved

models/transformer/decoder_impl.hpp Outdated Show resolved Hide resolved

mlpack-bot bot added s: stale and removed s: stale labels Jul 9, 2020

mrityunjay-tripathi mentioned this pull request Aug 6, 2020

multihead attention mlpack/mlpack#2375

Merged

lozhnikov reviewed Aug 15, 2020

View reviewed changes

models/transformer/decoder_impl.hpp Outdated Show resolved Hide resolved

mrityunjay-tripathi force-pushed the transformer_model branch 2 times, most recently from 0c08f69 to 37e2414 Compare August 22, 2020 03:45

mrityunjay-tripathi changed the title ~~transformer encoder and decoder~~ transformer Aug 22, 2020

lozhnikov reviewed Aug 22, 2020

View reviewed changes

mrityunjay-tripathi and others added 4 commits August 26, 2020 23:42

complete encoder decoder and transformer model

437fdeb

add namespace in ffn tests

dc8bc78

Apply suggestions from code review

ec965bb

Co-authored-by: Mikhail Lozhnikov <lozhnikovma@gmail.com>

add proper parameter description

e43f886

mrityunjay-tripathi added 3 commits August 26, 2020 23:42

adding constructors and destructors, removing some templates

55de045

use mutator method to set mask in mha

a253a6a

set model = true

fbdd4ff

mrityunjay-tripathi force-pushed the transformer_model branch from 5e572fa to fbdd4ff Compare August 26, 2020 18:13

Merge branch 'master' of https://github.com/mlpack/models into transf…

eaec647

…ormer_model

lozhnikov reviewed Sep 18, 2020

View reviewed changes

mrityunjay-tripathi added 2 commits September 19, 2020 15:05

Merge branch 'master' of https://github.com/mlpack/models into transf…

858f262

…ormer_model

added suggestions

368f82e

mlpack-bot bot added the s: stale label Nov 9, 2020

mlpack-bot bot closed this Nov 16, 2020

mrityunjay-tripathi reopened this Nov 20, 2020

mlpack-bot bot removed the s: stale label Nov 20, 2020

mrityunjay-tripathi added the s: keep open label Nov 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transformer #16

transformer #16

mrityunjay-tripathi commented Jun 4, 2020 •

edited

mlpack-bot bot commented Jun 4, 2020

kartikdutt18 commented Jun 4, 2020 •

edited

mrityunjay-tripathi commented Jun 4, 2020

kartikdutt18 commented Jun 4, 2020 •

edited

mlpack-bot bot commented Jul 9, 2020

mrityunjay-tripathi commented Aug 22, 2020

lozhnikov commented Aug 22, 2020

lozhnikov left a comment

mrityunjay-tripathi commented Aug 25, 2020 •

edited

lozhnikov commented Aug 25, 2020 •

edited

lozhnikov commented Aug 26, 2020 •

edited

mrityunjay-tripathi commented Aug 26, 2020

lozhnikov commented Sep 5, 2020

mrityunjay-tripathi commented Sep 5, 2020

lozhnikov left a comment

lozhnikov Sep 18, 2020

lozhnikov Sep 18, 2020

lozhnikov Sep 18, 2020

mlpack-bot bot commented Nov 9, 2020

	encDecAttnInput->Add<Subview<>>(1, 0, dModel * tgtSeqLen - 1, 0, -1);
	encDecAttnInput->Add(decoderBlockBottom);

transformer #16

Are you sure you want to change the base?

transformer #16

Conversation

mrityunjay-tripathi commented Jun 4, 2020 • edited

mlpack-bot bot commented Jun 4, 2020

kartikdutt18 commented Jun 4, 2020 • edited

mrityunjay-tripathi commented Jun 4, 2020

kartikdutt18 commented Jun 4, 2020 • edited

mlpack-bot bot commented Jul 9, 2020

mrityunjay-tripathi commented Aug 22, 2020

lozhnikov commented Aug 22, 2020

lozhnikov left a comment

Choose a reason for hiding this comment

mrityunjay-tripathi commented Aug 25, 2020 • edited

lozhnikov commented Aug 25, 2020 • edited

lozhnikov commented Aug 26, 2020 • edited

mrityunjay-tripathi commented Aug 26, 2020

lozhnikov commented Sep 5, 2020

mrityunjay-tripathi commented Sep 5, 2020

lozhnikov left a comment

Choose a reason for hiding this comment

lozhnikov Sep 18, 2020

Choose a reason for hiding this comment

lozhnikov Sep 18, 2020

Choose a reason for hiding this comment

lozhnikov Sep 18, 2020

Choose a reason for hiding this comment

mlpack-bot bot commented Nov 9, 2020

mrityunjay-tripathi commented Jun 4, 2020 •

edited

kartikdutt18 commented Jun 4, 2020 •

edited

kartikdutt18 commented Jun 4, 2020 •

edited

mrityunjay-tripathi commented Aug 25, 2020 •

edited

lozhnikov commented Aug 25, 2020 •

edited

lozhnikov commented Aug 26, 2020 •

edited