Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transformer #16

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

mrityunjay-tripathi
Copy link
Member

@mrityunjay-tripathi mrityunjay-tripathi commented Jun 4, 2020

Hello everyone! I've implemented the transformer encoder and decoder. Though there are other dependencies for this PR, I made this PR to get some insights and opinions. Some things still remaining regarding this PR:

  • Get PR multihead attention mlpack#2375 merged.
  • Implement positional encoding
  • Adding wiki-dataset2
  • Adding tests for encoder and decoder
  • Adding documentation for encoder and decoder.
  • Do something useful using this code.

@mlpack-bot
Copy link

mlpack-bot bot commented Jun 4, 2020

Thanks for opening your first pull request in this repository! Someone will review it when they have a chance. In the mean time, please be sure that you've handled the following things, to make the review process quicker and easier:

  • All code should follow the style guide
  • Documentation added for any new functionality
  • Tests added for any new functionality
  • Tests that are added follow the testing guide
  • Headers and license information added to the top of any new code files
  • HISTORY.md updated if the changes are big or user-facing
  • All CI checks should be passing

Thank you again for your contributions! 👍

@kartikdutt18
Copy link
Member

kartikdutt18 commented Jun 4, 2020

Hey @mrityunjay-tripathi, Thanks for opening this PR. If possible could you create a models folder and place the transformer inside. As many models will be added to the repo it would be great if we had a models folder rather than having a lot of models in main folder. So it would be models/models/transformer.

@mrityunjay-tripathi
Copy link
Member Author

Sure @kartikdutt18! Actually I was also thinking about why it was not that way. But no worries. I will make it that way now 👍

@kartikdutt18
Copy link
Member

kartikdutt18 commented Jun 4, 2020

Sure @kartikdutt18! Actually I was also thinking about why it was not that way. But no worries. I will make it that way now 👍

Awesome, The reason for that it is, it's part of Restructuring - 3. All existing models will be replaced with something similar to what you have implemented i.e. in a class so that user could use pre trained models.

models/transformer/decoder.hpp Outdated Show resolved Hide resolved
models/transformer/decoder_impl.hpp Outdated Show resolved Hide resolved
@mlpack-bot
Copy link

mlpack-bot bot commented Jul 9, 2020

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@mrityunjay-tripathi mrityunjay-tripathi force-pushed the transformer_model branch 2 times, most recently from 0c08f69 to 37e2414 Compare August 22, 2020 03:45
@mrityunjay-tripathi mrityunjay-tripathi changed the title transformer encoder and decoder transformer Aug 22, 2020
@mrityunjay-tripathi
Copy link
Member Author

@lozhnikov I've made the changes as you suggested. Can you please take a look? It's mostly done and once the required layers are merged we can test this.

@lozhnikov
Copy link

Sure, I'll review the PR today (in the evening).

Copy link

@lozhnikov lozhnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

models/transformer/encoder.hpp Outdated Show resolved Hide resolved
models/transformer/encoder.hpp Outdated Show resolved Hide resolved
models/transformer/encoder.hpp Outdated Show resolved Hide resolved
models/transformer/encoder.hpp Outdated Show resolved Hide resolved
models/transformer/encoder_impl.hpp Outdated Show resolved Hide resolved
models/transformer/decoder_impl.hpp Outdated Show resolved Hide resolved
models/transformer/decoder_impl.hpp Outdated Show resolved Hide resolved
models/transformer/encoder_impl.hpp Outdated Show resolved Hide resolved
models/transformer/transformer_impl.hpp Show resolved Hide resolved
models/transformer/transformer_impl.hpp Outdated Show resolved Hide resolved
@mrityunjay-tripathi
Copy link
Member Author

mrityunjay-tripathi commented Aug 25, 2020

I was trying to test this locally and I got following error--

error: matrix multiplication: incompatible matrix dimensions: 0x0 and 16x10
unknown location(0): fatal error: in "FFNModelsTests/TransformerEncoderTest": std::logic_error: matrix multiplication: incompatible matrix dimensions: 0x0 and 16x10

I feel there is some problem with Reset. The weights and biases are not allocated memory. But I can't find why.

@lozhnikov
Copy link

lozhnikov commented Aug 25, 2020

I feel there is some problem with Reset. The weights and biases are not allocated memory. But I can't find why.

I'll look into it in the morning.

Upd: I'd use GDB in order to find the actual place where it happens.

@lozhnikov
Copy link

lozhnikov commented Aug 26, 2020

@mrityunjay-tripathi I think I get it. Looks like my comment #16 (comment) was wrong. We need to pass model = true to the constructors (I mean Sequential, Concat, AddMerge). In this case all the visitors will go through their sublayers. And there will be no memory issues since the Delete visitor does the same.

@mrityunjay-tripathi
Copy link
Member Author

We need to pass model = true to the constructors (I mean Sequential, Concat, AddMerge).

Yeah. Got it. Thanks :)

@lozhnikov
Copy link

@mrityunjay-tripathi Is this PR ready? Can I review this?

@mrityunjay-tripathi
Copy link
Member Author

Yes. This is ready for review.

Copy link

@lozhnikov lozhnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slow response. The beginning of the term was hard. Now things settled a bit. I found a tiny flaw in the decoder implementation. I'll suggest the fix in the evening.

models/transformer/decoder.hpp Outdated Show resolved Hide resolved
models/transformer/decoder.hpp Outdated Show resolved Hide resolved
models/transformer/decoder.hpp Show resolved Hide resolved
Comment on lines 151 to 154
MultiheadAttention<>* mha1 = new MultiheadAttention<>(tgtSeqLen,
tgtSeqLen,
dModel,
numHeads);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the second argument be equal to srcSeqLen?

// This layer concatenates the output of the bottom decoder block (query)
// and the output of the encoder (key, value).
Concat<>* encDecAttnInput = new Concat<>(true);
encDecAttnInput->Add<Subview<>>(1, 0, dModel * tgtSeqLen - 1, 0, -1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect. It's the decoder bottom input. But the encoder-decoder attention block should receive the output of the decoder bottom.

Suggested change
encDecAttnInput->Add<Subview<>>(1, 0, dModel * tgtSeqLen - 1, 0, -1);
encDecAttnInput->Add(decoderBlockBottom);

// Residual connection.
AddMerge<>* residualAdd2 = new AddMerge<>(true);
residualAdd2->Add(encoderDecoderAttention);
residualAdd2->Add(decoderBlockBottom);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't pass the same block twice (see the comment to encDecAttnInput). Looks like we need to change the model a bit. I have to go now. I'll come up with the idea in the evening.

@mlpack-bot
Copy link

mlpack-bot bot commented Nov 9, 2020

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants