Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Competitive Example" (2.5): Sequence-to-sequence #1391

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

gpengzhi
Copy link
Contributor

@gpengzhi gpengzhi commented Jun 5, 2018

Here is a comparison between Dynet and PyTorch on the seq2seq translator example used in PyTorch tutorial (https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html).

@gpengzhi
Copy link
Contributor Author

gpengzhi commented Jun 6, 2018

I think I find a bug in my code. I will fix it and submit the commit very soon.

@neubig
Copy link
Contributor

neubig commented Jun 15, 2018

Thanks! Is this fixed and ready for review?

@gpengzhi
Copy link
Contributor Author

@neubig Yes. The bug is fixed.

Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, thanks for contributing! I added a few comments.

Also, there are ".DS_store" files that need to be removed.


## Usage (Dynet)

The architecture of the dynet model `seq2seq_dynet.py` is the same as that in [PyTorch Example](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html). We here implement the attention mechanism in the model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynet -> Dynet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified.


The architecture of the dynet model is shown as follows.

```python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be included in README.md, because if we make changes to the actual code they might get out of sync. Let's keep the code in .py files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted the architecture part in README.md.


Then, run the training:

<pre>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using <pre> tags, can you just use 4 spaces of indent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified.


| Time (D) | Iteration (D) | Loss (D) | Time (P) | Iteration (P) | Loss (P)|
| --- | --- | --- | --- | --- | --- |
| 0m 28s | 5000 5% | 3.2687 | 1m 30s | 5000 5% | 2.8794 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loss of DyNet and PyTorch are quite different, even at the start. Maybe this is due to different initialization of the parameters? Could you try to match the parameter initialization strategies between the two toolkits and see if they come more in line? Because PyTorch is better, you could try to match the PyTorch initialization strategy with DyNet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.2687 and 2.8794 are actually the loss after 5000 iterations for both Dynet and Pytorch. I listed the loss at the start of the training procedure for both Dynet example and PyTorch example. The initial loss of DyNet is 7.9808, and the initial loss of PyTorch is 7.9615.

@pmichel31415
Copy link
Collaborator

Hi @gpengzhi , thanks for the PR! Do you think you'll have time to fix the lint errors?

@gpengzhi
Copy link
Contributor Author

gpengzhi commented Sep 7, 2018

Hi, @pmichel31415 I have already fixed the lint errors.

Copy link
Collaborator

@pmichel31415 pmichel31415 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great contribution @gpengzhi ! I have a few comments here and there can you take a look? The main points of contention are:

  • Big discrepancy in loss between dynet/pytorch, can you check that the hyperparameters are as close as possible in both cases (eg the learning rate is different)
  • Add docstrings so that people can navigate the code more easily if they want to reuse it
  • Change function names to snake_case

| 8m 12s | 85000 85% | 0.7621 | 21m 44s | 85000 85% | 0.5210 |
| 8m 41s | 90000 90% | 0.7453 | 22m 55s | 90000 90% | 0.5054 |
| 9m 10s | 95000 95% | 0.6795 | 24m 9s | 95000 95% | 0.4417 |
| 9m 39s | 100000 100% | 0.6442 | 25m 24s | 100000 100% | 0.4297 |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This big difference is still weird to me. Can we make these numbers similar so that dynet is competitive with pytorch?

For instance I've noticed that there are some differences in the optimization procedure (dynet uses learning rate 0.2 vs pytorch has learning rate 0.01 if I'm not mistaken).

Also do you have any intuition on why there is such a big speed difference? Do you think it is due to your setup or the differences in implementation?


Here is the comparison between Dynet and PyTorch on the [seq2seq translator example](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html).

The data we used is a set of many thousands of English to French translation pairs. Download the data from [here](https://download.pytorch.org/tutorial/data.zip) and extract it to the current directory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you credit the original source of the data like in the pytorch tutorial: https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#loading-data-files


Format:

<pre>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<pre> -> ```


> vous etes tellement belle dans cette robe !
= you re so beautiful in that dress .
< you re so beautiful in that dress . <EOS>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have the same examples for pytorch and dynet? For comparison.

EOS_token = 1


class Lang(object):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a small docstring for each class/method/function? So that it is easier for people who want to reuse the code?

self.index2word = {0: "SOS", 1: "EOS"}
self.n_words = 2

def addSentence(self, sentence):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use snake_case for function/method/variable names (but not class names) to be consistent with the dynet API. You should probably do it for the pytorch example as well.

EOS_token = 1


class Lang:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this class and all the other utilities that are not dependent on the framework in a separate file like utils.py? To avoid code duplication

import dynet as dy
import time
import math
r = random.SystemRandom()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix the random seed (for reproducibility). Please also fix the dynet/pytorch random seeds as well


for _ in range(n):
pair = r.choice(pairs)
print('>', pair[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very nitpicky comment but can you use " or ' consistently everywhere for strings? A simple search and replace should fix it. I'm partial to ".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants