How to select the 5000/1500 words when building the dictionaries? #24

fallingstar621 · 2018-02-09T22:07:03Z

Hi, I was wondering how the 5000+ pairs and 1500+ pairs were selected to build the training/testing dictionary? As the full dictionary can contain 100K+ pairs, do we take just the top frequent words? I understand the pre-defined dictionary is only used in the first iteration of supervised training, but how much will the initial selection of translation pairs affect the alignment performance? Another question is that why selecting 5000? Will it help when including more translation pairs in the training dictionary? Thanks in advance!

glample · 2018-02-10T12:39:43Z

Hello,

In the supervised approach, we generated translations for all words from the source language to the target language, and vice-versa (a translation being a pair (x, y) associated with the probability for y of being the correct translation of x). Then, we considered all pairs of words (x, y) such that y has a high probability of being a translation of x, but also that x has a high probability of being a translation of y. Then, we sorted all generated translation pairs by frequency of the source word, and took the 5000 first resulting pairs for training, and the 1500 following ones for testing.

The initial selection pair most likely has an impact on the alignment performance, but we did not study this extensively. But we noticed that based on how we were selecting the pairs, the results in the supervised setting were different. In particular, when we were selecting pairs for which there was very little ambiguity / no multiple possible translations, then the translation accuracy was better, but note that the test set was also not the same, and maybe the difference of test pairs alone was enough to explain the differences.

Previous works have shown that using more than 5000 pairs of words does not improve the performance (Artetxe et al., 2017), and can even be detrimental (see Dinu et al., 2015). This is why we decided to consider 5000 pairs only (also because we wanted to be consistent with previous works).

fallingstar621 · 2018-02-12T18:37:02Z

@glample thank you for providing more insights! Also Congratulations on the acceptance of the paper!

glample · 2018-02-12T23:05:14Z

Thank you :)

fallingstar621 · 2018-02-15T00:48:28Z

@glample Can I ask another question? Why pre-defined dictionary is only used in the first iteration of supervised training? Can we use the pre-defined dictionary rather than build from the embedding in the following iterations? I tried supervised training for several language pairs. In some cases, I observed that the precision@k metric actually drops over iterations (starting from the second iteration). In particular, the number of translation pairs Does that mean the Procrustes can make the alignment worse? Have you experienced this kind of "convergence" problem in your experiments? Any suggestion on changing the parameters (e.g., number of iterations, dico_threshold, dico_max_rank, etc.)? Thanks in advance!

glample · 2018-02-15T09:20:39Z

Can we use the pre-defined dictionary rather than build from the embedding in the following iterations? Do you mean it is possible to use the pre-defined dictionary in addition to the dictionary generated by the alignement, or instead of the generated dictionary? Currently we use the generated dictionary for the next iteration, and totally discard the pre-defined dictionary. But it is true that you could probably use a combination of both and make the supervised + refinement model even stronger.

We sometimes observed that the iterations at step t >= 2 were not as good as the initial one, but this was only for languages where embeddings are difficult to align like en-ru or en-zh. For pairs composed of European languages we did not observed anything like this.

fallingstar621 · 2018-02-15T19:03:37Z

@glample Thanks for the reply. Again, great insights!

glample closed this as completed Feb 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to select the 5000/1500 words when building the dictionaries? #24

How to select the 5000/1500 words when building the dictionaries? #24

fallingstar621 commented Feb 9, 2018 •

edited

glample commented Feb 10, 2018

fallingstar621 commented Feb 12, 2018

glample commented Feb 12, 2018

fallingstar621 commented Feb 15, 2018 •

edited

glample commented Feb 15, 2018

fallingstar621 commented Feb 15, 2018

How to select the 5000/1500 words when building the dictionaries? #24

How to select the 5000/1500 words when building the dictionaries? #24

Comments

fallingstar621 commented Feb 9, 2018 • edited

glample commented Feb 10, 2018

fallingstar621 commented Feb 12, 2018

glample commented Feb 12, 2018

fallingstar621 commented Feb 15, 2018 • edited

glample commented Feb 15, 2018

fallingstar621 commented Feb 15, 2018

fallingstar621 commented Feb 9, 2018 •

edited

fallingstar621 commented Feb 15, 2018 •

edited