How to select the 5000/1500 words when building the dictionaries? #24
Comments
Hello, In the supervised approach, we generated translations for all words from the source language to the target language, and vice-versa (a translation being a pair (x, y) associated with the probability for y of being the correct translation of x). Then, we considered all pairs of words (x, y) such that y has a high probability of being a translation of x, but also that x has a high probability of being a translation of y. Then, we sorted all generated translation pairs by frequency of the source word, and took the 5000 first resulting pairs for training, and the 1500 following ones for testing. The initial selection pair most likely has an impact on the alignment performance, but we did not study this extensively. But we noticed that based on how we were selecting the pairs, the results in the supervised setting were different. In particular, when we were selecting pairs for which there was very little ambiguity / no multiple possible translations, then the translation accuracy was better, but note that the test set was also not the same, and maybe the difference of test pairs alone was enough to explain the differences. Previous works have shown that using more than 5000 pairs of words does not improve the performance (Artetxe et al., 2017), and can even be detrimental (see Dinu et al., 2015). This is why we decided to consider 5000 pairs only (also because we wanted to be consistent with previous works). |
@glample thank you for providing more insights! Also Congratulations on the acceptance of the paper! |
Thank you :) |
@glample Can I ask another question? Why pre-defined dictionary is only used in the first iteration of supervised training? Can we use the pre-defined dictionary rather than build from the embedding in the following iterations? I tried supervised training for several language pairs. In some cases, I observed that the precision@k metric actually drops over iterations (starting from the second iteration). In particular, the number of translation pairs Does that mean the Procrustes can make the alignment worse? Have you experienced this kind of "convergence" problem in your experiments? Any suggestion on changing the parameters (e.g., number of iterations, dico_threshold, dico_max_rank, etc.)? Thanks in advance! |
We sometimes observed that the iterations at step t >= 2 were not as good as the initial one, but this was only for languages where embeddings are difficult to align like en-ru or en-zh. For pairs composed of European languages we did not observed anything like this. |
@glample Thanks for the reply. Again, great insights! |
Hi, I was wondering how the 5000+ pairs and 1500+ pairs were selected to build the training/testing dictionary? As the full dictionary can contain 100K+ pairs, do we take just the top frequent words? I understand the pre-defined dictionary is only used in the first iteration of supervised training, but how much will the initial selection of translation pairs affect the alignment performance? Another question is that why selecting 5000? Will it help when including more translation pairs in the training dictionary? Thanks in advance!
The text was updated successfully, but these errors were encountered: