How to make the decoder only use the vocab from the input sequence? #508

wolfshow · 2018-01-31T15:49:20Z

For example, if the input sequence is "a b c d e", how can I generate an output sequence only based on a vocab of {a, b, c, d, e}? And, the target vocab also needs to be changed dynamically because each input sequence contains different words.

shahbazsyed · 2018-01-31T17:02:12Z

You can try providing the same vocab file path in the tgt_vocab as the src_vocab in the train.lua command. It would be helpful if you provide some more description of your use case here.

jsenellart · 2018-02-01T08:31:45Z

yes - can you describe a bit more your need? If you want to implement some hypothesis filtering, the entry point in the code is DecoderAdvancer:filter. You can also look at lexical_constraints option but is contraining the decoder to use some tokens (but it will not prevent using other tokens).

however, IMHO it is not necessarily a good idea to put a hard constraint in decoder - if you are training some reordering model, providing enough training examples should efficiently drive your decoder to only use source tokens.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to make the decoder only use the vocab from the input sequence? #508

How to make the decoder only use the vocab from the input sequence? #508

wolfshow commented Jan 31, 2018 •

edited

shahbazsyed commented Jan 31, 2018

jsenellart commented Feb 1, 2018

How to make the decoder only use the vocab from the input sequence? #508

How to make the decoder only use the vocab from the input sequence? #508

Comments

wolfshow commented Jan 31, 2018 • edited

shahbazsyed commented Jan 31, 2018

jsenellart commented Feb 1, 2018

wolfshow commented Jan 31, 2018 •

edited