Skip to content
This repository has been archived by the owner on Jun 10, 2021. It is now read-only.

How to make the decoder only use the vocab from the input sequence? #508

Open
wolfshow opened this issue Jan 31, 2018 · 2 comments
Open

Comments

@wolfshow
Copy link

wolfshow commented Jan 31, 2018

For example, if the input sequence is "a b c d e", how can I generate an output sequence only based on a vocab of {a, b, c, d, e}? And, the target vocab also needs to be changed dynamically because each input sequence contains different words.

@shahbazsyed
Copy link
Contributor

You can try providing the same vocab file path in the tgt_vocab as the src_vocab in the train.lua command. It would be helpful if you provide some more description of your use case here.

@jsenellart
Copy link
Contributor

yes - can you describe a bit more your need? If you want to implement some hypothesis filtering, the entry point in the code is DecoderAdvancer:filter. You can also look at lexical_constraints option but is contraining the decoder to use some tokens (but it will not prevent using other tokens).

however, IMHO it is not necessarily a good idea to put a hard constraint in decoder - if you are training some reordering model, providing enough training examples should efficiently drive your decoder to only use source tokens.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

3 participants