Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backtracking in beam search #151

Open
shubhamagarwal92 opened this issue Jun 30, 2018 · 6 comments
Open

Backtracking in beam search #151

shubhamagarwal92 opened this issue Jun 30, 2018 · 6 comments

Comments

@shubhamagarwal92
Copy link

Compared to OpenNMT, why do we need this block which handles the dropped sequences that see EOS earlier. (This is not there in their beam search implementation.) They are also doing a similar process: not letting the EOS have children here. However they have this end condition when EOS is at the top. They construct back the hypothesis using get_hyp function.

More specifically, can you explain elaborately what we are doing here.

            #       1. If there is survived sequences in the return variables, replace
            #       the one with the lowest survived sequence score with the new ended
            #       sequences
            #       2. Otherwise, replace the ended sequence with the lowest sequence
            #       score with the new ended sequence

I understand why we need to handle EOS sequences since we have their information in backtracking variables. But why do we need to "replace the one with the lowest survived sequence score with the new ended sequences"? AFAIK, this res_k_idx is tracking which beam (from the end) can we replace the information (the two conditions specified in the comments). However, we are not replacing the contents of the beam which got EOS, i.e:

t_predecessors[res_idx] = predecessors[t][idx[0]]
t_predecessors[idx] = ??

I understand that after this process all the beams remain static and we use index_select at each step to select the top beams.

Also, the unit test for top_k_decoder is not deterministic. Fails when batch_size>2 and also sometimes when batch_size==2.

@pskrunner14
Copy link

@shubhamagarwal92 thanks for pointing this out. I'll check their implementation and see what's different. Working on the test to make it more deterministic. Will test more beam sizes too.

@Mehrad0711
Copy link

Hi,
Are there any updates for this issue?
I've implemented a similar beam search strategy which uses the _backtrack(...) function from this repo but even with a beam_size of 1, I get worse results than greedy decoding.
Would be really helpful if you can double-check the implementation. Thanks.

@GZJAS
Copy link

GZJAS commented Jun 11, 2019

I studied the codes these days, and I thought you can use the torch.repeat_interleave. Such as follow:
hidden = tuple([torch.repeat_interleave(h, self.k, dim=1) for h in encoder_hidden])
inflated_encoder_outputs = torch.repeat_interleave(encoder_outputs, self.k, dim=0)

@shubhamagarwal92
Copy link
Author

@Mehrad0711 maybe you can try and integrate BS from allennlp; their implementation here

@pskrunner14
Copy link

Hey @Mehrad0711 @shubhamagarwal92 sorry haven't gotten the time to work on this yet. You're welcome to submit a PR :)

@ghost
Copy link

ghost commented May 7, 2021

Any update on this issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants