Fix DiverseBeamSearch so that no diversity groups will be dropped. #5069

shuminghu · 2023-04-10T21:47:49Z

DiverseBeamSearch result is noticed to be not diverse.
a) In DiverseBeamSearch (search.py), we are trying to enforce distance among different groups and each group contains 2 x group_beam_size candidates (group_beam_size = total beam_size / num_groups).
b) However, in during sequence generation (sequence_generator.py), we are selecting final top beam_size tokens among all group all candidates. Basically it is not aware of groups used in DiverseBeamSearch.
c) We iterate a) and b) at each step. This could lead to finally all groups converge to descendent candidates from the same group. This tend to happen as the original first group is the one not receiving any diversity penalty and high in fluency score.

This patch includes two changes:

Chooses cumulative diversity. It also offers a way to interpolate between cumulative diversity and Hamming diversity through diversity_discount¹.
Limits search num of candidates as a workaround for candidate selection bug (dropping diversity groups)

The additional bookkeeping needed for cumulative diversity is estimated to incur 5% latency overhead.
This is measured on a BART-base model with batch_size=9 and num_groups=12 on V100.

Diversity function illustration:
A) I like dogs.
B) I like ____.
C) There are ___.
Assuming each word is a token and we are at step=2, trying to fill in the blank:
Current/Hamming diversity:
Penalty for B from A is 1 for "dogs" and 0 for any other like "cats".
Penalty for C from A is 1 for "dogs" and 0 for any other like "cats".
Cumulative diversity:
Penalty for B from A is 3 for "dogs" and 0 for any other like "cats".
Penalty for C from A is 1 for "dogs" and 0 for any other like "cats".
B and C differ because B matches with A for "I" and "like" at respective steps incurring 2 cumulative penalty.
Using divesrity_discount to interpolate between these two:
If diverstiy_discount = 0.5, then
Penalty for B from A is 1.75 (1 + 0.5 + 0.25) for "dogs" and 0 for any other words like "cats".
Penalty for C from A is 1 for "dogs" and 0 for any other words like "cats".
"I" and "like" matched for B and A at step 0 and 1 respectively. Since "I" is two steps away and "like" is one step away, they are discounted by (0.5)^2 and 0.5 respectively.
When diversity_discount = 0, we recover Hamming diversity and when diversity_discount = 1, we recover cumulative diversity. ↩

Fix DiverseBeamSearch so that no diversity groups will be dropped.

121ff55

facebook-github-bot added the CLA Signed label Apr 10, 2023

cbalioglu merged commit 3f6ba43 into facebookresearch:main Apr 12, 2023
2 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DiverseBeamSearch so that no diversity groups will be dropped. #5069

Fix DiverseBeamSearch so that no diversity groups will be dropped. #5069

shuminghu commented Apr 10, 2023 •

edited

Fix DiverseBeamSearch so that no diversity groups will be dropped. #5069

Fix DiverseBeamSearch so that no diversity groups will be dropped. #5069

Conversation

shuminghu commented Apr 10, 2023 • edited

Footnotes

shuminghu commented Apr 10, 2023 •

edited