Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DiverseBeamSearch so that no diversity groups will be dropped. #5069

Merged
merged 1 commit into from
Apr 12, 2023

Conversation

shuminghu
Copy link
Contributor

@shuminghu shuminghu commented Apr 10, 2023

DiverseBeamSearch result is noticed to be not diverse.
a) In DiverseBeamSearch (search.py), we are trying to enforce distance among different groups and each group contains 2 x group_beam_size candidates (group_beam_size = total beam_size / num_groups).
b) However, in during sequence generation (sequence_generator.py), we are selecting final top beam_size tokens among all group all candidates. Basically it is not aware of groups used in DiverseBeamSearch.
c) We iterate a) and b) at each step. This could lead to finally all groups converge to descendent candidates from the same group. This tend to happen as the original first group is the one not receiving any diversity penalty and high in fluency score.

This patch includes two changes:

  1. Chooses cumulative diversity. It also offers a way to interpolate between cumulative diversity and Hamming diversity through diversity_discount1.
  2. Limits search num of candidates as a workaround for candidate selection bug (dropping diversity groups)

The additional bookkeeping needed for cumulative diversity is estimated to incur 5% latency overhead.
This is measured on a BART-base model with batch_size=9 and num_groups=12 on V100.

Footnotes

  1. Diversity function illustration:
    A) I like dogs.
    B) I like ____.
    C) There are ___.
    Assuming each word is a token and we are at step=2, trying to fill in the blank:
    Current/Hamming diversity:
    Penalty for B from A is 1 for "dogs" and 0 for any other like "cats".
    Penalty for C from A is 1 for "dogs" and 0 for any other like "cats".
    Cumulative diversity:
    Penalty for B from A is 3 for "dogs" and 0 for any other like "cats".
    Penalty for C from A is 1 for "dogs" and 0 for any other like "cats".
    B and C differ because B matches with A for "I" and "like" at respective steps incurring 2 cumulative penalty.
    Using divesrity_discount to interpolate between these two:
    If diverstiy_discount = 0.5, then
    Penalty for B from A is 1.75 (1 + 0.5 + 0.25) for "dogs" and 0 for any other words like "cats".
    Penalty for C from A is 1 for "dogs" and 0 for any other words like "cats".
    "I" and "like" matched for B and A at step 0 and 1 respectively. Since "I" is two steps away and "like" is one step away, they are discounted by (0.5)^2 and 0.5 respectively.
    When diversity_discount = 0, we recover Hamming diversity and when diversity_discount = 1, we recover cumulative diversity.

@cbalioglu cbalioglu merged commit 3f6ba43 into facebookresearch:main Apr 12, 2023
2 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants