Modified factor decoding for automatic dubbing #1082
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The main addition in this PR is modified decoding with specific kinds of target factors. This is designed for automatic dubbing models (https://iwslt.org/2023/dubbing), where we are dealing with numeric factors which are calculated on the basis of other numeric target factors. There are 4 kinds of factors that have been implemented:
target_segment_durations
field in JSON inputs.[pause]
tokens remaining.These are specified with
sockeye-translate --force-factors-stepwise
, e.g.--force-factors-stepwise none frames total_remaining none none
will mean that the second and third target factors will be calculated according to the rules corresponding to frames and total frames remaining, while the rest of the target factors are unaffected.I've also added an option to use sinusoidal embeddings instead of randomly initialized embeddings for numeric target factors.
These features also require use of specially created vocabs, for which I've added
sockeye_contrib/create_seq_vocab.py
and help messages.Changes should be fully backwards-compatible -- all tests pass.
Pending internal discussion; cc: @thompsonb
Pull Request Checklist
until you can check this box.
pytest
)pytest test/system
)./style-check.sh
)sockeye/__init__.py
. Major version bump if this is a backwards incompatible change.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.