Skip to content

Releases: kdexd/virtex

Drop Python 3.6 support, update docs theme.

09 Jan 21:44
2baba8a
Compare
Choose a tag to compare

Major changes

  • Python 3.6 support is dropped, the minimum requirement is Python 3.8. All major library versions are bumped to the latest releases (PyTorch, OpenCV, Albumentations, etc.).
  • Model zoo URLs are changed to Dropbox. All pre-trained checkpoint weights are unchanged.
  • There was a spike in training loss when resuming training with pretrain_virtex.py, it is fixed now.
  • Documentation theme is changed from alabaster to read the docs, looks fancier!

Fix beam search bug, add nucleus sampling support.

15 Jul 21:44
Compare
Choose a tag to compare

Bug Fix: Beam Search

The beam search implementation adapted from AllenNLP was more suited for LSTM/GRU (recurrent models), less for transformers (autoregressive models).
This version removes the "backpointer" trick from AllenNLP implementation and improves captioning results for all VirTex models. See below, "Old" metrics are v1.1 (ArXiv v2) and "New" metrics are v1.2 (ArXiv v3).

image

This bug does not affect pre-training or other downstream task results. Thanks to Nicolas Carion (@alcinos) and Aishwarya Kamath (@ashkamath) for spotting this issue and helping me to fix it!

Feature: Nucleus Sampling

This codebase now supports decoding through Nucleus Sampling, as introduced in The Curious Case of Neural Text Degeneration. Try running captioning evaluation script with --config-override MODEL.DECODER.NAME nucleus_sampling MODEL.DECODER.NUCLEUS_SIZE 0.9! To have consistent behavior with prior versions, the default decoding method is Beam Search with 5 beams.

Note: Nucleus sampling would give worse results specifically on COCO Captions, but will produce more interesting sounding language with larger transformers trained on much more data than COCO Captions.

New config arguments to support this:

MODEL:
  DECODER:
    # What algorithm to use for decoding. Supported values: {"beam_search",
    # "nucleus_sampling"}.
    NAME: "beam_search"

    # Number of beams to decode (1 = greedy decoding). Ignored when decoding
    # through nucleus sampling.
    BEAM_SIZE: 5

    # Size of nucleus for sampling predictions. Ignored when decoding through
    # beam search.
    NUCLEUS_SIZE: 0.9

    # Maximum length of decoded caption. Decoding may end earlier when [EOS]
    # token is sampled.
    MAX_DECODING_STEPS: 50  # Same as DATA.MAX_CAPTION_LENGTH

Remove obsolete modules and rename config parameters.

04 Apr 12:35
Compare
Choose a tag to compare

This version is a small increment over v1.0 with only cosmetic changes and obsolete code removals. The final results of models rained from this codebase would remain unchanged.

Removed feature extraction support:

  • Removed virtex.downstream.FeatureExtractor and its usage in scripts/clf_voc07.py. By default, the script will only evaluate on global average pooled features (2048-d), as with the CVPR 2021 paper version.

  • Removed virtex.modules.visual_backbones.BlindVisualBackbone. I introduced it a long time ago for debugging, it is not much useful anymore.

Two config-related changes:

  1. Renamed config parameters: (OPTIM.USE_LOOKAHEAD —> OPTIM.LOOKAHEAD.USE), (OPTIM.LOOKAHEAD_ALPHA —> OPTIM.LOOKAHEAD_ALPHA) and (OPTIM.LOOKAHEAD_STEPS —> OPTIM.LOOKAHEAD.STEPS).

  2. Renamed TransformerTextualHead to TransformerDecoderTextualHead for clarity. Model names in config also change accordingly: "transformer_postnorm" —> "transdec_postnorm" (same for prenorm).

These changes may be breaking if you wrote your own config and explicitly added these arguments.

CVPR 2021 release

07 Mar 11:13
Compare
Choose a tag to compare

CVPR 2021 release of VirTex.
Code and pre-trained models can reproduce results according
to the paper: https://arxiv.org/abs/2006.06666v2