Skip to content

Releases: NVIDIA-Merlin/Transformers4Rec

v23.12.00

11 Jan 14:03
d0cce61
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v23.08.00...v23.12.00

v23.08.00: adding unit test for end-to-end example (#669)

29 Aug 16:28
348c963
Compare
Choose a tag to compare
* adding unit test for multi-gpu example

* added test for notebook 03

* fixed formatting

* update

* update

* Update 01-ETL-with-NVTabular.ipynb

day of week is between 0 and 6; it must be scaled with a max value of 6 to produce correct values from the 0-1 range. If we do col+1 and scale with 7, then a section of the 0-2pi range (for Sine purposes) will not be represented.

* Update 01-ETL-with-NVTabular.ipynb

Reversed the previous edit for weekday scaling. It is correct that it should be scaled between 0-7, because day 0 (unused/nonapplicable after +1 added) overlaps with day 7 for Sine purposes. Monday should scale to 1/7, Sunday should scale to 7/7 to achieve even distribution of days along the sinus curve.

* reduce num_rows

* Update test_end_to_end_session_based.py

* Update 01-ETL-with-NVTabular.ipynb

* updated test script and notebook

* updated file

* removed nb3 test due to multi-gpu freezing issue

* revised notebooks, added back nb3 test

* fixed test file with black

* update test py

* update test py

* Use `python -m torch.distributed.run` instead of `torchrun`

The `torchrun` script installed in the system is a python script with
a shebang line starting with `#!/usr/bin/python3`

This picks up the wrong version of python when running in a virtualenv
like our tox test environment.

If instead this were `#!/usr/bin/env python3` it would work ok in a
tox environment to call `torchrun`.

However, until either the pytorch package is updated for this to
happen or we update our CI image for this to take place. Running the
python command directly is more reliable.

---------

Co-authored-by: rnyak <ronayak@hotmail.com>
Co-authored-by: edknv <109497216+edknv@users.noreply.github.com>
Co-authored-by: rnyak <16246900+rnyak@users.noreply.github.com>
Co-authored-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>

v23.06.00

22 Jun 21:00
b14d07e
Compare
Choose a tag to compare
Update merlin dependency versions to match 23.06 release (#724)

v23.05.00

31 May 14:40
e5fa050
Compare
Choose a tag to compare

What’s Changed

🐜 Bug Fixes

  • Fixing the projection layer when using weight tying and dim from Transformer output and item embedding differs @gabrielspmoreira (#689)

🚀 Features

  • Fix the randomness in the stochastic_swap_noise tests @sararb (#707)

📄 Documentation

🔧 Maintenance

v23.04.00

26 Apr 21:17
Compare
Choose a tag to compare

What’s Changed

🐜 Bug Fixes

  • Update multi-gpu notebook to set cupy device @edknv (#675)
  • Fix bug in get_output_sizes_from_schema with core-schema @marcromeyn (#663)
  • Remove torch.squeeze() step from the model's forward method. @sararb (#659)
  • Set device in dataloaders @edknv (#654)
  • Fix the predictions returned by Trainer.predict(..) @sararb (#641)

🚀 Features

📄 Documentation

  • extend getting started serving example to serve NVT and TF4Rec model together @rnyak (#670)
  • Cropped the table and addressed comments from previous PR @nzarif (#510)
  • Update example notebooks to create schema object from merlin core @rnyak (#650)
  • replace NVTabulardataloader with Merlindataloader @rnyak (#644)

🔧 Maintenance

  • Update multi-gpu notebook to set cupy device @edknv (#675)
  • add concurrency setting to stop tests when new commits get pushed to a PR @nv-alaiacano (#673)
  • Switch to using 2 GPU action runners for multi-GPU testing @karlhigley (#665)
  • Add workflow to check if base branch of pull request is development @oliverholworthy (#656)
  • fix the ci script for new unit tests setup @jperez999 (#658)
  • Add unit test for serving torchscript model example notebook @rnyak (#657)
  • Separate notebook tests into their own tox environment @nv-alaiacano (#653)
  • Update usage of use_amp to use_cuda_amp for transformers>=4.20 @oliverholworthy (#627)
  • Update example notebooks to create schema object from merlin core @rnyak (#650)
  • Update padding of ragged features to enable dataloader change @oliverholworthy (#647)
  • fix model output_schema dims for BC/Regression task case @rnyak (#646)
  • fix test_remove_consecutive_interactions unit test @rnyak (#643)
  • replace NVTabulardataloader with Merlindataloader @rnyak (#644)
  • Cleanup shapes in model.input_schema and output_schema @rnyak (#628)
  • Migrate schema Tags to merlin.schema.Tags @nv-alaiacano (#632)
  • Clean up imports in tests @marcromeyn (#626)

v23.02.00

08 Mar 16:37
Compare
Choose a tag to compare

What's Changed

🐜 Bug Fixes

  • Adjust serving notebook to account for underlying shape changes @karlhigley (#631)

🚀 Features

  • Add docstrings and the parameter to row_groups_per_part to the MerlinDataLoader class @sararb (#590)
  • Simplify getting-started ETL and fix serving with torch script notebook @rnyak (#604)

📄 Documentation

🔧 Maintenance

  • fix assert error in the test_soft_embedding unit test @rnyak (#595)
  • Small fixes in getting-started ETL and training notebooks and fix tuple error in serving notebook @rnyak (#586)
  • Fetch release branches so that we can figure out the release branch @oliverholworthy (#609)
  • Add Jenkinsfile @AyodeAwe (#537)
  • Change data_loader_engine to 'merlin' in examples @edknv (#580)
  • adding workflow for gpu ci on gha runner @jperez999 (#585)

New Contributors

Full Changelog: v0.1.16...v23.02.00

v0.1.16

03 Feb 19:26
b83d218
Compare
Choose a tag to compare

Highlights

1. Standardize the ModelOutput API:

  • Remove ambiguous flags: ignore_masking and hf_format: #543
  • Introduce the testing flag to differentiate between evaluation (=True) and inference (=False) modes: #543
  • All prediction tasks return the same output
    #546
    • During training and evaluation: the output is a dictionary with three elements: {"loss":torch.tensor, "labels": torch.tensor, "predictions": torch.tensor}
    • During inference: The output is the tensor of predictions.

2. Extend the Trainer class to support all prediction tasks:

#564

  • The trainer class is now accepting a T4Rec model defined with binary or regression tasks.
  • Remove the HFWrapper class as the Trainer is now supporting the base T4Rec Model class.
  • Set the default of the trainer's argument predict_top_k to 0 instead of 10.
    • Note that getting the top-k predictions is specific to NextItemPredictionTask and the user should explicitly set the parameter in the T4RecTrainingArguments object. If not specified, the method Trainer.predict() returns unsorted predictions for the whole item catalog.
  • Support multi-task learning in the Trainer class: it accepts any T4Rec model defined with multiple tasks and/or multiple heads.

3. Fix the inference performance of the Transformer-based model trained with masked language modeling (MLM):

#551

  • At inference, the input sequence is extended by a [MASK] embedding after the last non-padded position to take into account the target position. The hidden representation of the [MASK] position is used to get the next-item prediction scores.
  • With this fix, the user doesn't need to add a dummy position to the input test data when calling Trainer.predict() or model(test_batch, training=False, testing=False)

4. Update Transformers4Rec to use the new merlin-dataloader package: #547

  • The NVTabularDataLoader is renamed to MerlinDataLoader to use the loader from merlin-dataloader package.
  • User can specify the argument data_loader_engine=‘merlin’ in the T4RecTrainingArguments object to use the merlin dataloader. It supports GPU and CPU environments. The alias nvtabular is also kept to ensure backward compatibility.

What’s Changed

⚠ Breaking Changes

  • Extend trainer class to support all T4Rec prediction tasks @sararb (#564)
  • Standardize prediction tasks' outputs @nzarif (#546)
  • Uses merlin-dataloader package @edknv (#547)
  • Refactoring part1- flags modification @nzarif (#543)

🐜 Bug Fixes

  • Fix error raised by latest Torchmetrics (0.11.0) @sararb (#576)
  • Fix the test data path in Trainer.predict() @sararb (#571)
  • Fix discrepancy between evaluation and inference modes @sararb (#551)

🚀 Features

  • Support to pre-trained embeddings initializer (trainable or not) @gabrielspmoreira (#572)
  • Extend trainer class to support all T4Rec prediction tasks @sararb (#564)
  • Standardize prediction tasks' outputs @nzarif (#546)
  • Add music-streaming synthetic data to test the support of all predictions tasks with the Trainer class @sararb (#540)
  • Refactoring part1- flags modification @nzarif (#543)

📄 Documentation

🔧 Maintenance

  • Update mypy version from 0.971 to 0.991 @oliverholworthy (#574)
  • Uses merlin-dataloader package @edknv (#547)
  • fix drafter and update cpu ci to run on targeted branch @jperez999 (#549)
  • Add lint workflow to run pre-commit on all files @oliverholworthy (#545)
  • Specify packages to look for in setup.py to avoid publishing tests @oliverholworthy (#529)
  • Cleanup tensorflow dependencies @oliverholworthy (#530)
  • Add docs requirements to extras list in setup.py (#533)
  • Remove stale documentation reviews (#531)
  • Update branch name extraction for tag builds (#608)
  • run github action tests and lint via tox, with upstream deps installed (#527)

v0.1.15

22 Nov 19:27
Compare
Choose a tag to compare

What’s Changed

🐜 Bug Fixes

  • Fix failing ci error related to sparse_names containing features that are not part of the model's schema @sararb (#541)
  • Fix dtype mismatch in CLM masking class due to new data loader changes @sararb (#539)
  • Fix CI test based on the requirements of the new merlin loader @sararb (#536)
  • quick fix: apply masking when training next item prediction @nzarif (#514)

🚀 Features

  • Add save/load & input/output schema methods to T4Rec Model class @sararb (#507)

📄 Documentation

  • Add multi-gpu training example for T4Rec PyTorch @bbozkaya (#521)

🔧 Maintenance

  • Fix failing ci error related to sparse_names containing features that are not part of the model's schema @sararb (#541)
  • Fix CI test based on the requirements of the new merlin loader @sararb (#536)
  • Specify output dtype for Normalize op in ETL example to match model expectations @oliverholworthy (#523)
  • Fix name and bug in MeanReciprocalRankAt @rnyak (#522)
  • Update mypy version to match version in pre-commit-config @oliverholworthy (#517)

v0.1.14: Multi-GPU training with DP and DDP documentation (#503)

24 Oct 18:15
4f23a8b
Compare
Choose a tag to compare

What’s Changed

🚀 Features

  • Set ignore_masking to True by default @sararb (#498)
  • [feature]Multi-GPU DistributedDataParallel Fixed @nzarif (#496)

📄 Documentation

  • Multi-GPU training with DP and DDP documentation @nzarif (#503)

v0.1.13

26 Sep 18:06
bcc9392
Compare
Choose a tag to compare

What’s Changed

🐜 Bug Fixes

  • [BUG] trainer.model.module renamed and DataParallel mode fixed @nzarif (#483)

🔧 Maintenance