Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump horovod from 0.22.1 to 0.24.0 in /JABER-PyTorch #232

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Dec 25, 2022

Bumps horovod from 0.22.1 to 0.24.0.

Release notes

Sourced from horovod's releases.

Elastic mode improvements, MXNet async dependency engine, fixes for latest PyTorch and TensorFlow versions

Added

  • Ray: Added elastic keyword parameters to RayExecutor API: This API supports both static (non-elastic) and elastic Horovod jobs. (#3190)
  • TensorFlow: Added in-place broadcasting of variables. (#3128)
  • Elastic: Added support for resurrecting blacklisted hosts. (#3319)
  • MXNet: Added support for MXNet async dependency engine. (#3242, #2963)
  • Spark/Lightning: Added history to lightning estimator. (#3214)

Changed

  • Moved to CMake version 3.13 with first-class CUDA language support and re-enabled parallelized builds. Uses a temporary installation of CMake if CMake 3.13 is not found. (#3261, #3371)
  • Moved released Docker image horovod and horovod-cpu to Ubuntu 20.04 and Python 3.8. (#3393)
  • Spark Estimator: Don't shuffle row groups if training data requires non-shuffle (#3369)
  • Spark/Lightning: Reduced memory footprint of async dataloader. (#3239)
  • Elastic: Improved handling NCCL errors under elastic scenario. (#3112)
  • Spark/Lightning: Do not overwrite model with checkpoint by default. (#3201)
  • Make checkpoint name optional so that user can save to h5 format. (#3411)

Deprecated

  • Deprecated ElasticRayExecutor APIs in favor of the new RayExecutor API. (#3190)

Removed

  • Spark: Removed h5py<3 constraint as this is not needed anymore for Tensorflow >2.5.0. (#3301)

Fixed

  • Elastic Spark: Fixed indices in initial task-to-task registration. (#3410)
  • PyTorch: Fixed GIL-related deadlock with PyTorch 1.10.1. (#3352)
  • PyTorch: Fixed finalization of ProcessSetTable. (#3351)
  • Fixed remote trainers to point to the correct shared lib path. (#3258)
  • Fixed imports from tensorflow.python.keras with tensorflow 2.6.0+. (#3403)
  • Fixed Adasum communicator init logic. (#3379)
  • Lightning: Fixed resume logger. (#3375)
  • Fixed the checkpoint directory structure for pytorch and pytorch lightning. (#3362)
  • Fixed possible integer overflow in multiplication. (#3368)
  • Fixed the pytorch_lightning_mnist.py example. (#3245, #3290)
  • Fixed barrier segmentation fault. (#3313)
  • Fixed hvd.barrier() tensor queue management. (#3300)
  • Fixed PyArrow "list index out of range" IndexError. (#3274)
  • Elastic: Fixed all workers sometimes failing on elastic Horovod failure. (#3264)
  • Spark/Lightning: Fixed setting limit_train_batches and limit_val_batches. (#3237)
  • Elastic: Fixed ElasticSampler and hvd.elastic.state losing some indices of processed samples when nodes dropped. (#3143)
  • Spark/Lightning: Fixed history metrics for estimator serialization. (#3216)
  • Ray: Fixed RayExecutor to fail when num_workers=0 and num_hosts=None. (#3210)
  • Spark/Lightning: Fixed checkpoint callback dirpath typo. (#3204)

Process sets, XLA support, improved GPU backend

... (truncated)

Changelog

Sourced from horovod's changelog.

[v0.24.0] - 2022-03-01

Added

  • Ray: Added elastic keyword parameters to RayExecutor API: This API supports both static (non-elastic) and elastic Horovod jobs. (#3190)
  • TensorFlow: Added in-place broadcasting of variables. (#3128)
  • Elastic: Added support for resurrecting blacklisted hosts. (#3319)
  • MXNet: Added support for MXNet async dependency engine. (#3242, #2963)
  • Spark/Lightning: Added history to lightning estimator. (#3214)

Changed

  • Moved to CMake version 3.13 with first-class CUDA language support and re-enabled parallelized builds. Uses a temporary installation of CMake if CMake 3.13 is not found. (#3261, #3371)
  • Moved released Docker image horovod and horovod-cpu to Ubuntu 20.04 and Python 3.8. (#3393)
  • Spark Estimator: Don't shuffle row groups if training data requires non-shuffle (#3369)
  • Spark/Lightning: Reduced memory footprint of async dataloader. (#3239)
  • Elastic: Improved handling NCCL errors under elastic scenario. (#3112)
  • Spark/Lightning: Do not overwrite model with checkpoint by default. (#3201)
  • Make checkpoint name optional so that user can save to h5 format. (#3411)

Deprecated

  • Deprecated ElasticRayExecutor APIs in favor of the new RayExecutor API. (#3190)

Removed

  • Spark: Removed h5py<3 constraint as this is not needed anymore for Tensorflow >2.5.0. (#3301)

Fixed

  • Elastic Spark: Fixed indices in initial task-to-task registration. (#3410)
  • PyTorch: Fixed GIL-related deadlock with PyTorch 1.10.1. (#3352)
  • PyTorch: Fixed finalization of ProcessSetTable. (#3351)
  • Fixed remote trainers to point to the correct shared lib path. (#3258)
  • Fixed imports from tensorflow.python.keras with tensorflow 2.6.0+. (#3403)
  • Fixed Adasum communicator init logic. (#3379)
  • Lightning: Fixed resume logger. (#3375)
  • Fixed the checkpoint directory structure for pytorch and pytorch lightning. (#3362)
  • Fixed possible integer overflow in multiplication. (#3368)
  • Fixed the pytorch_lightning_mnist.py example. (#3245, #3290)
  • Fixed barrier segmentation fault. (#3313)
  • Fixed hvd.barrier() tensor queue management. (#3300)
  • Fixed PyArrow "list index out of range" IndexError. (#3274)
  • Elastic: Fixed all workers sometimes failing on elastic Horovod failure. (#3264)
  • Spark/Lightning: Fixed setting limit_train_batches and limit_val_batches. (#3237)
  • Elastic: Fixed ElasticSampler and hvd.elastic.state losing some indices of processed samples when nodes dropped. (#3143)
  • Spark/Lightning: Fixed history metrics for estimator serialization. (#3216)
  • Ray: Fixed RayExecutor to fail when num_workers=0 and num_hosts=None. (#3210)
  • Spark/Lightning: Fixed checkpoint callback dirpath typo. (#3204)

... (truncated)

Commits
  • b089df6 Bump version to 0.24.0 (#3433)
  • db19aa4 Move apt-get into non-interactive mode (#3441)
  • 2632c05 Build Horovod with temporarily installed CMake if necessary (#3371)
  • 7bf9b04 Make checkpoint name optional so that user can save to h5 format. (#3411)
  • b553974 Fix flaky ray tests (#3430)
  • 7b5346e Fix indices in initial task-to-task registration (#3410)
  • 71e10b4 Fixing GPU and CPU TF head CI failures (#3431)
  • 79ded4b Fix FindNVTX.cmake (#3421)
  • 642a6b3 [TF - Fix] Fix imports from tensorflow.python.keras with tf.version >= 2....
  • 046c071 Allow stderr of executed cmake python code appear in logs (#3398)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

Bumps [horovod](https://github.com/horovod/horovod) from 0.22.1 to 0.24.0.
- [Release notes](https://github.com/horovod/horovod/releases)
- [Changelog](https://github.com/horovod/horovod/blob/master/CHANGELOG.md)
- [Commits](horovod/horovod@v0.22.1...v0.24.0)

---
updated-dependencies:
- dependency-name: horovod
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added the dependencies Pull requests that update a dependency file label Dec 25, 2022
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant