Skip to content

Releases: GoogleCloudPlatform/hpc-toolkit

v1.34.0: Slurm-GCP v6 Generally Available

24 May 00:37
5b360ae
Compare
Choose a tag to compare

What's Changed

In this release, we promote Slurm-GCP V6 to GA, making it the recommended version of Slurm-GCP. Find out more at:
Announcement

Key New Features 🎉

Module Improvements 🔨

Improvements 🛠

Deprecations 💤

Version Updates ⏫

  • Update a3-highgpu-8g blueprint to use latest v5 tag by @tpdownes in #2572
  • Update Slurm-GCP v5 modules and examples to 5.11.1 by @tpdownes in #2595
  • Update Slurm-GCP v6 modules and examples to 6.5.2 by @tpdownes in #2594

Bug fixes 🐞

Other changes

  • Revert "Allow specific reservation for node-group in slurm-gcp v5" by @harshthakkar01 in #2621
  • Revert "Revert "Allow specific reservation for node-group in slurm-gcp v5"" by @harshthakkar01 in #2622

Full Changelog: v1.33.0...v1.34.0

v1.33.0: "ghpc_stage" function; Slurm-GCP v6 improvements

16 May 13:57
146ebbe
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

  • Add docs about ghpc_stage and other functions by @mr0re1 in #2485
  • Add startup-script option to automatically install Docker at boot by @tpdownes in #2489

New Modules 🧱

Module Improvements 🔨

  • Address feature requests for HTCondor functionality in Windows by @tpdownes in #2469
  • Slurm6. Replace service_account with service_account_email|scopes by @mr0re1 in #2495
  • Slurm6. Replace vars disable_X -> enable_X by @mr0re1 in #2486
  • Remove "hard" dependency between login instance and controller instance by @mr0re1 in #2413
  • Allow the wait-for-startup module to take a list of instance names by @rohitramu in #2515
  • Simplify "cleanup compute" by @mr0re1 in #2479
  • Copy labels from the batch-job-template module to the actual Batch job spec by @aaronegolden in #2514
  • Slurm6. Automatically set login intances name, don't put role into it by @mr0re1 in #2531
  • Adopt Slurm-GCP 6.4.6 by @tpdownes in #2511

Improvements 🛠

Bug fixes 🐞

Full Changelog: v1.32.1...v1.33.0

v1.32.1: Fix version number in modules

19 Apr 18:40
bec99bb
Compare
Choose a tag to compare

What's Changed

Fix version number in modules

Full Changelog: v1.32.0...v1.32.1

v1.32.0: Deployment files and Slurm-GCP v6 examples

18 Apr 18:20
d4754d4
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

  • Deployment files allow merging generic blueprints with configurations specific to single deployments

New Modules 🧱

  • Decoupling private access from Cloud SQL to allow multiple instances in same VPC by @cboneti in #2397

Improvements 🛠

Bug fixes 🐞

  • Revert "Add example using Slurm static compute nodes" by @nick-stroud in #2404
  • Fixes for workstation creation - new extension added for yaml by @cdunbar13 in #2421
  • Updating garther startup script and integration test by @cdunbar13 in #2449

Full Changelog: v1.31.1...v1.32.0

v1.31.1: Updated provisioning guide for A3 VM family

01 Apr 21:23
b830db4
Compare
Choose a tag to compare

What's Changed

The A3 provisioning guide was updated by @tpdownes to support 2 use cases:

  • user-created reservations without compact placement policies that are automatically consumed by matching VMs
  • Google Cloud-created reservations that must be specifically identified by Slurm cluster for consumption

See #2420 and reservation consumption documentation for details.

Full Changelog: v1.31.0...v1.31.1

v1.31.0: Improved Local File Management

28 Mar 19:06
fe6b653
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

  • Implement ghpc_stage function to stage files into deployment by @mr0re1 in #2339

Module Improvements 🔨

  • Support http_proxy in HTCondor Windows installation by @tpdownes in #2368
  • Slurm6. Add support for dynamic nodeset. by @mr0re1 in #1986

Improvements 🛠

Deprecations 💤

  • Deprecate schedmd-slurm-gcp-v6-partition.network_storage by @mr0re1 in #2379
  • Remove quota validator by @mr0re1 in #2382

Bug fixes 🐞

  • Packer service account fix and alignment with Toolkit naming convention by @tpdownes in #2367

Full Changelog: v1.30.0...v1.31.0

v1.30.0 - Cloud HPC Toolkit A3 VM + NeMo Framework Solution

18 Mar 21:51
08ae77e
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

  • Introduction of the Cloud HPC Toolkit A3 VM family blueprint featuring
    • A Slurm cluster composed of A3 VMs each with 8 NVIDIA H100 GPUs
    • An example for running the NVIDIA NeMo framework
    • An example for running the common nccl-tests benchmark

Module Improvements 🔨

Improvements 🛠

  • Add TPU v4 blueprint and tutorial to demonstrate running TPU workload by @harshthakkar01 in #2287
  • Update parameters for TPU nodeset module and add precondition checks and bump TPU to v3 by @harshthakkar01 in #2293
  • Add Slurm v6 version for image builder blueprint by @harshthakkar01 in #2297
  • Allow ghpc deploy blueprint.yaml by @mr0re1 in #2323
  • Slurm GCP version update; will cooldown before deleting orphan nodes by @nick-stroud in #2322
  • Add SlurmGCP v6 example of slurm compatible with startup scripts and integration test by @harshthakkar01 in #2346

Version Updates ⏫

Bug fixes 🐞

  • Added enable_devel for packer build to fix issue with bp by @cdunbar13 in #2334

New Contributors

Full Changelog: v1.29.0...v1.30.0

v1.29.0: New Firewall Rules module & Slurm-GCP v6 Improvements

07 Mar 21:27
c024e72
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

New Modules 🧱

  • Split service account creation from htcondor-setup by @tpdownes in #2250

Module Improvements 🔨

  • Set http_proxy, https_proxy variables for user login and during startup-script by @tpdownes in #2237
  • Update documentation for Packer to include minimum operational requirements by @tpdownes in #2241
  • Modify cloud-storage-bucket to include ability to set bucket viewers by @tpdownes in #2247
  • Add "submit" option to batch-job-template module by @aaronegolden in #2210
  • Prevent usage of placement with static and auto-scale nodes in same nodeset by @nick-stroud in #2279

Improvements 🛠

Version Updates ⏫

Bug fixes 🐞

Full Changelog: v1.28.1...v1.29.0

v1.28.1: Slurm-GCP v4 reaches End-of-Life, improved Slurm-GCP v6 support

15 Feb 23:13
75a04d4
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

Module Improvements 🔨

  • Slurm6. Make subnetwork_self_link required, don't pass subnetwork_project by @mr0re1 in #2067
  • Slurm6. Automagicaly set nodeset.name from module id. by @mr0re1 in #2068
  • Slurm6. Add support for additional_networks, access_config & reservation_name by @mr0re1 in #2062
  • Reduce default maximum number of HTCondor execute points by @tpdownes in #2127
  • Startup stackdriver option by @nick-stroud in #2120
  • HTCondor: variable MIG behavior by @tpdownes in #2140
  • Extending GKE Scheduler module by @ek-nag in #2137
  • Copies python binaries instead of symlink for more isolated venv by @nick-stroud in #2151
  • Increase dynamic node count to a more reasonable default value by @nick-stroud in #2153
  • Update Chrome Remote Desktop to Debian 12 by default by @tpdownes in #2180
  • Update startup-script module to latest release by @tpdownes in #2183
  • Updates to HTCondor autoscaler by @tpdownes in #2204
  • Change batch-job-base template from json to YAML by @aaronegolden in #2199
  • Add Slurm configuration template for long Prolog/Epilog scripts by @tpdownes in #2218

Improvements 🛠

Deprecations 💤

Version Updates ⏫

Bug fixes 🐞

  • Update spack openfoam example to use /opt/apps directory by @harshthakkar01 in #2131
  • Fix HTCondow Windows URI for latest 23.0 LTS release by @tpdownes in #2141
  • Validation added to Slurm v5 login_startup_scripts_timeout by @cdunbar13 in #2148
  • Ensure Windows VMs start HTCondor only after successful secret download by @tpdownes in #2174

New Contributors

Full Changelog: v1.27.0...v1.28.0

Submission Checklist

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cloud HPC Toolkit Contribution guidelines #

v1.27.0: Spack support for non-root users

10 Jan 01:12
fcdc5e5
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

New Modules 🧱

Module Improvements 🔨

  • Making CloudSQL to use internal IP address instead of external for Slurm Accounting DB. by @ek-nag in #1795
  • OFE: Various new features and fixes. by @ek-nag in #2040
  • Disable firewall rule logging by default by @tpdownes in #2057
  • Slurm6. Add support for enable_slurm_gcp_plugins by @mr0re1 in #2066
  • Support explicit reserved_ip_range for Filestore instances by @tpdownes in #2072
  • Adopt gcloud storage over gsutil by default by @tpdownes in #2075
  • Skip upgrade of wheel/setuptools if already installed by @tpdownes in #2074
  • Support use of http/https proxy for pip/apt/yum package managers by @tpdownes in #2079

Improvements 🛠

Version Updates ⏫

Bug fixes 🐞

New Contributors

Full Changelog: v1.26.1...v1.27.0