Releases · Lightning-AI/pytorch-lightning

10 Feb 16:57

Borda

1.9.1

c24b4bb

Weekly patch release

App

Added

Added lightning open command (#16482)
Added experimental support for interruptable GPU in the cloud (#16399)
Added FileSystem abstraction to simply manipulate files (#16581)
Added Storage Commands (#16606)
- ls: List files from your Cloud Platform Filesystem
- cd: Change the current directory within your Cloud Platform filesystem (terminal session based)
- pwd: Return the current folder in your Cloud Platform Filesystem
- cp: Copy files between your Cloud Platform Filesystem and local filesystem
Prevent to cd into non-existent folders (#16645)
Enabled cp (upload) at project level (#16631)
Enabled ls and cp (download) at project level (#16622)
Added lightning connect data to register data connection to s3 buckets (#16670)
Added support for running with multiprocessing in the cloud (#16624)
Initial plugin server (#16523)
Connect and Disconnect node (#16700)

Changed

Changed the default LightningClient(retry=False) to retry=True (#16382)
Add support for async predict method in PythonServer and remove torch context (#16453)
Renamed lightning.app.components.LiteMultiNode to lightning.app.components.FabricMultiNode (#16505)
Changed the command lightning connect to lightning connect app for consistency (#16670)
Refactor cloud dispatch and update to new API (#16456)
Updated app URLs to the latest format (#16568)

Fixed

Fixed a deadlock causing apps not to exit properly when running locally (#16623)
Fixed the Drive root_folder not parsed properly (#16454)
Fixed malformed path when downloading files using lightning cp (#16626)
Fixed app name in URL (#16575)

Fabric

Fixed

Fixed error handling for accelerator="mps" and ddp strategy pairing (#16455)
Fixed strict availability check for torch_xla requirement (#16476)
Fixed an issue where PL would wrap DataLoaders with XLA's MpDeviceLoader more than once (#16571)
Fixed the batch_sampler reference for DataLoaders wrapped with XLA's MpDeviceLoader (#16571)
Fixed an import error when torch.distributed is not available (#16658)

Pytorch

Fixed

Fixed an unintended limitation for calling save_hyperparameters on mixin classes that don't subclass LightningModule/LightningDataModule (#16369)
Fixed an issue with MLFlowLogger logging the wrong keys with .log_hyperparams() (#16418)
Fixed logging more than 100 parameters with MLFlowLogger and long values are truncated (#16451)
Fixed strict availability check for torch_xla requirement (#16476)
Fixed an issue where PL would wrap DataLoaders with XLA's MpDeviceLoader more than once (#16571)
Fixed the batch_sampler reference for DataLoaders wrapped with XLA's MpDeviceLoader (#16571)
Fixed an import error when torch.distributed is not available (#16658)

Contributors

@akihironitta, @awaelchli, @Borda, @BrianPulfer, @ethanwharris, @hhsecond, @justusschock, @Liyang90, @RuRo, @senarvi, @shenoynikhil, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

senarvi, RuRo, and 10 other contributors

Assets 10

17 Jan 17:26

Borda

1.9.0

fc195b9

Stability and additional improvements

App

Added

Added a possibility to set up basic authentication for Lightning apps (#16105)

Changed

The LoadBalancer now uses internal ip + port instead of URL exposed (#16119)
Added support for logging in different trainer stages with DeviceStatsMonitor
(#16002)
Changed lightning_app.components.serve.gradio to lightning_app.components.serve.gradio_server (#16201)
Made cluster creation/deletion async by default (#16185)

Fixed

Fixed not being able to run multiple lightning apps locally due to port collision (#15819)
Avoid relpath bug on Windows (#16164)
Avoid using the deprecated LooseVersion (#16162)
Porting fixes to autoscaler component (#16249)
Fixed a bug where lightning login with env variables would not correctly save the credentials (#16339)

Fabric

Added

Added Fabric.launch() to programmatically launch processes (e.g. in Jupyter notebook) (#14992)
Added the option to launch Fabric scripts from the CLI, without the need to wrap the code into the run method (#14992)
Added Fabric.setup_module() and Fabric.setup_optimizers() to support strategies that need to set up the model before an optimizer can be created (#15185)
Added support for Fully Sharded Data Parallel (FSDP) training in Lightning Lite (#14967)
Added lightning_fabric.accelerators.find_usable_cuda_devices utility function (#16147)
Added basic support for LightningModules (#16048)
Added support for managing callbacks via Fabric(callbacks=...) and emitting events through Fabric.call() (#16074)
Added Logger support (#16121)
- Added Fabric(loggers=...) to support different Logger frameworks in Fabric
- Added Fabric.log for logging scalars using multiple loggers
- Added Fabric.log_dict for logging a dictionary of multiple metrics at once
- Added Fabric.loggers and Fabric.logger attributes to access the individual logger instances
- Added support for calling self.log and self.log_dict in a LightningModule when using Fabric
- Added access to self.logger and self.loggers in a LightningModule when using Fabric
Added lightning_fabric.loggers.TensorBoardLogger (#16121)
Added lightning_fabric.loggers.CSVLogger (#16346)
Added support for a consistent .zero_grad(set_to_none=...) on the wrapped optimizer regardless of which strategy is used (#16275)

Changed

Renamed the class LightningLite to Fabric (#15932, #15938)
The Fabric.run() method is no longer abstract (#14992)
The XLAStrategy now inherits from ParallelStrategy instead of DDPSpawnStrategy (#15838)
Merged the implementation of DDPSpawnStrategy into DDPStrategy and removed DDPSpawnStrategy (#14952)
The dataloader wrapper returned from .setup_dataloaders() now calls .set_epoch() on the distributed sampler if one is used (#16101)
Renamed Strategy.reduce to Strategy.all_reduce in all strategies (#16370)
When using multiple devices, the strategy now defaults to "ddp" instead of "ddp_spawn" when none is set (#16388)

Removed

Removed support for FairScale's sharded training (strategy='ddp_sharded'|'ddp_sharded_spawn'). Use Fully-Sharded Data Parallel instead (strategy='fsdp') (#16329)

Fixed

Restored sampling parity between PyTorch and Fabric dataloaders when using the DistributedSampler (#16101)
Fixes an issue where the error message wouldn't tell the user the real value that was passed through the CLI (#16334)

PyTorch

Added

Added support for native logging of MetricCollection with enabled compute groups (#15580)
Added support for custom artifact names in pl.loggers.WandbLogger (#16173)
Added support for DDP with LRFinder (#15304)
Added utilities to migrate checkpoints from one Lightning version to another (#15237)
Added support to upgrade all checkpoints in a folder using the pl.utilities.upgrade_checkpoint script (#15333)
Add an axes argument ax to the .lr_find().plot() to enable writing to a user-defined axes in a matplotlib figure (#15652)
Added log_model parameter to MLFlowLogger (#9187)
Added a check to validate that wrapped FSDP models are used while initializing optimizers (#15301)
Added a warning when self.log(..., logger=True) is called without a configured logger (#15814)
Added support for colossalai 0.1.11 (#15888)
Added LightningCLI support for optimizer and learning schedulers via callable type dependency injection (#15869)
Added support for activation checkpointing for the DDPFullyShardedNativeStrategy strategy (#15826)
Added the option to set DDPFullyShardedNativeStrategy(cpu_offload=True|False) via bool instead of needing to pass a configuration object (#15832)
Added info message for Ampere CUDA GPU users to enable tf32 matmul precision (#16037)
Added support for returning optimizer-like classes in LightningModule.configure_optimizers (#16189)

Changed

Switch from tensorboard to tensorboardx in TensorBoardLogger (#15728)
From now on, Lightning Trainer and LightningModule.load_from_checkpoint automatically upgrade the loaded checkpoint if it was produced in an old version of Lightning (#15237)
Trainer.{validate,test,predict}(ckpt_path=...) no longer restores the Trainer.global_step and trainer.current_epoch value from the checkpoints - From now on, only Trainer.fit will restore this value (#15532)
The ModelCheckpoint.save_on_train_epoch_end attribute is now computed dynamically every epoch, accounting for changes to the validation dataloaders (#15300)
The Trainer now raises an error if it is given multiple stateful callbacks of the same time with colliding state keys (#15634)
MLFlowLogger now logs hyperparameters and metrics in batched API calls (#15915)
Overriding the on_train_batch_{start,end} hooks in conjunction with taking a dataloader_iter in the training_step no longer errors out and instead shows a warning (#16062)
Move tensorboardX to extra dependencies. Use the CSVLogger by default (#16349)
Drop PyTorch 1.9 support (#15347)

Deprecated

Deprecated description, env_prefix and env_parse parameters in LightningCLI.__init__ in favour of giving them through parser_kwargs (#15651)
Deprecated pytorch_lightning.profiler in favor of pytorch_lightning.profilers (#16059)
Deprecated Trainer(auto_select_gpus=...) in favor of pytorch_lightning.accelerators.find_usable_cuda_devices (#16147)
Deprecated pytorch_lightning.tuner.auto_gpu_select.{pick_single_gpu,pick_multiple_gpus} in favor of pytorch_lightning.accelerators.find_usable_cuda_devices (#16147)
nvidia/apex deprecation (#16039)
- Deprecated pytorch_lightning.plugins.NativeMixedPrecisionPlugin in favor of pytorch_lightning.plugins.MixedPrecisionPlugin
- Deprecated the LightningModule.optimizer_step(using_native_amp=...) argument
- Deprecated the Trainer(amp_backend=...) argument
- Deprecated the Trainer.amp_backend property
- Deprecated the Trainer(amp_level=...) argument
- Deprecated the pytorch_lightning.plugins.ApexMixedPrecisionPlugin class
- Deprecates the pytorch_lightning.utilities.enums.AMPType enum
- Deprecates the DeepSpeedPrecisionPlugin(amp_type=..., amp_level=...) arguments
horovod deprecation (#16141)
- Deprecated Trainer(strategy="horovod")
- Deprecated the HorovodStrategy class
Deprecated pytorch_lightning.lite.LightningLite in favor of lightning.fabric.Fabric (#16314)
FairScale deprecation (in favor of PyTorch's FSDP implementation) (#16353)
- Deprecated the pytorch_lightning.overrides.fairscale.LightningShardedDataParallel class
- Deprecated the pytorch_lightning.plugins.precision.fully_sharded_native_amp.FullyShardedNativeMixedPrecisionPlugin class
- Deprecated the pytorch_lightning.plugins.precision.sharded_native_amp.ShardedNativeMixedPrecisionPlugin class
- Deprecated the pytorch_lightning.strategies.fully_sharded.DDPFullyShardedStrategy class
- Deprecated the pytorch_lightning.strategies.sharded.DDPShardedStrategy class
- Deprecated the pytorch_lightning.strategies.sharded_spawn.DDPSpawnShardedStrategy class

Removed

Removed deprecated pytorch_lightning.utilities.memory.get_gpu_memory_map in favor of pytorch_lightning.accelerators.cuda.get_nvidia_gpu_stats (#15617)
Temporarily removed support for Hydra multi-run (#15737)
Removed deprecated pytorch_lightning.profiler.base.AbstractProfiler in favor of pytorch_lightning.profilers.profiler.Profiler (#15637)
Removed deprecated pytorch_lightning.profiler.base.BaseProfiler in favor of pytorch_lightning.profilers.profiler.Profiler (#15637)
Removed deprecated code in pytorch_lightning.utilities.meta (#16038)
Removed the deprecated LightningDeepSpeedModule (#16041)
Removed the deprecated pytorch_lightning.accelerators.GPUAccelerator in favor of pytorch_lightning.accelerators.CUDAAccelerator (#16050)
Removed the deprecated pytorch_lightning.profiler.* classes in favor of pytorch_lightning.profilers (#16059)
Removed the deprecated pytorch_lightning.utilities.cli module in favor of pytorch_lightning.cli (#16116)
Removed the deprecated pytorch_lightning.loggers.base module in favor of pytorch_lightning.loggers.logger (#16120)
Removed the deprecated pytorch_lightning.loops.base module in favor of pytorch_lightning.loops.loop (#16142)
Removed the deprecated pytorch_lightning.core.lightning module in favor of pytorch_lightning.core.module (#16318)
Removed the deprecated pytorch_lightning.callbacks.base module in favor of pytorch_lightning.callbacks.callback (#16319)
Removed the deprecated Trainer.reset_train_val_dataloaders() in favor of Trainer.reset_{train,val}_dataloader (#16131)
Removed support for `LightningCLI(seed_ever...

Contributors

nicolai86, lantiga, and 22 other contributors

Assets 10

21 Dec 18:35

Borda

1.8.6

caa3329

Weekly patch release

App

Added

Added partial support for fastapi Request annotation in configure_api handlers (#16047)
Added a nicer UI with URL and examples for the autoscaler component (#16063)
Enabled users to have more control over scaling out/in intervals (#16093)
Added more datatypes to the serving component (#16018)
Added work.delete method to delete the work (#16103)
Added display_name property to LightningWork for the cloud (#16095)
Added ColdStartProxy to the AutoScaler (#16094)
Added status endpoint, enable ready (#16075)
Implemented ready for components (#16129)

Changed

The default start_method for creating Work processes locally on macOS is now 'spawn' (previously 'fork') (#16089)
The utility lightning.app.utilities.cloud.is_running_in_cloud now returns True during the loading of the app locally when running with --cloud (#16045)
Updated Multinode Warning (#16091)
Updated app testing (#16000)
Changed overwrite to True (#16009)
Simplified messaging in cloud dispatch (#16160)
Added annotations endpoint (#16159)

Fixed

Fixed PythonServer messaging "Your app has started" (#15989)
Fixed auto-batching to enable batching for requests coming even after the batch interval but is in the queue (#16110)
Fixed a bug where AutoScaler would fail with min_replica=0 (#16092
Fixed a non-thread safe deepcopy in the scheduler (#16114)
Fixed HTTP Queue sleeping for 1 sec by default if no delta was found (#16114)
Fixed the endpoint info tab not showing up in the AutoScaler UI (#16128)
Fixed an issue where an exception would be raised in the logs when using a recent version of streamlit (#16139)
Fixed e2e tests (#16146)

Full Changelog: 1.8.5.post0...1.8.6

Assets 10

16 Dec 14:12

Borda

1.8.5.post0

a8a3519

Minor patch release

App

Fixed install/upgrade - removing single quote (#16079)
Fixed bug where components that are re-instantiated several times failed to initialize if they were modifying self.lightningignore (#16080)
Fixed a bug where apps that had previously been deleted could not be run again from the CLI (#16082)

Pytorch

Add function to remove checkpoint to allow override for extended classes (#16067)

Full Changelog: 1.8.5...1.8.5.post0

Assets 10

15 Dec 17:19

Borda

1.8.5

e5d5901

Weekly patch release

App

Added

Added Lightning{Flow,Work}.lightningignores attributes to programmatically ignore files before uploading to the cloud (#15818)
Added a progress bar while connecting to an app through the CLI (#16035)
Support running on multiple clusters (#16016)
Added guards to cluster deletion from cli (#16053)
Added creation of the default .lightningignore that ignores venv (#16056)

Changed

Cleanup cluster waiting (#16054)

Fixed

Fixed DDPStrategy import in app framework (#16029)
Fixed AutoScaler raising an exception when non-default cloud compute is specified (#15991)
Fixed and improvements of login flow (#16052)
Fixed the debugger detection mechanism for the lightning App in VSCode (#16068)

Pytorch

some minor cleaning

Full Changelog: 1.8.4.post0...1.8.5

Assets 10

09 Dec 23:43

Borda

1.8.4.post0

60b3cc9

Minor patch release

App

Fixed MultiNode Component to use separate cloud computes (#15965)
Fixed Registration for CloudComputes of Works in L.app.structures (#15964)
Fixed a bug where auto-upgrading to the latest lightning via the CLI could get stuck in a loop (#15984)

Pytorch

Fixed the XLAProfiler not recording anything due to mismatching of action names (#15885)

Full Changelog: 1.8.4...1.8.4.post0

Assets 10

09 Dec 05:02

Borda

1.8.3.post2

cae23c1

Dependency hotfix

🤖

Assets 10

08 Dec 18:52

Borda

1.8.4

7eb5ff5

Weekly patch release

App

Added

Add code_dir argument to tracer run (#15771)
Added the CLI command lightning run model to launch a LightningLite accelerated script (#15506)
Added the CLI command lightning delete app to delete a lightning app on the cloud (#15783)
Added a CloudMultiProcessBackend which enables running a child App from within the Flow in the cloud (#15800)
Utility for pickling work object safely even from a child process (#15836)
Added AutoScaler component (#15769)
Added the property ready of the LightningFlow to inform when the Open App should be visible (#15921)
Added private work attributed _start_method to customize how to start the works (#15923)
Added a configure_layout method to the LightningWork which can be used to control how the work is handled in the layout of a parent flow (#15926)
Added the ability to run a Lightning App or Component directly from the Gallery using lightning run app organization/name (#15941)
Added automatic conversion of list and dict of works and flows to structures (#15961)

Changed

The MultiNode components now warn the user when running with num_nodes > 1 locally (#15806)
Cluster creation and deletion now waits by default [#15458
Running an app without a UI locally no longer opens the browser (#15875)
Show a message when BuildConfig(requirements=[...]) is passed but a requirements.txt file is already present in the Work (#15799)
Show a message when BuildConfig(dockerfile="...") is passed but a Dockerfile file is already present in the Work (#15799)
Dropped name column from cluster list (#15721)
Apps without UIs no longer activate the "Open App" button when running in the cloud (#15875)
Wait for full file to be transferred in Path / Payload (#15934)

Removed

Removed the SingleProcessRuntime (#15933)

Fixed

Fixed SSH CLI command listing stopped components (#15810)
Fixed bug when launching apps on multiple clusters (#15484)
Fixed Sigterm Handler causing thread lock which caused KeyboardInterrupt to hang (#15881)
Fixed MPS error for multinode component (defaults to cpu on mps devices now as distributed operations are not supported by pytorch on mps) (#15748)
Fixed the work not stopped when successful when passed directly to the LightningApp (#15801)
Fixed the PyTorch Inference locally on GPU (#15813)
Fixed the enable_spawn method of the WorkRunExecutor (#15812)
Fixed require/import decorator (#15849)
Fixed a bug where using L.app.structures would cause multiple apps to be opened and fail with an error in the cloud (#15911)
Fixed PythonServer generating noise on M1 (#15949)
Fixed multiprocessing breakpoint (#15950)
Fixed detection of a Lightning App running in debug mode (#15951)
Fixed ImportError on Multinode if package not present (#15963)

Lite

Fixed shuffle=False having no effect when using DDP/DistributedSampler (#15931)

Pytorch

Changed

Direct support for compiled models (#15922)

Fixed

Fixed issue with unsupported torch.inference_mode() on hpu backends (#15918)
Fixed LRScheduler import for PyTorch 2.0 (#15940)
Fixed fit_loop.restarting to be False for lr finder (#15620)
Fixed torch.jit.script-ing a LightningModule causing an unintended error message about deprecated use_amp property (#15947)

Full Changelog: 1.8.3...1.8.4

Assets 10

25 Nov 19:20

tchaton

1.8.3.post1

92fe188

Hotfix for Python Server

App

Changed

Fixed the PyTorch Inference locally on GPU (#15813)

Full Changelog: 1.8.3...1.8.3

Assets 10

23 Nov 15:03

Borda

1.8.3.post0

655ade6

Hotfix for requirements

Revert/s3fs (#15792)

* revert s3fs

* post

Assets 10

Releases: Lightning-AI/pytorch-lightning

Weekly patch release

App

Added

Changed

Fixed

Fabric

Fixed

Pytorch

Fixed

Contributors

Contributors

Stability and additional improvements

App

Added

Changed

Fixed

Fabric

Added

Changed

Removed

Fixed

PyTorch

Added

Changed

Deprecated

Removed

Contributors

Weekly patch release

App

Added

Changed

Fixed

Minor patch release

App

Pytorch

Weekly patch release

App

Added

Changed

Fixed

Pytorch

Minor patch release

App

Pytorch

Dependency hotfix

Weekly patch release

App

Added

Changed

Removed

Fixed

Lite

Pytorch

Changed

Fixed

Hotfix for Python Server

App

Changed

Hotfix for requirements