Releases: pytorch/pytorch
PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips are now available
Pytorch 1.13 Release Notes
- Highlights
- Backwards Incompatible Changes
- New Features
- Improvements
- Performance
- Documentation
- Developers
Highlights
We are excited to announce the release of PyTorch 1.13! This includes stable versions of BetterTransformer. We deprecated CUDA 10.2 and 11.3 and completed migration of CUDA 11.6 and 11.7. Beta includes improved support for Apple M1 chips and functorch, a library that offers composable vmap (vectorization) and autodiff transforms, being included in-tree with the PyTorch release. This release is composed of over 3,749 commits and 467 contributors since 1.12.1. We want to sincerely thank our dedicated community for your contributions.
Summary:
-
The BetterTransformer feature set supports fastpath execution for common Transformer models during Inference out-of-the-box, without the need to modify the model. Additional improvements include accelerated add+matmul linear algebra kernels for sizes commonly used in Transformer models and Nested Tensors is now enabled by default.
-
Timely deprecating older CUDA versions allows us to proceed with introducing the latest CUDA version as they are introduced by Nvidia®, and hence allows support for C++17 in PyTorch and new NVIDIA Open GPU Kernel Modules.
-
Previously, functorch was released out-of-tree in a separate package. After installing PyTorch, a user will be able to
import functorch
and use functorch without needing to install another package. -
PyTorch is offering native builds for Apple® silicon machines that use Apple's new M1 chip as a beta feature, providing improved support across PyTorch's APIs.
Stable | Beta | Prototype |
---|---|---|
|
|
|
You can check the blogpost that shows the new features here.
Backwards Incompatible changes
Python API
uint8 and all integer dtype masks are no longer allowed in Transformer (#87106)
Prior to 1.13, key_padding_mask
could be set to uint8 or other integer dtypes in TransformerEncoder
and MultiheadAttention
, which might generate unexpected results. In this release, these dtypes are not allowed for the mask anymore. Please convert them to torch.bool
before using.
1.12.1
>>> layer = nn.TransformerEncoderLayer(2, 4, 2)
>>> encoder = nn.TransformerEncoder(layer, 2)
>>> pad_mask = torch.tensor([[1, 1, 0, 0]], dtype=torch.uint8)
>>> inputs = torch.cat([torch.randn(1, 2, 2), torch.zeros(1, 2, 2)], dim=1)
# works before 1.13
>>> outputs = encoder(inputs, src_key_padding_mask=pad_mask)
1.13
>>> layer = nn.TransformerEncoderLayer(2, 4, 2)
>>> encoder = nn.TransformerEncoder(layer, 2)
>>> pad_mask = torch.tensor([[1, 1, 0, 0]], dtype=torch.bool)
>>> inputs = torch.cat([torch.randn(1, 2, 2), torch.zeros(1, 2, 2)], dim=1)
>>> outputs = encoder(inputs, src_key_padding_mask=pad_mask)
Updated torch.floor_divide
to perform floor division (#78411)
Prior to 1.13, torch.floor_divide
erroneously performed truncation division (i.e. truncated the quotients). In this release, it has been fixed to perform floor division. To replicate the old behavior, use torch.div
with rounding_mode='trunc'
.
1.12.1
>>> a = torch.tensor([4.0, -3.0])
>>> b = torch.tensor([2.0, 2.0])
>>> torch.floor_divide(a, b)
tensor([ 2., -1.])
1.13
>>> a = torch.tensor([4.0, -3.0])
>>> b = torch.tensor([2.0, 2.0])
>>> torch.floor_divide(a, b)
tensor([ 2., -2.])
# Old behavior can be replicated using torch.div with rounding_mode='trunc'
>>> torch.div(a, b, rounding_mode='trunc')
tensor([ 2., -1.])
Fixed torch.index_select
on CPU to error that index is out of bounds when the source
tensor is empty (#77881)
Prior to 1.13, torch.index_select
would return an appropriately sized tensor filled with random values on CPU if the source tensor was empty. In this release, we have fixed this bug so that it errors out. A consequence of this is that torch.nn.Embedding
which utilizes index_select
will error out rather than returning an empty tensor when embedding_dim=0
and input
contains indices which are out of bounds. The old behavior cannot be reproduced with torch.nn.Embedding
, however since an Embedding layer with embedding_dim=0
is a corner case this behavior is unlikely to be relied upon.
1.12.1
>>> t = torch.tensor([4], dtype=torch.long)
>>> embedding = torch.nn.Embedding(3, 0)
>>> embedding(t)
tensor([], size=(1, 0), grad_fn=<EmbeddingBackward0>)
1.13
>>> t = torch.tensor([4], dtype=torch.long)
>>> embedding = torch.nn.Embedding(3, 0)
>>> embedding(t)
RuntimeError: INDICES element is out of DATA bounds, id=4 axis_dim=3
Disallow overflows when tensors are constructed from scalars (#82329)
Prior to this PR, overflows during tensor construction from scalars would not throw an error. In 1.13, such cases will error.
1.12.1
>>> torch.tensor(1000, dtype=torch.int8)
tensor(-24, dtype=torch.int8)
1.13
>>> torch.tensor(1000, dtype=torch.int8)
RuntimeError: value cannnot be converted to type int8 without overflow
Error on indexing a cpu tensor with non-cpu indices (#69607)
Prior to 1.13, cpu_tensor[cuda_indices]
was a valid program that would return a cpu tensor. The original use case for mixed device indexing was for non_cpu_tensor[cpu_indices]
, and allowing the opposite was unintentional (cpu_tensor[non_cpu_indices]
). This behavior appears to be rarely used, and a refactor of our indexing kernels made it difficult to represent an op that takes in (cpu_tensor, non_cpu_tensor) and returns another cpu_tensor, so it is now an error.
To replicate the old behavior for base[indices]
, you can ensure that either indices
lives on the CPU device, or base
and indices
both live on the same device.
1.12.1
>>> a = torch.tensor([1.0, 2.0, 3.0])
>>> b = torch.tensor([0, 2], device='cuda')
>>> a[b]
tensor([1., 3.])
1.13
>>> a = torch.tensor([1.0, 2.0, 3.0])
>>> b = torch.tensor([0, 2], device='cuda')
>>> a[b]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
# Old behavior can be replicated by moving b to CPU, or a to CUDA
>>> a[b.cpu()]
tensor([1., 3.])
>>> a.cuda()[b]
tensor([1., 3.], device='cuda:0')
Remove deprecated torch.eig
, torch.matrix_rank
, torch.lstsq
(#70982, #70981, #70980)
The deprecation cycle for the above functions has been completed and they have been removed in the 1.13 release.
torch.nn
Enforce that the bias
has the same dtype as input
and weight
for convolutions on CPU (#83686)
To align with the implementation on other devices, the CPU implementation for convolutions was updated to enforce that the dtype
of the bias
matches the dtype
of the input
and weight
.
1.12.1
# input and weight are dtype torch.int64
# bias is torch.float32
>>> out = torch.nn.functional.conv2d(input, weight, bias, ...)
1.13
# input and weight are dtype torch.int64
# bias is torch.float32
>>> with assertRaisesError():
>>> out = torch.nn.functional.conv2d(input, weight, bias, ...)
# Updated code to avoid the error
>>> out = torch.nn.functional.conv2d(input, weight, bias.to(input.dtype), ...)
Autograd
Disallow setting the .data
of a tensor that requires_grad=True
with an integer tensor (#78436)
Setting the .data
of a tensor that requires_grad
with an integer tensor now raises an error.
1.12.1
>>> x = torch.randn(2, requires_grad=True)
>>> x.data = torch.randint(1, (2,))
>>> x
tensor([0, 0], requires_grad=True)
1.13
>>> x = torch.randn(2, requires_grad=True)
>>> x.data = torch.randint(1, (2,))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: data set to a tensor that requires gradients must be floating point or complex dtype
Added variable_list support to ExtractVariables struct (#84583)
Prior to this change, C++ custom autograd Function considers tensors passed in TensorList to not be tensors for the purposes of recording the backward graph. After this change, custom Functions that receive TensorList must modify their backward functions to also compute gradients for these additional tensor inputs. Note that this behavior now differs from that of custom autograd Functions in Python.
1.12.1
struct MyFunction : public Function<MyFunction> {
static Variable forward(AutogradContext* ctx, at::Tensor t, at::TensorList tensors) {
return 2 * tensors[0] + 3 * t;
}
static variable_list backward(
AutogradContext* ctx,
variable_list grad_output) {
return {3 * grad_output[0]};
}
};
1.13
struct MyFunction : public Function<MyFunction> {
static Variable forward(AutogradContext* ctx, at::Tensor t, at::TensorList tensors) {
return 2 * tensors[0] + 3 * t;
}
static variable_list backward(
AutogradContext* ctx,
variable_list grad_output) {
return {3 * grad_output[0], 2 * grad_output[0]};
}
};
Don't detach when making views; force kernel to detach (#84893)
View operations registered as CompositeExplicitAutograd kernels are no longer allowed to return input tensors as-is. You must explic...
PyTorch 1.12.1 Release, small bug fix release
This release is meant to fix the following issues (regressions / silent correctness):
Optim
- Remove overly restrictive assert in adam #80222
Autograd
- Convolution forward over reverse internal asserts in specific case #81111
- 25% Performance regression from v0.1.1 to 0.2.0 when calculating hessian #82504
Distributed
- Fix distributed store to use add for the counter of DL shared seed #80348
- Raise proper timeout when sharing the distributed shared seed #81666
NN
- Allow register float16 weight_norm on cpu and speed up test #80600
- Fix weight norm backward bug on CPU when OMP_NUM_THREADS <= 2 #80930
- Weight_norm is not working with float16 #80599
- New release breaks torch.nn.weight_norm backwards pass and breaks all Wav2Vec2 implementations #80569
- Disable src mask for transformer and multiheadattention fastpath #81277
- Make nn.stateless correctly reset parameters if the forward pass fails #81262
- torchvision.transforms.functional.rgb_to_grayscale() + torch.nn.Conv2d() don`t work on 1080 GPU #81106
- Transformer and CPU path with src_mask raises error with torch 1.12 #81129
Data Loader
- [Locking lower ranks seed recepients https://github.com//pull/81071
CUDA
- os.environ["CUDA_VISIBLE_DEVICES"] has no effect #80876
- share_memory() on CUDA tensors no longer no-ops and instead crashes #80733
- [Prims] Unbreak CUDA lazy init #80899
- PyTorch 1.12 cu113 wheels cudnn discoverability issue #80637
- Remove overly restrictive checks for cudagraph #80881
ONNX
- ONNX cherry picks #82435
MPS
- MPS cherry picks #80898
Other
- Don't error if _warned_capturable_if_run_uncaptured not set #80345
- Initializing libiomp5.dylib, but found libomp.dylib already initialized. #78490
- Assertion error - _dl_shared_seed_recv_cnt - pt 1.12 - multi node #80845
- Add 3.10 stdlib to torch.package #81261
- CPU-only c++ extension libraries (functorch, torchtext) built against PyTorch wheels are not fully compatible with PyTorch wheels #80489
PyTorch 1.12: TorchArrow, Functional API for Modules and nvFuser, are now available
PyTorch 1.12 Release Notes
- Highlights
- Backwards Incompatible Change
- New Features
- Improvements
- Performance
- Documentation
Highlights
We are excited to announce the release of PyTorch 1.12! This release is composed of over 3124 commits, 433 contributors. Along with 1.12, we are releasing beta versions of AWS S3 Integration, PyTorch Vision Models on Channels Last on CPU, Empowering PyTorch on Intel® Xeon® Scalable processors with Bfloat16 and FSDP API. We want to sincerely thank our dedicated community for your contributions.
Summary:
- Functional Module API to functionally apply module computation with a given set of parameters
- Complex32 and Complex Convolutions in PyTorch
- DataPipes from TorchData fully backward compatible with DataLoader
- Functorch with improved coverage for APIs
- nvFuser a deep learning compiler for PyTorch
- Changes to float32 matrix multiplication precision on Ampere and later CUDA hardware
- TorchArrow, a new beta library for machine learning preprocessing over batch data
Backwards Incompatible changes
Python API
Updated type promotion for torch.clamp
(#77035)
In 1.11, the ‘min’ and ‘max’ arguments in torch.clamp
did not participate in type promotion, which made it inconsistent with minimum
and maximum
operations. In 1.12, the ‘min’ and ‘max’ arguments participate in type promotion.
1.11
>>> import torch
>>> a = torch.tensor([1., 2., 3., 4.], dtype=torch.float32)
>>> b = torch.tensor([2., 2., 2., 2.], dtype=torch.float64)
>>> c = torch.tensor([3., 3., 3., 3.], dtype=torch.float64)
>>> torch.clamp(a, b, c).dtype
torch.float32
1.12
>>> import torch
>>> a = torch.tensor([1., 2., 3., 4.], dtype=torch.float32)
>>> b = torch.tensor([2., 2., 2., 2.], dtype=torch.float64)
>>> c = torch.tensor([3., 3., 3., 3.], dtype=torch.float64)
>>> torch.clamp(a, b, c).dtype
torch.float64
Complex Numbers
Fix complex type promotion (#77524)
Updates the type promotion rule such that given a complex scalar and real tensor, the value type of real tensor is preserved
1.11
>>> a = torch.randn((2, 2), dtype=torch.float)
>>> b = torch.tensor(1, dtype=torch.cdouble)
>>> (a + b).dtype
torch.complex128
1.12
>>> a = torch.randn((2, 2), dtype=torch.float)
>>> b = torch.tensor(1, dtype=torch.cdouble)
>>> (a + b).dtype
torch.complex64
LinAlg
Disable TF32 for matmul by default and add high-level control of fp32 matmul precision (#76509)
PyTorch 1.12 makes the default math mode for fp32 matrix multiplications more precise and consistent across hardware. This may affect users on Ampere or later CUDA devices and TPUs. See the PyTorch blog for more details.
Sparse
Use ScatterGatherKernel for scatter_reduce (CPU-only) (#74226, #74608)
In 1.11.0, unlike scatter
which takes a reduce
kwarg or scatter_add
, scatter_reduce
was not an in-place function. That is, it did not allow the user to pass an output tensor which contains data that is reduced together with the scattered data. Instead, the scatter reduction took place on an output tensor initialized under the hood. Indices of the output that were not scattered to were filled with reduction inits (or 0 for options ‘amin’ and ‘amax’).
In 1.12.0, scatter_reduce
(which is in beta) is in-place to align with the API of the related existing functions scatter
/scatter_add
. For this reason, the argument input
in 1.11.0 has been renamed src
in 1.12.0 and the new self
argument now takes a destination tensor to be scattered onto. Since the destination tensor is no longer initialized under the hood, the output_size
kwarg in 1.11.0 that allowed users to specify the size of the output at dimension dim
has been removed. Further, in 1.12.0 we introduce an include_self
kwarg which determines whether values in the self
(destination) tensor are included in the reduction. Setting include_self=True
could, for example, allow users to provide special reduction inits for the scatter_reduction operation. Otherwise, if include_self=False,
indices scattered to are treated as if they were filled with reduction inits.
In the snippet below, we illustrate how the behavior of scatter_reduce
in 1.11.0 can be achieved with the function released in 1.12.0.
Example:
>>> src = torch.arange(6, dtype=torch.float).reshape(3, 2)
>>> index = torch.tensor([[0, 2], [1, 1], [0, 0]])
>>> dim = 1
>>> output_size = 4
>>> reduce = "prod"
1.11
>>> torch.scatter_reduce(src, dim, index, reduce, output_size=output_size)
`tensor([[ 0., 1., 1., 1.],
[ 1., 6., 1., 1.],
[20., 1., 1., 1.]])`
1.12
>>> output_shape = list(src.shape)
>>> output_shape[dim] = output_size
# reduction init for prod is 1
# filling the output with 1 is only necessary if the user wants to preserve the behavior in 1.11
# where indices not scattered to are filled with reduction inits
>>> output = src.new_empty(output_shape).fill_(1)
>>> output.scatter_reduce_(dim, index, src, reduce)
`tensor([[ 0., 1., 1., 1.],
[ 1., 6., 1., 1.],
[20., 1., 1., 1.]])`
torch.nn
nn.GroupNorm
: Report an error if num_channels
is not divisible by num_groups
(#74293)
Previously, nn.GroupNorm
would error out during the forward pass if num_channels
is not divisible by num_groups
. Now, the error is thrown for this case during module construction instead.
1.11
m = torch.nn.GroupNorm(3, 7)
m(...) # errors during forward pass
1.12
m = torch.nn.GroupNorm(3, 7) # errors during construction
nn.Dropout2d
: Return to 1.10 behavior: perform 1D channel-wise dropout for 3D inputs
In PyTorch 1.10 and older, passing a 3D input to nn.Dropout2D
resulted in 1D channel-wise dropout behavior; i.e. such inputs were interpreted as having shape (N, C, L)
with N = batch size and C = # channels and channel-wise dropout was performed along the second dimension.
1.10
x = torch.randn(2, 3, 4)
m = nn.Dropout2d(p=0.5)
out = m(x) # input is assumed to be shape (N, C, L); dropout along the second dim.
With the introduction of no-batch-dim input support in 1.11, 3D inputs were reinterpreted as having shape (C, H, W)
; i.e. an input without a batch dimension, and dropout behavior was changed to drop along the first dimension. This was a silent breaking change.
1.11
x = torch.randn(2, 3, 4)
m = nn.Dropout2d(p=0.5)
out = m(x) # input is assumed to be shape (C, H, W); dropout along the first dim.
The breaking change in 1.11 resulted in a lack of support for 1D channel-wise dropout behavior, so Dropout2d
in PyTorch 1.12 returns to 1.10 behavior with a warning to give some time to adapt before the no-batch-dim interpretation goes back into effect.
1.12
x = torch.randn(2, 3, 4)
m = nn.Dropout2d(p=0.5)
out = m(x) # input is assumed to be shape (N, C, L); dropout along the second dim.
# throws a warning suggesting nn.Dropout1d for 1D channel-wise dropout.
If you want 1D channel-wise dropout behavior, please switch to use of the newly-added nn.Dropout1d
module instead of nn.Dropout2d
. If you want no-batch-dim input behavior, please note that while this is not supported in 1.12, a future release will reinstate the interpretation of 3D inputs to nn.Dropout2d
as those without a batch dimension.
F.cosine_similarity
: Improve numerical stability (#31378)
Previously, we first compute the inner product, then normalize. After this change, we first normalize, then compute inner product. This should be more numerically stable because it avoids losing precision in inner product for inputs with large norms. Because of this change, outputs may be different in some cases.
Composability
Functions in torch.ops.aten.{foo} no longer accept self
as a kwarg
torch.ops.aten.{foo}
objects are now instances of OpOverloadPacket
(instead of a function) that have their __call__
method in Python, which means that you cannot pass self
as a kwarg. You can pass it normally as a positional argument instead.
1.11
>>> torch.ops.aten.sin(self=torch.ones(2))
tensor([0.8415, 0.8415])
1.12
# this now fails
>>> torch.ops.aten.sin(self=torch.ones(2))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __call__() got multiple values for argument 'self'
# this works
>>> torch.ops.aten.sin(torch.ones(2))
tensor([0.8415, 0.8415])
torch_dispatch now traces individual op overloads instead of op overload packets (#72673)
torch.ops.aten.add
actually corresponds to a bundle of functions from C++, corresponding to all over the overloads of add operator (specifically, add.Tensor
, add.Scalar
and add.out
). Now, __torch_dispatch__
will directly take in an overload corresponding to a single aten function.
1.11
class MyTensor(torch.Tensor):
....
def __torch_dispatch__(cls, func, types, args=(), kwargs=None):
# Before, func refers to a "packet" of all overloads
# for a given operator, e.g. "add"
assert func == torch.ops.aten.add
1.12
class MyTensor(torch.Tensor):
....
def __torch_dispatch__(cls, func, types, args=(), kwargs=No...
PyTorch 1.11, TorchData, and functorch are now available
PyTorch 1.11 Release Notes
- Highlights
- Backwards Incompatible Change
- New Features
- Improvements
- Performance
- Documentation
Highlights
We are excited to announce the release of PyTorch 1.11. This release is composed of over 3,300 commits since 1.10, made by 434 contributors. Along with 1.11, we are releasing beta versions of TorchData and functorch. We want to sincerely thank our community for continuously improving PyTorch.
- TorchData is a new library for common modular data loading primitives for easily constructing flexible and performant data pipelines. View it on GitHub.
- functorch, a library that adds composable function transforms to PyTorch, is now available in beta. View it on GitHub.
- Distributed Data Parallel (DDP) static graph optimizations available in stable.
You can check the blogpost that shows the new features here.
Backwards Incompatible changes
Python API
Fixed python deepcopy
to correctly copy all attributes on Tensor
objects (#65584)
This change ensures that the deepcopy
operation on Tensor properly copies all the attributes (and not just the plain Tensor properties).
1.10.2 | 1.11.0 |
---|---|
a = torch.rand(2)
a.foo = 3
torch.save(a, "bar")
b = torch.load("bar")
print(b.foo)
# Raise AttributeError: "Tensor" object has no attribute "foo"
|
a = torch.rand(2)
a.foo = 3
torch.save(a, "bar")
b = torch.load("bar")
print(b.foo)
# 3
|
steps
argument is no longer optional in torch.linspace
and torch.logspace
This argument used to default to 100 in PyTorch 1.10.2, but was deprecated (previously you would see a deprecation warning if you didn’t explicitly pass in steps
). In PyTorch 1.11, it is not longer optional.
1.10.2 | 1.11.0 |
---|---|
# Works, but raises a deprecation warning
# Steps defaults to 100
a = torch.linspace(1, 10)
# UserWarning: Not providing a value for linspace's steps is deprecated
# and will throw a runtime error in a future release.
# This warning will appear only once per process.
# (Triggered internally at ../aten/src/ATen/native/RangeFactories.cpp:19
|
# In 1.11, you must specify steps
a = torch.linspace(1, 10, steps=100)
|
Remove torch.hub.import_module
function that was mistakenly public (#67990)
This function is not intended for public use.
If you have existing code that relies on it, you can find an equivalent function at torch.hub._import_module
.
C++ API
We’ve cleaned up many of the headers in the C++ frontend to only include the subset of aten
operators that they actually used (#68247, #68687, #68688, #68714, #68689, #68690, #68697, #68691, #68692, #68693, #69840)
When you #include
a header from the C++ frontend, you can no longer assume that every aten
operators are transitively included. You can work around this by directly adding #include <ATen/ATen.h>
in your file, which will maintain the old behavior of including every aten
operators.
Custom implementation for c10::List
and c10::Dict
move constructors have been removed (#69370)
The semantics have changed from "make the moved-from List/Dict empty" to "keep the moved-from List/Dict unchanged"
1.10.2 | 1.11.0 |
---|---|
c10::List list1({"3", "4"});
c10::List list2(std::move(list1));
std::cout << list1.size() // 0
|
c10::List list1({"3", "4"});
c10::List list2(std::move(list1)); // calls copy ctr
std::cout << list1.size() // 2
|
CUDA
Removed THCeilDiv
function and corresponding THC/THCDeviceUtils.cuh
header (#65472)
As part of cleaning up TH
from the codebase, the THCeilDiv
function has been removed. Instead, please use at::ceil_div
, and include the corresponding ATen/ceil_div.h
header
Removed THCudaCheck
(#66391)
You can replace it with C10_CUDA_CHECK
, which has been available since at least PyTorch 1.4, so just replacing is enough even if you support older versions
Removed THCudaMalloc()
, THCudaFree()
, THCThrustAllocator.cuh
(#65492)
If your extension is using THCThrustAllocator.cuh
, please replace it with ATen/cuda/ThrustAllocator.h
and corresponding APIs (see examples in this PR).
This PR also removes THCudaMalloc/THCudaFree
calls. Please use c10::cuda::CUDACachingAllocator::raw_alloc(size)/raw_delete(ptr)
, or, preferably, switch to c10:cuda::CUDaCachingAllocator::allocate
which manages deallocation. Caching allocator APIs are available since PyTorch 1.2, so just replacing it is enough even if you support older versions of PyTorch.
Build
Stopped building shared library for AOT Compiler, libaot_compiler.so
(#66227)
Building aot_compiler.cpp
as a separate library is not necessary, as it’s already included in libtorch.so
.
You can update your build system to only dynamically link libtorch.so
.
Mobile
Make typing.Union
type unsupported for mobile builds (#65556)
typing.Union
support was added for TorchScript in 1.10. It was removed specifically for mobile due to its lack of use and increase in binary size of PyTorch for Mobile builds.
Distributed
torch.distributed.rpc
: Final Removal of ProcessGroup RPC backend (#67363)
ProcessGroup RPC backend is deprecated. In 1.10, it threw an error to help users update their code, and, in 1.11, it is removed completely.
The backend type “PROCESS_GROUP” is now deprecated, e.g.
torch.distributed.rpc.init_rpc("worker0", backend="PROCESS_GROUP", rank=0, world_size=1)
and should be replaced with:
torch.distributed.rpc.init_rpc("worker0", backend="TENSORPIPE", rank=0, world_size=1)
Quantization
Disabled the support for getitem
in FX Graph Mode Quantization (#66647)
getitem
used to be quantized in FX Graph Mode Quantization
, and it is no longer quantized. This won’t break any models but could result in a slight difference in numerics.
1.10.2 | 1.11.0 |
---|---|
from torch.ao.quantization.quantize_fx import convert_fx, prepare_fx
class M(torch.nn.Module):
def __init__(self):
super().__init__()
self.linear = torch.nn.Linear(5, 5)
def forward(self, x):
x = self.linear(x)
y = torch.stack([x], 0)
return y[0]
m = M().eval()
m = prepare_fx(m, {"": torch.ao.quantization.default_qconfig})
m = convert_fx(m)
print(m)
# prints
# GraphModule(
# (linear): QuantizedLinear(in_features=5, out_features=5,
# scale=1.0, zero_point=0, qscheme=torch.per_tensor_affine)
# )
# def forward(self, x):
# linear_input_scale_0 = self.linear_input_scale_0
# linear_input_zero_point_0 = self.linear_input_zero_point_0
# quantize_per_tensor = torch.quantize_per_tensor(x,
# linear_input_scale_0, linear_input_zero_point_0, torch.quint8)
# x = linear_input_scale_0 = linear_input_zero_point_0 = None
# linear = self.linear(quantize_per_tensor)
# quantize_per_tensor = None
# stack = torch.stack([linear], 0); linear = None
# getitem = stack[0]; stack = None
# dequantize_2 = getitem.dequantize(); getitem = None
# return getitem
|
from torch.ao.quantization.quantize_fx import convert_fx, prepare_fx
class M(torch.nn.Module):
def __init__(self):
super().__init__()
self.linear = torch.nn.Linear(5, 5)
def forward(self, x):
x = self.linear(x)
y = torch.stack([x], 0)
return y[0]
m = M().eval()
m = prepare_fx(m, {"": torch.ao.quantization.default_qconfig})
m = convert_fx(m)
print(m)
# prints
# GraphModule(
# (linear): QuantizedLinear(in_features=5, out_features=5, scale=1.0,
zero_point=0, qscheme=torch.per_tensor_affine)
# )
# def forward(self, x):
# linear_input_scale_0 = self.linear_input_scale_0
# linear_input_zero_point_0 = self.linear_input_zero_point_0
# quantize_per_tensor = tor... |
PyTorch 1.10.2 Release, small bug fix release
PyTorch 1.10.1 Release, small bug fix release
This release is meant to fix the following issues (regressions / silent correctness):
- torch.nn.cross_entropy silently incorrect in PyTorch 1.10 on CUDA on non-contiguous inputs #67167
- channels_last significantly degrades accuracy #67239
- Potential strict aliasing rule violation in bitwise_binary_op (on ARM/NEON) #66119
- torch.get_autocast_cpu_dtype() returns a new dtype #65786
- Conv2d grad bias gets wrong value for bfloat16 case #68048
The release tracker should contain all relevant pull requests related to this release as well as links to related issues
PyTorch 1.10 Release, including CUDA Graphs APIs, Frontend and compiler improvements
1.10.0 Release Notes
- Highlights
- Backwards Incompatible Change
- New Features
- Improvements
- Performance
- Documentation
Highlights
We are excited to announce the release of PyTorch 1.10. This release is composed of over 3,400 commits since 1.9, made by 426 contributors. We want to sincerely thank our community for continuously improving PyTorch.
PyTorch 1.10 updates are focused on improving training and performance of PyTorch, and developer usability. Highlights include:
- CUDA Graphs APIs are integrated to reduce CPU overheads for CUDA workloads.
- Several frontend APIs such as FX,
torch.special
, andnn.Module
Parametrization, have moved from beta to stable. - Support for automatic fusion in JIT Compiler expands to CPUs in addition to GPUs.
- Android NNAPI support is now available in beta.
You can check the blogpost that shows the new features here.
Backwards Incompatible changes
Python API
torch.any
/torch.all
behavior changed slightly to be more consistent for zero-dimension, uint8
tensors. (#64642)
These two functions match the behavior of NumPy, returning an output dtype of bool for all support dtypes, except for uint8
(in which case they return a 1 or a 0, but with uint8
dtype). In some cases with 0-dim tensor inputs, the returned uint8
value could mistakenly take on a value > 1. This has now been fixed.
1.9.1 | 1.10.0 |
---|---|
>>> torch.all(torch.tensor(42, dtype=torch.uint8))
tensor(1, dtype=torch.uint8)
>>> torch.all(torch.tensor(42, dtype=torch.uint8), dim=0)
tensor(42, dtype=torch.uint8) # wrong, old behavior
|
>>> torch.all(torch.tensor(42, dtype=torch.uint8))
tensor(1, dtype=torch.uint8)
>>> torch.all(torch.tensor(42, dtype=torch.uint8), dim=0)
tensor(1, dtype=torch.uint8) # new, corrected and consistent behavior
|
Remove deprecated torch.{is,set}_deterministic
(#62158)
This is the end of the deprecation cycle for both of these functions. You should be using torch.use_deterministic_algorithms
andtorch.are_deterministic_algorithms_enabled
instead.
Complex Numbers
Conjugate View: tensor.conj()
now returns a view tensor that aliases the same memory and has conjugate bit set (#54987, #60522, #66082, #63602).
This means that .conj()
is now an O(1) operation and returns a tensor that views the same memory as tensor
and has conjugate bit set. This notion of conjugate bit enables fusion of operations with conjugation which gives a lot of performance benefit for operations like matrix multiplication. All out-of-place operations will have the same behavior as before, but an in-place operation on a conjugated tensor will additionally modify the input tensor.
1.9.1 | 1.10.0 |
---|---|
>>> import torch
>>> x = torch.tensor([1+2j])
>>> y = x.conj()
>>> y.add_(2)
>>> print(x)
tensor([1.+2.j])
|
>>> import torch
>>> x = torch.tensor([1+2j])
>>> y = x.conj()
>>> y.add_(2)
>>> print(x)
tensor([3.+2.j])
|
Note: You can verify if the conj bit is set by calling tensor.is_conj()
. The conjugation can be resolved, i.e., you can obtain a new tensor that doesn’t share storage with the input tensor at any time by calling conjugated_tensor.clone()
or conjugated_tensor.resolve_conj()
.
Note that these conjugated tensors behave differently from the corresponding numpy arrays obtained from np.conj()
when an in-place operation is performed on them (similar to the example shown above).
Negative View: tensor.conj().neg()
returns a view tensor that aliases the same memory as both tensor and tensor.conj()
and has a negative bit set (#56058).
conjugated_tensor.neg()
continues to be an O(1) operation, but the returned tensor shares memory with both tensor
and conjugated_tensor
.
1.9.1 | 1.10.0 |
---|---|
>>> x = torch.tensor([1+2j])
>>> y = x.conj()
>>> z = y.imag
>>> z.add_(2)
>>> print(x)
tensor([1.+2.j])
|
>>> x = torch.tensor([1+2j])
>>> y = x.conj()
>>> z = y.imag
>>> print(z.is_neg())
True
>>> z.add_(2)
>>> print(x)
tensor([1.-0.j])
|
tensor.numpy()
now throws RuntimeError
when called on a tensor with conjugate or negative bit set (#61925).
Because the notion of conjugate bit and negative bit doesn’t exist outside of PyTorch, calling operations that return a Python object viewing the same memory as input like .numpy()
would no longer work for tensors with conjugate or negative bit set.
1.9.1 | 1.10.0 |
---|---|
>>> x = torch.tensor([1+2j])
>>> y = x.conj().imag
>>> print(y.numpy())
[2.]
|
>>> x = torch.tensor([1+2j])
>>> y = x.conj().imag
>>> print(y.numpy())
RuntimeError: Can't call numpy() on Tensor that has negative
bit set. Use tensor.resolve_neg().numpy() instead.
|
Autograd
Raise TypeError
instead of RuntimeError
when assigning to a Tensor’s grad field with wrong type (#64876)
Setting the .grad
field with a non-None and non-Tensor object used to return a RuntimeError
but it now properly returns a TypeError
. If your code was catching this error, you should simply update it to catch a TypeError
instead of a RuntimeError
.
1.9.1 | 1.10.0 |
---|---|
try:
# Assigning an int to a Tensor's grad field
a.grad = 0
except RuntimeError as e:
pass
|
try:
a.grad = 0
except TypeError as e:
pass
|
Raise error when inputs to autograd.grad
are empty (#52016)
Calling autograd.grad
with an empty list of inputs used to do the same as backward. To reduce confusion, it now raises the expected error. If you were relying on this, you can simply update your code as follows:
1.9.1 | 1.10.0 |
---|---|
grad = autograd.grad(out, tuple())
assert grad == tuple()
|
out.backward()
|
Optional arguments to autograd.gradcheck
and autograd.gradgradcheck
are now kwarg-only (#65290)
These two functions now have a significant number of optional arguments controlling what they do (i.e., eps
, atol
, rtol
, raise_exception
, etc.). To improve readability, we made these arguments kwarg-only. If you are passing these arguments to autograd.gradcheck
or autograd.gradgradcheck
as positional arguments, you can update your code as follows:
1.9.1 | 1.10.0 |
---|---|
torch.autograd.gradcheck(fn, x, 1e-6)
|
torch.autograd.gradcheck(fn, x, eps=1e-6)
|
In-place detach (detach_
) now errors for views that return multiple outputs (#58285)
This change is finishing the deprecation cycle for the inplace-over-view logic. In particular, a few things that were warning are updated:
* `detach_` will now raise an error when invoked on any view created by `split`, `split_with_sizes`, or `chunk`. You should use the non-inplace `detach` instead.
* The error message for when an in-place operation (that is not detach) is performed on a view created by `split`, `split_with_size`, and `chunk` has been changed from "This view is an output of a function..." to "This view is the output of a function...".
1.9.1 | 1.10.0 |
---|---|
b = a.split(1)[0]
b.detach_()
|
b = a.split(1)[0]
c = b.detach()
|
Fix saved variable unpacking version counter (#60195)
In-place on the unpacked SavedVariables used to be ignored. They are now properly detected which can lead to errors saying that a variable needed for backward was modified in-place.
This is a valid error and the ...
Small bug fix release
PyTorch 1.9.1 Release Notes
- Improvements
- Bug Fixes
- Documentation
Improvements
- Stop warning on
.names()
access inmax_pool2d
#60059 - Remove Caffe2 thread-pool leak warning #60318
- Add option to skip GitHub tag validation for
torch.hub.load
#62139 - Use
log.warning
intorch.distributed.run
to print OMP_NUM_THREADS warning #63953 - TorchElastic: Pretty print the failure message captured by @record #64036
torch.distribtued.run
to setnproc_per_node
to 1 by default #61552- Remove experimental API warning from
torch.distributed.elastic.utils.store
#60807 - Deprecate
use_env
intorch.distributed.run
#59409 - Better engineering changes for torch.distributed launcher #59152
Bug fixes
Distributed / TorchElastic
- Make init_method=tcp:// compatible with
torch.distributed.run
#63910 - Fix default parameters (number of restarts, log level, number of processes per node) that regressed with the transition from
torch.distributed.launch
andtorch.distributed.run
and clarify the documentation accordingly #61294
Hub
- Fix HTTP/403 error when calling
torch.hub.load
for TorchVision models #62072
Misc
torch.mm
to check input matrix sizes shapes #61394
Documentation
LTS 1.8.2, Wrap cub in its own namespace
PyTorch 1.8.2 Release Notes
- Highlights
- Bug Fixes
Highlights
We are excited to announce the release of PyTorch 1.8.2. This is the first release we are making as part of the Pytorch Enterprise Support Program. This release includes a bug fix requested by a customer in an LTS branch.
We'd like to thank Microsoft for their support and work on this release.
Bug Fixes
PyTorch 1.9 Release, including Torch.Linalg and Mobile Interpreter
PyTorch 1.9 Release Notes
- Highlights
- Backwards Incompatible Change
- Deprecations
- New Features
- Improvements
- Bug Fixes
- Performance
- Documentation
Highlights
We are excited to announce the release of PyTorch 1.9. The release is composed of more than 3,400 commits since 1.8, made by 398 contributors. Highlights include:
- Major improvements to support scientific computing, including torch.linalg, torch.special, and Complex Autograd
- Major improvements in on-device binary size with Mobile Interpreter
- Native support for elastic-fault tolerance training through the upstreaming of TorchElastic into PyTorch Core
- Major updates to the PyTorch RPC framework to support large scale distributed training with GPU support
- New APIs to optimize performance and packaging for model inference deployment
- Support for Distributed training, GPU utilization and SM efficiency in the PyTorch Profiler
We’d like to thank the community for their support and work on this latest release. We’d especially like to thank Quansight and Microsoft for their contributions.
You can find more details on all the highlighted features in the PyTorch 1.9 Release blogpost.
Backwards Incompatible changes
Python API
torch.divide
withrounding_mode='floor'
now returns infinity when a non-zero number is divided by zero (#56893).
This fixes therounding_mode='floor'
behavior to return the same non-finite values as other rounding modes when there is a division by zero. Previously it would always result in a NaN value, but a non-zero number divided by zero should return +/- infinity in IEEE floating point arithmetic. Note this does not effecttorch.floor_divide
or the floor division operator, which currently userounding_mode='trunc'
(and are also deprecated for that reason).
1.8.1 | 1.9.0 |
---|---|
>>> a = torch.tensor([-1.0, 0.0, 1.0])
>>> b = torch.tensor([0.0])
>>> torch.divide(a, b, rounding_mode='floor')
tensor([nan, nan, nan])
|
>>> a = torch.tensor([-1.0, 0.0, 1.0])
>>> b = torch.tensor([0.0])
>>> torch.divide(a, b, rounding_mode='floor')
tensor([-inf, nan, inf])
|
- Legacy tensor constructors and
Tensor.new
no longer support passing bothTensor
anddevice
as inputs (#58108).
This fixes a bug in which 1-element integer tensors were misinterpreted as specifying tensor size, yielding an uninitialized tensor. As noted in the error message, use the new-styletorch.tensor(...)
ortorch.as_tensor(...)
to copy or alias an existing tensor. If you want to create an uninitialized tensor, usetorch.empty(...)
.
1.8.1 | 1.9.0 |
---|---|
>>> a = torch.tensor([1])
>>> torch.LongTensor(a, device='cpu') # uninitialized
tensor([7022349217739848992])
>>> a.new(a, device='cpu')
tensor([4294967295]) # uninitialized
|
>>> a = torch.tensor([1])
>>> torch.LongTensor(a, device='cpu')
RuntimeError: Legacy tensor constructor of the form torch.Tensor(tensor, device=device) is
not supported. Use torch.tensor(...) or torch.as_tensor(...) instead.
>>> a.new(a, device='cpu')
RuntimeError: Legacy tensor new of the form tensor.new(tensor, device=device) is not
supported. Use torch.as_tensor(...) instead.
|
torch.divide
withrounding_mode='true'
is replaced withrounding_mode=None
(#51988).
torch.divide
's undocumentedrounding_mode='true'
option has been removed, and insteadrounding_mode=None
should be passed to indicate no rounding should take place. This is equivalent to omitting the argument entirely.
1.8.1 | 1.9.0 |
---|---|
>>> a, b = torch.full((2,), 4.2), torch.full((2,), 2)
>>> torch.divide(a, b, rounding_mode='true')
tensor([2.1000, 2.1000])
|
>>> a, b = torch.full((2,), 4.2), torch.full((2,), 2)
>>> torch.divide(a, b, rounding_mode=None) # equivalent to torch.divide(a, b, rounding_mode='true') from the prior release
tensor([2.1000, 2.1000])
|
import torch.tensor as tensor
is no longer supported (#53424).
Instead, usefrom torch import tensor
1.8.1 | 1.9.0 |
---|---|
>>> import torch.tensor as tensor
>>> torch.tensor(1.)
tensor(1.)
|
>>> import torch.tensor as tensor
ModuleNotFoundError: No module named 'torch.tensor'
>>> from torch import tensor
>>> tensor(1.)
tensor(1.)
|
- binary release:
numpy
is no longer a required dependency
If you requirenumpy
(and don't already have it installed) you will need to install it separately.
Autograd
torch.autograd.gradcheck.get_numerical_jacobian
andtorch.autograd.gradcheck.get_analytical_jacobian
no longer support functions that return complex valued output as well as any other values ofgrad_out
not equal to 1 (#55692).
This change is a part of a refactor ofgradcheck
’s internals. Note thatgradcheck
itself still supports functions with complex output. This new restriction only applies to calls to the two internal helper functions. As a workaround, you can wrap your functions to return either the real or imaginary component of its output before calling these functions. Additionally these internal helpers no longer accept any other value except 1 forgrad_out
for any input function. Note that these helper functions are also being deprecated in this release.
1.8.1:
get_numerical_jacobian(torch.complex, (a, b), grad_out=2.0)
1.9.0:
def wrapped(fn):
def wrapper(*input):
return torch.real(fn(*input))
return wrapper
get_numerical_jacobian(wrapped(torch.complex), (a, b), grad_out=1.0)
torch.autograd.gradcheck
now throwsGradcheckError
(#55656).
This change is a part of a refactor ofgradcheck
’s internals. All errors that are able to be silenced byraise_exception=False
now raiseGradcheckError
(which inherits fromRuntimeError
). If you explicitly check that the type of the error isRuntimeError
you'll need to update your code to check forGradcheckError
instead. Otherwise if you use something likeexcept
orisinstance
, no changes are necessary.
1.8.1:
# An example of a situation that will now return GradcheckError instead of
# RuntimeError is when there is a jacobian mismatch, which can happen
# for example when you forget to specify float64 for your inputs.
try:
torch.autograd.gradcheck(torch.sin, (torch.ones(1, requires_grad=True),))
except RuntimeError as e:
assert type(e) is RuntimeError # explicitly check type -> NEEDS UPDATE
1.9.0:
try:
torch.autograd.gradcheck(torch.sin, (torch.ones(1, requires_grad=True),)
except RuntimeError as e:
# GradcheckError inherits from RuntimeError so you can still catch this
# with RuntimeError (No change necessary!)
# BUT, if you explicitly check type...
assert type(e) is torch.autograd.GradcheckError
- Finished deprecation cycle for in-place view error checks (#56093).
In-place modification of views will now raise an error if that view was created by a custom function or a function that returns multiple views, or if the view was created in no-grad mode. Modifying in-place a view created in the situations above are error-prone and have been deprecated since v1.5.0. Doing these in-place modifications are now forbidden. For more information on how to work around this, see the related sections the release notes linked below:
torch.nn
- Fixed regression for
nn.MultiheadAttention
to now apply bias flag to both in and out projection layers (#52537).
In PyTorch 1.6, a regression was introduced that caused thebias
flag ofnn.MultiheadAttention
only to apply to the input projection layer. This caused the output projection layer to always include abias
parameter, even withbias=False
specified. The regression is now fixed in PyTorch 1.9, making thebias
flag correctly apply to both the input and output projection layers. This fix is BC-breaking for thebias=False
case as it will now result in nobias
parameter for the output projection layer.
v1.6 - v1.8.1: | pre 1.6 & 1.9.0 |
---|---|
>>> mha = torch.nn.MultiheadAttenti... |