Skip to content

Releases: pytorch/pytorch

PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips are now available

28 Oct 16:54
7c98e70
Compare
Choose a tag to compare

Pytorch 1.13 Release Notes

  • Highlights
  • Backwards Incompatible Changes
  • New Features
  • Improvements
  • Performance
  • Documentation
  • Developers

Highlights

We are excited to announce the release of PyTorch 1.13! This includes stable versions of BetterTransformer. We deprecated CUDA 10.2 and 11.3 and completed migration of CUDA 11.6 and 11.7. Beta includes improved support for Apple M1 chips and functorch, a library that offers composable vmap (vectorization) and autodiff transforms, being included in-tree with the PyTorch release. This release is composed of over 3,749 commits and 467 contributors since 1.12.1. We want to sincerely thank our dedicated community for your contributions.

Summary:

  • The BetterTransformer feature set supports fastpath execution for common Transformer models during Inference out-of-the-box, without the need to modify the model. Additional improvements include accelerated add+matmul linear algebra kernels for sizes commonly used in Transformer models and Nested Tensors is now enabled by default.

  • Timely deprecating older CUDA versions allows us to proceed with introducing the latest CUDA version as they are introduced by Nvidia®, and hence allows support for C++17 in PyTorch and new NVIDIA Open GPU Kernel Modules.

  • Previously, functorch was released out-of-tree in a separate package. After installing PyTorch, a user will be able to import functorch and use functorch without needing to install another package.

  • PyTorch is offering native builds for Apple® silicon machines that use Apple's new M1 chip as a beta feature, providing improved support across PyTorch's APIs.

Stable Beta Prototype
  • Better Transformer
  • CUDA 10.2 and 11.3 CI/CD Deprecation
  • Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs
  • Extend NNC to support channels last and bf16
  • Functorch now in PyTorch Core Library
  • Beta Support for M1 devices
  • Arm® Compute Library backend support for AWS Graviton
  • CUDA Sanitizer

You can check the blogpost that shows the new features here.

Backwards Incompatible changes

Python API

uint8 and all integer dtype masks are no longer allowed in Transformer (#87106)

Prior to 1.13, key_padding_mask could be set to uint8 or other integer dtypes in TransformerEncoder and MultiheadAttention, which might generate unexpected results. In this release, these dtypes are not allowed for the mask anymore. Please convert them to torch.bool before using.

1.12.1

>>> layer = nn.TransformerEncoderLayer(2, 4, 2)
>>> encoder = nn.TransformerEncoder(layer, 2)
>>> pad_mask = torch.tensor([[1, 1, 0, 0]], dtype=torch.uint8)
>>> inputs = torch.cat([torch.randn(1, 2, 2), torch.zeros(1, 2, 2)], dim=1)
# works before 1.13
>>> outputs = encoder(inputs, src_key_padding_mask=pad_mask)

1.13

>>> layer = nn.TransformerEncoderLayer(2, 4, 2)
>>> encoder = nn.TransformerEncoder(layer, 2)
>>> pad_mask = torch.tensor([[1, 1, 0, 0]], dtype=torch.bool)
>>> inputs = torch.cat([torch.randn(1, 2, 2), torch.zeros(1, 2, 2)], dim=1)
>>> outputs = encoder(inputs, src_key_padding_mask=pad_mask)

Updated torch.floor_divide to perform floor division (#78411)

Prior to 1.13, torch.floor_divide erroneously performed truncation division (i.e. truncated the quotients). In this release, it has been fixed to perform floor division. To replicate the old behavior, use torch.div with rounding_mode='trunc'.

1.12.1

>>> a = torch.tensor([4.0, -3.0])
>>> b = torch.tensor([2.0, 2.0])
>>> torch.floor_divide(a, b)
tensor([ 2., -1.])

1.13

>>> a = torch.tensor([4.0, -3.0])
>>> b = torch.tensor([2.0, 2.0])
>>> torch.floor_divide(a, b)
tensor([ 2., -2.])
# Old behavior can be replicated using torch.div with rounding_mode='trunc'
>>> torch.div(a, b, rounding_mode='trunc')
tensor([ 2., -1.])

Fixed torch.index_select on CPU to error that index is out of bounds when the source tensor is empty (#77881)

Prior to 1.13, torch.index_select would return an appropriately sized tensor filled with random values on CPU if the source tensor was empty. In this release, we have fixed this bug so that it errors out. A consequence of this is that torch.nn.Embedding which utilizes index_select will error out rather than returning an empty tensor when embedding_dim=0 and input contains indices which are out of bounds. The old behavior cannot be reproduced with torch.nn.Embedding, however since an Embedding layer with embedding_dim=0 is a corner case this behavior is unlikely to be relied upon.

1.12.1

>>> t = torch.tensor([4], dtype=torch.long)
>>> embedding = torch.nn.Embedding(3, 0)
>>> embedding(t)
tensor([], size=(1, 0), grad_fn=<EmbeddingBackward0>)

1.13

>>> t = torch.tensor([4], dtype=torch.long)
>>> embedding = torch.nn.Embedding(3, 0)
>>> embedding(t)
RuntimeError: INDICES element is out of DATA bounds, id=4 axis_dim=3

Disallow overflows when tensors are constructed from scalars (#82329)

Prior to this PR, overflows during tensor construction from scalars would not throw an error. In 1.13, such cases will error.

1.12.1

>>> torch.tensor(1000, dtype=torch.int8)
tensor(-24, dtype=torch.int8)

1.13

>>> torch.tensor(1000, dtype=torch.int8)
RuntimeError: value cannnot be converted to type int8 without overflow

Error on indexing a cpu tensor with non-cpu indices (#69607)

Prior to 1.13, cpu_tensor[cuda_indices] was a valid program that would return a cpu tensor. The original use case for mixed device indexing was for non_cpu_tensor[cpu_indices], and allowing the opposite was unintentional (cpu_tensor[non_cpu_indices]). This behavior appears to be rarely used, and a refactor of our indexing kernels made it difficult to represent an op that takes in (cpu_tensor, non_cpu_tensor) and returns another cpu_tensor, so it is now an error.

To replicate the old behavior for base[indices], you can ensure that either indices lives on the CPU device, or base and indices both live on the same device.

1.12.1

>>> a = torch.tensor([1.0, 2.0, 3.0])
>>> b = torch.tensor([0, 2], device='cuda')
>>> a[b]
tensor([1., 3.])

1.13

>>> a = torch.tensor([1.0, 2.0, 3.0])
>>> b = torch.tensor([0, 2], device='cuda')
>>> a[b]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
# Old behavior can be replicated by moving b to CPU, or a to CUDA
>>> a[b.cpu()]
tensor([1., 3.])
>>> a.cuda()[b]
tensor([1., 3.], device='cuda:0')

Remove deprecated torch.eig, torch.matrix_rank, torch.lstsq (#70982, #70981, #70980)

The deprecation cycle for the above functions has been completed and they have been removed in the 1.13 release.

torch.nn

Enforce that the bias has the same dtype as input and weight for convolutions on CPU (#83686)

To align with the implementation on other devices, the CPU implementation for convolutions was updated to enforce that the dtype of the bias matches the dtype of the input and weight.

1.12.1

# input and weight are dtype torch.int64
# bias is torch.float32
>>> out = torch.nn.functional.conv2d(input, weight, bias, ...)

1.13

# input and weight are dtype torch.int64
# bias is torch.float32
>>> with assertRaisesError():
>>>    out = torch.nn.functional.conv2d(input, weight, bias, ...)

# Updated code to avoid the error
>>> out = torch.nn.functional.conv2d(input, weight, bias.to(input.dtype), ...)

Autograd

Disallow setting the .data of a tensor that requires_grad=True with an integer tensor (#78436)

Setting the .data of a tensor that requires_grad with an integer tensor now raises an error.

1.12.1

>>> x = torch.randn(2, requires_grad=True)
>>> x.data = torch.randint(1, (2,))
>>> x
tensor([0, 0], requires_grad=True)

1.13

>>> x = torch.randn(2, requires_grad=True)
>>> x.data = torch.randint(1, (2,))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: data set to a tensor that requires gradients must be floating point or complex dtype

Added variable_list support to ExtractVariables struct (#84583)

Prior to this change, C++ custom autograd Function considers tensors passed in TensorList to not be tensors for the purposes of recording the backward graph. After this change, custom Functions that receive TensorList must modify their backward functions to also compute gradients for these additional tensor inputs. Note that this behavior now differs from that of custom autograd Functions in Python.

1.12.1

struct MyFunction : public Function<MyFunction> {
    static Variable forward(AutogradContext* ctx, at::Tensor t, at::TensorList tensors) {
      return 2 * tensors[0] + 3 * t;
    }

    static variable_list backward(
        AutogradContext* ctx,
        variable_list grad_output) {
      return {3 * grad_output[0]};
    }
};

1.13

struct MyFunction : public Function<MyFunction> {
    static Variable forward(AutogradContext* ctx, at::Tensor t, at::TensorList tensors) {
      return 2 * tensors[0] + 3 * t;
    }

    static variable_list backward(
        AutogradContext* ctx,
        variable_list grad_output) {
      return {3 * grad_output[0], 2 * grad_output[0]};
    }
};

Don't detach when making views; force kernel to detach (#84893)

View operations registered as CompositeExplicitAutograd kernels are no longer allowed to return input tensors as-is. You must explic...

Read more

PyTorch 1.12.1 Release, small bug fix release

05 Aug 19:35
664058f
Compare
Choose a tag to compare

This release is meant to fix the following issues (regressions / silent correctness):

Optim

  • Remove overly restrictive assert in adam #80222

Autograd

  • Convolution forward over reverse internal asserts in specific case #81111
  • 25% Performance regression from v0.1.1 to 0.2.0 when calculating hessian #82504

Distributed

  • Fix distributed store to use add for the counter of DL shared seed #80348
  • Raise proper timeout when sharing the distributed shared seed #81666

NN

  • Allow register float16 weight_norm on cpu and speed up test #80600
  • Fix weight norm backward bug on CPU when OMP_NUM_THREADS <= 2 #80930
  • Weight_norm is not working with float16 #80599
  • New release breaks torch.nn.weight_norm backwards pass and breaks all Wav2Vec2 implementations #80569
  • Disable src mask for transformer and multiheadattention fastpath #81277
  • Make nn.stateless correctly reset parameters if the forward pass fails #81262
  • torchvision.transforms.functional.rgb_to_grayscale() + torch.nn.Conv2d() don`t work on 1080 GPU #81106
  • Transformer and CPU path with src_mask raises error with torch 1.12 #81129

Data Loader

  • [Locking lower ranks seed recepients https://github.com//pull/81071

CUDA

  • os.environ["CUDA_VISIBLE_DEVICES"] has no effect #80876
  • share_memory() on CUDA tensors no longer no-ops and instead crashes #80733
  • [Prims] Unbreak CUDA lazy init #80899
  • PyTorch 1.12 cu113 wheels cudnn discoverability issue #80637
  • Remove overly restrictive checks for cudagraph #80881

ONNX

MPS

Other

  • Don't error if _warned_capturable_if_run_uncaptured not set #80345
  • Initializing libiomp5.dylib, but found libomp.dylib already initialized. #78490
  • Assertion error - _dl_shared_seed_recv_cnt - pt 1.12 - multi node #80845
  • Add 3.10 stdlib to torch.package #81261
  • CPU-only c++ extension libraries (functorch, torchtext) built against PyTorch wheels are not fully compatible with PyTorch wheels #80489

PyTorch 1.12: TorchArrow, Functional API for Modules and nvFuser, are now available

28 Jun 16:48
67ece03
Compare
Choose a tag to compare

PyTorch 1.12 Release Notes

  • Highlights
  • Backwards Incompatible Change
  • New Features
  • Improvements
  • Performance
  • Documentation

Highlights

We are excited to announce the release of PyTorch 1.12! This release is composed of over 3124 commits, 433 contributors. Along with 1.12, we are releasing beta versions of AWS S3 Integration, PyTorch Vision Models on Channels Last on CPU, Empowering PyTorch on Intel® Xeon® Scalable processors with Bfloat16 and FSDP API. We want to sincerely thank our dedicated community for your contributions.

Summary:

  • Functional Module API to functionally apply module computation with a given set of parameters
  • Complex32 and Complex Convolutions in PyTorch
  • DataPipes from TorchData fully backward compatible with DataLoader
  • Functorch with improved coverage for APIs
  • nvFuser a deep learning compiler for PyTorch
  • Changes to float32 matrix multiplication precision on Ampere and later CUDA hardware
  • TorchArrow, a new beta library for machine learning preprocessing over batch data

Backwards Incompatible changes

Python API

Updated type promotion for torch.clamp (#77035)

In 1.11, the ‘min’ and ‘max’ arguments in torch.clamp did not participate in type promotion, which made it inconsistent with minimum and maximum operations. In 1.12, the ‘min’ and ‘max’ arguments participate in type promotion.

1.11

>>> import torch
>>> a = torch.tensor([1., 2., 3., 4.], dtype=torch.float32)
>>> b = torch.tensor([2., 2., 2., 2.], dtype=torch.float64)
>>> c = torch.tensor([3., 3., 3., 3.], dtype=torch.float64)
>>> torch.clamp(a, b, c).dtype
torch.float32

1.12

>>> import torch
>>> a = torch.tensor([1., 2., 3., 4.], dtype=torch.float32)
>>> b = torch.tensor([2., 2., 2., 2.], dtype=torch.float64)
>>> c = torch.tensor([3., 3., 3., 3.], dtype=torch.float64)
>>> torch.clamp(a, b, c).dtype
torch.float64

Complex Numbers

Fix complex type promotion (#77524)

Updates the type promotion rule such that given a complex scalar and real tensor, the value type of real tensor is preserved

1.11

>>> a = torch.randn((2, 2), dtype=torch.float)
>>> b = torch.tensor(1, dtype=torch.cdouble)
>>> (a + b).dtype
torch.complex128

1.12

>>> a = torch.randn((2, 2), dtype=torch.float)
>>> b = torch.tensor(1, dtype=torch.cdouble)
>>> (a + b).dtype
torch.complex64

LinAlg

Disable TF32 for matmul by default and add high-level control of fp32 matmul precision (#76509)

PyTorch 1.12 makes the default math mode for fp32 matrix multiplications more precise and consistent across hardware. This may affect users on Ampere or later CUDA devices and TPUs. See the PyTorch blog for more details.

Sparse

Use ScatterGatherKernel for scatter_reduce (CPU-only) (#74226, #74608)

In 1.11.0, unlike scatter which takes a reduce kwarg or scatter_add, scatter_reduce was not an in-place function. That is, it did not allow the user to pass an output tensor which contains data that is reduced together with the scattered data. Instead, the scatter reduction took place on an output tensor initialized under the hood. Indices of the output that were not scattered to were filled with reduction inits (or 0 for options ‘amin’ and ‘amax’).

In 1.12.0, scatter_reduce (which is in beta) is in-place to align with the API of the related existing functions scatter/scatter_add. For this reason, the argument input in 1.11.0 has been renamed src in 1.12.0 and the new self argument now takes a destination tensor to be scattered onto. Since the destination tensor is no longer initialized under the hood, the output_size kwarg in 1.11.0 that allowed users to specify the size of the output at dimension dim has been removed. Further, in 1.12.0 we introduce an include_self kwarg which determines whether values in the self (destination) tensor are included in the reduction. Setting include_self=True could, for example, allow users to provide special reduction inits for the scatter_reduction operation. Otherwise, if include_self=False, indices scattered to are treated as if they were filled with reduction inits.

In the snippet below, we illustrate how the behavior of scatter_reduce in 1.11.0 can be achieved with the function released in 1.12.0.

Example:

>>> src = torch.arange(6, dtype=torch.float).reshape(3, 2)
>>> index = torch.tensor([[0, 2], [1, 1], [0, 0]])
>>> dim = 1
>>> output_size = 4
>>> reduce = "prod"

1.11

>>> torch.scatter_reduce(src, dim, index, reduce, output_size=output_size)
`tensor([[ 0., 1., 1., 1.],
        [ 1., 6., 1., 1.],
        [20., 1., 1., 1.]])`

1.12

>>> output_shape = list(src.shape)
>>> output_shape[dim] = output_size
# reduction init for prod is 1
# filling the output with 1 is only necessary if the user wants to preserve the behavior in 1.11
# where indices not scattered to are filled with reduction inits
>>> output = src.new_empty(output_shape).fill_(1)
>>> output.scatter_reduce_(dim, index, src, reduce)
`tensor([[ 0., 1., 1., 1.],
        [ 1., 6., 1., 1.],
        [20., 1., 1., 1.]])`

torch.nn

nn.GroupNorm: Report an error if num_channels is not divisible by num_groups (#74293)

Previously, nn.GroupNorm would error out during the forward pass if num_channels is not divisible by num_groups. Now, the error is thrown for this case during module construction instead.

1.11

m = torch.nn.GroupNorm(3, 7)
m(...)  # errors during forward pass

1.12

m = torch.nn.GroupNorm(3, 7)  # errors during construction

nn.Dropout2d: Return to 1.10 behavior: perform 1D channel-wise dropout for 3D inputs

In PyTorch 1.10 and older, passing a 3D input to nn.Dropout2D resulted in 1D channel-wise dropout behavior; i.e. such inputs were interpreted as having shape (N, C, L) with N = batch size and C = # channels and channel-wise dropout was performed along the second dimension.

1.10

x = torch.randn(2, 3, 4)
m = nn.Dropout2d(p=0.5)
out = m(x)  # input is assumed to be shape (N, C, L); dropout along the second dim.

With the introduction of no-batch-dim input support in 1.11, 3D inputs were reinterpreted as having shape (C, H, W); i.e. an input without a batch dimension, and dropout behavior was changed to drop along the first dimension. This was a silent breaking change.

1.11

x = torch.randn(2, 3, 4)
m = nn.Dropout2d(p=0.5)
out = m(x)  # input is assumed to be shape (C, H, W); dropout along the first dim.

The breaking change in 1.11 resulted in a lack of support for 1D channel-wise dropout behavior, so Dropout2d in PyTorch 1.12 returns to 1.10 behavior with a warning to give some time to adapt before the no-batch-dim interpretation goes back into effect.

1.12

x = torch.randn(2, 3, 4)
m = nn.Dropout2d(p=0.5)
out = m(x)  # input is assumed to be shape (N, C, L); dropout along the second dim.
            # throws a warning suggesting nn.Dropout1d for 1D channel-wise dropout.

If you want 1D channel-wise dropout behavior, please switch to use of the newly-added nn.Dropout1d module instead of nn.Dropout2d. If you want no-batch-dim input behavior, please note that while this is not supported in 1.12, a future release will reinstate the interpretation of 3D inputs to nn.Dropout2d as those without a batch dimension.

F.cosine_similarity: Improve numerical stability (#31378)

Previously, we first compute the inner product, then normalize. After this change, we first normalize, then compute inner product. This should be more numerically stable because it avoids losing precision in inner product for inputs with large norms. Because of this change, outputs may be different in some cases.

Composability

Functions in torch.ops.aten.{foo} no longer accept self as a kwarg

torch.ops.aten.{foo} objects are now instances of OpOverloadPacket (instead of a function) that have their __call__ method in Python, which means that you cannot pass self as a kwarg. You can pass it normally as a positional argument instead.

1.11

>>> torch.ops.aten.sin(self=torch.ones(2))
    tensor([0.8415, 0.8415])

1.12

# this now fails
>>> torch.ops.aten.sin(self=torch.ones(2))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __call__() got multiple values for argument 'self'
# this works
>>> torch.ops.aten.sin(torch.ones(2))
tensor([0.8415, 0.8415])

torch_dispatch now traces individual op overloads instead of op overload packets (#72673)

torch.ops.aten.add actually corresponds to a bundle of functions from C++, corresponding to all over the overloads of add operator (specifically, add.Tensor, add.Scalar and add.out). Now, __torch_dispatch__ will directly take in an overload corresponding to a single aten function.

1.11

class MyTensor(torch.Tensor):
    ....
    def __torch_dispatch__(cls, func, types, args=(), kwargs=None):
        # Before, func refers to a "packet" of all overloads
        # for a given operator, e.g. "add"
        assert func == torch.ops.aten.add

1.12

class MyTensor(torch.Tensor):
    ....
    def __torch_dispatch__(cls, func, types, args=(), kwargs=No...
Read more

PyTorch 1.11, TorchData, and functorch are now available

10 Mar 16:59
bc2c6ed
Compare
Choose a tag to compare

PyTorch 1.11 Release Notes

  • Highlights
  • Backwards Incompatible Change
  • New Features
  • Improvements
  • Performance
  • Documentation

Highlights

We are excited to announce the release of PyTorch 1.11. This release is composed of over 3,300 commits since 1.10, made by 434 contributors. Along with 1.11, we are releasing beta versions of TorchData and functorch. We want to sincerely thank our community for continuously improving PyTorch.

  • TorchData is a new library for common modular data loading primitives for easily constructing flexible and performant data pipelines. View it on GitHub.
  • functorch, a library that adds composable function transforms to PyTorch, is now available in beta. View it on GitHub.
  • Distributed Data Parallel (DDP) static graph optimizations available in stable.

You can check the blogpost that shows the new features here.

Backwards Incompatible changes

Python API

Fixed python deepcopy to correctly copy all attributes on Tensor objects (#65584)

This change ensures that the deepcopy operation on Tensor properly copies all the attributes (and not just the plain Tensor properties).

1.10.21.11.0
a = torch.rand(2)
a.foo = 3
torch.save(a, "bar")
b = torch.load("bar")
print(b.foo)
# Raise AttributeError: "Tensor" object has no attribute "foo"
      
a = torch.rand(2)
a.foo = 3
torch.save(a, "bar")
b = torch.load("bar")
print(b.foo)
# 3
      

steps argument is no longer optional in torch.linspace and torch.logspace

This argument used to default to 100 in PyTorch 1.10.2, but was deprecated (previously you would see a deprecation warning if you didn’t explicitly pass in steps). In PyTorch 1.11, it is not longer optional.

1.10.21.11.0
# Works, but raises a deprecation warning
# Steps defaults to 100
a = torch.linspace(1, 10)
# UserWarning: Not providing a value for linspace's steps is deprecated
# and will throw a runtime error in a future release.
# This warning will appear only once per process.
# (Triggered internally at  ../aten/src/ATen/native/RangeFactories.cpp:19
      
# In 1.11, you must specify steps
a = torch.linspace(1, 10, steps=100)
      

Remove torch.hub.import_module function that was mistakenly public (#67990)

This function is not intended for public use.
If you have existing code that relies on it, you can find an equivalent function at torch.hub._import_module.

C++ API

We’ve cleaned up many of the headers in the C++ frontend to only include the subset of aten operators that they actually used (#68247, #68687, #68688, #68714, #68689, #68690, #68697, #68691, #68692, #68693, #69840)

When you #include a header from the C++ frontend, you can no longer assume that every aten operators are transitively included. You can work around this by directly adding #include <ATen/ATen.h> in your file, which will maintain the old behavior of including every aten operators.

Custom implementation for c10::List and c10::Dict move constructors have been removed (#69370)

The semantics have changed from "make the moved-from List/Dict empty" to "keep the moved-from List/Dict unchanged"

1.10.21.11.0
c10::List list1({"3", "4"});
c10::List list2(std::move(list1));
std::cout << list1.size() // 0
      
c10::List list1({"3", "4"});
c10::List list2(std::move(list1)); // calls copy ctr
std::cout << list1.size() // 2
      

CUDA

Removed THCeilDiv function and corresponding THC/THCDeviceUtils.cuh header (#65472)

As part of cleaning up TH from the codebase, the THCeilDiv function has been removed. Instead, please use at::ceil_div, and include the corresponding ATen/ceil_div.h header

Removed THCudaCheck (#66391)

You can replace it with C10_CUDA_CHECK, which has been available since at least PyTorch 1.4, so just replacing is enough even if you support older versions

Removed THCudaMalloc(), THCudaFree(), THCThrustAllocator.cuh (#65492)

If your extension is using THCThrustAllocator.cuh, please replace it with ATen/cuda/ThrustAllocator.h and corresponding APIs (see examples in this PR).

This PR also removes THCudaMalloc/THCudaFree calls. Please use c10::cuda::CUDACachingAllocator::raw_alloc(size)/raw_delete(ptr), or, preferably, switch to c10:cuda::CUDaCachingAllocator::allocate which manages deallocation. Caching allocator APIs are available since PyTorch 1.2, so just replacing it is enough even if you support older versions of PyTorch.

Build

Stopped building shared library for AOT Compiler, libaot_compiler.so (#66227)

Building aot_compiler.cpp as a separate library is not necessary, as it’s already included in libtorch.so.
You can update your build system to only dynamically link libtorch.so.

Mobile

Make typing.Union type unsupported for mobile builds (#65556)

typing.Union support was added for TorchScript in 1.10. It was removed specifically for mobile due to its lack of use and increase in binary size of PyTorch for Mobile builds.

Distributed

torch.distributed.rpc: Final Removal of ProcessGroup RPC backend (#67363)

ProcessGroup RPC backend is deprecated. In 1.10, it threw an error to help users update their code, and, in 1.11, it is removed completely.

The backend type “PROCESS_GROUP” is now deprecated, e.g.
torch.distributed.rpc.init_rpc("worker0", backend="PROCESS_GROUP", rank=0, world_size=1)
and should be replaced with:
torch.distributed.rpc.init_rpc("worker0", backend="TENSORPIPE", rank=0, world_size=1)

Quantization

Disabled the support for getitem in FX Graph Mode Quantization (#66647)

getitem used to be quantized in FX Graph Mode Quantization, and it is no longer quantized. This won’t break any models but could result in a slight difference in numerics.

1.10.21.11.0
from torch.ao.quantization.quantize_fx import convert_fx, prepare_fx
class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Linear(5, 5)
    def forward(self, x):
        x = self.linear(x)
        y = torch.stack([x], 0)
        return y[0]
m = M().eval()
m = prepare_fx(m, {"": torch.ao.quantization.default_qconfig})
m = convert_fx(m)
print(m)
# prints
# GraphModule(
#   (linear): QuantizedLinear(in_features=5, out_features=5,
#      scale=1.0, zero_point=0, qscheme=torch.per_tensor_affine)
# )
# def forward(self, x):
#     linear_input_scale_0 = self.linear_input_scale_0
#     linear_input_zero_point_0 = self.linear_input_zero_point_0
#     quantize_per_tensor = torch.quantize_per_tensor(x,
#         linear_input_scale_0, linear_input_zero_point_0, torch.quint8)
#     x = linear_input_scale_0 = linear_input_zero_point_0 = None
#     linear = self.linear(quantize_per_tensor)
#     quantize_per_tensor = None
#     stack = torch.stack([linear], 0);  linear = None
#     getitem = stack[0]; stack = None
#     dequantize_2 = getitem.dequantize();  getitem = None
#     return getitem
      
from torch.ao.quantization.quantize_fx import convert_fx, prepare_fx
class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Linear(5, 5)
    def forward(self, x):
        x = self.linear(x)
        y = torch.stack([x], 0)
        return y[0]
m = M().eval()
m = prepare_fx(m, {"": torch.ao.quantization.default_qconfig})
m = convert_fx(m)
print(m)
# prints
# GraphModule(
#   (linear): QuantizedLinear(in_features=5, out_features=5, scale=1.0,
                    zero_point=0, qscheme=torch.per_tensor_affine)
# )
# def forward(self, x):
#     linear_input_scale_0 = self.linear_input_scale_0
#     linear_input_zero_point_0 = self.linear_input_zero_point_0
#     quantize_per_tensor = tor...
Read more

PyTorch 1.10.2 Release, small bug fix release

27 Jan 21:51
71f889c
Compare
Choose a tag to compare

This release is meant to deploy additional fixes not included in 1.10.1 release:

  • fix pybind issue for get_autocast_cpu_dtype and get_autocast_gpu_dtype #66396
  • Remove fgrad_input from slow_conv2d #64280
  • fix formatting CIRCLE_TAG when building docs #67026

PyTorch 1.10.1 Release, small bug fix release

15 Dec 22:27
302ee7b
Compare
Choose a tag to compare

This release is meant to fix the following issues (regressions / silent correctness):

  • torch.nn.cross_entropy silently incorrect in PyTorch 1.10 on CUDA on non-contiguous inputs #67167
  • channels_last significantly degrades accuracy #67239
  • Potential strict aliasing rule violation in bitwise_binary_op (on ARM/NEON) #66119
  • torch.get_autocast_cpu_dtype() returns a new dtype #65786
  • Conv2d grad bias gets wrong value for bfloat16 case #68048

The release tracker should contain all relevant pull requests related to this release as well as links to related issues

PyTorch 1.10 Release, including CUDA Graphs APIs, Frontend and compiler improvements

21 Oct 15:49
36449ea
Compare
Choose a tag to compare

1.10.0 Release Notes

  • Highlights
  • Backwards Incompatible Change
  • New Features
  • Improvements
  • Performance
  • Documentation

Highlights

We are excited to announce the release of PyTorch 1.10. This release is composed of over 3,400 commits since 1.9, made by 426 contributors. We want to sincerely thank our community for continuously improving PyTorch.

PyTorch 1.10 updates are focused on improving training and performance of PyTorch, and developer usability. Highlights include:

  • CUDA Graphs APIs are integrated to reduce CPU overheads for CUDA workloads.
  • Several frontend APIs such as FX, torch.special, and nn.Module Parametrization, have moved from beta to stable.
  • Support for automatic fusion in JIT Compiler expands to CPUs in addition to GPUs.
  • Android NNAPI support is now available in beta.

You can check the blogpost that shows the new features here.

Backwards Incompatible changes

Python API

torch.any/torch.all behavior changed slightly to be more consistent for zero-dimension, uint8 tensors. (#64642)

These two functions match the behavior of NumPy, returning an output dtype of bool for all support dtypes, except for uint8 (in which case they return a 1 or a 0, but with uint8 dtype). In some cases with 0-dim tensor inputs, the returned uint8 value could mistakenly take on a value > 1. This has now been fixed.

1.9.11.10.0
>>> torch.all(torch.tensor(42, dtype=torch.uint8))
tensor(1, dtype=torch.uint8)
>>> torch.all(torch.tensor(42, dtype=torch.uint8), dim=0)
tensor(42, dtype=torch.uint8) # wrong, old behavior
      
>>> torch.all(torch.tensor(42, dtype=torch.uint8))
tensor(1, dtype=torch.uint8)
>>> torch.all(torch.tensor(42, dtype=torch.uint8), dim=0)
tensor(1, dtype=torch.uint8) # new, corrected and consistent behavior
      

Remove deprecated torch.{is,set}_deterministic (#62158)

This is the end of the deprecation cycle for both of these functions. You should be using torch.use_deterministic_algorithms andtorch.are_deterministic_algorithms_enabled instead.

Complex Numbers

Conjugate View: tensor.conj() now returns a view tensor that aliases the same memory and has conjugate bit set (#54987, #60522, #66082, #63602).

This means that .conj() is now an O(1) operation and returns a tensor that views the same memory as tensor and has conjugate bit set. This notion of conjugate bit enables fusion of operations with conjugation which gives a lot of performance benefit for operations like matrix multiplication. All out-of-place operations will have the same behavior as before, but an in-place operation on a conjugated tensor will additionally modify the input tensor.

1.9.11.10.0
>>> import torch
>>> x = torch.tensor([1+2j])
>>> y = x.conj()
>>> y.add_(2)
>>> print(x)
tensor([1.+2.j])
      
>>> import torch
>>> x = torch.tensor([1+2j])
>>> y = x.conj()
>>> y.add_(2)
>>> print(x)
tensor([3.+2.j])
      

Note: You can verify if the conj bit is set by calling tensor.is_conj(). The conjugation can be resolved, i.e., you can obtain a new tensor that doesn’t share storage with the input tensor at any time by calling conjugated_tensor.clone() or conjugated_tensor.resolve_conj() .

Note that these conjugated tensors behave differently from the corresponding numpy arrays obtained from np.conj() when an in-place operation is performed on them (similar to the example shown above).

Negative View: tensor.conj().neg() returns a view tensor that aliases the same memory as both tensor and tensor.conj() and has a negative bit set (#56058).

conjugated_tensor.neg() continues to be an O(1) operation, but the returned tensor shares memory with both tensor and conjugated_tensor.

1.9.11.10.0
>>> x = torch.tensor([1+2j])
>>> y = x.conj()
>>> z = y.imag
>>> z.add_(2)
>>> print(x)
tensor([1.+2.j])
      
>>> x = torch.tensor([1+2j])
>>> y = x.conj()
>>> z = y.imag
>>> print(z.is_neg())
True
>>> z.add_(2)
>>> print(x)
tensor([1.-0.j])
      

tensor.numpy() now throws RuntimeError when called on a tensor with conjugate or negative bit set (#61925).

Because the notion of conjugate bit and negative bit doesn’t exist outside of PyTorch, calling operations that return a Python object viewing the same memory as input like .numpy() would no longer work for tensors with conjugate or negative bit set.

1.9.11.10.0
>>> x = torch.tensor([1+2j])
>>> y = x.conj().imag
>>> print(y.numpy())
[2.]
      
>>> x = torch.tensor([1+2j])
>>> y = x.conj().imag
>>> print(y.numpy())
RuntimeError: Can't call numpy() on Tensor that has negative
bit set. Use tensor.resolve_neg().numpy() instead.
      

Autograd

Raise TypeError instead of RuntimeError when assigning to a Tensor’s grad field with wrong type (#64876)

Setting the .grad field with a non-None and non-Tensor object used to return a RuntimeError but it now properly returns a TypeError. If your code was catching this error, you should simply update it to catch a TypeError instead of a RuntimeError.

1.9.11.10.0
try:
    # Assigning an int to a Tensor's grad field
    a.grad = 0
except RuntimeError as e:
    pass
      
try:
   a.grad = 0
except TypeError as e:
    pass
      

Raise error when inputs to autograd.grad are empty (#52016)

Calling autograd.grad with an empty list of inputs used to do the same as backward. To reduce confusion, it now raises the expected error. If you were relying on this, you can simply update your code as follows:

1.9.11.10.0
grad = autograd.grad(out, tuple())
assert grad == tuple()
      
out.backward()
      

Optional arguments to autograd.gradcheck and autograd.gradgradcheck are now kwarg-only (#65290)

These two functions now have a significant number of optional arguments controlling what they do (i.e., eps, atol, rtol, raise_exception, etc.). To improve readability, we made these arguments kwarg-only. If you are passing these arguments to autograd.gradcheck or autograd.gradgradcheck as positional arguments, you can update your code as follows:

1.9.11.10.0
torch.autograd.gradcheck(fn, x, 1e-6)
      
torch.autograd.gradcheck(fn, x, eps=1e-6)
      

In-place detach (detach_) now errors for views that return multiple outputs (#58285)

This change is finishing the deprecation cycle for the inplace-over-view logic. In particular, a few things that were warning are updated:

* `detach_` will now raise an error when invoked on any view created by `split`, `split_with_sizes`, or `chunk`. You should use the non-inplace `detach` instead.
* The error message for when an in-place operation (that is not detach) is performed on a view created by `split`, `split_with_size`, and `chunk` has been changed from "This view is an output of a function..." to "This view is the output of a function...".

1.9.11.10.0
b = a.split(1)[0]
b.detach_()
      
b = a.split(1)[0]
c = b.detach()
      

Fix saved variable unpacking version counter (#60195)

In-place on the unpacked SavedVariables used to be ignored. They are now properly detected which can lead to errors saying that a variable needed for backward was modified in-place.
This is a valid error and the ...

Read more

Small bug fix release

22 Sep 12:58
dfbd030
Compare
Choose a tag to compare

PyTorch 1.9.1 Release Notes

  • Improvements
  • Bug Fixes
  • Documentation

Improvements

  • Stop warning on .names() access in max_pool2d #60059
  • Remove Caffe2 thread-pool leak warning #60318
  • Add option to skip GitHub tag validation for torch.hub.load #62139
  • Use log.warning in torch.distributed.run to print OMP_NUM_THREADS warning #63953
  • TorchElastic: Pretty print the failure message captured by @record #64036
  • torch.distribtued.run to set nproc_per_node to 1 by default #61552
  • Remove experimental API warning from torch.distributed.elastic.utils.store #60807
  • Deprecate use_env in torch.distributed.run #59409
  • Better engineering changes for torch.distributed launcher #59152

Bug fixes

Distributed / TorchElastic

  • Make init_method=tcp:// compatible with torch.distributed.run #63910
  • Fix default parameters (number of restarts, log level, number of processes per node) that regressed with the transition from torch.distributed.launch and torch.distributed.run and clarify the documentation accordingly #61294

Hub

  • Fix HTTP/403 error when calling torch.hub.load for TorchVision models #62072

Misc

  • torch.mm to check input matrix sizes shapes #61394

Documentation

  • Fix broken link in elastic launch doc #62378
  • Fix typo in torch.distribtued.run warning message #61127

LTS 1.8.2, Wrap cub in its own namespace

17 Aug 18:33
e0495a7
Compare
Choose a tag to compare

PyTorch 1.8.2 Release Notes

  • Highlights
  • Bug Fixes

Highlights

We are excited to announce the release of PyTorch 1.8.2. This is the first release we are making as part of the Pytorch Enterprise Support Program. This release includes a bug fix requested by a customer in an LTS branch.
We'd like to thank Microsoft for their support and work on this release.

Bug Fixes

PyTorch 1.9 Release, including Torch.Linalg and Mobile Interpreter

15 Jun 16:06
d69c22d
Compare
Choose a tag to compare

PyTorch 1.9 Release Notes

  • Highlights
  • Backwards Incompatible Change
  • Deprecations
  • New Features
  • Improvements
  • Bug Fixes
  • Performance
  • Documentation

Highlights

We are excited to announce the release of PyTorch 1.9. The release is composed of more than 3,400 commits since 1.8, made by 398 contributors. Highlights include:

  • Major improvements to support scientific computing, including torch.linalg, torch.special, and Complex Autograd
  • Major improvements in on-device binary size with Mobile Interpreter
  • Native support for elastic-fault tolerance training through the upstreaming of TorchElastic into PyTorch Core
  • Major updates to the PyTorch RPC framework to support large scale distributed training with GPU support
  • New APIs to optimize performance and packaging for model inference deployment
  • Support for Distributed training, GPU utilization and SM efficiency in the PyTorch Profiler

We’d like to thank the community for their support and work on this latest release. We’d especially like to thank Quansight and Microsoft for their contributions.

You can find more details on all the highlighted features in the PyTorch 1.9 Release blogpost.

Backwards Incompatible changes

Python API

  • torch.divide with rounding_mode='floor' now returns infinity when a non-zero number is divided by zero (#56893).
    This fixes the rounding_mode='floor' behavior to return the same non-finite values as other rounding modes when there is a division by zero. Previously it would always result in a NaN value, but a non-zero number divided by zero should return +/- infinity in IEEE floating point arithmetic. Note this does not effect torch.floor_divide or the floor division operator, which currently use rounding_mode='trunc' (and are also deprecated for that reason).

1.8.11.9.0
>>> a = torch.tensor([-1.0, 0.0, 1.0])
>>> b = torch.tensor([0.0])
>>> torch.divide(a, b, rounding_mode='floor')
tensor([nan, nan, nan])
      
>>> a = torch.tensor([-1.0, 0.0, 1.0])
>>> b = torch.tensor([0.0])
>>> torch.divide(a, b, rounding_mode='floor')
tensor([-inf, nan, inf])
      

  • Legacy tensor constructors and Tensor.new no longer support passing both Tensor and device as inputs (#58108).
    This fixes a bug in which 1-element integer tensors were misinterpreted as specifying tensor size, yielding an uninitialized tensor. As noted in the error message, use the new-style torch.tensor(...) or torch.as_tensor(...) to copy or alias an existing tensor. If you want to create an uninitialized tensor, use torch.empty(...).

1.8.11.9.0
>>> a = torch.tensor([1])
>>> torch.LongTensor(a, device='cpu') # uninitialized
tensor([7022349217739848992])
>>> a.new(a, device='cpu')
tensor([4294967295]) # uninitialized
      
>>> a = torch.tensor([1])
>>> torch.LongTensor(a, device='cpu')
RuntimeError: Legacy tensor constructor of the form torch.Tensor(tensor, device=device) is
not supported. Use torch.tensor(...) or torch.as_tensor(...) instead.
>>> a.new(a, device='cpu')
RuntimeError: Legacy tensor new of the form tensor.new(tensor, device=device) is not
supported. Use torch.as_tensor(...) instead.
      

  • torch.divide with rounding_mode='true' is replaced with rounding_mode=None (#51988).
    torch.divide's undocumented rounding_mode='true' option has been removed, and instead rounding_mode=None should be passed to indicate no rounding should take place. This is equivalent to omitting the argument entirely.

1.8.11.9.0
>>> a, b = torch.full((2,), 4.2), torch.full((2,), 2)
>>> torch.divide(a, b, rounding_mode='true')
tensor([2.1000, 2.1000])
      
>>> a, b = torch.full((2,), 4.2), torch.full((2,), 2)
>>> torch.divide(a, b, rounding_mode=None) # equivalent to  torch.divide(a, b, rounding_mode='true') from the prior release
tensor([2.1000, 2.1000])
      

  • import torch.tensor as tensor is no longer supported (#53424).
    Instead, use from torch import tensor

1.8.11.9.0
>>> import torch.tensor as tensor
>>> torch.tensor(1.)
tensor(1.)
      
>>> import torch.tensor as tensor
ModuleNotFoundError: No module named 'torch.tensor'
>>> from torch import tensor
>>> tensor(1.)
tensor(1.)
      

  • binary release: numpy is no longer a required dependency
    If you require numpy (and don't already have it installed) you will need to install it separately.

Autograd

  • torch.autograd.gradcheck.get_numerical_jacobian and torch.autograd.gradcheck.get_analytical_jacobian no longer support functions that return complex valued output as well as any other values of grad_out not equal to 1 (#55692).
    This change is a part of a refactor of gradcheck’s internals. Note that gradcheck itself still supports functions with complex output. This new restriction only applies to calls to the two internal helper functions. As a workaround, you can wrap your functions to return either the real or imaginary component of its output before calling these functions. Additionally these internal helpers no longer accept any other value except 1 for grad_out for any input function. Note that these helper functions are also being deprecated in this release.

1.8.1:

get_numerical_jacobian(torch.complex, (a, b), grad_out=2.0)

1.9.0:

      def wrapped(fn):
            def wrapper(*input):
                return torch.real(fn(*input))
            return wrapper
        
        get_numerical_jacobian(wrapped(torch.complex), (a, b), grad_out=1.0)
  • torch.autograd.gradcheck now throws GradcheckError (#55656).
    This change is a part of a refactor of gradcheck’s internals. All errors that are able to be silenced by raise_exception=False now raise GradcheckError (which inherits from RuntimeError). If you explicitly check that the type of the error is RuntimeError you'll need to update your code to check for GradcheckError instead. Otherwise if you use something like except or isinstance, no changes are necessary.

1.8.1:

# An example of a situation that will now return GradcheckError instead of
# RuntimeError is when there is a jacobian mismatch, which can happen
# for example when you forget to specify float64 for your inputs.
try:
    torch.autograd.gradcheck(torch.sin, (torch.ones(1, requires_grad=True),))
except RuntimeError as e:
    assert type(e) is RuntimeError # explicitly check type -> NEEDS UPDATE

1.9.0:

try:
    torch.autograd.gradcheck(torch.sin, (torch.ones(1, requires_grad=True),)
except RuntimeError as e:
   # GradcheckError inherits from RuntimeError so you can still catch this
   # with RuntimeError (No change necessary!)
   
   # BUT, if you explicitly check type...
   assert type(e) is torch.autograd.GradcheckError
  • Finished deprecation cycle for in-place view error checks (#56093).
    In-place modification of views will now raise an error if that view was created by a custom function or a function that returns multiple views, or if the view was created in no-grad mode. Modifying in-place a view created in the situations above are error-prone and have been deprecated since v1.5.0. Doing these in-place modifications are now forbidden. For more information on how to work around this, see the related sections the release notes linked below:
    • v1.5.0 (view created in custom autograd function, view created in no-grad block)
    • v1.7.0 (section on split and chunk, i.e., functions that return multiple views).

torch.nn

  • Fixed regression for nn.MultiheadAttention to now apply bias flag to both in and out projection layers (#52537).
    In PyTorch 1.6, a regression was introduced that caused the bias flag of nn.MultiheadAttention only to apply to the input projection layer. This caused the output projection layer to always include a bias parameter, even with bias=False specified. The regression is now fixed in PyTorch 1.9, making the bias flag correctly apply to both the input and output projection layers. This fix is BC-breaking for the bias=False case as it will now result in no bias parameter for the output projection layer.

v1.6 - v1.8.1:pre 1.6 & 1.9.0
>>> mha = torch.nn.MultiheadAttenti...
Read more