28 Oct 16:54

7c98e70

PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips are now available

Pytorch 1.13 Release Notes

Highlights
Backwards Incompatible Changes
New Features
Improvements
Performance
Documentation
Developers

Highlights

We are excited to announce the release of PyTorch 1.13! This includes stable versions of BetterTransformer. We deprecated CUDA 10.2 and 11.3 and completed migration of CUDA 11.6 and 11.7. Beta includes improved support for Apple M1 chips and functorch, a library that offers composable vmap (vectorization) and autodiff transforms, being included in-tree with the PyTorch release. This release is composed of over 3,749 commits and 467 contributors since 1.12.1. We want to sincerely thank our dedicated community for your contributions.

Summary:

The BetterTransformer feature set supports fastpath execution for common Transformer models during Inference out-of-the-box, without the need to modify the model. Additional improvements include accelerated add+matmul linear algebra kernels for sizes commonly used in Transformer models and Nested Tensors is now enabled by default.
Timely deprecating older CUDA versions allows us to proceed with introducing the latest CUDA version as they are introduced by Nvidia®, and hence allows support for C++17 in PyTorch and new NVIDIA Open GPU Kernel Modules.
Previously, functorch was released out-of-tree in a separate package. After installing PyTorch, a user will be able to import functorch and use functorch without needing to install another package.
PyTorch is offering native builds for Apple® silicon machines that use Apple's new M1 chip as a beta feature, providing improved support across PyTorch's APIs.

Stable	Beta	Prototype
Better Transformer CUDA 10.2 and 11.3 CI/CD Deprecation	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs Extend NNC to support channels last and bf16 Functorch now in PyTorch Core Library Beta Support for M1 devices	Arm® Compute Library backend support for AWS Graviton CUDA Sanitizer

You can check the blogpost that shows the new features here.

Backwards Incompatible changes

Python API

uint8 and all integer dtype masks are no longer allowed in Transformer (#87106)

Prior to 1.13, key_padding_mask could be set to uint8 or other integer dtypes in TransformerEncoder and MultiheadAttention, which might generate unexpected results. In this release, these dtypes are not allowed for the mask anymore. Please convert them to torch.bool before using.

1.12.1

>>> layer = nn.TransformerEncoderLayer(2, 4, 2)
>>> encoder = nn.TransformerEncoder(layer, 2)
>>> pad_mask = torch.tensor([[1, 1, 0, 0]], dtype=torch.uint8)
>>> inputs = torch.cat([torch.randn(1, 2, 2), torch.zeros(1, 2, 2)], dim=1)
# works before 1.13
>>> outputs = encoder(inputs, src_key_padding_mask=pad_mask)

1.13

>>> layer = nn.TransformerEncoderLayer(2, 4, 2)
>>> encoder = nn.TransformerEncoder(layer, 2)
>>> pad_mask = torch.tensor([[1, 1, 0, 0]], dtype=torch.bool)
>>> inputs = torch.cat([torch.randn(1, 2, 2), torch.zeros(1, 2, 2)], dim=1)
>>> outputs = encoder(inputs, src_key_padding_mask=pad_mask)

Updated `torch.floor_divide` to perform floor division (#78411)

Prior to 1.13, torch.floor_divide erroneously performed truncation division (i.e. truncated the quotients). In this release, it has been fixed to perform floor division. To replicate the old behavior, use torch.div with rounding_mode='trunc'.

1.12.1

>>> a = torch.tensor([4.0, -3.0])
>>> b = torch.tensor([2.0, 2.0])
>>> torch.floor_divide(a, b)
tensor([ 2., -1.])

1.13

>>> a = torch.tensor([4.0, -3.0])
>>> b = torch.tensor([2.0, 2.0])
>>> torch.floor_divide(a, b)
tensor([ 2., -2.])
# Old behavior can be replicated using torch.div with rounding_mode='trunc'
>>> torch.div(a, b, rounding_mode='trunc')
tensor([ 2., -1.])

Fixed `torch.index_select` on CPU to error that index is out of bounds when the `source` tensor is empty (#77881)

Prior to 1.13, torch.index_select would return an appropriately sized tensor filled with random values on CPU if the source tensor was empty. In this release, we have fixed this bug so that it errors out. A consequence of this is that torch.nn.Embedding which utilizes index_select will error out rather than returning an empty tensor when embedding_dim=0 and input contains indices which are out of bounds. The old behavior cannot be reproduced with torch.nn.Embedding, however since an Embedding layer with embedding_dim=0 is a corner case this behavior is unlikely to be relied upon.

1.12.1

>>> t = torch.tensor([4], dtype=torch.long)
>>> embedding = torch.nn.Embedding(3, 0)
>>> embedding(t)
tensor([], size=(1, 0), grad_fn=<EmbeddingBackward0>)

1.13

>>> t = torch.tensor([4], dtype=torch.long)
>>> embedding = torch.nn.Embedding(3, 0)
>>> embedding(t)
RuntimeError: INDICES element is out of DATA bounds, id=4 axis_dim=3

Disallow overflows when tensors are constructed from scalars (#82329)

Prior to this PR, overflows during tensor construction from scalars would not throw an error. In 1.13, such cases will error.

1.12.1

>>> torch.tensor(1000, dtype=torch.int8)
tensor(-24, dtype=torch.int8)

1.13

>>> torch.tensor(1000, dtype=torch.int8)
RuntimeError: value cannnot be converted to type int8 without overflow

Error on indexing a cpu tensor with non-cpu indices (#69607)

Prior to 1.13, cpu_tensor[cuda_indices] was a valid program that would return a cpu tensor. The original use case for mixed device indexing was for non_cpu_tensor[cpu_indices], and allowing the opposite was unintentional (cpu_tensor[non_cpu_indices]). This behavior appears to be rarely used, and a refactor of our indexing kernels made it difficult to represent an op that takes in (cpu_tensor, non_cpu_tensor) and returns another cpu_tensor, so it is now an error.

To replicate the old behavior for base[indices], you can ensure that either indices lives on the CPU device, or base and indices both live on the same device.

1.12.1

>>> a = torch.tensor([1.0, 2.0, 3.0])
>>> b = torch.tensor([0, 2], device='cuda')
>>> a[b]
tensor([1., 3.])

1.13

>>> a = torch.tensor([1.0, 2.0, 3.0])
>>> b = torch.tensor([0, 2], device='cuda')
>>> a[b]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
# Old behavior can be replicated by moving b to CPU, or a to CUDA
>>> a[b.cpu()]
tensor([1., 3.])
>>> a.cuda()[b]
tensor([1., 3.], device='cuda:0')

Remove deprecated `torch.eig`, `torch.matrix_rank`, `torch.lstsq` (#70982, #70981, #70980)

The deprecation cycle for the above functions has been completed and they have been removed in the 1.13 release.

torch.nn

Enforce that the `bias` has the same dtype as `input` and `weight` for convolutions on CPU (#83686)

To align with the implementation on other devices, the CPU implementation for convolutions was updated to enforce that the dtype of the bias matches the dtype of the input and weight.

1.12.1

# input and weight are dtype torch.int64
# bias is torch.float32
>>> out = torch.nn.functional.conv2d(input, weight, bias, ...)

1.13

# input and weight are dtype torch.int64
# bias is torch.float32
>>> with assertRaisesError():
>>>    out = torch.nn.functional.conv2d(input, weight, bias, ...)

# Updated code to avoid the error
>>> out = torch.nn.functional.conv2d(input, weight, bias.to(input.dtype), ...)

Autograd

Disallow setting the `.data` of a tensor that `requires_grad=True` with an integer tensor (#78436)

Setting the .data of a tensor that requires_grad with an integer tensor now raises an error.

1.12.1

>>> x = torch.randn(2, requires_grad=True)
>>> x.data = torch.randint(1, (2,))
>>> x
tensor([0, 0], requires_grad=True)

1.13

>>> x = torch.randn(2, requires_grad=True)
>>> x.data = torch.randint(1, (2,))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: data set to a tensor that requires gradients must be floating point or complex dtype

Added variable_list support to ExtractVariables struct (#84583)

Prior to this change, C++ custom autograd Function considers tensors passed in TensorList to not be tensors for the purposes of recording the backward graph. After this change, custom Functions that receive TensorList must modify their backward functions to also compute gradients for these additional tensor inputs. Note that this behavior now differs from that of custom autograd Functions in Python.

1.12.1

struct MyFunction : public Function<MyFunction> {
    static Variable forward(AutogradContext* ctx, at::Tensor t, at::TensorList tensors) {
      return 2 * tensors[0] + 3 * t;
    }

    static variable_list backward(
        AutogradContext* ctx,
        variable_list grad_output) {
      return {3 * grad_output[0]};
    }
};

1.13

struct MyFunction : public Function<MyFunction> {
    static Variable forward(AutogradContext* ctx, at::Tensor t, at::TensorList tensors) {
      return 2 * tensors[0] + 3 * t;
    }

    static variable_list backward(
        AutogradContext* ctx,
        variable_list grad_output) {
      return {3 * grad_output[0], 2 * grad_output[0]};
    }
};

Don't detach when making views; force kernel to detach (#84893)

View operations registered as CompositeExplicitAutograd kernels are no longer allowed to return input tensors as-is. You must explic...

Contributors

hfwen0502, lchu-ibm, and beartype

Assets 3

05 Aug 19:35

atalman

v1.12.1

664058f

PyTorch 1.12.1 Release, small bug fix release

This release is meant to fix the following issues (regressions / silent correctness):

Optim

Remove overly restrictive assert in adam #80222

Autograd

Convolution forward over reverse internal asserts in specific case #81111
25% Performance regression from v0.1.1 to 0.2.0 when calculating hessian #82504

Distributed

Fix distributed store to use add for the counter of DL shared seed #80348
Raise proper timeout when sharing the distributed shared seed #81666

NN

Allow register float16 weight_norm on cpu and speed up test #80600
Fix weight norm backward bug on CPU when OMP_NUM_THREADS <= 2 #80930
Weight_norm is not working with float16 #80599
New release breaks torch.nn.weight_norm backwards pass and breaks all Wav2Vec2 implementations #80569
Disable src mask for transformer and multiheadattention fastpath #81277
Make nn.stateless correctly reset parameters if the forward pass fails #81262
torchvision.transforms.functional.rgb_to_grayscale() + torch.nn.Conv2d() don`t work on 1080 GPU #81106
Transformer and CPU path with src_mask raises error with torch 1.12 #81129

Data Loader

[Locking lower ranks seed recepients https://github.com//pull/81071

CUDA

os.environ["CUDA_VISIBLE_DEVICES"] has no effect #80876
share_memory() on CUDA tensors no longer no-ops and instead crashes #80733
[Prims] Unbreak CUDA lazy init #80899
PyTorch 1.12 cu113 wheels cudnn discoverability issue #80637
Remove overly restrictive checks for cudagraph #80881

ONNX

ONNX cherry picks #82435

MPS

MPS cherry picks #80898

Other

Don't error if _warned_capturable_if_run_uncaptured not set #80345
Initializing libiomp5.dylib, but found libomp.dylib already initialized. #78490
Assertion error - _dl_shared_seed_recv_cnt - pt 1.12 - multi node #80845
Add 3.10 stdlib to torch.package #81261
CPU-only c++ extension libraries (functorch, torchtext) built against PyTorch wheels are not fully compatible with PyTorch wheels #80489

Assets 3

28 Jun 16:48

soulitzer

v1.12.0

67ece03

PyTorch 1.12: TorchArrow, Functional API for Modules and nvFuser, are now available

PyTorch 1.12 Release Notes

Highlights
Backwards Incompatible Change
New Features
Improvements
Performance
Documentation

Highlights

We are excited to announce the release of PyTorch 1.12! This release is composed of over 3124 commits, 433 contributors. Along with 1.12, we are releasing beta versions of AWS S3 Integration, PyTorch Vision Models on Channels Last on CPU, Empowering PyTorch on Intel® Xeon® Scalable processors with Bfloat16 and FSDP API. We want to sincerely thank our dedicated community for your contributions.

Summary:

Functional Module API to functionally apply module computation with a given set of parameters
Complex32 and Complex Convolutions in PyTorch
DataPipes from TorchData fully backward compatible with DataLoader
Functorch with improved coverage for APIs
nvFuser a deep learning compiler for PyTorch
Changes to float32 matrix multiplication precision on Ampere and later CUDA hardware
TorchArrow, a new beta library for machine learning preprocessing over batch data

Backwards Incompatible changes

Python API

Updated type promotion for torch.clamp (#77035)

In 1.11, the ‘min’ and ‘max’ arguments in torch.clamp did not participate in type promotion, which made it inconsistent with minimum and maximum operations. In 1.12, the ‘min’ and ‘max’ arguments participate in type promotion.

1.11

>>> import torch
>>> a = torch.tensor([1., 2., 3., 4.], dtype=torch.float32)
>>> b = torch.tensor([2., 2., 2., 2.], dtype=torch.float64)
>>> c = torch.tensor([3., 3., 3., 3.], dtype=torch.float64)
>>> torch.clamp(a, b, c).dtype
torch.float32

1.12

>>> import torch
>>> a = torch.tensor([1., 2., 3., 4.], dtype=torch.float32)
>>> b = torch.tensor([2., 2., 2., 2.], dtype=torch.float64)
>>> c = torch.tensor([3., 3., 3., 3.], dtype=torch.float64)
>>> torch.clamp(a, b, c).dtype
torch.float64

Complex Numbers

Fix complex type promotion (#77524)

Updates the type promotion rule such that given a complex scalar and real tensor, the value type of real tensor is preserved

1.11

>>> a = torch.randn((2, 2), dtype=torch.float)
>>> b = torch.tensor(1, dtype=torch.cdouble)
>>> (a + b).dtype
torch.complex128

1.12

>>> a = torch.randn((2, 2), dtype=torch.float)
>>> b = torch.tensor(1, dtype=torch.cdouble)
>>> (a + b).dtype
torch.complex64

LinAlg

Disable TF32 for matmul by default and add high-level control of fp32 matmul precision (#76509)

PyTorch 1.12 makes the default math mode for fp32 matrix multiplications more precise and consistent across hardware. This may affect users on Ampere or later CUDA devices and TPUs. See the PyTorch blog for more details.

Sparse

Use ScatterGatherKernel for scatter_reduce (CPU-only) (#74226, #74608)

In 1.11.0, unlike scatter which takes a reduce kwarg or scatter_add, scatter_reduce was not an in-place function. That is, it did not allow the user to pass an output tensor which contains data that is reduced together with the scattered data. Instead, the scatter reduction took place on an output tensor initialized under the hood. Indices of the output that were not scattered to were filled with reduction inits (or 0 for options ‘amin’ and ‘amax’).

In 1.12.0, scatter_reduce (which is in beta) is in-place to align with the API of the related existing functions scatter/scatter_add. For this reason, the argument input in 1.11.0 has been renamed src in 1.12.0 and the new self argument now takes a destination tensor to be scattered onto. Since the destination tensor is no longer initialized under the hood, the output_size kwarg in 1.11.0 that allowed users to specify the size of the output at dimension dim has been removed. Further, in 1.12.0 we introduce an include_self kwarg which determines whether values in the self (destination) tensor are included in the reduction. Setting include_self=True could, for example, allow users to provide special reduction inits for the scatter_reduction operation. Otherwise, if include_self=False, indices scattered to are treated as if they were filled with reduction inits.

In the snippet below, we illustrate how the behavior of scatter_reduce in 1.11.0 can be achieved with the function released in 1.12.0.

Example:

>>> src = torch.arange(6, dtype=torch.float).reshape(3, 2)
>>> index = torch.tensor([[0, 2], [1, 1], [0, 0]])
>>> dim = 1
>>> output_size = 4
>>> reduce = "prod"

1.11

>>> torch.scatter_reduce(src, dim, index, reduce, output_size=output_size)
`tensor([[ 0., 1., 1., 1.],
        [ 1., 6., 1., 1.],
        [20., 1., 1., 1.]])`

1.12

>>> output_shape = list(src.shape)
>>> output_shape[dim] = output_size
# reduction init for prod is 1
# filling the output with 1 is only necessary if the user wants to preserve the behavior in 1.11
# where indices not scattered to are filled with reduction inits
>>> output = src.new_empty(output_shape).fill_(1)
>>> output.scatter_reduce_(dim, index, src, reduce)
`tensor([[ 0., 1., 1., 1.],
        [ 1., 6., 1., 1.],
        [20., 1., 1., 1.]])`

torch.nn

`nn.GroupNorm`: Report an error if `num_channels` is not divisible by `num_groups` (#74293)

Previously, nn.GroupNorm would error out during the forward pass if num_channels is not divisible by num_groups. Now, the error is thrown for this case during module construction instead.

1.11

m = torch.nn.GroupNorm(3, 7)
m(...)  # errors during forward pass

1.12

m = torch.nn.GroupNorm(3, 7)  # errors during construction

`nn.Dropout2d`: Return to 1.10 behavior: perform 1D channel-wise dropout for 3D inputs

In PyTorch 1.10 and older, passing a 3D input to nn.Dropout2D resulted in 1D channel-wise dropout behavior; i.e. such inputs were interpreted as having shape (N, C, L) with N = batch size and C = # channels and channel-wise dropout was performed along the second dimension.

1.10

x = torch.randn(2, 3, 4)
m = nn.Dropout2d(p=0.5)
out = m(x)  # input is assumed to be shape (N, C, L); dropout along the second dim.

With the introduction of no-batch-dim input support in 1.11, 3D inputs were reinterpreted as having shape (C, H, W); i.e. an input without a batch dimension, and dropout behavior was changed to drop along the first dimension. This was a silent breaking change.

1.11

x = torch.randn(2, 3, 4)
m = nn.Dropout2d(p=0.5)
out = m(x)  # input is assumed to be shape (C, H, W); dropout along the first dim.

The breaking change in 1.11 resulted in a lack of support for 1D channel-wise dropout behavior, so Dropout2d in PyTorch 1.12 returns to 1.10 behavior with a warning to give some time to adapt before the no-batch-dim interpretation goes back into effect.

1.12

x = torch.randn(2, 3, 4)
m = nn.Dropout2d(p=0.5)
out = m(x)  # input is assumed to be shape (N, C, L); dropout along the second dim.
            # throws a warning suggesting nn.Dropout1d for 1D channel-wise dropout.

If you want 1D channel-wise dropout behavior, please switch to use of the newly-added nn.Dropout1d module instead of nn.Dropout2d. If you want no-batch-dim input behavior, please note that while this is not supported in 1.12, a future release will reinstate the interpretation of 3D inputs to nn.Dropout2d as those without a batch dimension.

`F.cosine_similarity`: Improve numerical stability (#31378)

Previously, we first compute the inner product, then normalize. After this change, we first normalize, then compute inner product. This should be more numerically stable because it avoids losing precision in inner product for inputs with large norms. Because of this change, outputs may be different in some cases.

Composability

Functions in torch.ops.aten.{foo} no longer accept self as a kwarg

torch.ops.aten.{foo} objects are now instances of OpOverloadPacket (instead of a function) that have their __call__ method in Python, which means that you cannot pass self as a kwarg. You can pass it normally as a positional argument instead.

1.11

>>> torch.ops.aten.sin(self=torch.ones(2))
    tensor([0.8415, 0.8415])

1.12

# this now fails
>>> torch.ops.aten.sin(self=torch.ones(2))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __call__() got multiple values for argument 'self'
# this works
>>> torch.ops.aten.sin(torch.ones(2))
tensor([0.8415, 0.8415])

torch_dispatch now traces individual op overloads instead of op overload packets (#72673)

torch.ops.aten.add actually corresponds to a bundle of functions from C++, corresponding to all over the overloads of add operator (specifically, add.Tensor, add.Scalar and add.out). Now, __torch_dispatch__ will directly take in an overload corresponding to a single aten function.

1.11

class MyTensor(torch.Tensor):
    ....
    def __torch_dispatch__(cls, func, types, args=(), kwargs=None):
        # Before, func refers to a "packet" of all overloads
        # for a given operator, e.g. "add"
        assert func == torch.ops.aten.add

1.12

class MyTensor(torch.Tensor):
    ....
    def __torch_dispatch__(cls, func, types, args=(), kwargs=No...

Assets 3

10 Mar 16:59

bdhirsh

v1.11.0

bc2c6ed

PyTorch 1.11, TorchData, and functorch are now available

PyTorch 1.11 Release Notes

Highlights
Backwards Incompatible Change
New Features
Improvements
Performance
Documentation

Highlights

We are excited to announce the release of PyTorch 1.11. This release is composed of over 3,300 commits since 1.10, made by 434 contributors. Along with 1.11, we are releasing beta versions of TorchData and functorch. We want to sincerely thank our community for continuously improving PyTorch.

TorchData is a new library for common modular data loading primitives for easily constructing flexible and performant data pipelines. View it on GitHub.
functorch, a library that adds composable function transforms to PyTorch, is now available in beta. View it on GitHub.
Distributed Data Parallel (DDP) static graph optimizations available in stable.

You can check the blogpost that shows the new features here.

Backwards Incompatible changes

Python API

Fixed python `deepcopy` to correctly copy all attributes on `Tensor` objects (#65584)

This change ensures that the deepcopy operation on Tensor properly copies all the attributes (and not just the plain Tensor properties).

1.10.2	1.11.0
_{a = torch.rand(2) a.foo = 3 torch.save(a, "bar") b = torch.load("bar") print(b.foo) # Raise AttributeError: "Tensor" object has no attribute "foo"}	_{a = torch.rand(2) a.foo = 3 torch.save(a, "bar") b = torch.load("bar") print(b.foo) # 3}

1.10.2

1.11.0

_{a = torch.rand(2)
a.foo = 3
torch.save(a, "bar")
b = torch.load("bar")
print(b.foo)
# Raise AttributeError: "Tensor" object has no attribute "foo"}

_{a = torch.rand(2)
a.foo = 3
torch.save(a, "bar")
b = torch.load("bar")
print(b.foo)
# 3}

`steps` argument is no longer optional in `torch.linspace` and `torch.logspace`

This argument used to default to 100 in PyTorch 1.10.2, but was deprecated (previously you would see a deprecation warning if you didn’t explicitly pass in steps). In PyTorch 1.11, it is not longer optional.

1.10.2	1.11.0
_{# Works, but raises a deprecation warning # Steps defaults to 100 a = torch.linspace(1, 10) # UserWarning: Not providing a value for linspace's steps is deprecated # and will throw a runtime error in a future release. # This warning will appear only once per process. # (Triggered internally at ../aten/src/ATen/native/RangeFactories.cpp:19}	_{# In 1.11, you must specify steps a = torch.linspace(1, 10, steps=100)}

1.10.2

1.11.0

_{# Works, but raises a deprecation warning
# Steps defaults to 100
a = torch.linspace(1, 10)
# UserWarning: Not providing a value for linspace's steps is deprecated
# and will throw a runtime error in a future release.
# This warning will appear only once per process.
# (Triggered internally at ../aten/src/ATen/native/RangeFactories.cpp:19}

_{# In 1.11, you must specify steps
a = torch.linspace(1, 10, steps=100)}

Remove `torch.hub.import_module` function that was mistakenly public (#67990)

This function is not intended for public use.
If you have existing code that relies on it, you can find an equivalent function at torch.hub._import_module.

C++ API

We’ve cleaned up many of the headers in the C++ frontend to only include the subset of `aten` operators that they actually used (#68247, #68687, #68688, #68714, #68689, #68690, #68697, #68691, #68692, #68693, #69840)

When you #include a header from the C++ frontend, you can no longer assume that every aten operators are transitively included. You can work around this by directly adding #include <ATen/ATen.h> in your file, which will maintain the old behavior of including every aten operators.

Custom implementation for `c10::List` and `c10::Dict` move constructors have been removed (#69370)

The semantics have changed from "make the moved-from List/Dict empty" to "keep the moved-from List/Dict unchanged"

1.10.2	1.11.0
_{c10::List list1({"3", "4"}); c10::List list2(std::move(list1)); std::cout << list1.size() // 0}	_{c10::List list1({"3", "4"}); c10::List list2(std::move(list1)); // calls copy ctr std::cout << list1.size() // 2}

1.10.2

1.11.0

_{c10::List list1({"3", "4"});
c10::List list2(std::move(list1));
std::cout << list1.size() // 0}

_{c10::List list1({"3", "4"});
c10::List list2(std::move(list1)); // calls copy ctr
std::cout << list1.size() // 2}

CUDA

Removed `THCeilDiv` function and corresponding `THC/THCDeviceUtils.cuh` header (#65472)

As part of cleaning up TH from the codebase, the THCeilDiv function has been removed. Instead, please use at::ceil_div, and include the corresponding ATen/ceil_div.h header

Removed `THCudaCheck` (#66391)

You can replace it with C10_CUDA_CHECK, which has been available since at least PyTorch 1.4, so just replacing is enough even if you support older versions

Removed `THCudaMalloc()`, `THCudaFree()`, `THCThrustAllocator.cuh` (#65492)

If your extension is using THCThrustAllocator.cuh, please replace it with ATen/cuda/ThrustAllocator.h and corresponding APIs (see examples in this PR).

This PR also removes THCudaMalloc/THCudaFree calls. Please use c10::cuda::CUDACachingAllocator::raw_alloc(size)/raw_delete(ptr), or, preferably, switch to c10:cuda::CUDaCachingAllocator::allocate which manages deallocation. Caching allocator APIs are available since PyTorch 1.2, so just replacing it is enough even if you support older versions of PyTorch.

Build

Stopped building shared library for AOT Compiler, `libaot_compiler.so` (#66227)

Building aot_compiler.cpp as a separate library is not necessary, as it’s already included in libtorch.so.
You can update your build system to only dynamically link libtorch.so.

Mobile

Make `typing.Union` type unsupported for mobile builds (#65556)

typing.Union support was added for TorchScript in 1.10. It was removed specifically for mobile due to its lack of use and increase in binary size of PyTorch for Mobile builds.

Distributed

`torch.distributed.rpc`: Final Removal of ProcessGroup RPC backend (#67363)

ProcessGroup RPC backend is deprecated. In 1.10, it threw an error to help users update their code, and, in 1.11, it is removed completely.

The backend type “PROCESS_GROUP” is now deprecated, e.g.
torch.distributed.rpc.init_rpc("worker0", backend="PROCESS_GROUP", rank=0, world_size=1)
and should be replaced with:
torch.distributed.rpc.init_rpc("worker0", backend="TENSORPIPE", rank=0, world_size=1)

Quantization

Disabled the support for `getitem` in FX Graph Mode Quantization (#66647)

getitem used to be quantized in FX Graph Mode Quantization, and it is no longer quantized. This won’t break any models but could result in a slight difference in numerics.

1.10.2	1.11.0
_{from torch.ao.quantization.quantize_fx import convert_fx, prepare_fx class M(torch.nn.Module): def __init__(self): super().__init__() self.linear = torch.nn.Linear(5, 5) def forward(self, x): x = self.linear(x) y = torch.stack([x], 0) return y[0] m = M().eval() m = prepare_fx(m, {"": torch.ao.quantization.default_qconfig}) m = convert_fx(m) print(m) # prints # GraphModule( # (linear): QuantizedLinear(in_features=5, out_features=5, # scale=1.0, zero_point=0, qscheme=torch.per_tensor_affine) # ) # def forward(self, x): # linear_input_scale_0 = self.linear_input_scale_0 # linear_input_zero_point_0 = self.linear_input_zero_point_0 # quantize_per_tensor = torch.quantize_per_tensor(x, # linear_input_scale_0, linear_input_zero_point_0, torch.quint8) # x = linear_input_scale_0 = linear_input_zero_point_0 = None # linear = self.linear(quantize_per_tensor) # quantize_per_tensor = None # stack = torch.stack([linear], 0); linear = None # getitem = stack[0]; stack = None # dequantize_2 = getitem.dequantize(); getitem = None # return getitem}	_{from torch.ao.quantization.quantize_fx import convert_fx, prepare_fx class M(torch.nn.Module): def __init__(self): super().__init__() self.linear = torch.nn.Linear(5, 5) def forward(self, x): x = self.linear(x) y = torch.stack([x], 0) return y[0] m = M().eval() m = prepare_fx(m, {"": torch.ao.quantization.default_qconfig}) m = convert_fx(m) print(m) # prints # GraphModule( # (linear): QuantizedLinear(in_features=5, out_features=5, scale=1.0, zero_point=0, qscheme=torch.per_tensor_affine) # ) # def forward(self, x): # linear_input_scale_0 = self.linear_input_scale_0 # linear_input_zero_point_0 = self.linear_input_zero_point_0 # quantize_per_tensor = tor...}

1.10.2

1.11.0

_{from torch.ao.quantization.quantize_fx import convert_fx, prepare_fx
class M(torch.nn.Module):
def __init__(self):
super().__init__()
self.linear = torch.nn.Linear(5, 5)
def forward(self, x):
x = self.linear(x)
y = torch.stack([x], 0)
return y[0]
m = M().eval()
m = prepare_fx(m, {"": torch.ao.quantization.default_qconfig})
m = convert_fx(m)
print(m)
# prints
# GraphModule(
# (linear): QuantizedLinear(in_features=5, out_features=5,
# scale=1.0, zero_point=0, qscheme=torch.per_tensor_affine)
# )
# def forward(self, x):
# linear_input_scale_0 = self.linear_input_scale_0
# linear_input_zero_point_0 = self.linear_input_zero_point_0
# quantize_per_tensor = torch.quantize_per_tensor(x,
# linear_input_scale_0, linear_input_zero_point_0, torch.quint8)
# x = linear_input_scale_0 = linear_input_zero_point_0 = None
# linear = self.linear(quantize_per_tensor)
# quantize_per_tensor = None
# stack = torch.stack([linear], 0); linear = None
# getitem = stack[0]; stack = None
# dequantize_2 = getitem.dequantize(); getitem = None
# return getitem}

Assets 3

27 Jan 21:51

atalman

v1.10.2

71f889c

PyTorch 1.10.2 Release, small bug fix release

This release is meant to deploy additional fixes not included in 1.10.1 release:

fix pybind issue for get_autocast_cpu_dtype and get_autocast_gpu_dtype #66396
Remove fgrad_input from slow_conv2d #64280
fix formatting CIRCLE_TAG when building docs #67026

Assets 3

15 Dec 22:27

seemethere

v1.10.1

302ee7b

PyTorch 1.10.1 Release, small bug fix release

This release is meant to fix the following issues (regressions / silent correctness):

torch.nn.cross_entropy silently incorrect in PyTorch 1.10 on CUDA on non-contiguous inputs #67167
channels_last significantly degrades accuracy #67239
Potential strict aliasing rule violation in bitwise_binary_op (on ARM/NEON) #66119
torch.get_autocast_cpu_dtype() returns a new dtype #65786
Conv2d grad bias gets wrong value for bfloat16 case #68048

The release tracker should contain all relevant pull requests related to this release as well as links to related issues

Assets 3

21 Oct 15:49

albanD

v1.10.0

36449ea

PyTorch 1.10 Release, including CUDA Graphs APIs, Frontend and compiler improvements

1.10.0 Release Notes

Highlights
Backwards Incompatible Change
New Features
Improvements
Performance
Documentation

Highlights

We are excited to announce the release of PyTorch 1.10. This release is composed of over 3,400 commits since 1.9, made by 426 contributors. We want to sincerely thank our community for continuously improving PyTorch.

PyTorch 1.10 updates are focused on improving training and performance of PyTorch, and developer usability. Highlights include:

CUDA Graphs APIs are integrated to reduce CPU overheads for CUDA workloads.
Several frontend APIs such as FX, torch.special, and nn.Module Parametrization, have moved from beta to stable.
Support for automatic fusion in JIT Compiler expands to CPUs in addition to GPUs.
Android NNAPI support is now available in beta.

You can check the blogpost that shows the new features here.

Backwards Incompatible changes

Python API

`torch.any`/`torch.all` behavior changed slightly to be more consistent for zero-dimension, `uint8` tensors. (#64642)

These two functions match the behavior of NumPy, returning an output dtype of bool for all support dtypes, except for uint8 (in which case they return a 1 or a 0, but with uint8 dtype). In some cases with 0-dim tensor inputs, the returned uint8 value could mistakenly take on a value > 1. This has now been fixed.

1.9.1	1.10.0
_{>>> torch.all(torch.tensor(42, dtype=torch.uint8)) tensor(1, dtype=torch.uint8) >>> torch.all(torch.tensor(42, dtype=torch.uint8), dim=0) tensor(42, dtype=torch.uint8) # wrong, old behavior}	_{>>> torch.all(torch.tensor(42, dtype=torch.uint8)) tensor(1, dtype=torch.uint8) >>> torch.all(torch.tensor(42, dtype=torch.uint8), dim=0) tensor(1, dtype=torch.uint8) # new, corrected and consistent behavior}

1.9.1

1.10.0

_{>>> torch.all(torch.tensor(42, dtype=torch.uint8))
tensor(1, dtype=torch.uint8)
>>> torch.all(torch.tensor(42, dtype=torch.uint8), dim=0)
tensor(42, dtype=torch.uint8) # wrong, old behavior}

_{>>> torch.all(torch.tensor(42, dtype=torch.uint8))
tensor(1, dtype=torch.uint8)
>>> torch.all(torch.tensor(42, dtype=torch.uint8), dim=0)
tensor(1, dtype=torch.uint8) # new, corrected and consistent behavior}

Remove deprecated `torch.{is,set}_deterministic` (#62158)

This is the end of the deprecation cycle for both of these functions. You should be using torch.use_deterministic_algorithms andtorch.are_deterministic_algorithms_enabled instead.

Complex Numbers

Conjugate View: `tensor.conj()` now returns a view tensor that aliases the same memory and has conjugate bit set (#54987, #60522, #66082, #63602).

This means that .conj() is now an O(1) operation and returns a tensor that views the same memory as tensor and has conjugate bit set. This notion of conjugate bit enables fusion of operations with conjugation which gives a lot of performance benefit for operations like matrix multiplication. All out-of-place operations will have the same behavior as before, but an in-place operation on a conjugated tensor will additionally modify the input tensor.

1.9.1	1.10.0
_{>>> import torch >>> x = torch.tensor([1+2j]) >>> y = x.conj() >>> y.add_(2) >>> print(x) tensor([1.+2.j])}	_{>>> import torch >>> x = torch.tensor([1+2j]) >>> y = x.conj() >>> y.add_(2) >>> print(x) tensor([3.+2.j])}

1.9.1

1.10.0

_{>>> import torch
>>> x = torch.tensor([1+2j])
>>> y = x.conj()
>>> y.add_(2)
>>> print(x)
tensor([1.+2.j])}

_{>>> import torch
>>> x = torch.tensor([1+2j])
>>> y = x.conj()
>>> y.add_(2)
>>> print(x)
tensor([3.+2.j])}

Note: You can verify if the conj bit is set by calling tensor.is_conj(). The conjugation can be resolved, i.e., you can obtain a new tensor that doesn’t share storage with the input tensor at any time by calling conjugated_tensor.clone() or conjugated_tensor.resolve_conj() .

Note that these conjugated tensors behave differently from the corresponding numpy arrays obtained from np.conj() when an in-place operation is performed on them (similar to the example shown above).

Negative View: `tensor.conj().neg()` returns a view tensor that aliases the same memory as both tensor and `tensor.conj()` and has a negative bit set (#56058).

conjugated_tensor.neg() continues to be an O(1) operation, but the returned tensor shares memory with both tensor and conjugated_tensor.

1.9.1	1.10.0
_{>>> x = torch.tensor([1+2j]) >>> y = x.conj() >>> z = y.imag >>> z.add_(2) >>> print(x) tensor([1.+2.j])}	_{>>> x = torch.tensor([1+2j]) >>> y = x.conj() >>> z = y.imag >>> print(z.is_neg()) True >>> z.add_(2) >>> print(x) tensor([1.-0.j])}

1.9.1

1.10.0

_{>>> x = torch.tensor([1+2j])
>>> y = x.conj()
>>> z = y.imag
>>> z.add_(2)
>>> print(x)
tensor([1.+2.j])}

_{>>> x = torch.tensor([1+2j])
>>> y = x.conj()
>>> z = y.imag
>>> print(z.is_neg())
True
>>> z.add_(2)
>>> print(x)
tensor([1.-0.j])}

`tensor.numpy()` now throws `RuntimeError` when called on a tensor with conjugate or negative bit set (#61925).

Because the notion of conjugate bit and negative bit doesn’t exist outside of PyTorch, calling operations that return a Python object viewing the same memory as input like .numpy() would no longer work for tensors with conjugate or negative bit set.

1.9.1	1.10.0
_{>>> x = torch.tensor([1+2j]) >>> y = x.conj().imag >>> print(y.numpy()) [2.]}	_{>>> x = torch.tensor([1+2j]) >>> y = x.conj().imag >>> print(y.numpy()) RuntimeError: Can't call numpy() on Tensor that has negative bit set. Use tensor.resolve_neg().numpy() instead.}

1.9.1

1.10.0

_{>>> x = torch.tensor([1+2j])
>>> y = x.conj().imag
>>> print(y.numpy())
[2.]}

_{>>> x = torch.tensor([1+2j])
>>> y = x.conj().imag
>>> print(y.numpy())
RuntimeError: Can't call numpy() on Tensor that has negative
bit set. Use tensor.resolve_neg().numpy() instead.}

Autograd

Raise `TypeError` instead of `RuntimeError` when assigning to a Tensor’s grad field with wrong type (#64876)

Setting the .grad field with a non-None and non-Tensor object used to return a RuntimeError but it now properly returns a TypeError. If your code was catching this error, you should simply update it to catch a TypeError instead of a RuntimeError.

1.9.1	1.10.0
_{try: # Assigning an int to a Tensor's grad field a.grad = 0 except RuntimeError as e: pass}	_{try: a.grad = 0 except TypeError as e: pass}

Raise error when inputs to `autograd.grad` are empty (#52016)

Calling autograd.grad with an empty list of inputs used to do the same as backward. To reduce confusion, it now raises the expected error. If you were relying on this, you can simply update your code as follows:

1.9.1	1.10.0
_{grad = autograd.grad(out, tuple()) assert grad == tuple()}	_{out.backward()}

Optional arguments to `autograd.gradcheck` and `autograd.gradgradcheck` are now kwarg-only (#65290)

These two functions now have a significant number of optional arguments controlling what they do (i.e., eps, atol, rtol, raise_exception, etc.). To improve readability, we made these arguments kwarg-only. If you are passing these arguments to autograd.gradcheck or autograd.gradgradcheck as positional arguments, you can update your code as follows:

1.9.1	1.10.0
_{torch.autograd.gradcheck(fn, x, 1e-6)}	_{torch.autograd.gradcheck(fn, x, eps=1e-6)}

In-place detach (`detach_`) now errors for views that return multiple outputs (#58285)

This change is finishing the deprecation cycle for the inplace-over-view logic. In particular, a few things that were warning are updated:

* `detach_` will now raise an error when invoked on any view created by `split`, `split_with_sizes`, or `chunk`. You should use the non-inplace `detach` instead.
* The error message for when an in-place operation (that is not detach) is performed on a view created by `split`, `split_with_size`, and `chunk` has been changed from "This view is an output of a function..." to "This view is the output of a function...".

1.9.1	1.10.0
_{b = a.split(1)[0] b.detach_()}	_{b = a.split(1)[0] c = b.detach()}

Fix saved variable unpacking version counter (#60195)

In-place on the unpacked SavedVariables used to be ignored. They are now properly detected which can lead to errors saying that a variable needed for backward was modified in-place.
This is a valid error and the ...

Assets 3

22 Sep 12:58

malfet

v1.9.1

dfbd030

Small bug fix release

PyTorch 1.9.1 Release Notes

Improvements
Bug Fixes
Documentation

Improvements

Stop warning on .names() access in max_pool2d #60059
Remove Caffe2 thread-pool leak warning #60318
Add option to skip GitHub tag validation for torch.hub.load #62139
Use log.warning in torch.distributed.run to print OMP_NUM_THREADS warning #63953
TorchElastic: Pretty print the failure message captured by @record #64036
torch.distribtued.run to set nproc_per_node to 1 by default #61552
Remove experimental API warning from torch.distributed.elastic.utils.store #60807
Deprecate use_env in torch.distributed.run #59409
Better engineering changes for torch.distributed launcher #59152

Bug fixes

Distributed / TorchElastic

Make init_method=tcp:// compatible with torch.distributed.run #63910
Fix default parameters (number of restarts, log level, number of processes per node) that regressed with the transition from torch.distributed.launch and torch.distributed.run and clarify the documentation accordingly #61294

Hub

Fix HTTP/403 error when calling torch.hub.load for TorchVision models #62072

Misc

torch.mm to check input matrix sizes shapes #61394

Documentation

Fix broken link in elastic launch doc #62378
Fix typo in torch.distribtued.run warning message #61127

Contributors

record

Assets 2

17 Aug 18:33

seemethere

v1.8.2

e0495a7

LTS 1.8.2, Wrap cub in its own namespace

PyTorch 1.8.2 Release Notes

Highlights
Bug Fixes

Highlights

We are excited to announce the release of PyTorch 1.8.2. This is the first release we are making as part of the Pytorch Enterprise Support Program. This release includes a bug fix requested by a customer in an LTS branch.
We'd like to thank Microsoft for their support and work on this release.

Bug Fixes

Wrap cub in its own namespace (#55292) (#61605)

Assets 2

15 Jun 16:06

anjali411

v1.9.0

d69c22d

PyTorch 1.9 Release, including Torch.Linalg and Mobile Interpreter

PyTorch 1.9 Release Notes

Highlights
Backwards Incompatible Change
Deprecations
New Features
Improvements
Bug Fixes
Performance
Documentation

Highlights

We are excited to announce the release of PyTorch 1.9. The release is composed of more than 3,400 commits since 1.8, made by 398 contributors. Highlights include:

Major improvements to support scientific computing, including torch.linalg, torch.special, and Complex Autograd
Major improvements in on-device binary size with Mobile Interpreter
Native support for elastic-fault tolerance training through the upstreaming of TorchElastic into PyTorch Core
Major updates to the PyTorch RPC framework to support large scale distributed training with GPU support
New APIs to optimize performance and packaging for model inference deployment
Support for Distributed training, GPU utilization and SM efficiency in the PyTorch Profiler

We’d like to thank the community for their support and work on this latest release. We’d especially like to thank Quansight and Microsoft for their contributions.

You can find more details on all the highlighted features in the PyTorch 1.9 Release blogpost.

Backwards Incompatible changes

Python API

torch.divide with rounding_mode='floor' now returns infinity when a non-zero number is divided by zero (#56893).
This fixes the rounding_mode='floor' behavior to return the same non-finite values as other rounding modes when there is a division by zero. Previously it would always result in a NaN value, but a non-zero number divided by zero should return +/- infinity in IEEE floating point arithmetic. Note this does not effect torch.floor_divide or the floor division operator, which currently use rounding_mode='trunc' (and are also deprecated for that reason).

1.8.1	1.9.0
_{>>> a = torch.tensor([-1.0, 0.0, 1.0]) >>> b = torch.tensor([0.0]) >>> torch.divide(a, b, rounding_mode='floor') tensor([nan, nan, nan])}	_{>>> a = torch.tensor([-1.0, 0.0, 1.0]) >>> b = torch.tensor([0.0]) >>> torch.divide(a, b, rounding_mode='floor') tensor([-inf, nan, inf])}

1.8.1

1.9.0

_{>>> a = torch.tensor([-1.0, 0.0, 1.0])
>>> b = torch.tensor([0.0])
>>> torch.divide(a, b, rounding_mode='floor')
tensor([nan, nan, nan])}

_{>>> a = torch.tensor([-1.0, 0.0, 1.0])
>>> b = torch.tensor([0.0])
>>> torch.divide(a, b, rounding_mode='floor')
tensor([-inf, nan, inf])}

Legacy tensor constructors and Tensor.new no longer support passing both Tensor and device as inputs (#58108).
This fixes a bug in which 1-element integer tensors were misinterpreted as specifying tensor size, yielding an uninitialized tensor. As noted in the error message, use the new-style torch.tensor(...) or torch.as_tensor(...) to copy or alias an existing tensor. If you want to create an uninitialized tensor, use torch.empty(...).

1.8.1	1.9.0
_{>>> a = torch.tensor([1]) >>> torch.LongTensor(a, device='cpu') # uninitialized tensor([7022349217739848992]) >>> a.new(a, device='cpu') tensor([4294967295]) # uninitialized}	_{>>> a = torch.tensor([1]) >>> torch.LongTensor(a, device='cpu') RuntimeError: Legacy tensor constructor of the form torch.Tensor(tensor, device=device) is not supported. Use torch.tensor(...) or torch.as_tensor(...) instead. >>> a.new(a, device='cpu') RuntimeError: Legacy tensor new of the form tensor.new(tensor, device=device) is not supported. Use torch.as_tensor(...) instead.}

1.8.1

1.9.0

_{>>> a = torch.tensor([1])
>>> torch.LongTensor(a, device='cpu') # uninitialized
tensor([7022349217739848992])
>>> a.new(a, device='cpu')
tensor([4294967295]) # uninitialized}

_{>>> a = torch.tensor([1])
>>> torch.LongTensor(a, device='cpu')
RuntimeError: Legacy tensor constructor of the form torch.Tensor(tensor, device=device) is
not supported. Use torch.tensor(...) or torch.as_tensor(...) instead.
>>> a.new(a, device='cpu')
RuntimeError: Legacy tensor new of the form tensor.new(tensor, device=device) is not
supported. Use torch.as_tensor(...) instead.}

torch.divide with rounding_mode='true' is replaced with rounding_mode=None (#51988).
torch.divide's undocumented rounding_mode='true' option has been removed, and instead rounding_mode=None should be passed to indicate no rounding should take place. This is equivalent to omitting the argument entirely.

1.8.1	1.9.0
_{>>> a, b = torch.full((2,), 4.2), torch.full((2,), 2) >>> torch.divide(a, b, rounding_mode='true') tensor([2.1000, 2.1000])}	_{>>> a, b = torch.full((2,), 4.2), torch.full((2,), 2) >>> torch.divide(a, b, rounding_mode=None) # equivalent to torch.divide(a, b, rounding_mode='true') from the prior release tensor([2.1000, 2.1000])}

1.8.1

1.9.0

_{>>> a, b = torch.full((2,), 4.2), torch.full((2,), 2)
>>> torch.divide(a, b, rounding_mode='true')
tensor([2.1000, 2.1000])}

_{>>> a, b = torch.full((2,), 4.2), torch.full((2,), 2)
>>> torch.divide(a, b, rounding_mode=None) # equivalent to torch.divide(a, b, rounding_mode='true') from the prior release
tensor([2.1000, 2.1000])}

import torch.tensor as tensor is no longer supported (#53424).
Instead, use from torch import tensor

1.8.1	1.9.0
_{>>> import torch.tensor as tensor >>> torch.tensor(1.) tensor(1.)}	_{>>> import torch.tensor as tensor ModuleNotFoundError: No module named 'torch.tensor' >>> from torch import tensor >>> tensor(1.) tensor(1.)}

1.8.1

1.9.0

_{>>> import torch.tensor as tensor
>>> torch.tensor(1.)
tensor(1.)}

_{>>> import torch.tensor as tensor
ModuleNotFoundError: No module named 'torch.tensor'
>>> from torch import tensor
>>> tensor(1.)
tensor(1.)}

binary release: numpy is no longer a required dependency
If you require numpy (and don't already have it installed) you will need to install it separately.

Autograd

torch.autograd.gradcheck.get_numerical_jacobian and torch.autograd.gradcheck.get_analytical_jacobian no longer support functions that return complex valued output as well as any other values of grad_out not equal to 1 (#55692).
This change is a part of a refactor of gradcheck’s internals. Note that gradcheck itself still supports functions with complex output. This new restriction only applies to calls to the two internal helper functions. As a workaround, you can wrap your functions to return either the real or imaginary component of its output before calling these functions. Additionally these internal helpers no longer accept any other value except 1 for grad_out for any input function. Note that these helper functions are also being deprecated in this release.

1.8.1:

get_numerical_jacobian(torch.complex, (a, b), grad_out=2.0)

1.9.0:

      def wrapped(fn):
            def wrapper(*input):
                return torch.real(fn(*input))
            return wrapper
        
        get_numerical_jacobian(wrapped(torch.complex), (a, b), grad_out=1.0)

torch.autograd.gradcheck now throws GradcheckError (#55656).
This change is a part of a refactor of gradcheck’s internals. All errors that are able to be silenced by raise_exception=False now raise GradcheckError (which inherits from RuntimeError). If you explicitly check that the type of the error is RuntimeError you'll need to update your code to check for GradcheckError instead. Otherwise if you use something like except or isinstance, no changes are necessary.

1.8.1:

# An example of a situation that will now return GradcheckError instead of
# RuntimeError is when there is a jacobian mismatch, which can happen
# for example when you forget to specify float64 for your inputs.
try:
    torch.autograd.gradcheck(torch.sin, (torch.ones(1, requires_grad=True),))
except RuntimeError as e:
    assert type(e) is RuntimeError # explicitly check type -> NEEDS UPDATE

1.9.0:

try:
    torch.autograd.gradcheck(torch.sin, (torch.ones(1, requires_grad=True),)
except RuntimeError as e:
   # GradcheckError inherits from RuntimeError so you can still catch this
   # with RuntimeError (No change necessary!)
   
   # BUT, if you explicitly check type...
   assert type(e) is torch.autograd.GradcheckError

Finished deprecation cycle for in-place view error checks (#56093).
In-place modification of views will now raise an error if that view was created by a custom function or a function that returns multiple views, or if the view was created in no-grad mode. Modifying in-place a view created in the situations above are error-prone and have been deprecated since v1.5.0. Doing these in-place modifications are now forbidden. For more information on how to work around this, see the related sections the release notes linked below:
- v1.5.0 (view created in custom autograd function, view created in no-grad block)
- v1.7.0 (section on split and chunk, i.e., functions that return multiple views).

torch.nn

Fixed regression for nn.MultiheadAttention to now apply bias flag to both in and out projection layers (#52537).
In PyTorch 1.6, a regression was introduced that caused the bias flag of nn.MultiheadAttention only to apply to the input projection layer. This caused the output projection layer to always include a bias parameter, even with bias=False specified. The regression is now fixed in PyTorch 1.9, making the bias flag correctly apply to both the input and output projection layers. This fix is BC-breaking for the bias=False case as it will now result in no bias parameter for the output projection layer.

v1.6 - v1.8.1:	pre 1.6 & 1.9.0
_{>>> mha = torch.nn.MultiheadAttenti...}

Assets 2

Releases: pytorch/pytorch

PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips are now available

Pytorch 1.13 Release Notes

Highlights

Backwards Incompatible changes

Python API

uint8 and all integer dtype masks are no longer allowed in Transformer (#87106)

Updated torch.floor_divide to perform floor division (#78411)

Fixed torch.index_select on CPU to error that index is out of bounds when the source tensor is empty (#77881)

Disallow overflows when tensors are constructed from scalars (#82329)

Error on indexing a cpu tensor with non-cpu indices (#69607)

Remove deprecated torch.eig, torch.matrix_rank, torch.lstsq (#70982, #70981, #70980)

torch.nn

Enforce that the bias has the same dtype as input and weight for convolutions on CPU (#83686)

Autograd

Disallow setting the .data of a tensor that requires_grad=True with an integer tensor (#78436)

Added variable_list support to ExtractVariables struct (#84583)

Don't detach when making views; force kernel to detach (#84893)

Contributors

PyTorch 1.12.1 Release, small bug fix release

Optim

Autograd

Distributed

NN

Data Loader

CUDA

ONNX

MPS

Other

PyTorch 1.12: TorchArrow, Functional API for Modules and nvFuser, are now available

PyTorch 1.12 Release Notes

Highlights

Backwards Incompatible changes

Python API

Complex Numbers

Fix complex type promotion (#77524)

LinAlg

Disable TF32 for matmul by default and add high-level control of fp32 matmul precision (#76509)

Sparse

Use ScatterGatherKernel for scatter_reduce (CPU-only) (#74226, #74608)

torch.nn

nn.GroupNorm: Report an error if num_channels is not divisible by num_groups (#74293)

nn.Dropout2d: Return to 1.10 behavior: perform 1D channel-wise dropout for 3D inputs

F.cosine_similarity: Improve numerical stability (#31378)

Composability

PyTorch 1.11, TorchData, and functorch are now available

PyTorch 1.11 Release Notes

Highlights

Backwards Incompatible changes

Python API

Fixed python deepcopy to correctly copy all attributes on Tensor objects (#65584)

steps argument is no longer optional in torch.linspace and torch.logspace

Remove torch.hub.import_module function that was mistakenly public (#67990)

C++ API

We’ve cleaned up many of the headers in the C++ frontend to only include the subset of aten operators that they actually used (#68247, #68687, #68688, #68714, #68689, #68690, #68697, #68691, #68692, #68693, #69840)

Custom implementation for c10::List and c10::Dict move constructors have been removed (#69370)

CUDA

Removed THCeilDiv function and corresponding THC/THCDeviceUtils.cuh header (#65472)

Removed THCudaCheck (#66391)

Removed THCudaMalloc(), THCudaFree(), THCThrustAllocator.cuh (#65492)

Build

Stopped building shared library for AOT Compiler, libaot_compiler.so (#66227)

Mobile

Make typing.Union type unsupported for mobile builds (#65556)

Distributed

torch.distributed.rpc: Final Removal of ProcessGroup RPC backend (#67363)

Quantization

Disabled the support for getitem in FX Graph Mode Quantization (#66647)

PyTorch 1.10.2 Release, small bug fix release

PyTorch 1.10.1 Release, small bug fix release

PyTorch 1.10 Release, including CUDA Graphs APIs, Frontend and compiler improvements

1.10.0 Release Notes

Highlights

Backwards Incompatible changes

Python API

torch.any/torch.all behavior changed slightly to be more consistent for zero-dimension, uint8 tensors. (#64642)

Remove deprecated torch.{is,set}_deterministic (#62158)

Complex Numbers

Conjugate View: tensor.conj() now returns a view tensor that aliases the same memory and has conjugate bit set (#54987, #60522, #66082, #63602).

Negative View: tensor.conj().neg() returns a view tensor that aliases the same memory as both tensor and tensor.conj() and has a negative bit set (#56058).

Updated `torch.floor_divide` to perform floor division (#78411)

Fixed `torch.index_select` on CPU to error that index is out of bounds when the `source` tensor is empty (#77881)

Remove deprecated `torch.eig`, `torch.matrix_rank`, `torch.lstsq` (#70982, #70981, #70980)

Enforce that the `bias` has the same dtype as `input` and `weight` for convolutions on CPU (#83686)

Disallow setting the `.data` of a tensor that `requires_grad=True` with an integer tensor (#78436)

`nn.GroupNorm`: Report an error if `num_channels` is not divisible by `num_groups` (#74293)

`nn.Dropout2d`: Return to 1.10 behavior: perform 1D channel-wise dropout for 3D inputs

`F.cosine_similarity`: Improve numerical stability (#31378)

Fixed python `deepcopy` to correctly copy all attributes on `Tensor` objects (#65584)

`steps` argument is no longer optional in `torch.linspace` and `torch.logspace`

Remove `torch.hub.import_module` function that was mistakenly public (#67990)

We’ve cleaned up many of the headers in the C++ frontend to only include the subset of `aten` operators that they actually used (#68247, #68687, #68688, #68714, #68689, #68690, #68697, #68691, #68692, #68693, #69840)

Custom implementation for `c10::List` and `c10::Dict` move constructors have been removed (#69370)

Removed `THCeilDiv` function and corresponding `THC/THCDeviceUtils.cuh` header (#65472)

Removed `THCudaCheck` (#66391)

Removed `THCudaMalloc()`, `THCudaFree()`, `THCThrustAllocator.cuh` (#65492)

Stopped building shared library for AOT Compiler, `libaot_compiler.so` (#66227)

Make `typing.Union` type unsupported for mobile builds (#65556)

`torch.distributed.rpc`: Final Removal of ProcessGroup RPC backend (#67363)

Disabled the support for `getitem` in FX Graph Mode Quantization (#66647)

`torch.any`/`torch.all` behavior changed slightly to be more consistent for zero-dimension, `uint8` tensors. (#64642)

Remove deprecated `torch.{is,set}_deterministic` (#62158)

Conjugate View: `tensor.conj()` now returns a view tensor that aliases the same memory and has conjugate bit set (#54987, #60522, #66082, #63602).

Negative View: `tensor.conj().neg()` returns a view tensor that aliases the same memory as both tensor and `tensor.conj()` and has a negative bit set (#56058).

`tensor.numpy()` now throws `RuntimeError` when called on a tensor with conjugate or negative bit set (#61925).

Raise `TypeError` instead of `RuntimeError` when assigning to a Tensor’s grad field with wrong type (#64876)

Raise error when inputs to `autograd.grad` are empty (#52016)

Optional arguments to `autograd.gradcheck` and `autograd.gradgradcheck` are now kwarg-only (#65290)

In-place detach (`detach_`) now errors for views that return multiple outputs (#58285)