Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NumPy-like Functionality Request Rollup #38349

Closed
37 of 41 tasks
mruberry opened this issue May 12, 2020 · 70 comments
Closed
37 of 41 tasks

NumPy-like Functionality Request Rollup #38349

mruberry opened this issue May 12, 2020 · 70 comments
Assignees
Labels
high priority module: numpy Related to numpy support, and also numpy compatibility of our operators OSS contribution wanted PR from open source contributors welcome to solve this issue. tracker A tracking issue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@mruberry
Copy link
Collaborator

mruberry commented May 12, 2020

PyTorch often receives requests for NumPy-like functionality. This issue is a "rollup" of these requests. It will be continue to be updated with additional requests and helpful links for implementing them.

Newer contributors, checkout the Wiki and read the Contribution Guide for help getting started.

If one of these issues requires additional discussion then, to avoid cluttering this rollup, please create a new issue titled “Implementing [Request]”, label it with “module: numpy”, and start the more focused discussion there.

Functions (Updated for the PyTorch 1.8 release):

Completed Functions

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @mruberry @rgommers @heitorschueroff
cc @aayn @n-gao

@mruberry mruberry added high priority module: numpy Related to numpy support, and also numpy compatibility of our operators OSS contribution wanted PR from open source contributors welcome to solve this issue. labels May 12, 2020
@kshitij12345
Copy link
Collaborator

Hi,

I'd like to work on fliplr and flipud

@rgommers
Copy link
Collaborator

@mruberry there's a number of functions in there that the NumPy devs consider mistakes or are outdated. We'd like to be able to deprecate those, but backwards compat discussions. Examples are heaviside (too trivial), kaiser and all other window functions (belongs in scipy.signal), ptp (unreadable, bad idea), percentile (quantile is almost identical but slightly better), sinc (belongs in scipy.special). All the *stack functions are kind of special cases of concatenate and stack, see numpy/numpy#7183.

A lot of the other functions (e.g. deg2rad) are simply utilities that would maybe be good to have somewhere, but probably not in the main torch namespace.

We'll also have to deal with existing incompatibilies between pytorch and numpy functions/objects of the same name, e.g. mapping axis to dim, the behavior of .view() and probably many more little things.

Would it make sense to curate the above list of desired functions a little, and revisit the separate torch.numpy namespace (#2228 (comment)) for adding the utility functions?

@mruberry
Copy link
Collaborator Author

Always great to hear from you, @rgommers!

Let's follow-up offline.

@mruberry
Copy link
Collaborator Author

Hi,

I'd like to work on fliplr and flipud

That's great! But no need to announce what you're working on ;) Go for it!

@carlosgmartin
Copy link

Should take be added to the list?

@carlosgmartin
Copy link

The following seems to work for quantile:

import torch
import numpy as np
from pdb import set_trace

def take(a, indices, axis):
    return a[tuple(
        slice(a.shape[dim]) if dim != axis else indices 
        for dim in range(a.ndim)
    )]

def quantile(a, q, axis):
    i = q * (a.shape[axis] - 1)
    i_hi = torch.ceil(i).long()
    i_lo = torch.floor(i).long()
    w_hi = i - i_lo
    w_lo = 1 - w_hi
    sort = a.sort(axis).values
    v_hi = take(sort, i_hi, axis)
    v_lo = take(sort, i_lo, axis)
    return w_lo * v_lo + w_hi * v_hi

while True:
    a = torch.randn(4, 5, 6).float()
    q = torch.rand(size=())
    axis = torch.randint(a.ndim, size=())
    try:
        x = quantile(a, q, axis)
        y = torch.tensor(np.quantile(a, q, axis)).float()
        assert torch.allclose(x, y, atol=1e-6)
    except AssertionError:
        print(x)
        print(y)
        set_trace()

Note that q could potentially be a tensor rather than just a scalar.

@mruberry
Copy link
Collaborator Author

The following seems to work for [quantile]...

Yes but we need a properly registered C++ implementation since PyTorch has both Python and C++ APIs.

@mruberry
Copy link
Collaborator Author

Should take be added to the list?

"take" isn't a great candidate at the moment because we already have torch.take.

@kshitij12345
Copy link
Collaborator

@mruberry
outer will just be an alias for torch.ger right?

>>> import torch
>>> import numpy as np
>>> np.outer(np.ones((5,)), np.linspace(-2, 2, 5))
array([[-2., -1.,  0.,  1.,  2.],
       [-2., -1.,  0.,  1.,  2.],
       [-2., -1.,  0.,  1.,  2.],
       [-2., -1.,  0.,  1.,  2.],
       [-2., -1.,  0.,  1.,  2.]])
>>> torch.ger(torch.ones((5,)), torch.linspace(-2, 2, 5))
tensor([[-2., -1.,  0.,  1.,  2.],
        [-2., -1.,  0.,  1.,  2.],
        [-2., -1.,  0.,  1.,  2.],
        [-2., -1.,  0.,  1.,  2.],
        [-2., -1.,  0.,  1.,  2.]])

@carlosgmartin
Copy link

"take" isn't a great candidate at the moment because we already have torch.take.

torch.take doesn't seem to take an axis like the numpy version.

@mruberry
Copy link
Collaborator Author

outer will just be an alias for torch.ger right?

Good point! I'll remove it from the list since we'll be handling aliases later. Thanks @kshitij12345!

@mruberry
Copy link
Collaborator Author

"take" isn't a great candidate at the moment because we already have torch.take.

torch.take doesn't seem to take an axis like the numpy version.

Yes, sorry I should have elaborated that this list is intended for entirely new functions, not for functions that exist in PyTorch but don't have the same behavior as those functions in NumPy. Those are out of scope for this issue. (They're totally valid separate issues, however.)

@vadimkantorov
Copy link
Contributor

vadimkantorov commented May 17, 2020

One problem with special numpy-like nan* ops is that without an agreed and clearly pronounced principle, many other non-numpy ops may demand nan* versions as well: e.g. topk #16762

One problem about fliplr/flipud is that they assume about which dimensions they concern, if I understand well. But in PyTorch usecases, there can be more flexible choices (as opposed to NumPy): BCHW is more popular logical format, but if someone has BHWC logical format, then what will fliplr/flipud do?

@mruberry
Copy link
Collaborator Author

I don't think the behavior of fliplr and flipud is ambiguous. You are correct that their names suggest a certain perspective on the data, but that's typical of many functions.

From my discussion with @rgommers there probably won't be many (any?) new nan- functions, but the existing ones are frequently used and have been requested.

@vadimkantorov
Copy link
Contributor

About nan: an alternative way could be adding some nan-controlling option to regular torch.mean/sum/... and then redirecting these numpy nan* versions to use that option enabled. Like that, if we have pytorch (not numpy) functions wanting to support nan skipping, the interface would be the same (e.g. skip_nan = True or nan_policy = 'skip') and no function nan* aliases will have to be created

@rgommers
Copy link
Collaborator

From my discussion with @rgommers there probably won't be many (any?) new nan- functions

Dug up the most recent discussion on this with a summary of the feeling of the NumPy devs about this: numpy/numpy#13198 (comment)

@vadimkantorov
Copy link
Contributor

vadimkantorov commented May 18, 2020

Another related: SciPy.Stats seems to have nan_policy concept (not saying that any choice is much better than the other - just that the made choice should be hopefully made clear in some new "NaN/Inf handling" docs section)

@alvgaona
Copy link
Contributor

Awesome, thanks @rgommers

clrpackages pushed a commit to clearlinux-pkgs/pytorch that referenced this issue Mar 9, 2021
….8.0

76181208+imaginary-person@users.noreply.github.com (1):
      Remove unnecessary dtype checks for complex types & disable complex dispatch for CPU min/max pointwise ops (#50465)

AJ San Joaquin (1):
      Add SELU Activation to calculate_gain (#50664)

Abaho Katabarwa (1):
      Use CAFFE2_USE_MSVC_STATIC_RUNTIME to determine when to avoid waiting for global destructors on Windows (#43532)

Abdelrauf (2):
      Vsx initial support issue27678 (#41541)
      add mising VSX dispatches (#51217)

Aiden Nibali (1):
      Add three-phase option to OneCycleLR (#42715)

Ailing Zhang (26):
      Add alias dispatch key DefaultBackend. (#45718)
      Support DefaultBackend keyword in native_functions.yaml. (#45719)
      Update native_functions.yaml to add DefaultBackend. (#45938)
      Revert D24165889: Update native_functions.yaml to add DefaultBackend.
      Avoid computing AutogradKey if not needed. (#46252)
      [Reland] Update native_functions.yaml to add DefaultBackend. (#46236)
      Add guideline about which dispatch keyword to use in native_functions.yaml. (#46126)
      Allow Undefined to get kernel from Math/DefaultBackend. (#46352)
      Remove codegen for old RegistrationDeclarations.h (#46370)
      Update VariableTypeManual.cpp to not use catchAllKernel. (#46353)
      Remove variable_excluded_from_dispatch() check for factory functions. (#46371)
      [WIP] Move catchAll to Math (#45939)
      Remove catchAllKernel_. (#46354)
      Update error message to include link to readme. (#46613)
      Check support_as_strided before using empty_strided. (#46746)
      view_as_real doesn't work for all backends since it relies on strides. (#47018)
      Port math kernel for layer_norm from pytorch/xla. (#47882)
      Fix index_put doc. (#48673)
      Only 1 TensorImpl allocation in differentiable views. (#48896)
      Saves a copy of vector<Tensor> in view ops returning TensorList. (#49149)
      Relax the atol/rtol of layernorm math kernel test. (#49507)
      Bring back math_silu_backward which works for all backends. (#49439)
      Dump state when hitting ambiguous_autogradother_kernel. (#50246)
      Add cloud-tpu-client to xla CI. (#50823)
      Improve docs around Math/DefaultBackend & add PythonDispatcher class. (#50854)
      Move USE_NUMPY to more appropriate targets (#51143)

AishwaryaKalloli (1):
      added docs to nn.rst (#48374)

Akifumi Imanishi (2):
      Add LazyConvXd and LazyConvTransposeXd (#47350)
      Add LazyBatchNormXd (#51548)

Akshit Khurana (6):
      Mark overriden Tensor method `override` (#47198)
      Free original weight after prepacking in XNNPACK based op (#46541)
      Preserve memory format in qconv op (#49533)
      Back out "[pytorch][PR] Preserve memory format in qconv op" (#49994)
      Make sure ConstantPadNd op preserves memory format (#50898)
      Fix memory leak in qnnpack ops (#51612)

Alban Desmaison (14):
      Revert D24486972: [quant][graphmode][fx] Support sigmoid/hardsigmoid/tanh in qat
      Revert D24262885: [pytorch][PR] Added foreach_zero_ API
      Fix backcompat in master following revert (#46984)
      Back out "Providing more information while crashing process in async error handling" (#47185)
      Update doc to reflect current behavior (#46937)
      Add release note scripts (#47360)
      Autograd engine, only enqueue task when it is fully initialized (#50164)
      Update autograd related comments (#50166)
      Add link to tutorial in Timer doc (#50374)
      Add range assert in autograd engine queue lookup (#50372)
      Revert D26113953: [pytorch][PR] [ZeroRedundancyOptimizer] Elastic and pytorch compatible checkpoints
      Revert D26246231: [FX] Edits after comprehensive pass over docs
      Revert D26276903: [pytorch][PR] Add LazyBatchNormXd
      Revert D26009829: Optimize relu on cpu using clamp_min

Alberto Alfarano (1):
      Support MatMul in c2_pt_converter

Alex Henrie (3):
      Fix return value of _vmap_internals._get_name (#49951)
      Unused variables in neural net classes and functions (#50100)
      Unused exception variables (#50181)

Alex Suhan (4):
      [TensorExpr] Support LLVM versions 8 through 12 (#47033)
      [TensorExpr] Fix LLVM 12 build after LLVM API changes (#47480)
      Fix get_overlap_status for tensors without storage (#49638)
      [TensorExpr] Use wider type for scalars (#50774)

Alexander (2):
      Sparse-sparse matrix multiplication (CPU/CUDA) (#39526)
      Sparse benchmarking utils (#48397)

Alexander Golynski (3):
      Add warning on ProcessGroup and ProcessGroup::Work APIs (#46220)
      fix backwards compatibility for #48711 and its revert (#49240)
      fix slow windows test (#49258)

Alexander Grund (4):
      Workaround for bug in DistributedDataParallel (#46186)
      Undefine bool and vector after including altivec.h (#46179)
      Replace list(map(...)) constructs by list comprehensions (#46461)
      Replace map(lambda constructs (#46462)

Alexandre Saint (1):
      [cpp-extensions] Ensure default extra_compile_args (#45956)

Aliaksandr Ivanou (1):
      Add exception classification to torch.multiprocessing.spawn (#45174)

Allan Di Wu (1):
      [pytorch][glow][NNPI] Using int32 as indices for embedding_bag operators (#45878)

Amogh Akshintala (3):
      Add kernel launch checks in caffe2/aten/src/ATen/native/cuda/ (#49269)
      Add Kernel Launch Checks to files under caffe2/aten/THC (#49358)
      Replace THError() check in THCTensorMathReduce.cu with C10_CUDA_KERNEL_LAUNCH_CHECK() (#49424)

Andrei Vukolov (1):
      Workaround to pay attention for CUDA version (#46535)

Andres Suarez (1):
      [codemod][fbcode/caffe2] Apply clang-format update fixes

Andrew Millspaugh (1):
      Add string versions of argument funcs in jit Node (#45464)

Andrey Malevich (2):
      [DPER] Introduce barrier operation to force synchronization of threads in async execution (#49322)
      [C2] Revive unsafe CoalesceOp (#49402)

Andrii Grynenko (1):
      Patch death tests/fork use after D25292667 (part 3)

Andy Zhang (1):
      Define objects using classes instead of namedtuples in torch.utils.data._utils.worker (#45870)

Anjali Chourdia (1):
      Add type annotation logic for complex numbers (#50884)

Ankur Singla (3):
      [caffe2][memonger] Add support for distributed inference predict nets in DAG memonger
      [caffe2][memonger] Add support for distributed inference predict nets in DAG memonger (#47718)
      [caffe][memonger] Extend operator schema check to dag memonger (#48021)

Ansha Yu (4):
      [static runtime] fuse inference ops (1) (#48948)
      [aten] index_select dim 1 (#47077)
      [pt] fuse ClipRangesGatherSigridHash (#49181)
      [aten] embedding_bag_byte_rowwise_offsets_out (#49561)

Anshul Jain (B*8) (2):
      [MaskR-CNN] Add int8 aabb bbox_transform op
      [Mask R-CNN]Add Int8 AABB Generate proposals Op

Anshul Jain (FRL) (1):
      [Mask R-CNN]Add Int8 AABB Generate proposals Op (#49574)

Ansley Ussery (31):
      Add function signature for pixel_shuffle (#45661)
      Change signature for torch.poisson (#45656)
      Change type inferred from empty annotation (#45360)
      Fix stride printing/parsing formatting (#45156)
      Support %-based string formatting (#45976)
      Allow for source code comments at any level of indentation (#46548)
      Fix grammar and spelling errors (#46713)
      Create prototype for AST rewriter (#46410)
      Create prototype for AST rewriter (#47216)
      Support default args in symbolic tracing (#47615)
      Allow for string literal return during symbolic tracing (#47618)
      Update Graph docstring to match `__init__.py` (#48100)
      Add dict comprehension (#47774)
      Support the `in` operator with str (#47057)
      Fix errata (#49903)
      Document single op replacement (#50116)
      Update op replacement tutorial (#50377)
      Add docstring for Proxy (#50145)
      Create subgraph rewriter (#49540)
      Assemble technical overview of FX (#50291)
      Make `split_module` results deterministic (#50470)
      Improve call provenance during GraphModule scripting (#50538)
      Add alternative prettyprinting method to `Graph` (#50878)
      snake_case FX IR names (#50876)
      Document example of Proxy use (#50583)
      Clarify logic in `ir_emitter` (#51299)
      Handle error during dict expansion (#51374)
      Write FX Subgraph Rewriter tutorial (#51531)
      Correct subgraph rewriter pattern containment rules (#51529)
      Document FX debugging (#51530)
      Extend subgraph_rewriter logic (#51532)

Anthony Liu (3):
      Add benchmark for per channel tensor quantization (#46017)
      quantize_tensor_per_channel ARM implementation (#46018)
      Fix rounding error flakiness in quantized_test (#47468)

Anthony Shoumikhin (1):
      [qnnpack] Fix unused var warning when building for different archs. (#48730)

Antonio Cuni (7):
      Migrate `eig` from the TH to Aten (CUDA) (#44105)
      Migrate `eig` from the TH to Aten (CUDA) (#44105)
      Implement torch.linalg.qr (#47764)
      Implement torch.linalg.svd (#45562)
      Improve torch.linalg.qr (#50046)
      Fix MKL builds on Ubuntu (#50212)
      Add torch.eig complex forward (CPU, CUDA) (#49168)

Arindam Roy (4):
      Enable Skipped ROCM Tests in common_nn.py (#50753)
      Skip test_lc_1d for ROCM (#50964)
      Enable ROCM Skipped tests in test_ops.py (#50500)
      Enable rocm tests in common nn (#51227)

ArtistBanda (1):
      Spurious numpy writable warning (#47271)

Ashkan Aliabadi (27):
      Redo Vulkan command and descriptor pools. (#44496)
      Add image sampler. (#45037)
      Add fence. (#45148)
      Revert D24395956: [pytorch][PR] Replace flatten tensors with flatten loops.
      Add Vulkan job dispatch and flush. (#46008)
      Provide CMake option to enable Vulkan API. (#46503)
      Revert D24004795: [quant] Add FixedQParamsFakeQuantize module
      Add Vulkan Tensor. (#44015)
      Add Vulkan Tensor factory. (#44016)
      Add Vulkan tensor copy. (#46481)
      Add Vulkan op Add. (#44017)
      Vulkan tweaks (#47261)
      Add Vulkan op Conv2D. (#46900)
      Vulkan MobileNetv2 unit test. (#47616)
      Fix Vulkan empty (and family) breakage as a result of API update. (#47937)
      Update VMA. (#47727)
      Tweak Vulkan memory use. (#47728)
      Vulkan linear memory allocator. (#48569)
      Force a sync on non-CPU tensors for the benchmark to reflect the timing accurately. (#48856)
      Remove incorrect usage of layout(std430) on uniform buffers, correctly now treated as error in the latest release of Vulkan SDK. (#49572)
      Add android.permission.INTERNET permission to Android test_app. (#49996)
      Optimize Vulkan command buffer submission rate. (#49112)
      Vulkan convolution touchups. (#50329)
      Define static constexpr variable in at::native::vulkan:::api::Handle. (#51006)
      Remove debug-only assertion from vulkan::api::Command::Command as the buffer can legitimately be null. (#51160)
      [Vulkan] Improve error handling in a few places. (#51423)
      [Vulkan] Remove redundant qualifiers on writeonly images. (#51425)

Ayush Saraf (1):
      [pytorch][quantization] adding jit state for QuantizedLeakyReLU (#47660)

Ayush Sharma (1):
      grammatically update index.rst (#45801)

Basil Hosmer (9):
      a few more comments on dispatch key computation methods (#46128)
      [dispatcher] avoid autograd fixup step on non-backend keys (#46135)
      reland fast TypeMeta/ScalarType conversion (#45544)
      pass TypeMeta by value (#45026)
      track Half/ComplexHalf default dtype (#45043)
      faster TensorOptions merging (#45046)
      use bitfield to shrink TensorImpl (#45263)
      faithful signature support in BoxedKernelWrapper (#47267)
      remove unused params in scalar_tensor_static (#48550)

Ben Koopman (2):
      Add Sigmoid operator from Caffe2 (#46286)
      fp16 -> fp32 EmbeddingBag moved into CPU impl (#47076)

Benjamin Lefaudeux (2):
      ZeroRedundancyOptimizer: an implementation of a standalone sharded optimizer wrapper (#46750)
      [ZeroRedundancyOptimizer] Elastic and pytorch compatible checkpoints (#50956)

Bert Maher (47):
      [te] Get llvm codegen to compile with llvm9 and llvm-fb (#45726)
      [te] Add a 2D convolution example test (#45514)
      [te] Add a benchmark harness (#45875)
      [te] Tiled (m=32 x n=32) gemm benchmark (#45905)
      [te][llvm] Enable fused multiply-add (fma) in code generation (#45906)
      [pytorch][te] Add compilation time benchmark (#46124)
      [pytorch][tensorexpr] Promote integer arguments to sin/cos/tan to float (#46776)
      [pytorch] Improve/fix heuristics for using mkldnn vs native conv (#46675)
      [tensorexpr] Fix registration of intrinsics on llvm-fb (#47540)
      [pytorch][te] Don't start TE fusion groups with an unknown-typed result (#47884)
      [pytorch][te][easy] Remove KernelScope from fusion pass tests (#47952)
      [pytorch][te] Do not merge Tensor[] variant of aten::where into fusion group (#48063)
      [pytorch][te] Handle negative axis in chunk (#48084)
      [torch][te] aten::type_as is unary, not binary (#48085)
      [te] Make BUILD_TENSOREXPR_BENCHMARK a real CMake option (#48158)
      [te][benchmark] Add more optimized versions of gemm (#48159)
      [tensorexpr] Switch cpp tests to pure gtest (#48160)
      [te] Fix pow (#48213)
      Disable fast sigmoid since it causes divergence (#48623)
      [te] Fix comparison ops on booleans (#48384)
      [te] Don't fuse integer fmod or remainder (#48700)
      [te] Fix spacing in graph dump (#48829)
      [te] Remove vestigial __init__.py from test/cpp/tensorexpr (#49061)
      [pe] Add gflags for num_profiled_runs and bailout_depth, laint (#49059)
      [te] Add gflag for fast intrinsic expansion (#49060)
      [te] Don't throw when re-registering a CodeGen factory (#49174)
      [te] Fix clamp with uint8 args (#49143)
      [te] Use Dtype::is_signed instead of an ad hoc local predicate. (#49147)
      [te] Use c10::ScalarType utility functions in te::Dtype (#49148)
      [te] Ban uint8 tensors from fusion groups (#49247)
      [te] Fix bugs with shift operators (#49396)
      [te] Create TargetMachine only once with correct options to fix perf (#50406)
      [te] Benchmark comparing fused overhead to unfused (#50305)
      [te] Optimize allocation of kernel outputs (#50318)
      Reapply D25856891: [te] Benchmark comparing fused overhead to unfused (#50543)
      Reapply D25859132: [te] Optimize allocation of kernel outputs (#50546)
      [te] Speed up relu on cpu
      [static runtime] Shortcut resize_({0})
      [nnc] Per-operator benchmarks (#51093)
      [nnc] Refactor generation of intrinsics to reduce the amount of macro-hell (#51125)
      [nnc][trivial] Refactor llvm_jit so the wrapper class doesn't depend on ifdefs (#51186)
      [nnc] Use sleef if its symbols are available (#51187)
      [nnc] Expose vectorized math functions to jit fuser. (#51190)
      [nnc] Don't use sleef where it's slower (#51246)
      [nnc] Tweak log_nnc_sleef so vectorization kicks in (#51491)
      [nnc] Vectorize bitwise ops (#51492)
      Optimize relu on cpu using clamp_min (#50924)

Bharat123rox (1):
      [DOCS]Correct docs for torch.lu_solve (#47762)

Blaise Sanouillet (1):
      [caffe2/FC DNNLOWP] Shrink Y_int32_ vector capacity when appropriate

Boris Valkov (1):
      Benchmark combining Distributed Data Parallel and Distributed RPC (#46993)

Bowen Bao (3):
      [ONNX] Fix scripting rand/randn/where (#45793)
      [ONNX] Fix dtype for log_softmax export (#46627)
      [1.8] Fix onnx mixed precision export for layernorm & fuseLogSoftmaxNllLoss (#52510)

BowenBao (35):
      [ONNX] Add dim_param support in export with onnx shape inference (#44920)
      [ONNX] Update squeeze test for opset 9 (#45369)
      [ONNX] Update ONNX doc for indexing export (#46349)
      [ONNX] Support nd mask index in opset >= 11 (#45252)
      [ONNX] bump CI ort to 1.5.2 rel for stability (#46595)
      [ONNX] Slightly improve indexing with ellipsis under scripting (#46571)
      [ONNX] Fix eye export (#47016)
      [ONNX] Improve stability of gemm export (#46570)
      [ONNX] Enable onnx shape inference in export by default (#46629)
      [ONNX] Support nonzero(*, as_tuple=True) export (#47421)
      [ONNX] Remove usage of isCompleteTensor() in symbolic functions (#48162)
      [ONNX] Use parameter values in onnx shape inference (#49706) (#50905)
      [ONNX] Support opset13 Squeeze and Unsqueeze (#50150) (#50906)
      [ONNX] Update Reducesum operator for opset 13 (#50532) (#50907)
      [ONNX] Add binary_cross_entropy_with_logits op to ONNX opset version 12 (#49675) (#50908)
      [ONNX] Add logical_and, logical_or, logical_xor torch op support in pytorch exporter (#50570) (#50909)
      [ONNX] Enable _jit_pass_onnx_fold_if only when dynamic_axes is None (#50582) (#50910)
      [ONNX] Support gelu for fp16 export (#50487) (#50911)
      [ONNX] Replace optional parameters of Resize with placeholder for ops13. (#50574) (#50954)
      [ONNX] Fix param names (#50764) (#50955)
      [ONNX] Update constant-folding of Gather op (#50554) (#51514)
      [ONNX] Fix bug in unfold symbolic (#50504) (#51515)
      [ONNX] Improve error message for parse_arg in symbolic functions (#50512) (#51516)
      [ONNX] Export get/set attribute nodes (#50768) (#51517)
      [ONNX] Enable remaining failed tests in opset13 (#50806) (#51518)
      [ONNX] Add silu operator support for onnx (#51193) (#51519)
      [ONNX] Fix graph position to insert clone node for inplace op removal (#50123) (#51520)
      [ONNX] Fix graph sequence output from loop node (#51305) (#51521)
      Update error message that displays when encountering an op unsupported for ONNX export. (#51387) (#51522)
      [ONNX] Enable Constant Folding for ONNX Opset 13 (#51096) (#51523)
      [ONNX] Update unsafe_chunk() method to support new version 13 of Split operator. (#51415) (#51524)
      [ONNX] Fix opset 11 ConstantChunk with negative dim (#51396) (#51525)
      [ONNX] Support list remove for onnx export (#51373) (#51526)
      fix bug (#51222) (#51527)
      [ONNX] Modifications in remove inplace ops passes to better handle binary inplace ops (#51318) (#51572)

Bradley Davis (1):
      [jit][tracer] allow traced modules to return dicts with tuple values when strict=False (#49568)

Bram Wasti (18):
      [jit] Prevent caching of `graph` attribute. (#46960)
      [TorchScript] Support user defined classes as constants (#5062)
      [static runtime] Initial memonger (#47759)
      [static runtime] add more _out variants (#48260)
      [static runtime] add static registry (#48258)
      [static runtime] Add Internal Ops to the registry (#48616)
      [te] Add BitCast to the IR
      Revert D25441716: [te] Add BitCast to the IR
      [static runtime] add static subgraph fusion pass (#49185)
      [te] Add BitCast to the IR (#49184)
      [static runtime] refine fusion group (#49340)
      [te] Add fast log approximation based on sleef
      [te][reapply] Add fast log approximation based on sleef (#49575)
      [static runtime] Remove register concept by giving ownership to the nodes (#50050)
      [nnc] Expose fast tanh/sigmoid (#50736)
      [torch vitals] Initial implementation (#51047)
      [nnc] Add benchmarks
      [torch vitals] move into namespace and fix windows tests

Brandon Lin (2):
      Implement LengthsToOffsets operator in Caffe2 (#46590)
      [FX] fix Graph python_code return type annotation (#49931)

Brian Hirsh (39):
      Some fixes to smooth_l1_loss (#45532)
      Revert D24002415: Some fixes to smooth_l1_loss
      migrating the take() fn from TH to ATen (#45283)
      migrate cuda implementation of take() from TH to ATen (#45430)
      make broadcasting explanation clearer in matmul doc: #22763 (#45699)
      some documentation and style fixes to smooth_l1_loss (#45587)
      remove beta defaulting in smooth_l1_loss_backward. added to the bc whitelist (#45588)
      adding complex support for distributed functions and . fix #45760 (#45879)
      fix #45552 - adding add_done_callback(fn) to torch.futures.Future (#45675)
      adding BAND/BOR/BXOR reduce ops to unsupported list for complex numbers. added tests (#46270)
      detect inplace modifications of views earlier (fix #21875) (#46204)
      first cut of adding a dangling impl test. fix #45165 (#46484)
      small doc fix (#46599)
      explicitly error out in comparison ops when the types don't match (#46399)
      Revert D24335982: explicitly error out in comparison ops when the types don't match
      Codegen - error when an argument that looks like an out argument isn't a kwarg (fix #43273) (#47284)
      Revert "Revert D24335982: explicitly error out in comparison ops when the types don't match" (#47288)
      remove ops in the __caffe2 namespace (#47318)
      update legacy dispatcher registration API tests to avoid duplicate def() calls (#47319)
      rename macro. TORCH_LIBRARY_FRAGMENT_THIS_API_IS_FOR_PER_OP_REGISTRATION_ONLY to TORCH_LIBRARY_FRAGMENT (#47320)
      migrate export_caffe2_op_to_c10.h macros to the new dispatcher registration API (#47321)
      make duplicate def() calls an error in the dispatcher. Updating all fb operators to use the new dispatcher registration API (#47322)
      Revert D24714803: make duplicate def() calls an error in the dispatcher. Updating all fb operators to use the new dispatcher registration API
      fix BC test, after removing __caffe2 ops (#48099)
      migrate export_caffe2_op_to_c10.h macros to the new dispatcher registration API (#48097)
      Revert D25056091: migrate export_caffe2_op_to_c10.h macros to the new dispatcher registration API
      migrate export_caffe2_op_to_c10.h macros to the new dispatcher registration API, update code_analyzer regex (#48308)
      Updating all call-sites of the legacy dispatcher registration API in fbcode to the new API. (#48178)
      pyi codegen update - remove Declarations.yaml (#48754)
      make validate debug-only in Device copy ctr (#47854)
      fix clang-tidy warning - make global TorchLibraryInit objects const (#48956)
      migrating some straggler pytorch ops in fbcode to the new registration API (#48954)
      make duplicate def() calls an error in the dispatcher (#48098)
      pyi cleanup (#49054)
      pyi codegen - removing byte-for-byte compatibility hacks (#49055)
      pyi codegen - removing byte-for-byte-compatibility hacks (sorting overloads) (#49056)
      pyi codegen refactor - no need to group python signatures by overload name (#49057)
      Revert "Revert D25003113: make validate debug-only in Device copy ctr" (#49123)
      fix test_dispatch tests to error on duplicate def (#49254)

Brian Johnson (1):
      Update index.rst (#47282)

Brian Skinn (1):
      Update _torch_docs.py (#51212)

Brian Vaughan (1):
      Revert D24924236: [pytorch][PR] [ONNX] Handle sequence output shape and type inference

Bugra Akyildiz (5):
      Refactor gather_ranges_to_dense from Python to C++ (#46021)
      Add operator benchmark for 4bit/8bit embedding lookups
      Add last_n_window_collector
      Add sub operator
      Replace `GatherRangesToDense` operator in Dper from c2 to pt.

Cameron Burnett (1):
      [Bootcamp] add CUDA kernel checks to ATen/native/cuda (#47466)

CedricPicron (1):
      Fix incorrect warnings in ParameterList/Dict (#48315)

Charles Coulombe (1):
      Conditional requirement for py3.6 only (#46932)

Chen Lai (5):
      [Pytorch][Annotation] Update inlined callstack with module instance info (#46729)
      [Pytorch][Annotation] Update inlined callstack with module instance info (#47416)
      Remove unused reconstruct_scopes function (#48822)
      reuse consant from jit (#49916)
      Back out "reuse consant from jit" (#50521)

Cheng Chang (2):
      [NNC] Implement Cond in LLVM codegen (#47256)
      [NNC] Generate C++ code for Allocate and Free (#51070)

Chester Liu (6):
      Cleanup unused code for Python < 3.6 (#47822)
      Reorganize and refine the Windows.h import in C++ files (#48009)
      Use Unicode friendly API on Win32 in THAllocator (#47905)
      Fix cl.exe detection in cpu/fused_kernel.cpp (#50085)
      Use Unicode friendly API in fused kernel related code (#49781)
      Fix warning when running scripts/build_ios.sh (#49457)

Christian Hudon (1):
      Add nvtx.range() context manager (#42925)

Christian Puhrsch (1):
      Add NestedTensor specific dispatch key to PyTorch (#44668)

Chunli Fu (2):
      [shape inference] fix ConstantFill
      [script] Validator for unsupported ops on accelerator

Daily, Jeff (3):
      passing all arguments to sccache wrapper script should be quoted as "$@" (#45582)
      [ROCm] update GPG key URL in circleci Dockerfile (#46256)
      [ROCm] update debug flags (#46717)

Dan Fan (1):
      add and adjust kernel launch checks under fbcode/caffe2/caffe2/utils (#50862)

Daniel Balchev (1):
      Set caffe2::pthreadpool() size in ParallelOpenMP (#45566)

Daniil Osokin (1):
      Allow zero annealing epochs (#47579)

Danny Huang (3):
      [caffe2] plan executor error propagation test with blocking cancellable op (#45319)
      [caffe2] temp remove ErrorPlanWithCancellableStuckNet (#46080)
      [caffe2] add PlanExecutorTest ErrorPlanWithCancellableStuckNet (#46110)

David (5):
      [ONNX] Convert _len based on the first dimension length (#47538)
      [ONNX] Cast Gather index to Long if needed (#47653)
      [ONNX] Cast Gather index to Long if needed (#47653)
      [ONNX] Handle dynamic input axes for prim_ConstantChunk (#48176)
      [ONNX] Handle Sub-block index_put in _jit_pass_onnx_remove_inplace_ops_for_onnx (#48734)

David Clissold (1):
      add missing return statement to inlined vec_signed (#51116)

David Fan (2):
      [ONNX] Reimplement _var_mean to ensure non-negative (#47240)
      [ONNX] Update ONNX doc for writing pytorch model (#46961)

David Reiss (5):
      Revert D24024606: [FX] Shape propagation example
      Update default ouput extension in optimize_for_mobile.cc (#45598)
      Add inputsSize to Python IR, like outputsSize (#46779)
      Add a command-line flag for overriding pthreadpool size (#46781)
      PyTorch NNAPI integration prototype (#46780)

Daya Khudia (2):
      [pt][quant] Support either min or max in qclamp (#45937)
      [caffe2][dnnlowp] Remove openmp usage in quantize dnnlowp op

Dhruv Matani (17):
      [RFC] Generate generated_unboxing_wrappers_everything.cpp for unboxing wrappers codegen to aid debugging (#45872)
      [PyTorch] Stringize kernel tag names consistently during macro expansion and require all tag names to be a compile time character array (#46074)
      [RFC] Switch PyTorch Selective Build (Custom Build) to use the SelectiveBuilder abstraction (#45722)
      [PyTorch] [BUCK] Replace pt_deps.bzl with a YAML operator dependency file which is generated by the code analyser (#46057)
      [RFC] Add OperatorHandle overload to the RecordFunction::before() method (#46401)
      [RFC] Better error message in case operator could not be run (#46885)
      [PyTorch Mobile] Add continuous build config for xplat/caffe2
      [PyTorch Mobile] Record dtypes for tensors used in kernel function implementations (#48826)
      [PyTorch Mobile] Export Operator List from Mobile CompilationUnit instead of from TorchScript Model (#49385)
      [PyTorch Mobile] Generate Kernel dtype selection code in selected_mobile_ops.h during the build (#49279)
      [PyTorch Mobile] Skip signature check when converting to typed operator handle (#49469)
      [Pytorch Mobile] Remove caching (in code) of interned strings (#50390)
      [PyTorch Mobile] Eliminate static default_extra_files_mobile from header import.h (#50795)
      [PyTorch] Eliminate static default_extra_files_mobile from header import.h (#50832)
      [PyTorch Mobile] Add an overload for deserialize() that doesn't accept the extra_files map. (#50932)
      [PyTorch Mobile] Enable partial loading of GPU models on linux CPU machines (#51236)
      [PyTorch Mobile] Skip inferring function schema from the C++ function type (#50457)

Dmytro Dzhulgakov (3):
      Revert D24042344: [C2] Add string equality operator
      Revert D23398534: [pytorch][PR] [ONNX] Improve error handling for adaptive_pool
      [caffe2] Disable running full grad check in tests by default

Donny Greenberg (1):
      [JIT] Add `__prepare_scriptable__` duck typing to allow replacing nn.modules with scriptable preparations (#45645)

Edson Romero (1):
      Add support for torch.tensor_split to accept a tensor for `indices` argument (#49169)

Edvard Ghazaryan (2):
      Optimize torch zeros (#45636)
      added fuse_op and list_construct - list_unpack pass

Edward Yang (68):
      Switch all Sequences in tools.codegen.model to Tuple (#45127)
      Add NativeFunction.signature and kind. (#45131)
      Revert D24027761: Update backward definition for more operators and reenable tests in test_ops.py
      Rewrite implementation of faithful cpp signatures (#45890)
      Add NativeFunctionGroup (#45918)
      Reorder dispatcher/legacy_dispatcher types (#45973)
      Rename legacy_dispatcher to native. (#45974)
      Remove unnecessary byte-for-byte compatibility code that is not needed. (#45975)
      Refactor dispatcher and native to use Signature structure. (#45990)
      Add some more docs to expecttest. (#46263)
      Delete Vulkan from code generator. (#46938)
      Some miscellaneous cleanup in codegen (#46940)
      Desugar missing dispatch field into singleton Math entry (#46970)
      Delete SchemaRegister.cpp, make flag operate on TypeDefault.cpp (#46991)
      Delete TypeDefault call code generation logic in VariableType (#47000)
      Delete TypeDefault.h and TypeDerived.h codegen entirely. (#47002)
      Revert D24649817: [pytorch][PR] Fix pickling for Tensor subclasses.
      Revert D24730264: [pytorch][PR] Added CUDA support for complex input for torch.inverse
      Stop including TypeDefault.h from MPSCNNTests.mm (#46998)
      Convert from higher order functions to classes in tools.codegen.gen (#47008)
      ATen DerivedType is dead, long live ATen RegisterDispatchKey (#47011)
      Structured kernel definitions (#45277)
      Get TestTorch.test_empty_meta working again (#48113)
      Pruning codeowners who don't actual do code review. (#48109)
      Pin the rest of flake8 dependencies. (#48590)
      ret is never reassigned, return 0 directly (#48609)
      TensorIteratorConfig is not used by reorder_dimensions (#48613)
      Structured kernels generate Meta registrations (#48116)
      Refactor argument fields in FunctionSchema to Arguments (#48182)
      Move argument grouping into FunctionSchema (#48195)
      Refactor TensorIterator to do allocations via MetaBase::set_output (#48659)
      Move var and std overloads to Functions.cpp and remove native:: reference (#48683)
      Delete NativeFunctions.h include from Functions.h (#48687)
      Generalize some TensorIterator consumers to take TensorIteratorBase (#48727)
      Fix code review from #48659 and #48116 (#48731)
      Header cleanup (#48728)
      Revert D25277886: [pytorch][PR] Replace constexpr with CONSTEXPR_EXCEPT_WIN_CUDA
      Revert D25304229: [pytorch][PR] Add type annotations to torch.onnx.* modules
      Class-based structured kernels, with migration of add to framework (#48718)
      Revert D25416620: [pytorch][PR] Add version_info tuple
      Delete some dead functions from tools.codegen.api.meta (#49041)
      Rename positional and kwarg_only to have flat prefix (#49042)
      Delete cpp.group_arguments (#49043)
      Add manual_cpp_binding to native_functions.yaml (#49092)
      Revert D25489030: [PyTorch] Make tls_local_dispatch_key_set inlineable
      Introduce tools.codegen.api.translate (#49122)
      Revert D25105217: [pytorch][PR] Fix bad error message when int overflow
      Revert D25445815: [te] Add fast log approximation based on sleef
      Construct CppSignatureGroup from NativeFunction (#49245)
      Tighten up error checking on manual_kernel_registration (#49341)
      codegen: Resolve overload ambiguities created by defaulted arguments (#49348)
      Move default or no default logic into native.argument (#49489)
      Make use_c10_dispatcher: full mandatory for structured kernels (#49490)
      Push anonymous namespace into codegen, not template (#49498)
      Back out "Revert D25757721: [pytorch][PR] Run mypy on more test files" (#50142)
      Add at::cpu namespace of functions for structured kernels (#49505)
      Fix #48903 (#50817)
      Back out "Revert D25903846: [pytorch][PR] Structured kernel definition for upsample_nearest2d" (#50794)
      Make empty_cpu sanity test CPU only in DEBUG mode (#51358)
      Test if allocator is set only in DEBUG mode. (#51360)
      Relax type signature for tools.codegen.api.translate (#51477)
      Add api.structured; switch structured kernels to use const Tensor& everywhere (#51490)
      Add support for generating faithful at::cpu signatures (#51499)
      Use Literal to model targets. (#51500)
      Split out RegisterDispatchKey to its own file (#51508)
      Factor out structured generation into its own subclass. (#51583)
      Split anonymous and namespaced definitions in RegisterDispatchKey (#51585)
      Support at::cpu on non-structured kernels (#51590)

Eli Uriegas (29):
      setup: Only include dataclasses for py < 3.8 (#45611)
      Bump nightlies to 1.8.0 (#45696)
      setup: Dataclasses only when < 3.7 (#45844)
      .circleci: Fix android publish snapshot job (#46266)
      .circleci: Add python 3.9 to linux binary build matrix (#47235)
      docker: Fix PYTHON_VERSION not propagating (#47877)
      .circleci: Add python 3.9 builds for macOS (#47689)
      docker: Make CUDA_VERSION configurable (#48199)
      third_party: Update pybind to point to fork (#48117)
      .circleci: Add python 3.9 builds for windows (#48138)
      torch: Stop using _nt_quote_args from distutils (#48618)
      docker: Add make variable to add docker build args (#48942)
      .circleci: downgrade conda-package-handling to 1.6.0 (#49434)
      .circleci: Only downgrade if we have conda (#49519)
      .github: Add action workflow to update S3 HTMLS (#49509)
      .circleci: Ignore unbound variables for conda (#50053)
      .circleci: Add option to not run build workflow (#50162)
      .circleci: Remove CUDA 9.2 binary build jobs (#50388)
      .circleci: Set +u for all conda install commands (#50505)
      tools: Move sha check to else statement (#50773)
      .github: Add GitHub Actions workflow to build wheels (#50633)
      .github: Add workflow to stale pull requests (#51237)
      .github: Update stale messaging add newlines (#51298)
      .github: Remove title from stale alert (#51306)
      .github: Up frequency of stale checks (#51365)
      [v1.8.0] [wip] doc_fix (#52006)
      .jenkins: Release branch specific updates (#51982)
      [v1.8.0] .circleci: Downgrade CUDA 11.2 -> 11.1 for binaries (#52151) (#52406)
      [v1.8.0] Various CUDA 11.1 with BUILD_SPLIT_CUDA_FIXES (#52518)

Elias Ellison (46):
      [JIT] fix dict update (#45857)
      [JIT] Fix Dict bug in constant hashing (#45929)
      [1/3] [JIT] Make sure fusion occurs in test_tensorexpr file (#45788)
      [2/3] [JIT] Make sure fusion occurs in test_tensorexpr (#45789)
      [JIT] [3/3] Make sure fusion occurs in test_tensorexpr (#45790)
      [JIT] Bind log1p and lgamma (#45791)
      [JIT] Revert Freezing shared type PR (#46285)
      [JIT] add freeze to docs (#47120)
      [JIT] fix documentation typo (#46926)
      [JIT][Reland] add list() support (#42382)
      Force LLVM Compilation for CPU Tests (#46949)
      Small changes/cleanup (#46950)
      [TensorExpr][CPU] Fix bool -> int casting (#46951)
      Add more CPU tests (#47369)
      [NNC] Fix llvm min lowering for int inputs (#47370)
      [NNC] Add more CPU Tests (#47371)
      [NNC] More cpu tests (#47372)
      [NNC] refactor cuda half support to more general file (#47373)
      [NNC] Enable unary op cpu testing (#47374)
      [JIT] Dont use specialized tensor type (#46130)
      [JIT] Metacompile boolean constants (#46721)
      Disable old fuser internally (#48322)
      [NNC] Add cpu fusion gflag (#48682)
      [TensorExpr Fuser] Handle fusing values with un-profiled uses (#48689)
      Dont use symbolic shapes check (#47810)
      [TensorExpr] Cache use of fallback in kernel invocation (#47812)
      [NNC] Compute Tensor Output Properties in ininitialization (#47813)
      [TensorExpr Fuser] Add support for nodes which have tensor constant inputs (#47814)
      Remove inferred from tensor type ctors (#48263)
      [NNC] Preserve strided output (#48264)
      [NNC] Dont inline outputs buffers on cpu (#49488)
      [NNC] Add Support For is_nan (#48973)
      [NNC] add support for masked_fill (#48974)
      Add fusion support of aten::to (#48976)
      Add more list peephole idioms (#48268)
      [NNC] Disable masked fill (#49622)
      [NNC] masked fill (#49627)
      [JIT] Constant prop getattr (#49806)
      Dont inlinine intermediates on cpu (#49565)
      fix fork formatting (#49436)
      [JIT] disable masked fill (#50147)
      [JIT] Frozen Graph Conv-BN fusion (#50074)
      [JIT] Add Frozen Conv-> Add/Sub/Mul/Div fusion (#50075)
      [JIT] Factor out peephole to own test file (#50220)
      Peephole Optimize out conv(x).dim(), which prevents BN fusion (#50221)
      Add Post Freezing Optimizations, turn on by default in torch.jit.freeze (#50222)

Elijah Rippeth (2):
      Add windows JNI support (#44257)
      fix issue by which pytorch_jni is not bundled in libtorch (#46466)

Emile van Krieken (1):
      Fix broadcast_all crashing on Tensor-likes (#48169)

Emilio Castillo (2):
      `torch.nn.modules.LazyModuleMixin` and `torch.nn.LazyLinear` (Shape Inference II) (#44538)
      Implement autograd functions for c10d communication operations (#40762)

Eric Cotner (1):
      Correct `Categorical` docstring (#45804)

Erjia Guan (24):
      Add ShuffleDataset with buffer (#45290)
      Document fix for logspace and linspace (#46056)
      Implement ravel (#46098)
      Format error message for unmatched signature between _out and base functions (#47087)
      Optimize backward for torch.repeat (#46726)
      Implement copysign (#46396)
      Revert D24481801: Optimize backward for torch.repeat
      Optimize backward for torch.repeat (#46726)
      Revert D24859919: [pytorch][PR] Grammatically updated the tech docs
      Implement C++ ModuleDict (#47707)
      Migrate `fmod` and `fmod_` from TH to ATen (CUDA) (#47323)
      [WIP][DataLoader] CollateIterableDataset prototype (#48933)
      [WIP][DataLoader] Prototype of BatchIterableDataset (#49186)
      [WIP][DataLoader] Prototype of SamplerIterableDataset (#49363)
      Fix return type Any for Ternary ops (#49165)
      Add trace batching forward/backward rule (#49979)
      Fix doc for vmap levels (#50099)
      Fix mypy typing check for test_dataset (#50108)
      Fix `fmod` type promotion (#48278)
      Fix remainder type promotion (#48668)
      Remove optional for veiw_fn during View Tracking (#50067)
      [WIP][DataLoader] Implement CallableIterableDataset (#50045)
      [WIP][DataLoader] Implement BucketBatchIterableDataset (#51126)
      [DataLoader] Rename Functional DataSet to DataPipe (#51488)

FNSTER (1):
      fix INTERNAL ASSERT FAILED for maximum (#48446)

Facebook Community Bot (24):
      Automated submodule update: FBGEMM (#45713)
      Automated submodule update: FBGEMM (#46079)
      Automated submodule update: FBGEMM (#46125)
      Automated submodule update: FBGEMM (#46151)
      Automated submodule update: FBGEMM (#46271)
      Automated submodule update: FBGEMM (#46395)
      Automated submodule update: FBGEMM (#46443)
      Automated submodule update: FBGEMM (#46578)
      Automated submodule update: FBGEMM (#47071)
      Automated submodule update: FBGEMM (#47190)
      Automated submodule update: FBGEMM (#47263)
      Automated submodule update: FBGEMM (#47605)
      Automated submodule update: FBGEMM (#47929)
      Automated submodule update: tensorpipe (#50267)
      Automated submodule update: tensorpipe (#50369)
      Automated submodule update: tensorpipe (#50441)
      Automated submodule update: tensorpipe (#50572)
      Automated submodule update: tensorpipe (#50684)
      Automated submodule update: tensorpipe (#50765)
      Automated submodule update: tensorpipe (#50807)
      Automated submodule update: tensorpipe (#50895)
      Automated submodule update: tensorpipe (#50946)
      Automated submodule update: tensorpipe (#51203)
      Automated submodule update: tensorpipe (#51469)

Fayçal Arbai (1):
      An implementation of torch.tile as requested in pytorch/pytorch#38349 (#47974)

Felix Abecassis (1):
      docker: add environment variable PYTORCH_VERSION (#50154)

Feynman Liang (1):
      HalfCauchy should ValueError if _validate_args (#50403)

Francisco Javier Ponce (1):
      Check CUDA kernel launches (fbcode/caffe2/aten/src/ATen/native/cuda/) (#47207)

Frank Seide (3):
      Fused8BitRowwiseQuantizedToFloat operator support (#48407)
      T66557700 Support default argument values of a method (#48863)
      T66557700 Support default argument values of a method (#48863)

Fritz Obermeyer (6):
      Add broadcast_shapes() function and use it in MultivariateNormal (#43935)
      Enable distribution validation if __debug__ (#48743)
      Validate args in HalfCauchy and HalfNormal (#50492)
      Independent constraint (#50547)
      Fix TransformedDistribution shaping logic (#50581)
      Fix Dirichlet.arg_constraints event_dim (#51369)

Gao, Xiang (12):
      Delete CUDAUnaryOps.cpp (#46280)
      Fix bug in toComplexWithDefault (#43841)
      Fix some flaky tests in test_torch.py and test_nn.py (#46941)
      Test TORCH_LIBRARY in CUDA extension (#47524)
      Bump up the CUDA OOM test memory size (#48029)
      Install magma on CUDA 11.1 (#48164)
      Ignore MSVC's pdb file (#47963)
      CUDA BF16 norm (#48806)
      Refactor cudnn convolution (#49109)
      Reland "Add test for empty tensors for batch matmuls" (#48797)
      Fix CUDA extension ninja build (#49344)
      Followup of kron PR (#51045)

Gaoxiang Liu (1):
      [DI] Allow explicit taskLauncher for torchscript interpreter (#46865)

Garret Catron (5):
      Add AcceleratedGraphModule and serialzie GraphModule to JSON (#47233)
      Add serialize GraphModule to JSON support (#47612)
      Added serialization of parameters for leaf modules (#47729)
      Move AcceleratedGraphModule out of graph_manipulation.
      Move AcceleratedGraphModule out of graph_manipulation. (#51220)

Gary Zheng (5):
      [caffe2] Add unittests for schema.Field init (#47512)
      [caffe2] Fix duplicate name bug in Net.AddExternalInput (#47530)
      [caffe2] Properly call super init in schema.py (#47542)
      [caffe2] Add __slots__ to all classes in schema.py (#47541)
      [caffe2] Fix ListWithEvicted _pprint_impl wrongly printing _evicted_values (#47881)

Gemfield (4):
      fix BUILD_MOBILE_BENCHMARK typo (#48515)
      Fix TORCH_LIBRARIES variables when do static build (#49458)
      Fix quantization doc issue (#50187)
      Fix the missing parameter in get_sha function (#51290)

Georgia Hong (1):
      [p2c2] Add support for Int8FCPackWeight in model transformation

Greg Tarr (1):
      Missing curly bracket. (#47855)

Gregory Chanan (15):
      Improve error checking of Storage._writeFile. (#46036)
      Stop running clang-tidy on torch/csrc/generic/*.cpp. (#46335)
      Split up BinaryMiscOpKernels.cu because it's slow to compile. (#47362)
      Clean up some imports in cuda kernel code. (#47400)
      Split IGamma cuda kernel into it's own file to speed up compilation times. (#47401)
      Move igamma cuda specific code to kernel file. (#47410)
      __noinline__ the top level igamma cuda kernel. (#47414)
      Fix type promotion for trace on CPU. (#47305)
      Revert "Fixed einsum compatibility/performance issues (#46398)" (#47821)
      Update gather documentation to allow index.shape[k] <= input.shape[k] rather than ==. (#41887)
      torch.xlogy: Use wrapped_scalar_tensor / gpu_with_scalars to speed up GPU kernel. (#49926)
      Stop using c10::scalar_to_tensor in float_power. (#50105)
      Move scalar_to_tensor_default_dtype out of ScalarOps.h because it's only useful for torch.where. (#50111)
      Stop using an unnecessary scalar_to_tensor(..., device) call. (#50114)
      Stop moving scalars to GPU for one computation in leaky_rrelu_backward. (#50115)

Guanheng Zhang (1):
      Enable the faster combined weight branch in MHA when query/key/value is same object with nan (#48126)

Guilherme Leobas (28):
      annotate torch.autograd.* modules (#45004)
      Annotate torch.nn.cpp (#46490)
      add type annotations to comm.py (#46736)
      annotate a few torch.nn.modules.* modules (#45772)
      Add type informations to torch/storage.py (#46876)
      Add type informations to torch.cuda (#47134)
      add type annotations to multiprocessing module (#47756)
      Add type annotations for a few torch.nn.modules (#46013)
      annotate torch._tensor_str (#48463)
      Add type annotations to torch.onnx.* modules (#45258)
      add type annotations to common_nn.py (#48190)
      Add type annotations to torch.onnx.* modules (#48782)
      Torch onnx (#48980)
      Annotate torch._tensor_str (#48584)
      Add type annotations to conv-relu (#47680)
      add type annotations to torch.nn.parallel._functions (#49687)
      Add type annotations to _tensorboard_vis.py and hipify_python.py (#49834)
      add type annotations to torch._utils (#49705)
      add type annotations to torch.nn.quantized.modules.conv (#49702)
      add type annotations to torch.nn.modules.fold (#49479)
      add type annotations to torch.nn.modules.module (#49045)
      add type annotations to torch.nn.modules.normalization (#49035)
      Add type annotations to torch.nn.modules.padding (#49494)
      add type annotations to torch.nn.modules.conv (#49564)
      add type annotations to torch.nn.modules.container (#48969)
      Add type annotations to torch.overrides (#48493)
      Add type annotations to torch.overrides (#50824)
      add type annotations to conv_fused/blas_compare/blas_compare_setup (#51235)

Haichuan Yang (1):
      mem-efficient learnable fake quantization (#49315)

Hameer Abbasi (11):
      Allow Tensor-likes in torch.autograd.gradcheck (#45732)
      Fix incorrect signatures in get_testing_overrides, and add test for incorrect signatures (#45983)
      Add torch.overrides checks for submodules. (#47285)
      Fix pickling for Tensor subclasses. (#47115)
      Fix output type of torch.max for Tensor subclasses. (#47110)
      Fix classmethod override argument passing. (#47114)
      Fix documentation to point to torch.overrides instead of _overrides. (#47842)
      Add documentation for torch.overrides submodule. (#48170)
      Fix indexing for overrides. (#49324)
      Clarify wording around overrides subclasses. (#51031)
      Fix pickling for Tensor subclasses (redo) (#47732)

Hao Lu (29):
      [StaticRuntime] Integrate Static Runtime into PyTorchPredictor (#45640)
      [StaticRuntime] Fix broken tests (#45813)
      [StaticRuntime] Implement StaticRuntime::benchmark (#45639)
      [caffe2] Do not run RemoveOpsByType on recurrent networks (#45986)
      [StaticRuntime] Replace hashtable based workspace with vector<IValue> (#45892)
      [caffe2] Bypass memonger for in-place ops (#46378)
      [StaticRuntime] Threading model (#46219)
      [caffe2] Allow memonger to optimize nets with inplace(enforced) ops (#46560)
      [caffe2] Fix inplace ops in onnx::SsaRewrite (#46134)
      [pt][static_runtime] Add option enable_out_variant (#46690)
      [PT] optional -> c10::optional (#47144)
      [pt][static_runtime] Memory model (#46896)
      [static runtime] Add out_ variant for aten::stack and aten::nan_to_num (#48150)
      [PT][StaticRuntime] Move prim op impl to ops.cpp (#48210)
      [caffe2] Register BlackBoxPredictor AllocationArenaPool as CPUCachingAllocator (#48161)
      [StaticRuntime] Add aten::narrow (#48991)
      [pt][quant] Remove contiguous calls in qembeddingbag (#48993)
      [caffe2] DeserializeToNDArray (#49135)
      [pt] Replace size(dim) with sizes()[dim] (#49255)
      [StaticRuntime] Permute_out (#49447)
      [StaticRuntime] Fusion pass for ClipRanges/GatherRanges/LengthsToOffsets (#49113)
      [StaticRuntime][ATen] Add out variant for narrow_copy (#49449)
      Revert D25554109: [StaticRuntime][ATen] Add out variant for narrow_copy
      [pt][ATen] Optimize bmm (#49506)
      [StaticRuntime][ATen] Add out variant for narrow_copy (#49502)
      [aten] Make aten::flatten call native::reshape (#50859)
      [atem] Fix type check bug in bmm_out_or_baddbmm_ (#51248)
      [StaticRuntime] Add out variant for reshape and flatten (#51249)
      [StaticRuntime] Fix bug in MemoryPlanner (#51342)

Hebo Yang (1):
      Minor Fix: Double ";" typo in transformerlayer.h (#50300)

Hector Yuen (8):
      flush the buffer when printing the IR (#45585)
      remove having no deadline for the test (#48226)
      reintroduce deadline removal (#48481)
      change global_fp16_constants for test_fc_nnpi_fp16 (#48663)
      quantize bias of the quantization parameters (#48749)
      sls + layernorm test (#43799)
      unit test for fc parallelization aot (#50056)
      replace silufp16 with cubic interpolation (#51645)

Heitor Schueroff (27):
      Add GradMode::enabled check to max_pool1d (#46767)
      Fix median bug on discontigous tensors (#46917)
      Fix max_pool2d with ceil_mode bug (#46558)
      Fix max_pool1d on discontiguous tensor (#47065)
      Updated docs/test for dot and vdot (#47242)
      Fix kthvalue error for scalar input (#47600)
      Fixed einsum compatibility/performance issues (#46398)
      Move kthvalue scalar test to separate method for XLA (#48042)
      Implemented torch.inner (#46716)
      Fixed einsum compatibility/performance issues (#46398) (#47860)
      Revert D24923679: Fixed einsum compatibility/performance issues (#46398)
      Revert D25428587: [pytorch][PR] add additional interpolation modes for torch.quantile
      Revert "Revert D24923679: Fixed einsum compatibility/performance issues (#46398)" (#49189)
      Revert D25763758: [pytorch][PR] introduce a flag to disable aten::cat in TE
      Revert D25717504: Clean up some type annotations in test/jit
      Workaround for MAGMA accessing illegal memory in batched cholesky (#50957)
      [doc] Fix linalg.cholesky doc consistency issues (#51459)
      [doc] Deprecate torch.cholesky in favor of torch.linalg.cholesky (#51460)
      [doc] Fix linalg.slogdet doc consistency issues (#51353)
      [doc] Add deprecation message to torch.slogdet in favor of torch.linalg.slogdet (#51354)
      [doc] Fix inconsistencies with torch.linalg.cond doc (#51641)
      [doc] Fix inconsistencies with torch.linalg.det docs (#51651)
      [doc] Fix inconsistencies with torch.linalg.eigh (#51658)
      [doc] Fix inconsistencies with torch.linalg.eigvalsh (#51659)
      doc] Fix inconsistencies with torch.linalg.matrix_rank doc (#51660)
      [doc] Fix inconsistencies with linalg.pinv docs and deprecate pinverse (#51671)
      [doc] Fix inconsistencies with torch.linalg.inv and deprecate torch.inverse (#51672)

Heitor Schueroff de Souza (1):
      Fixed median nan propagation and implemented nanmedian (#45847)

Helene (1):
      Removed typographical error from tech docs (#51286)

Himangshu (2):
      Creation of test framework for Sparse Operators (#48488)
      Add Sparse support for torch.sqrt (#50088)

Hong Xu (7):
      Add a warning message that torch.sign would not support complex numbers (#43280)
      Clarify, make consistent, and test the behavior of logspace when dtype is integral (#47647)
      Use quiet_NaN() in calc_digamma, not NAN (#50412)
      Replace all AT_ASSERTM under c10/ (except Exception.h) (#50843)
      Replace all AT_ASSERTM in RNN_miopen.cpp (#51072)
      Replace all AT_ASSERTM in ATen/native (#51147)
      Replace AT_ASSERTM in ATen/core (#51579)

Hongtao Yu (1):
      Fix caffee2 for LLVM trunk

Horace He (13):
      [FX] Fix handling of attributes (#47030)
      [FX] get the correct error message (#47108)
      [FX] Prototype Conv/BN fuser in FX (#47657)
      Fix mypy error (#48359)
      [fx]added prototype of to_folder (#47544)
      Finished fleshing out the tensor expr bindings in expr.cpp (#50643)
      Adds per-op microbenchmarks for NNC (#50845)
      Added cuda bindings for NNC (#51046)
      [FX] Added invert example (#51478)
      [FX] Added how to write transformations section (#51278)
      [FX] Added partial concrete values for symbolic tracing (#51609)
      [FX] Fix mypy error in FX for rewriter (#51740)
      Fixed slight bug in FX docs (#51779)

Howard Huang (12):
      Add Batch-Updating Parameter Server Example to CI Tests (#46510)
      Provide 'out' parameter for 'tensordot' (#47278)
      Add bool tensor support for where (#47454)
      Enable diag for bool Tensors (#47455)
      Small documentation changes for RRef and Dist Autograd (#48123)
      Add MessageTypeFlags enum for RPC Messages (#48143)
      Refactor request_callback_no_python.cpp processRpc function (#47816)
      Update torch.randint documentation to include missing note (#48787)
      Fix TCPStore type coercion (#49685)
      Fix elu backward operation for negative alpha (#49272)
      Add compare_set operation and test to TCPStore (#51593)
      Revert D26237328: Add compare_set operation and test to TCPStore

Huamin Li (1):
      Add deadline to fakelowp tests (#48823)

Huan Gui (1):
      [uhm][0/n] add cuda Mod Op (#46732)

Hugo van Kemenade (2):
      Fix version comparisons for Python 3.6, 3.10 and 4 (#32389)
      Remove redundant code for unsupported Python versions (#49486)

Hui Guo (4):
      #48733 added logging statements to LLVM codegen using JIT logging (#48758)
      [Issue #46210] added torch.fx.len() to provide support for len(); added a test case for torch.fx.len() (#49532)
      converted current debugging statements in LLVM codegen to jit-logging statements #48771 (#49040)
      added macros in jit logging to check whether loggings are enabled; replaced similar checks in LLVM codegen with such macros (#49121)

HyunJun (1):
      Fix possible padding length overflow in DistributedSampler (#45329)

Igor Gitman (1):
      Adding support for CuDNN-based LSTM with projections (#47725)

Ilia Cherniavskii (23):
      Profiler benchmark fix (#47713)
      Add Kineto submodule (separate PR) (#48332)
      Add USE_KINETO build option (#45888)
      Use libkineto in profiler (#46470)
      Extra sampling of record function events (#48289)
      Update Kineto revision (#49200)
      Extra sampling of record function events [resend] (#49114)
      Reduce kineto logging (#49216)
      Set USE_KINETO=1 (#49201)
      Revert D25480770: Set USE_KINETO=1
      New profiler API (#48280)
      Output stacks (support for SVG visualization) (#48438)
      Disable test on windows (#49636)
      Remove flops warnings from the default profiler use case (#49896)
      Update Kineto revision (#50855)
      Set USE_KINETO=1 (#49897)
      Trim profiler file paths (#51192)
      Rewrite "ProfilerStep#<num>" in profiler output (#51194)
      Add convenience import (#51195)
      Multi-GPU Kineto profiler test (#51391)
      [profiler] Default activities value (#51561)
      Fix attribution of some CUDA events to CPU events (#51632)
      [profiler] Support top-level memory events (#51421)

Ilqar Ramazanli (1):
      Update the error message for retain_grad (#47084)

Iurii Zdebskyi (22):
      Update test_multi_tensor_optimizers test (#45510)
      Add torch._foreach_maximum(TensorList, TensorList) & torch._foreach_minimum(TensorList, TensorList) APIs (#45692)
      Added scalar lists APIs for addcdiv and addcmul (#45932)
      Refactor scalar list APIs to use overloads (#45673)
      Push rocm to slow path (#46216)
      Revert "Push rocm to slow path (#46216)" (#46728)
      [WIP] Push rocm to slow path for foreach APIs (#46733)
      Refactored ForeachFunctors.cuh (#46660)
      Optimize arguments checks (#46661)
      Renamed a TensorListMetaData property. Cleaned up a test (#46662)
      [WIP] Adding bunch of unary foreach APIs (#47383)
      Revert D24737050: [WIP] Adding bunch of unary foreach APIs
      Adding bunch of unary foreach APIs (#47875)
      Added foreach_frac API (#47384)
      [WIP] Added foreach_trunc, foreahc_reciprocal, foreach_sigmoid APIs (#47385)
      Enabled Scalar lists (#48222)
      Add torch._foreach_zero_ API (#47286)
      Move device guard from MultiTensorApply.cuh (#46664)
      Revert D26018916: [pytorch][PR] Automated submodule update: tensorpipe
      Revert D26070147: [Gradient Compression] Refactor default_hooks.py and powerSGD_hook.py by creating a util function that make a vanilla allreduce future
      [WIP] Update foreach APIs to use scalar lists (#48223)
      Refactor ForeachUnaryOps.cu (#49248)

Ivan Kobzarev (11):
      [py][vulkan] Add is_vulkan to py api, add vulkan to device type parsing (#46511)
      [py][vulkan][reland] Add is_vulkan to py api, add vulkan to device type parsing (#46655)
      [vulkan_api][ops] Mm, Pool, Upsample (#47063)
      [vulkan] Apply new changes to vulkan api v1 (#47721)
      [vulkan] convolution old prepacking via cpu-shader (#48330)
      [vulkan][test] Not use non 1 dilation for conv2d (#48800)
      [vulkan] test_app for mobilenetV2 on vulkan api (#48924)
      [torchscript] Fix constant propagation schemas (#49605)
      [android] Fix YUV camera image to tensor (#50871)
      [android] fix yuv conversion - remove define (#50951)
      [android] turn on USE_VULKAN for android builds by default (#51291)

Ivan Murashko (2):
      [HTE @ clang-tidy] Enable clang-tidy configs inheretence for caffe2 project
      Config inheritance was added for pytorch project (#46584)

Ivan Yashchuk (35):
      Added support for complex torch.symeig (#45121)
      Added CUDA support for complex input for QR decomposition (#45032)
      Updated derivatives for complex mm, mv, ger, bmm, triangular_solve (#45737)
      Added support for complex torch.pinverse (#45819)
      Allow converting parameters of nn.Module to complex dtypes (#44788)
      Added torch.linalg.tensorsolve (#46142)
      Added Kronecker product of tensors (torch.kron) (#45358)
      Added CUDA support for complex input for torch.inverse (#45034)
      Added CUDA support for complex input for torch.inverse #2 (#47595)
      Added CUDA support for complex input for torch.triangular_solve (#46916)
      Added CUDA support for complex input for torch.solve (#47045)
      Added support for complex input for torch.lu_solve (#46862)
      Added linalg.cholesky (#46083)
      Enable complex tests that depend on batched matmul on CUDA (#47910)
      Added linalg.tensorinv (#45969)
      Added linalg.eigh, linalg.eigvalsh (#45526)
      Added linalg.matrix_rank (#48206)
      Added support for complex input for torch.lu_solve #2 (#48028)
      Added computing matrix condition numbers (linalg.cond) (#45832)
      Added CUDA support for complex input for torch.cholesky_solve (#47047)
      Added entry for torch.linalg.cond to linalg.rst (#48941)
      Updated derivative rules for complex QR decomposition (#48489)
      Updated derivative rules for complex svd and pinverse (#47761)
      Updated derivative rules for complex svd and pinverse (#47761)
      Added linalg.solve (#48456)
      Added linalg.inv (#48261)
      Added linalg.inv (#48261)
      Added linalg.pinv (#48399)
      Added linalg.slogdet (#49194)
      Port CPU torch.orgqr to ATen (#50502)
      Make torch.svd return V, not V.conj() for complex inputs (#51012)
      Port cholesky_inverse to ATen (#50269)
      Added OpInfo-based testing of triangular_solve (#50948)
      Fixed SVD ignoring "some/full_matrices" flag for empty inputs (#51109)
      Added missing VSX dispatch for cholesky_inverse (#51562)

Jack Montgomery (1):
      Add single element tuple output from to_backend/to_glow (#5029)

Jacob Szwejbka (3):
      [Pytorch Mobile] Expose _export_operator_list to python (#51312)
      [Pytorch] Expanded Bundled Inputs To Any Public Function (#51153)
      [Pytorch Mobile] Preserved all functions generated by bundled inputs (#51496)

Jagadish Krishnamoorthy (6):
      distributed_test: Map rank to GPU accordingly (#47898)
      [ROCm] Enable skipped distributed global tests (#48023)
      [distributed] Provide parameter to pass GPU ID in barrier function (#49069)
      [distributed_test_c10d]Enable disabled ROCm tests. (#50629)
      [distributed_test]Enable disabled ROCm tests. (#50421)
      [ROCm] disable tests for ROCm 4.0.1 (#51510)

James Donald (4):
      [caffe2][torch] Clean up unused variable 'device' (#48600)
      [caffe2][a10] Remove unreferenced local variable e (#48601)
      [caffe2][autograd] Avoid extensive -Wunused-variable warnings on _any_requires_grad (#49167)
      [caffe2][a10] Move down pragma pop to properly suppress warning 4522 (#49233)

James Reed (62):
      [FX] Shape propagation example (#45589)
      Revert "Revert D24024606: [FX] Shape propagation example" (#45637)
      [FX] Make output a non-special Node (#45599)
      [FX] Make Tracer.trace() just return a Graph (#45704)
      [FX][WIP] Mutable Graph APIs (#45227)
      [FX] Track use nodes in Node (#45775)
      [JIT] Make objects throw Python AttributeError on nonexistant attr access (#45911)
      [FX] Preserve type annotations on generated code in Graph (#45880)
      [FX] Make `graph_copy` examine existing values in val_map (#46104)
      [FX] Allow tracing free functions (#46268)
      [FX] Fix recursion depth issue on Graph deepcopy (#46669)
      [FX] Make wrapped functions traceable (#46692)
      [FX] Fix handling of `inf` and `nan` literals (#46894)
      [FX] Fix corner case in name sanitization (#46958)
      [FX] Kill functional transforms name (#47004)
      [FX] Put inf and nan in globals instead of with an import string (#47035)
      [WIP] Move torch.fx into its own target (#46658)
      Fix lint (#47095)
      [FX] Speed up non-parameter tensor lookup (#47325)
      [FX] Add a bunch of docstrings (#47719)
      [FX] Fix uses not updating when erasing a node (#47720)
      [FX] Fix __tensor_constants not scriptable (#47817)
      [FX] Fix submodule naming for subgraph split (#47869)
      [FX] Refactor unique name handling (#48205)
      [FX] Add Node.all_input_nodes (#48270)
      [FX] Delete values after their last use (#48631)
      Add scary comment in cpp_custom_type_hack.h (#48737)
      Revert D23898398: [Mask R-CNN]Add Int8 AABB Generate proposals Op
      [FX][1/2] Make docstrings pretty when rendered (#48738)
      [FX][2/2] Make docstrings pretty when rendered (#48871)
      [FX] Fix create_arg for NamedTuple (#48986)
      [FX] Move none assignments to same line (#49209)
      [JIT] Fix toIValue handling of AttributeError when casting ClassType (#49188)
      [WIP][FX] Add FX page to docs (#48814)
      [FX] Rename Node._uses and refactor Node.all_input_nodes (#49415)
      [FX] Enforce args is tuple and kwargs is dict (#49526)
      [FX] Emit named tuple construction node when NamedTuple appears as an arg (#49553)
      [FX] Fix python code having spurious newlines from placeholders (#49720)
      [FX] Try to make it more clear that _update_args_kwargs should not be called (#49745)
      [FX] Remove extraneous newlines at end of code (#50117)
      [FX} Implement wrap() by patching module globals during symtrace (#50182)
      [FX] Make graph target printouts more user-friendly (#50296)
      [FX] Make FX stability warning reference beta (#50394)
      [FX] Update docstring code/graph printout (#50396)
      [FX] Add wrap() docstring to docs and add decorator example (#50555)
      [WIP][FX] new sections in docs (#50562)
      [FX] Make len traceable and scriptable with wrap (#50184)
      [FX] Fix tracing a free function with embedded constant (#50639)
      [FX] Fix NoneType annotation in generated code (#50777)
      [FX][docs] Add limitations of symbolic tracing (#50638)
      [FX] Update overview docstring (#50896)
      [FX] Minor docs changes (#50966)
      [WIP][FX] Add Interpreter and Transformer (#50420)
      [FX] Support ellipsis as arg (#51502)
      [FX] Move some heavily used passes out of experimental (#51392)
      [FX] Add note about more use cases of FX (#51576)
      [FX] Move examples to pytorch/examples (#51686)
      [FX] Edits after comprehensive pass over docs (#51705)
      Revert "Revert D26246231: [FX] Edits after comprehensive pass over docs" (#51728)
      [FX][docs] Indent forward (#51802)
      [FX] Hide experimental folder (#51987)
      [FX][1.8] Cherrypick three FX fixes to 1.8 (#52021)

Jan (2):
      Fix MultiheadAttention docstring latex (#50430)
      Fix small typo (#51542)

Jane (Yuan) Xu (6):
      adding sharding option to run_test.py (#45583)
      reorganizing tests so that test1 and test2 are balanced in timing (#45778)
      Removing caffe2 and third_party from our code coverage (#47310)
      Add anoth…
facebook-github-bot pushed a commit that referenced this issue Mar 15, 2021
Summary:
Close #51108
Related #38349

This PR implements the `cpu_kernel_multiple_outputs` to support returning multiple values in a CPU kernel.
```c++
auto iter = at::TensorIteratorConfig()
  .add_output(out1)
  .add_output(out2)
  .add_input(in1)
  .add_input(in2)
  .build();

at::native::cpu_kernel_multiple_outputs(iter,
  [=](float a, float b) -> std::tuple<float, float> {
    float add = a + b;
    float mul = a * b;
    return std::tuple<float, float>(add, mul);
  }
);
```

The `out1` will equal to `torch.add(in1, in2)`, while the result of `out2` will be `torch.mul(in1, in2)`.
It helps developers implement new torch functions that return two tensors more conveniently, such as NumPy-like functions [divmod](https://numpy.org/doc/1.18/reference/generated/numpy.divmod.html?highlight=divmod#numpy.divmod) and [frexp](https://numpy.org/doc/stable/reference/generated/numpy.frexp.html#numpy.frexp).

This PR adds `torch.frexp` function to exercise the new functionality provided by `cpu_kernel_multiple_outputs`.

Pull Request resolved: #51097

Reviewed By: albanD

Differential Revision: D26982619

Pulled By: heitorschueroff

fbshipit-source-id: cb61c7f2c79873ab72ab5a61cbdb9203531ad469
facebook-github-bot pushed a commit that referenced this issue Mar 28, 2021
Summary:
Reference: #38349

Wrapper around the existing `torch.gather` with broadcasting logic.

TODO:
* [x] Add Doc entry (see if phrasing can be improved)
* [x] Add OpInfo
* [x] Add test against numpy
* [x] Handle broadcasting behaviour and when dim is not given.

Pull Request resolved: #52833

Reviewed By: malfet

Differential Revision: D27319038

Pulled By: mruberry

fbshipit-source-id: 00f307825f92c679d96e264997aa5509172f5ed1
@rgommers
Copy link
Collaborator

One thing I never noticed before, and that I don't find discussed in issues, is that pytorch doesn't have nan, inf, pi and e constants. At least the first three are used a lot. The test suite is full of float('nan') and float('inf') which is ugly. Doing import math; nan = math.nan would be the alternative (that'd be reasonable). Anyone know if not implementing these constants was a conscious decision?

@mruberry
Copy link
Collaborator Author

mruberry commented Apr 5, 2021

One thing I never noticed before, and that I don't find discussed in issues, is that pytorch doesn't have nan, inf, pi and e constants. At least the first three are used a lot. The test suite is full of float('nan') and float('inf') which is ugly. Doing import math; nan = math.nan would be the alternative (that'd be reasonable). Anyone know if not implementing these constants was a conscious decision?

Per offline discussion, no it wasn't an especially conscious decision. We can probably just add these.

@vadimkantorov
Copy link
Contributor

torch functions for torch.add / torch.delete

@Kiyosora
Copy link
Contributor

Kiyosora commented Dec 10, 2021 via email

@mruberry
Copy link
Collaborator Author

Closing this issue. We've added the great majority of missing NumPy functions, and future requests can be handled as individual issues.

@vadimkantorov We have torch.add(), but would you like to file an issue requesting torch.delete()? delete is a tough name to reuse, however.

@vadimkantorov
Copy link
Contributor

Oh, I think mistyped, I meant append / delete. Yeah, I'll create a separate issues for discussing these

@danijar
Copy link

danijar commented Jul 3, 2022

@mruberry I just found this issue and was quite surprised that it's closed. It seems like even basic Numpy functions like np.concatenate() are not in the PyTorch namespace yet? Are there any plans to support a Numpy-like API? That's the default in JAX and TF2 has tf.experimental.numpy with fairly complete coverage. I'm happy to open a separate issue if this one was just to provide Numpy-like functionality rather than a Numpy-like API.

@mruberry
Copy link
Collaborator Author

mruberry commented Jul 8, 2022

@mruberry I just found this issue and was quite surprised that it's closed. It seems like even basic Numpy functions like np.concatenate() are not in the PyTorch namespace yet? Are there any plans to support a Numpy-like API? That's the default in JAX and TF2 has tf.experimental.numpy with fairly complete coverage. I'm happy to open a separate issue if this one was just to provide Numpy-like functionality rather than a Numpy-like API.

Hey @danijar! It's true there are some NumPy operations and aliases still not in PyTorch, but we thought it'd be easier to track them directly than using this rollup issue, that's all.

In PyTorch we have torch.cat and torch.concat, and we'd just want to add an alias for torch.concatenate, too. If you'd like to file an issue for that alias in particular that'd be great; it should be straightforward to add.

@danijar
Copy link

danijar commented Jul 9, 2022

That's great to hear! I'm happy to file an issue. It would just be useful to see an overview of what's available and what's still missing, it could even be a spreadsheet. That way, I can see what I need to special case when converting a given code base from JAX to PT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: numpy Related to numpy support, and also numpy compatibility of our operators OSS contribution wanted PR from open source contributors welcome to solve this issue. tracker A tracking issue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests