Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Make tensordict not incompatible with torch.compile #629

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 18, 2024

Description

The goal of this PR is to make tensordict not incompatible with torch.compile, ie. remove breaking points by letting know torch.compile that these functions should be ignored.

This way, we will be able to use tensordict with torch.compile and speed things up at a later stage.

We also deprecate the old functional API by default. It can be reinstated via _set_auto_make_functional(True) decorator / cm.

torch.compile crashes somewhere in the @dispatch wrapper but since this isn't intended for performance I think having the option to disable it isn't a bad idea so I created a cm for that too and make it private for now.

There are still a bunch of errors to fix:

  • with cudagraphs, even deactivating compile around the get oparations mixes fake and real tensors and results in

    File "/home/vmoens/.conda/envs/torchrl/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1586, in validate
        raise AssertionError("Mixing fake modes NYI")
    torch._dynamo.exc.BackendCompilerFailed: backend='cudagraphs' raised:
    AssertionError: Mixing fake modes NYI
    

    I guess that registering get and set in [RFC] Tensordict integration pytorch#112441 will solve a great deal of bugs in one shot (other key-based tensordict operations almost always rely on these two methods).

  • with inductor, some modules (eg, building distributions in tensordict.nn.ProbabilisticTensorDictModule) results in some cryptic

      File "/home/vmoens/.conda/envs/torchrl/lib/python3.10/site-packages/torch/_dynamo/guards.py", line 235, in get
      return eval(name, self.scope, CLOSURE_VARS)
      File "<string>", line 1, in <module>
    torch._dynamo.exc.InternalTorchDynamoError: super(type, obj): obj must be an instance or subtype of type
    

cc @ezyang

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 18, 2024
@vmoens vmoens added enhancement New feature or request Refactor Refactoring code - not a new feature labels Jan 18, 2024
Copy link

github-actions bot commented Jan 18, 2024

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 126. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}78$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.9624ms 27.7038μs 36.0962 KOps/s 57.0677 KOps/s $\textbf{\color{#d91a1a}-36.75\%}$
test_plain_set_stack_nested 4.8919ms 0.1887ms 5.2989 KOps/s 6.7445 KOps/s $\textbf{\color{#d91a1a}-21.43\%}$
test_plain_set_nested_inplace 7.2839ms 35.0713μs 28.5133 KOps/s 49.4704 KOps/s $\textbf{\color{#d91a1a}-42.36\%}$
test_plain_set_stack_nested_inplace 0.4255ms 0.2028ms 4.9298 KOps/s 5.3305 KOps/s $\textbf{\color{#d91a1a}-7.52\%}$
test_items 39.7840μs 2.6627μs 375.5537 KOps/s 391.3444 KOps/s $\color{#d91a1a}-4.03\%$
test_items_nested 0.4524ms 0.2713ms 3.6863 KOps/s 3.7077 KOps/s $\color{#d91a1a}-0.58\%$
test_items_nested_locked 0.8873ms 0.2742ms 3.6464 KOps/s 3.6917 KOps/s $\color{#d91a1a}-1.23\%$
test_items_nested_leaf 0.3207ms 0.1671ms 5.9859 KOps/s 6.0085 KOps/s $\color{#d91a1a}-0.37\%$
test_items_stack_nested 2.1264ms 1.3592ms 735.7403 Ops/s 743.7110 Ops/s $\color{#d91a1a}-1.07\%$
test_items_stack_nested_leaf 1.5982ms 1.2197ms 819.8906 Ops/s 832.9116 Ops/s $\color{#d91a1a}-1.56\%$
test_items_stack_nested_locked 1.5737ms 0.8880ms 1.1262 KOps/s 1.1282 KOps/s $\color{#d91a1a}-0.18\%$
test_keys 45.4350μs 3.9877μs 250.7681 KOps/s 230.5345 KOps/s $\textbf{\color{#35bf28}+8.78\%}$
test_keys_nested 1.6000ms 0.1518ms 6.5883 KOps/s 6.6549 KOps/s $\color{#d91a1a}-1.00\%$
test_keys_nested_locked 0.3124ms 0.1561ms 6.4080 KOps/s 6.5560 KOps/s $\color{#d91a1a}-2.26\%$
test_keys_nested_leaf 0.2964ms 0.1332ms 7.5066 KOps/s 7.6654 KOps/s $\color{#d91a1a}-2.07\%$
test_keys_stack_nested 2.2205ms 1.7202ms 581.3283 Ops/s 778.3391 Ops/s $\textbf{\color{#d91a1a}-25.31\%}$
test_keys_stack_nested_leaf 1.9988ms 1.7096ms 584.9332 Ops/s 781.3187 Ops/s $\textbf{\color{#d91a1a}-25.14\%}$
test_keys_stack_nested_locked 1.5874ms 1.1823ms 845.8211 Ops/s 1.1666 KOps/s $\textbf{\color{#d91a1a}-27.50\%}$
test_values 11.1107μs 1.1553μs 865.5752 KOps/s 848.0163 KOps/s $\color{#35bf28}+2.07\%$
test_values_nested 0.1005ms 52.3577μs 19.0994 KOps/s 19.1742 KOps/s $\color{#d91a1a}-0.39\%$
test_values_nested_locked 0.1160ms 52.4049μs 19.0822 KOps/s 19.1326 KOps/s $\color{#d91a1a}-0.26\%$
test_values_nested_leaf 0.1312ms 46.9922μs 21.2801 KOps/s 21.2326 KOps/s $\color{#35bf28}+0.22\%$
test_values_stack_nested 1.7360ms 1.0465ms 955.5801 Ops/s 962.2342 Ops/s $\color{#d91a1a}-0.69\%$
test_values_stack_nested_leaf 1.2081ms 1.0343ms 966.8352 Ops/s 851.6466 Ops/s $\textbf{\color{#35bf28}+13.53\%}$
test_values_stack_nested_locked 1.0860ms 0.6152ms 1.6255 KOps/s 1.6216 KOps/s $\color{#35bf28}+0.24\%$
test_membership 41.4580μs 1.3472μs 742.2952 KOps/s 740.5575 KOps/s $\color{#35bf28}+0.23\%$
test_membership_nested 37.2100μs 3.5322μs 283.1068 KOps/s 289.6690 KOps/s $\color{#d91a1a}-2.27\%$
test_membership_nested_leaf 50.8450μs 3.5707μs 280.0580 KOps/s 286.1696 KOps/s $\color{#d91a1a}-2.14\%$
test_membership_stacked_nested 37.5500μs 12.0509μs 82.9811 KOps/s 82.7993 KOps/s $\color{#35bf28}+0.22\%$
test_membership_stacked_nested_leaf 48.9720μs 11.9517μs 83.6703 KOps/s 83.4408 KOps/s $\color{#35bf28}+0.28\%$
test_membership_nested_last 49.1220μs 12.1961μs 81.9932 KOps/s 149.3905 KOps/s $\textbf{\color{#d91a1a}-45.11\%}$
test_membership_nested_leaf_last 69.5100μs 12.1688μs 82.1771 KOps/s 148.5243 KOps/s $\textbf{\color{#d91a1a}-44.67\%}$
test_membership_stacked_nested_last 0.4390ms 0.3001ms 3.3319 KOps/s 5.5994 KOps/s $\textbf{\color{#d91a1a}-40.49\%}$
test_membership_stacked_nested_leaf_last 61.8060μs 19.5629μs 51.1172 KOps/s 69.6782 KOps/s $\textbf{\color{#d91a1a}-26.64\%}$
test_nested_getleaf 57.7890μs 15.8113μs 63.2461 KOps/s 85.4221 KOps/s $\textbf{\color{#d91a1a}-25.96\%}$
test_nested_get 56.6760μs 15.2527μs 65.5621 KOps/s 93.1050 KOps/s $\textbf{\color{#d91a1a}-29.58\%}$
test_stacked_getleaf 0.8113ms 0.4260ms 2.3473 KOps/s 2.4712 KOps/s $\textbf{\color{#d91a1a}-5.01\%}$
test_stacked_get 1.5193ms 0.4024ms 2.4848 KOps/s 2.7141 KOps/s $\textbf{\color{#d91a1a}-8.45\%}$
test_nested_getitemleaf 47.8800μs 17.4118μs 57.4324 KOps/s 81.6028 KOps/s $\textbf{\color{#d91a1a}-29.62\%}$
test_nested_getitem 69.6400μs 16.8466μs 59.3593 KOps/s 85.5372 KOps/s $\textbf{\color{#d91a1a}-30.60\%}$
test_stacked_getitemleaf 0.7077ms 0.4120ms 2.4269 KOps/s 2.4606 KOps/s $\color{#d91a1a}-1.37\%$
test_stacked_getitem 0.5207ms 0.3769ms 2.6531 KOps/s 2.6654 KOps/s $\color{#d91a1a}-0.46\%$
test_lock_nested 2.4800ms 0.7109ms 1.4067 KOps/s 3.0126 KOps/s $\textbf{\color{#d91a1a}-53.30\%}$
test_lock_stack_nested 94.3379ms 9.5755ms 104.4329 Ops/s 162.7397 Ops/s $\textbf{\color{#d91a1a}-35.83\%}$
test_unlock_nested 76.9607ms 0.7832ms 1.2769 KOps/s 3.0029 KOps/s $\textbf{\color{#d91a1a}-57.48\%}$
test_unlock_stack_nested 96.5545ms 9.7599ms 102.4605 Ops/s 160.2095 Ops/s $\textbf{\color{#d91a1a}-36.05\%}$
test_flatten_speed 0.8307ms 0.4852ms 2.0609 KOps/s 2.7229 KOps/s $\textbf{\color{#d91a1a}-24.31\%}$
test_unflatten_speed 5.7637ms 0.9402ms 1.0636 KOps/s 2.1539 KOps/s $\textbf{\color{#d91a1a}-50.62\%}$
test_common_ops 7.8030ms 1.6750ms 597.0204 Ops/s 1.4523 KOps/s $\textbf{\color{#d91a1a}-58.89\%}$
test_creation 3.7145ms 7.4896μs 133.5192 KOps/s 538.3542 KOps/s $\textbf{\color{#d91a1a}-75.20\%}$
test_creation_empty 3.0530ms 29.7440μs 33.6202 KOps/s 90.3874 KOps/s $\textbf{\color{#d91a1a}-62.80\%}$
test_creation_nested_1 88.5660μs 37.8311μs 26.4333 KOps/s 73.2492 KOps/s $\textbf{\color{#d91a1a}-63.91\%}$
test_creation_nested_2 87.5440μs 46.9494μs 21.2995 KOps/s 58.5265 KOps/s $\textbf{\color{#d91a1a}-63.61\%}$
test_clone 97.1720μs 25.8067μs 38.7497 KOps/s 79.0003 KOps/s $\textbf{\color{#d91a1a}-50.95\%}$
test_getitem[int] 69.4900μs 28.1394μs 35.5373 KOps/s 85.9274 KOps/s $\textbf{\color{#d91a1a}-58.64\%}$
test_getitem[slice_int] 0.1089ms 44.8612μs 22.2910 KOps/s 32.6422 KOps/s $\textbf{\color{#d91a1a}-31.71\%}$
test_getitem[range] 0.2658ms 70.3316μs 14.2184 KOps/s 18.6310 KOps/s $\textbf{\color{#d91a1a}-23.68\%}$
test_getitem[tuple] 77.7360μs 38.0278μs 26.2965 KOps/s 51.0508 KOps/s $\textbf{\color{#d91a1a}-48.49\%}$
test_getitem[list] 0.1849ms 61.6060μs 16.2322 KOps/s 27.9649 KOps/s $\textbf{\color{#d91a1a}-41.96\%}$
test_setitem_dim[int] 0.1118ms 36.0163μs 27.7652 KOps/s 31.2554 KOps/s $\textbf{\color{#d91a1a}-11.17\%}$
test_setitem_dim[slice_int] 0.1005ms 62.4641μs 16.0092 KOps/s 17.0424 KOps/s $\textbf{\color{#d91a1a}-6.06\%}$
test_setitem_dim[range] 0.1391ms 83.2274μs 12.0153 KOps/s 12.6266 KOps/s $\color{#d91a1a}-4.84\%$
test_setitem_dim[tuple] 82.5140μs 51.4259μs 19.4454 KOps/s 21.0705 KOps/s $\textbf{\color{#d91a1a}-7.71\%}$
test_setitem 0.1110ms 38.4544μs 26.0048 KOps/s 51.2988 KOps/s $\textbf{\color{#d91a1a}-49.31\%}$
test_set 0.1018ms 42.5675μs 23.4921 KOps/s 52.7104 KOps/s $\textbf{\color{#d91a1a}-55.43\%}$
test_set_shared 4.4933ms 0.1749ms 5.7184 KOps/s 7.0290 KOps/s $\textbf{\color{#d91a1a}-18.65\%}$
test_update 4.1515ms 37.8643μs 26.4101 KOps/s 33.4717 KOps/s $\textbf{\color{#d91a1a}-21.10\%}$
test_update_nested 0.1173ms 51.3947μs 19.4573 KOps/s 32.6822 KOps/s $\textbf{\color{#d91a1a}-40.47\%}$
test_set_nested 0.1100ms 39.9198μs 25.0502 KOps/s 47.7715 KOps/s $\textbf{\color{#d91a1a}-47.56\%}$
test_set_nested_new 0.1222ms 56.2474μs 17.7786 KOps/s 40.0586 KOps/s $\textbf{\color{#d91a1a}-55.62\%}$
test_select 0.1975ms 87.0000μs 11.4943 KOps/s 26.9100 KOps/s $\textbf{\color{#d91a1a}-57.29\%}$
test_select_nested 0.2986ms 0.1594ms 6.2732 KOps/s 17.4199 KOps/s $\textbf{\color{#d91a1a}-63.99\%}$
test_exclude_nested 0.3584ms 0.2231ms 4.4824 KOps/s 8.5904 KOps/s $\textbf{\color{#d91a1a}-47.82\%}$
test_empty[True] 0.6211ms 0.5152ms 1.9411 KOps/s 2.4676 KOps/s $\textbf{\color{#d91a1a}-21.34\%}$
test_empty[False] 58.6830μs 5.9061μs 169.3155 KOps/s 966.9026 KOps/s $\textbf{\color{#d91a1a}-82.49\%}$
test_unbind_speed 0.6811ms 0.5834ms 1.7140 KOps/s 4.1296 KOps/s $\textbf{\color{#d91a1a}-58.49\%}$
test_unbind_speed_stack0 86.7416ms 6.6785ms 149.7347 Ops/s 327.7127 Ops/s $\textbf{\color{#d91a1a}-54.31\%}$
test_unbind_speed_stack1 36.7190μs 1.9928μs 501.7968 KOps/s 502.3872 KOps/s $\color{#d91a1a}-0.12\%$
test_split 85.5221ms 2.7732ms 360.5945 Ops/s 599.1784 Ops/s $\textbf{\color{#d91a1a}-39.82\%}$
test_chunk 2.7161ms 2.4786ms 403.4611 Ops/s 645.0124 Ops/s $\textbf{\color{#d91a1a}-37.45\%}$
test_creation[device0] 3.7239ms 0.1077ms 9.2814 KOps/s 9.8038 KOps/s $\textbf{\color{#d91a1a}-5.33\%}$
test_creation_from_tensor 0.2622ms 82.2736μs 12.1546 KOps/s 11.8515 KOps/s $\color{#35bf28}+2.56\%$
test_add_one[memmap_tensor0] 0.1553ms 5.6606μs 176.6589 KOps/s 116.9611 KOps/s $\textbf{\color{#35bf28}+51.04\%}$
test_contiguous[memmap_tensor0] 21.8910μs 0.6355μs 1.5735 MOps/s 1.3583 MOps/s $\textbf{\color{#35bf28}+15.84\%}$
test_stack[memmap_tensor0] 63.7100μs 3.7438μs 267.1051 KOps/s 244.7345 KOps/s $\textbf{\color{#35bf28}+9.14\%}$
test_memmaptd_index 1.0136ms 0.2624ms 3.8110 KOps/s 3.9751 KOps/s $\color{#d91a1a}-4.13\%$
test_memmaptd_index_astensor 0.7541ms 0.3375ms 2.9626 KOps/s 2.9627 KOps/s $-0.00\%$
test_memmaptd_index_op 1.2319ms 0.6792ms 1.4723 KOps/s 1.3749 KOps/s $\textbf{\color{#35bf28}+7.08\%}$
test_serialize_model 0.2308s 0.1567s 6.3835 Ops/s 6.3008 Ops/s $\color{#35bf28}+1.31\%$
test_serialize_model_pickle 0.4705s 0.3722s 2.6864 Ops/s 2.5960 Ops/s $\color{#35bf28}+3.48\%$
test_serialize_weights 0.2232s 0.1547s 6.4648 Ops/s 8.0144 Ops/s $\textbf{\color{#d91a1a}-19.33\%}$
test_serialize_weights_returnearly 0.2005s 0.1587s 6.3008 Ops/s 7.2512 Ops/s $\textbf{\color{#d91a1a}-13.11\%}$
test_serialize_weights_pickle 0.5443s 0.4275s 2.3394 Ops/s 1.1985 Ops/s $\textbf{\color{#35bf28}+95.19\%}$
test_serialize_weights_filesystem 0.2001s 0.1492s 6.7039 Ops/s 9.3415 Ops/s $\textbf{\color{#d91a1a}-28.24\%}$
test_serialize_model_filesystem 0.1603s 0.1402s 7.1351 Ops/s 7.7424 Ops/s $\textbf{\color{#d91a1a}-7.84\%}$
test_reshape_pytree 0.1456ms 20.4039μs 49.0102 KOps/s 45.9310 KOps/s $\textbf{\color{#35bf28}+6.70\%}$
test_reshape_td 0.1212ms 55.4909μs 18.0210 KOps/s 32.8221 KOps/s $\textbf{\color{#d91a1a}-45.10\%}$
test_view_pytree 64.4930μs 20.8608μs 47.9367 KOps/s 46.7769 KOps/s $\color{#35bf28}+2.48\%$
test_view_td 81.6270ms 11.4419μs 87.3984 KOps/s 86.3245 KOps/s $\color{#35bf28}+1.24\%$
test_unbind_pytree 3.2863ms 24.5896μs 40.6676 KOps/s 41.1459 KOps/s $\color{#d91a1a}-1.16\%$
test_unbind_td 5.6980ms 0.1076ms 9.2940 KOps/s 28.5040 KOps/s $\textbf{\color{#d91a1a}-67.39\%}$
test_split_pytree 0.1713ms 23.9277μs 41.7926 KOps/s 42.3314 KOps/s $\color{#d91a1a}-1.27\%$
test_split_td 0.3543ms 79.3469μs 12.6029 KOps/s 25.6395 KOps/s $\textbf{\color{#d91a1a}-50.85\%}$
test_add_pytree 97.3920μs 29.5815μs 33.8050 KOps/s 34.1063 KOps/s $\color{#d91a1a}-0.88\%$
test_add_td 0.2568ms 91.4332μs 10.9369 KOps/s 18.6799 KOps/s $\textbf{\color{#d91a1a}-41.45\%}$
test_distributed 0.2583ms 0.1001ms 9.9931 KOps/s 9.9880 KOps/s $\color{#35bf28}+0.05\%$
test_tdmodule 0.1860ms 37.8707μs 26.4056 KOps/s 43.0295 KOps/s $\textbf{\color{#d91a1a}-38.63\%}$
test_tdmodule_dispatch 0.2354ms 83.1036μs 12.0332 KOps/s 21.6407 KOps/s $\textbf{\color{#d91a1a}-44.40\%}$
test_tdseq 0.1258ms 40.5654μs 24.6516 KOps/s 38.8925 KOps/s $\textbf{\color{#d91a1a}-36.62\%}$
test_tdseq_dispatch 0.4679ms 88.4263μs 11.3088 KOps/s 20.6500 KOps/s $\textbf{\color{#d91a1a}-45.24\%}$
test_instantiation_functorch 2.1114ms 1.3206ms 757.2514 Ops/s 765.5532 Ops/s $\color{#d91a1a}-1.08\%$
test_instantiation_td 1.5931ms 1.0745ms 930.6414 Ops/s 991.8016 Ops/s $\textbf{\color{#d91a1a}-6.17\%}$
test_exec_functorch 0.2405ms 0.1631ms 6.1323 KOps/s 6.4271 KOps/s $\color{#d91a1a}-4.59\%$
test_exec_functional_call 0.3406ms 0.1554ms 6.4364 KOps/s 6.8642 KOps/s $\textbf{\color{#d91a1a}-6.23\%}$
test_exec_td 0.3133ms 0.1511ms 6.6165 KOps/s 7.0415 KOps/s $\textbf{\color{#d91a1a}-6.04\%}$
test_exec_td_decorator 1.1510ms 0.3290ms 3.0397 KOps/s 5.7056 KOps/s $\textbf{\color{#d91a1a}-46.72\%}$
test_vmap_mlp_speed[True-True] 1.4440ms 0.9690ms 1.0320 KOps/s 1.1258 KOps/s $\textbf{\color{#d91a1a}-8.33\%}$
test_vmap_mlp_speed[True-False] 0.8025ms 0.5364ms 1.8643 KOps/s 2.1271 KOps/s $\textbf{\color{#d91a1a}-12.36\%}$
test_vmap_mlp_speed[False-True] 0.9616ms 0.7788ms 1.2840 KOps/s 1.2894 KOps/s $\color{#d91a1a}-0.42\%$
test_vmap_mlp_speed[False-False] 2.4685ms 0.4057ms 2.4646 KOps/s 2.5928 KOps/s $\color{#d91a1a}-4.95\%$
test_vmap_mlp_speed_decorator[True-True] 2.6295ms 1.9648ms 508.9462 Ops/s 640.4225 Ops/s $\textbf{\color{#d91a1a}-20.53\%}$
test_vmap_mlp_speed_decorator[True-False] 0.8805ms 0.6907ms 1.4478 KOps/s 1.9432 KOps/s $\textbf{\color{#d91a1a}-25.50\%}$
test_vmap_mlp_speed_decorator[False-True] 2.1900ms 1.5798ms 632.9859 Ops/s 770.5046 Ops/s $\textbf{\color{#d91a1a}-17.85\%}$
test_vmap_mlp_speed_decorator[False-False] 0.6166ms 0.5061ms 1.9759 KOps/s 2.5368 KOps/s $\textbf{\color{#d91a1a}-22.11\%}$
test_to_module_speed[True] 4.2223ms 2.8899ms 346.0289 Ops/s 738.3985 Ops/s $\textbf{\color{#d91a1a}-53.14\%}$
test_to_module_speed[False] 2.9729ms 2.8410ms 351.9886 Ops/s 902.2590 Ops/s $\textbf{\color{#d91a1a}-60.99\%}$

Copy link

github-actions bot commented Jan 18, 2024

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 134. Improved: $\large\color{#35bf28}2$. Worsened: $\large\color{#d91a1a}64$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1263ms 14.7147μs 67.9592 KOps/s 78.7848 KOps/s $\textbf{\color{#d91a1a}-13.74\%}$
test_plain_set_stack_nested 0.1435ms 0.1243ms 8.0473 KOps/s 8.4014 KOps/s $\color{#d91a1a}-4.21\%$
test_plain_set_nested_inplace 41.4320μs 16.1901μs 61.7661 KOps/s 71.9761 KOps/s $\textbf{\color{#d91a1a}-14.19\%}$
test_plain_set_stack_nested_inplace 0.2005ms 0.1525ms 6.5563 KOps/s 6.7538 KOps/s $\color{#d91a1a}-2.92\%$
test_items 20.3310μs 4.7343μs 211.2237 KOps/s 212.1939 KOps/s $\color{#d91a1a}-0.46\%$
test_items_nested 0.3919ms 0.3366ms 2.9706 KOps/s 2.9250 KOps/s $\color{#35bf28}+1.56\%$
test_items_nested_locked 0.4186ms 0.3389ms 2.9511 KOps/s 2.8995 KOps/s $\color{#35bf28}+1.78\%$
test_items_nested_leaf 0.2414ms 0.1996ms 5.0104 KOps/s 4.9575 KOps/s $\color{#35bf28}+1.07\%$
test_items_stack_nested 1.3775ms 1.3274ms 753.3418 Ops/s 751.5390 Ops/s $\color{#35bf28}+0.24\%$
test_items_stack_nested_leaf 1.2056ms 1.1658ms 857.8055 Ops/s 853.3090 Ops/s $\color{#35bf28}+0.53\%$
test_items_stack_nested_locked 0.9581ms 0.9132ms 1.0951 KOps/s 1.0874 KOps/s $\color{#35bf28}+0.71\%$
test_keys 24.6320μs 4.5489μs 219.8310 KOps/s 219.2486 KOps/s $\color{#35bf28}+0.27\%$
test_keys_nested 1.9141ms 94.0694μs 10.6304 KOps/s 10.6077 KOps/s $\color{#35bf28}+0.21\%$
test_keys_nested_locked 0.1317ms 97.6320μs 10.2425 KOps/s 10.2018 KOps/s $\color{#35bf28}+0.40\%$
test_keys_nested_leaf 0.2030ms 78.0215μs 12.8170 KOps/s 12.8437 KOps/s $\color{#d91a1a}-0.21\%$
test_keys_stack_nested 1.3328ms 1.2802ms 781.1039 Ops/s 862.2872 Ops/s $\textbf{\color{#d91a1a}-9.41\%}$
test_keys_stack_nested_leaf 1.3673ms 1.2667ms 789.4677 Ops/s 873.0781 Ops/s $\textbf{\color{#d91a1a}-9.58\%}$
test_keys_stack_nested_locked 1.0189ms 0.8290ms 1.2063 KOps/s 1.3666 KOps/s $\textbf{\color{#d91a1a}-11.73\%}$
test_values 8.7237μs 1.8979μs 526.8897 KOps/s 523.7966 KOps/s $\color{#35bf28}+0.59\%$
test_values_nested 85.0640μs 44.7182μs 22.3623 KOps/s 21.9914 KOps/s $\color{#35bf28}+1.69\%$
test_values_nested_locked 69.1930μs 46.9928μs 21.2798 KOps/s 20.9743 KOps/s $\color{#35bf28}+1.46\%$
test_values_nested_leaf 56.6630μs 39.3643μs 25.4037 KOps/s 25.0336 KOps/s $\color{#35bf28}+1.48\%$
test_values_stack_nested 1.0172ms 0.9737ms 1.0270 KOps/s 1.0247 KOps/s $\color{#35bf28}+0.23\%$
test_values_stack_nested_leaf 1.0314ms 0.9708ms 1.0301 KOps/s 1.0332 KOps/s $\color{#d91a1a}-0.30\%$
test_values_stack_nested_locked 0.6305ms 0.5835ms 1.7138 KOps/s 1.7019 KOps/s $\color{#35bf28}+0.69\%$
test_membership 8.2804μs 0.9322μs 1.0727 MOps/s 1.0526 MOps/s $\color{#35bf28}+1.91\%$
test_membership_nested 35.0820μs 2.8686μs 348.6037 KOps/s 348.2273 KOps/s $\color{#35bf28}+0.11\%$
test_membership_nested_leaf 18.8010μs 2.8488μs 351.0243 KOps/s 348.5938 KOps/s $\color{#35bf28}+0.70\%$
test_membership_stacked_nested 60.2830μs 11.1415μs 89.7547 KOps/s 87.5673 KOps/s $\color{#35bf28}+2.50\%$
test_membership_stacked_nested_leaf 44.6820μs 11.2261μs 89.0782 KOps/s 88.8085 KOps/s $\color{#35bf28}+0.30\%$
test_membership_nested_last 35.0420μs 6.6873μs 149.5363 KOps/s 188.0393 KOps/s $\textbf{\color{#d91a1a}-20.48\%}$
test_membership_nested_leaf_last 26.6510μs 6.6491μs 150.3972 KOps/s 187.6984 KOps/s $\textbf{\color{#d91a1a}-19.87\%}$
test_membership_stacked_nested_last 0.8528ms 0.1891ms 5.2884 KOps/s 6.3520 KOps/s $\textbf{\color{#d91a1a}-16.74\%}$
test_membership_stacked_nested_leaf_last 40.2920μs 14.4104μs 69.3941 KOps/s 76.2392 KOps/s $\textbf{\color{#d91a1a}-8.98\%}$
test_nested_getleaf 45.2220μs 9.8231μs 101.8013 KOps/s 118.4363 KOps/s $\textbf{\color{#d91a1a}-14.05\%}$
test_nested_get 31.5110μs 9.4046μs 106.3308 KOps/s 125.7140 KOps/s $\textbf{\color{#d91a1a}-15.42\%}$
test_stacked_getleaf 0.5336ms 0.3364ms 2.9723 KOps/s 3.0332 KOps/s $\color{#d91a1a}-2.01\%$
test_stacked_get 0.3396ms 0.2992ms 3.3424 KOps/s 3.3760 KOps/s $\color{#d91a1a}-1.00\%$
test_nested_getitemleaf 26.8620μs 11.2773μs 88.6740 KOps/s 101.9730 KOps/s $\textbf{\color{#d91a1a}-13.04\%}$
test_nested_getitem 26.2010μs 10.8069μs 92.5334 KOps/s 107.0889 KOps/s $\textbf{\color{#d91a1a}-13.59\%}$
test_stacked_getitemleaf 0.3735ms 0.3337ms 2.9968 KOps/s 2.9996 KOps/s $\color{#d91a1a}-0.10\%$
test_stacked_getitem 0.3619ms 0.2984ms 3.3508 KOps/s 3.3642 KOps/s $\color{#d91a1a}-0.40\%$
test_lock_nested 2.5156ms 0.4659ms 2.1466 KOps/s 2.2571 KOps/s $\color{#d91a1a}-4.90\%$
test_lock_stack_nested 0.1255s 8.2715ms 120.8977 Ops/s 138.9049 Ops/s $\textbf{\color{#d91a1a}-12.96\%}$
test_unlock_nested 0.8605ms 0.4658ms 2.1470 KOps/s 2.8141 KOps/s $\textbf{\color{#d91a1a}-23.70\%}$
test_unlock_stack_nested 0.1230s 8.3088ms 120.3548 Ops/s 138.6609 Ops/s $\textbf{\color{#d91a1a}-13.20\%}$
test_flatten_speed 0.3761ms 0.2924ms 3.4205 KOps/s 3.8942 KOps/s $\textbf{\color{#d91a1a}-12.17\%}$
test_unflatten_speed 0.4918ms 0.4601ms 2.1736 KOps/s 2.7735 KOps/s $\textbf{\color{#d91a1a}-21.63\%}$
test_common_ops 1.2059ms 0.7671ms 1.3036 KOps/s 1.7324 KOps/s $\textbf{\color{#d91a1a}-24.75\%}$
test_creation 15.5010μs 2.9183μs 342.6632 KOps/s 636.9403 KOps/s $\textbf{\color{#d91a1a}-46.20\%}$
test_creation_empty 31.5320μs 11.8583μs 84.3293 KOps/s 153.3222 KOps/s $\textbf{\color{#d91a1a}-45.00\%}$
test_creation_nested_1 87.6850μs 15.4947μs 64.5382 KOps/s 121.2998 KOps/s $\textbf{\color{#d91a1a}-46.79\%}$
test_creation_nested_2 36.0920μs 19.2767μs 51.8762 KOps/s 93.8628 KOps/s $\textbf{\color{#d91a1a}-44.73\%}$
test_clone 51.6420μs 18.3249μs 54.5706 KOps/s 70.3135 KOps/s $\textbf{\color{#d91a1a}-22.39\%}$
test_getitem[int] 32.3020μs 15.8893μs 62.9356 KOps/s 93.9130 KOps/s $\textbf{\color{#d91a1a}-32.99\%}$
test_getitem[slice_int] 62.3430μs 29.2411μs 34.1984 KOps/s 46.6872 KOps/s $\textbf{\color{#d91a1a}-26.75\%}$
test_getitem[range] 0.1097ms 50.4532μs 19.8204 KOps/s 24.3878 KOps/s $\textbf{\color{#d91a1a}-18.73\%}$
test_getitem[tuple] 50.2120μs 25.8424μs 38.6961 KOps/s 53.7044 KOps/s $\textbf{\color{#d91a1a}-27.95\%}$
test_getitem[list] 0.1283ms 47.6073μs 21.0052 KOps/s 27.1916 KOps/s $\textbf{\color{#d91a1a}-22.75\%}$
test_setitem_dim[int] 43.9620μs 27.5148μs 36.3440 KOps/s 39.8890 KOps/s $\textbf{\color{#d91a1a}-8.89\%}$
test_setitem_dim[slice_int] 71.2140μs 48.5076μs 20.6153 KOps/s 21.7259 KOps/s $\textbf{\color{#d91a1a}-5.11\%}$
test_setitem_dim[range] 87.7250μs 68.0283μs 14.6998 KOps/s 15.1242 KOps/s $\color{#d91a1a}-2.81\%$
test_setitem_dim[tuple] 58.5840μs 41.8933μs 23.8702 KOps/s 25.4816 KOps/s $\textbf{\color{#d91a1a}-6.32\%}$
test_setitem 55.6630μs 22.8832μs 43.7002 KOps/s 55.2844 KOps/s $\textbf{\color{#d91a1a}-20.95\%}$
test_set 59.2330μs 24.0336μs 41.6084 KOps/s 57.3862 KOps/s $\textbf{\color{#d91a1a}-27.49\%}$
test_set_shared 2.7017ms 0.1114ms 8.9754 KOps/s 9.6544 KOps/s $\textbf{\color{#d91a1a}-7.03\%}$
test_update 65.2230μs 22.9028μs 43.6629 KOps/s 52.8258 KOps/s $\textbf{\color{#d91a1a}-17.35\%}$
test_update_nested 75.0540μs 32.2267μs 31.0301 KOps/s 38.7974 KOps/s $\textbf{\color{#d91a1a}-20.02\%}$
test_set_nested 63.8130μs 24.1263μs 41.4485 KOps/s 53.1756 KOps/s $\textbf{\color{#d91a1a}-22.05\%}$
test_set_nested_new 69.8540μs 31.3771μs 31.8703 KOps/s 46.8744 KOps/s $\textbf{\color{#d91a1a}-32.01\%}$
test_select 0.1408ms 51.3143μs 19.4877 KOps/s 29.5633 KOps/s $\textbf{\color{#d91a1a}-34.08\%}$
test_select_nested 0.1102ms 84.8400μs 11.7869 KOps/s 19.0239 KOps/s $\textbf{\color{#d91a1a}-38.04\%}$
test_exclude_nested 1.0801ms 0.1488ms 6.7221 KOps/s 8.9772 KOps/s $\textbf{\color{#d91a1a}-25.12\%}$
test_empty[True] 0.4858ms 0.4200ms 2.3808 KOps/s 2.6244 KOps/s $\textbf{\color{#d91a1a}-9.28\%}$
test_empty[False] 10.7755μs 2.5061μs 399.0204 KOps/s 1.1588 MOps/s $\textbf{\color{#d91a1a}-65.57\%}$
test_to 84.0040μs 61.6872μs 16.2108 KOps/s 17.5085 KOps/s $\textbf{\color{#d91a1a}-7.41\%}$
test_to_nonblocking 76.5240μs 43.0093μs 23.2508 KOps/s 27.9278 KOps/s $\textbf{\color{#d91a1a}-16.75\%}$
test_unbind_speed 0.4658ms 0.3765ms 2.6561 KOps/s 3.6875 KOps/s $\textbf{\color{#d91a1a}-27.97\%}$
test_unbind_speed_stack0 0.1160s 5.4816ms 182.4271 Ops/s 280.7590 Ops/s $\textbf{\color{#d91a1a}-35.02\%}$
test_unbind_speed_stack1 16.5310μs 1.8274μs 547.2356 KOps/s 551.6424 KOps/s $\color{#d91a1a}-0.80\%$
test_split 2.5196ms 1.8994ms 526.4772 Ops/s 552.6727 Ops/s $\color{#d91a1a}-4.74\%$
test_chunk 0.1097s 2.1139ms 473.0547 Ops/s 581.4345 Ops/s $\textbf{\color{#d91a1a}-18.64\%}$
test_creation[device0] 0.1467ms 73.8029μs 13.5496 KOps/s 13.5857 KOps/s $\color{#d91a1a}-0.27\%$
test_creation_from_tensor 0.1984ms 55.0737μs 18.1575 KOps/s 18.4950 KOps/s $\color{#d91a1a}-1.82\%$
test_add_one[memmap_tensor0] 0.2837ms 7.5673μs 132.1474 KOps/s 133.4053 KOps/s $\color{#d91a1a}-0.94\%$
test_contiguous[memmap_tensor0] 26.4410μs 0.6591μs 1.5172 MOps/s 1.5100 MOps/s $\color{#35bf28}+0.48\%$
test_stack[memmap_tensor0] 46.8120μs 4.7862μs 208.9327 KOps/s 206.4735 KOps/s $\color{#35bf28}+1.19\%$
test_memmaptd_index 0.9982ms 0.2737ms 3.6542 KOps/s 3.7436 KOps/s $\color{#d91a1a}-2.39\%$
test_memmaptd_index_astensor 0.6161ms 0.3321ms 3.0109 KOps/s 3.1012 KOps/s $\color{#d91a1a}-2.91\%$
test_memmaptd_index_op 0.9580ms 0.6416ms 1.5586 KOps/s 1.6356 KOps/s $\color{#d91a1a}-4.71\%$
test_serialize_model 0.2003s 0.1042s 9.5940 Ops/s 9.1770 Ops/s $\color{#35bf28}+4.54\%$
test_serialize_model_pickle 1.3520s 1.2364s 0.8088 Ops/s 0.8077 Ops/s $\color{#35bf28}+0.14\%$
test_serialize_weights 0.1966s 99.6231ms 10.0378 Ops/s 10.7331 Ops/s $\textbf{\color{#d91a1a}-6.48\%}$
test_serialize_weights_returnearly 0.3060s 74.6917ms 13.3884 Ops/s 13.2852 Ops/s $\color{#35bf28}+0.78\%$
test_serialize_weights_pickle 1.4157s 1.2454s 0.8030 Ops/s 0.8029 Ops/s $+0.01\%$
test_reshape_pytree 0.1643ms 25.8797μs 38.6404 KOps/s 39.6402 KOps/s $\color{#d91a1a}-2.52\%$
test_reshape_td 0.1861ms 37.7917μs 26.4609 KOps/s 33.0647 KOps/s $\textbf{\color{#d91a1a}-19.97\%}$
test_view_pytree 0.1172ms 24.6338μs 40.5946 KOps/s 40.6030 KOps/s $\color{#d91a1a}-0.02\%$
test_view_td 0.5117ms 6.8395μs 146.2101 KOps/s 80.2544 KOps/s $\textbf{\color{#35bf28}+82.18\%}$
test_unbind_pytree 77.8140μs 30.7977μs 32.4700 KOps/s 32.8183 KOps/s $\color{#d91a1a}-1.06\%$
test_unbind_td 0.3567ms 57.2645μs 17.4628 KOps/s 24.6070 KOps/s $\textbf{\color{#d91a1a}-29.03\%}$
test_split_pytree 59.2030μs 28.8265μs 34.6904 KOps/s 34.9333 KOps/s $\color{#d91a1a}-0.70\%$
test_split_td 0.1439ms 53.1638μs 18.8098 KOps/s 25.3942 KOps/s $\textbf{\color{#d91a1a}-25.93\%}$
test_add_pytree 65.4040μs 38.1026μs 26.2450 KOps/s 26.0797 KOps/s $\color{#35bf28}+0.63\%$
test_add_td 0.1455ms 59.3009μs 16.8632 KOps/s 20.1829 KOps/s $\textbf{\color{#d91a1a}-16.45\%}$
test_distributed 2.7618ms 72.4127μs 13.8097 KOps/s 11.2975 KOps/s $\textbf{\color{#35bf28}+22.24\%}$
test_tdmodule 0.1214ms 21.2544μs 47.0490 KOps/s 60.2772 KOps/s $\textbf{\color{#d91a1a}-21.95\%}$
test_tdmodule_dispatch 0.2410ms 47.6678μs 20.9785 KOps/s 29.9459 KOps/s $\textbf{\color{#d91a1a}-29.95\%}$
test_tdseq 39.9120μs 24.0091μs 41.6509 KOps/s 52.2607 KOps/s $\textbf{\color{#d91a1a}-20.30\%}$
test_tdseq_dispatch 67.4330μs 48.9563μs 20.4264 KOps/s 27.7832 KOps/s $\textbf{\color{#d91a1a}-26.48\%}$
test_instantiation_functorch 1.7846ms 1.6650ms 600.6180 Ops/s 605.4883 Ops/s $\color{#d91a1a}-0.80\%$
test_instantiation_td 1.7283ms 1.1795ms 847.7846 Ops/s 873.5983 Ops/s $\color{#d91a1a}-2.95\%$
test_exec_functorch 0.2351ms 0.1671ms 5.9827 KOps/s 6.2237 KOps/s $\color{#d91a1a}-3.87\%$
test_exec_functional_call 0.2497ms 0.1632ms 6.1288 KOps/s 6.2451 KOps/s $\color{#d91a1a}-1.86\%$
test_exec_td 0.2289ms 0.1586ms 6.3063 KOps/s 6.4035 KOps/s $\color{#d91a1a}-1.52\%$
test_exec_td_decorator 0.3500ms 0.2370ms 4.2198 KOps/s 5.4939 KOps/s $\textbf{\color{#d91a1a}-23.19\%}$
test_vmap_mlp_speed[True-True] 1.1815ms 1.0638ms 940.0241 Ops/s 959.8104 Ops/s $\color{#d91a1a}-2.06\%$
test_vmap_mlp_speed[True-False] 0.7249ms 0.6207ms 1.6111 KOps/s 1.6624 KOps/s $\color{#d91a1a}-3.09\%$
test_vmap_mlp_speed[False-True] 1.7832ms 0.9620ms 1.0395 KOps/s 1.0284 KOps/s $\color{#35bf28}+1.08\%$
test_vmap_mlp_speed[False-False] 0.7138ms 0.5426ms 1.8431 KOps/s 1.7870 KOps/s $\color{#35bf28}+3.14\%$
test_vmap_mlp_speed_decorator[True-True] 2.0934ms 1.9374ms 516.1489 Ops/s 551.7953 Ops/s $\textbf{\color{#d91a1a}-6.46\%}$
test_vmap_mlp_speed_decorator[True-False] 0.8003ms 0.6959ms 1.4370 KOps/s 1.5810 KOps/s $\textbf{\color{#d91a1a}-9.11\%}$
test_vmap_mlp_speed_decorator[False-True] 2.1018ms 1.6772ms 596.2402 Ops/s 601.6582 Ops/s $\color{#d91a1a}-0.90\%$
test_vmap_mlp_speed_decorator[False-False] 0.7127ms 0.5783ms 1.7292 KOps/s 1.7467 KOps/s $\color{#d91a1a}-1.00\%$
test_vmap_transformer_speed[True-True] 12.7279ms 12.3499ms 80.9726 Ops/s 78.9750 Ops/s $\color{#35bf28}+2.53\%$
test_vmap_transformer_speed[True-False] 8.4336ms 8.1660ms 122.4595 Ops/s 120.5806 Ops/s $\color{#35bf28}+1.56\%$
test_vmap_transformer_speed[False-True] 12.6124ms 12.2129ms 81.8804 Ops/s 81.0988 Ops/s $\color{#35bf28}+0.96\%$
test_vmap_transformer_speed[False-False] 8.4355ms 8.1005ms 123.4498 Ops/s 122.8750 Ops/s $\color{#35bf28}+0.47\%$
test_vmap_transformer_speed_decorator[True-True] 0.1958s 71.7096ms 13.9451 Ops/s 16.0113 Ops/s $\textbf{\color{#d91a1a}-12.90\%}$
test_vmap_transformer_speed_decorator[True-False] 20.2362ms 19.8779ms 50.3070 Ops/s 51.9889 Ops/s $\color{#d91a1a}-3.24\%$
test_vmap_transformer_speed_decorator[False-True] 57.2127ms 56.7394ms 17.6244 Ops/s 18.4176 Ops/s $\color{#d91a1a}-4.31\%$
test_vmap_transformer_speed_decorator[False-False] 20.1594ms 19.6446ms 50.9046 Ops/s 52.9211 Ops/s $\color{#d91a1a}-3.81\%$
test_to_module_speed[True] 1.7050ms 1.5850ms 630.9057 Ops/s 996.8444 Ops/s $\textbf{\color{#d91a1a}-36.71\%}$
test_to_module_speed[False] 2.9613ms 1.5574ms 642.0801 Ops/s 1.0264 KOps/s $\textbf{\color{#d91a1a}-37.44\%}$

@ezyang
Copy link

ezyang commented Jan 23, 2024

I'd suggest filing issues for the problems with repros, they're probably just PT2 bugs.

# Conflicts:
#	tensordict/_td.py
#	tensordict/base.py
#	tensordict/nn/common.py
#	tensordict/nn/utils.py
#	test/test_nn.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request Refactor Refactoring code - not a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants