Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Multithreaded apply #659

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

[Feature] Multithreaded apply #659

wants to merge 3 commits into from

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 5, 2024

  • init
  • init
  • empty

Description

Describe your changes in detail.

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 5, 2024
Copy link

github-actions bot commented Feb 5, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 124. Improved: $\large\color{#35bf28}2$. Worsened: $\large\color{#d91a1a}33$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 47.0980μs 18.1910μs 54.9724 KOps/s 64.2065 KOps/s $\textbf{\color{#d91a1a}-14.38\%}$
test_plain_set_stack_nested 0.2508ms 0.1504ms 6.6482 KOps/s 7.0767 KOps/s $\textbf{\color{#d91a1a}-6.05\%}$
test_plain_set_nested_inplace 51.9470μs 20.4591μs 48.8780 KOps/s 56.6287 KOps/s $\textbf{\color{#d91a1a}-13.69\%}$
test_plain_set_stack_nested_inplace 0.3327ms 0.1840ms 5.4337 KOps/s 5.7539 KOps/s $\textbf{\color{#d91a1a}-5.57\%}$
test_items 20.3280μs 2.4475μs 408.5733 KOps/s 416.8002 KOps/s $\color{#d91a1a}-1.97\%$
test_items_nested 0.3706ms 0.2789ms 3.5851 KOps/s 3.7314 KOps/s $\color{#d91a1a}-3.92\%$
test_items_nested_locked 0.7844ms 0.2794ms 3.5785 KOps/s 3.7325 KOps/s $\color{#d91a1a}-4.13\%$
test_items_nested_leaf 0.4271ms 0.1686ms 5.9329 KOps/s 6.0228 KOps/s $\color{#d91a1a}-1.49\%$
test_items_stack_nested 1.5924ms 1.3388ms 746.9222 Ops/s 762.5609 Ops/s $\color{#d91a1a}-2.05\%$
test_items_stack_nested_leaf 1.8481ms 1.2077ms 828.0508 Ops/s 846.4837 Ops/s $\color{#d91a1a}-2.18\%$
test_items_stack_nested_locked 2.1952ms 0.9136ms 1.0946 KOps/s 1.1505 KOps/s $\color{#d91a1a}-4.86\%$
test_keys 23.8350μs 3.8773μs 257.9108 KOps/s 259.6568 KOps/s $\color{#d91a1a}-0.67\%$
test_keys_nested 1.8206ms 0.1538ms 6.5028 KOps/s 6.7937 KOps/s $\color{#d91a1a}-4.28\%$
test_keys_nested_locked 0.2758ms 0.1518ms 6.5886 KOps/s 6.6301 KOps/s $\color{#d91a1a}-0.63\%$
test_keys_nested_leaf 0.2574ms 0.1318ms 7.5884 KOps/s 7.6772 KOps/s $\color{#d91a1a}-1.16\%$
test_keys_stack_nested 1.9520ms 1.2870ms 777.0089 Ops/s 795.9758 Ops/s $\color{#d91a1a}-2.38\%$
test_keys_stack_nested_leaf 1.6659ms 1.3120ms 762.1947 Ops/s 795.2712 Ops/s $\color{#d91a1a}-4.16\%$
test_keys_stack_nested_locked 1.2586ms 0.8307ms 1.2037 KOps/s 1.2454 KOps/s $\color{#d91a1a}-3.35\%$
test_values 5.7132μs 1.1640μs 859.0929 KOps/s 840.6376 KOps/s $\color{#35bf28}+2.20\%$
test_values_nested 0.1187ms 52.4750μs 19.0567 KOps/s 19.3610 KOps/s $\color{#d91a1a}-1.57\%$
test_values_nested_locked 93.7750μs 52.8070μs 18.9369 KOps/s 19.2318 KOps/s $\color{#d91a1a}-1.53\%$
test_values_nested_leaf 2.4768ms 46.5327μs 21.4903 KOps/s 21.8781 KOps/s $\color{#d91a1a}-1.77\%$
test_values_stack_nested 1.7277ms 1.0400ms 961.5527 Ops/s 986.7335 Ops/s $\color{#d91a1a}-2.55\%$
test_values_stack_nested_leaf 1.2781ms 1.0235ms 977.0280 Ops/s 993.4582 Ops/s $\color{#d91a1a}-1.65\%$
test_values_stack_nested_locked 1.1245ms 0.6233ms 1.6043 KOps/s 1.6970 KOps/s $\textbf{\color{#d91a1a}-5.46\%}$
test_membership 17.0820μs 1.3478μs 741.9228 KOps/s 744.3943 KOps/s $\color{#d91a1a}-0.33\%$
test_membership_nested 20.9990μs 3.4312μs 291.4416 KOps/s 294.8889 KOps/s $\color{#d91a1a}-1.17\%$
test_membership_nested_leaf 27.3710μs 3.5044μs 285.3552 KOps/s 294.4852 KOps/s $\color{#d91a1a}-3.10\%$
test_membership_stacked_nested 37.1790μs 11.7247μs 85.2903 KOps/s 83.9845 KOps/s $\color{#35bf28}+1.55\%$
test_membership_stacked_nested_leaf 28.5430μs 11.7816μs 84.8782 KOps/s 83.9183 KOps/s $\color{#35bf28}+1.14\%$
test_membership_nested_last 30.1260μs 6.5431μs 152.8326 KOps/s 153.5801 KOps/s $\color{#d91a1a}-0.49\%$
test_membership_nested_leaf_last 29.8760μs 6.5973μs 151.5763 KOps/s 148.4216 KOps/s $\color{#35bf28}+2.13\%$
test_membership_stacked_nested_last 0.2831ms 0.1760ms 5.6827 KOps/s 5.5436 KOps/s $\color{#35bf28}+2.51\%$
test_membership_stacked_nested_leaf_last 47.6390μs 13.7560μs 72.6956 KOps/s 72.2827 KOps/s $\color{#35bf28}+0.57\%$
test_nested_getleaf 36.5390μs 10.5606μs 94.6916 KOps/s 98.3436 KOps/s $\color{#d91a1a}-3.71\%$
test_nested_get 40.8260μs 10.1190μs 98.8244 KOps/s 101.2175 KOps/s $\color{#d91a1a}-2.36\%$
test_stacked_getleaf 0.6385ms 0.3957ms 2.5269 KOps/s 2.5652 KOps/s $\color{#d91a1a}-1.49\%$
test_stacked_get 0.5461ms 0.3628ms 2.7566 KOps/s 2.7714 KOps/s $\color{#d91a1a}-0.54\%$
test_nested_getitemleaf 37.8510μs 12.1011μs 82.6374 KOps/s 82.7753 KOps/s $\color{#d91a1a}-0.17\%$
test_nested_getitem 38.0910μs 11.5176μs 86.8233 KOps/s 88.1714 KOps/s $\color{#d91a1a}-1.53\%$
test_stacked_getitemleaf 0.6307ms 0.3994ms 2.5037 KOps/s 2.5173 KOps/s $\color{#d91a1a}-0.54\%$
test_stacked_getitem 0.6414ms 0.3663ms 2.7302 KOps/s 2.7345 KOps/s $\color{#d91a1a}-0.16\%$
test_lock_nested 0.7306ms 0.3417ms 2.9261 KOps/s 3.0161 KOps/s $\color{#d91a1a}-2.98\%$
test_lock_stack_nested 82.6207ms 6.0428ms 165.4869 Ops/s 166.3912 Ops/s $\color{#d91a1a}-0.54\%$
test_unlock_nested 73.1178ms 0.4133ms 2.4193 KOps/s 3.0181 KOps/s $\textbf{\color{#d91a1a}-19.84\%}$
test_unlock_stack_nested 88.5247ms 6.0644ms 164.8958 Ops/s 165.1874 Ops/s $\color{#d91a1a}-0.18\%$
test_flatten_speed 0.6205ms 0.3723ms 2.6860 KOps/s 2.7271 KOps/s $\color{#d91a1a}-1.51\%$
test_unflatten_speed 0.6083ms 0.4714ms 2.1214 KOps/s 2.2023 KOps/s $\color{#d91a1a}-3.67\%$
test_common_ops 5.8052ms 0.7319ms 1.3664 KOps/s 1.5392 KOps/s $\textbf{\color{#d91a1a}-11.23\%}$
test_creation 23.3240μs 1.8380μs 544.0699 KOps/s 531.7344 KOps/s $\color{#35bf28}+2.32\%$
test_creation_empty 29.5050μs 11.5751μs 86.3920 KOps/s 130.4109 KOps/s $\textbf{\color{#d91a1a}-33.75\%}$
test_creation_nested_1 85.0190μs 14.3219μs 69.8232 KOps/s 98.2437 KOps/s $\textbf{\color{#d91a1a}-28.93\%}$
test_creation_nested_2 86.2210μs 17.4690μs 57.2442 KOps/s 74.8800 KOps/s $\textbf{\color{#d91a1a}-23.55\%}$
test_clone 55.3530μs 13.0262μs 76.7686 KOps/s 75.5560 KOps/s $\color{#35bf28}+1.60\%$
test_getitem[int] 52.4470μs 11.1324μs 89.8282 KOps/s 92.3192 KOps/s $\color{#d91a1a}-2.70\%$
test_getitem[slice_int] 64.3400μs 21.9383μs 45.5824 KOps/s 46.8406 KOps/s $\color{#d91a1a}-2.69\%$
test_getitem[range] 92.0810μs 42.4539μs 23.5550 KOps/s 24.3613 KOps/s $\color{#d91a1a}-3.31\%$
test_getitem[tuple] 63.5590μs 18.0828μs 55.3011 KOps/s 55.0468 KOps/s $\color{#35bf28}+0.46\%$
test_getitem[list] 0.2185ms 37.9397μs 26.3576 KOps/s 27.7806 KOps/s $\textbf{\color{#d91a1a}-5.12\%}$
test_setitem_dim[int] 48.8210μs 31.8263μs 31.4205 KOps/s 36.9746 KOps/s $\textbf{\color{#d91a1a}-15.02\%}$
test_setitem_dim[slice_int] 97.7620μs 57.7296μs 17.3221 KOps/s 19.1647 KOps/s $\textbf{\color{#d91a1a}-9.61\%}$
test_setitem_dim[range] 0.1341ms 79.8031μs 12.5308 KOps/s 13.8635 KOps/s $\textbf{\color{#d91a1a}-9.61\%}$
test_setitem_dim[tuple] 0.1153ms 48.5780μs 20.5854 KOps/s 24.1575 KOps/s $\textbf{\color{#d91a1a}-14.79\%}$
test_setitem 77.8650μs 20.3958μs 49.0297 KOps/s 54.8744 KOps/s $\textbf{\color{#d91a1a}-10.65\%}$
test_set 91.5010μs 19.7622μs 50.6016 KOps/s 57.4855 KOps/s $\textbf{\color{#d91a1a}-11.98\%}$
test_set_shared 3.0249ms 0.1438ms 6.9532 KOps/s 7.1228 KOps/s $\color{#d91a1a}-2.38\%$
test_update 0.1004ms 23.3687μs 42.7923 KOps/s 53.1650 KOps/s $\textbf{\color{#d91a1a}-19.51\%}$
test_update_nested 0.1022ms 31.4553μs 31.7911 KOps/s 37.6013 KOps/s $\textbf{\color{#d91a1a}-15.45\%}$
test_set_nested 0.1231ms 22.0398μs 45.3724 KOps/s 50.9967 KOps/s $\textbf{\color{#d91a1a}-11.03\%}$
test_set_nested_new 83.6860μs 26.0708μs 38.3571 KOps/s 42.7657 KOps/s $\textbf{\color{#d91a1a}-10.31\%}$
test_select 0.1269ms 40.2733μs 24.8303 KOps/s 27.5185 KOps/s $\textbf{\color{#d91a1a}-9.77\%}$
test_select_nested 0.1117ms 58.9386μs 16.9668 KOps/s 17.1882 KOps/s $\color{#d91a1a}-1.29\%$
test_exclude_nested 0.2493ms 0.1188ms 8.4178 KOps/s 8.5063 KOps/s $\color{#d91a1a}-1.04\%$
test_empty[True] 0.6481ms 0.4063ms 2.4615 KOps/s 2.4469 KOps/s $\color{#35bf28}+0.60\%$
test_empty[False] 5.3120μs 1.0627μs 940.9570 KOps/s 959.9024 KOps/s $\color{#d91a1a}-1.97\%$
test_unbind_speed 0.4422ms 0.2447ms 4.0875 KOps/s 4.0461 KOps/s $\color{#35bf28}+1.02\%$
test_unbind_speed_stack0 77.2960ms 3.7899ms 263.8581 Ops/s 334.7944 Ops/s $\textbf{\color{#d91a1a}-21.19\%}$
test_unbind_speed_stack1 18.7350μs 1.9766μs 505.9117 KOps/s 513.3127 KOps/s $\color{#d91a1a}-1.44\%$
test_split 2.2308ms 1.4738ms 678.4993 Ops/s 622.9586 Ops/s $\textbf{\color{#35bf28}+8.92\%}$
test_chunk 72.4091ms 1.5743ms 635.2016 Ops/s 650.6211 Ops/s $\color{#d91a1a}-2.37\%$
test_creation[device0] 3.6348ms 0.1047ms 9.5512 KOps/s 9.6773 KOps/s $\color{#d91a1a}-1.30\%$
test_creation_from_tensor 0.2027ms 82.1869μs 12.1674 KOps/s 12.2086 KOps/s $\color{#d91a1a}-0.34\%$
test_add_one[memmap_tensor0] 0.2847ms 5.4513μs 183.4426 KOps/s 190.6795 KOps/s $\color{#d91a1a}-3.80\%$
test_contiguous[memmap_tensor0] 17.5820μs 0.6540μs 1.5290 MOps/s 1.5366 MOps/s $\color{#d91a1a}-0.50\%$
test_stack[memmap_tensor0] 53.7200μs 3.6303μs 275.4563 KOps/s 279.3559 KOps/s $\color{#d91a1a}-1.40\%$
test_memmaptd_index 0.9631ms 0.2475ms 4.0403 KOps/s 4.2369 KOps/s $\color{#d91a1a}-4.64\%$
test_memmaptd_index_astensor 0.6910ms 0.3038ms 3.2920 KOps/s 3.3614 KOps/s $\color{#d91a1a}-2.06\%$
test_memmaptd_index_op 1.0197ms 0.6242ms 1.6019 KOps/s 1.8318 KOps/s $\textbf{\color{#d91a1a}-12.55\%}$
test_serialize_model 0.1874s 0.1096s 9.1224 Ops/s 8.9766 Ops/s $\color{#35bf28}+1.62\%$
test_serialize_model_pickle 0.5073s 0.3742s 2.6722 Ops/s 2.6192 Ops/s $\color{#35bf28}+2.02\%$
test_serialize_weights 0.1039s 98.4964ms 10.1527 Ops/s 8.9315 Ops/s $\textbf{\color{#35bf28}+13.67\%}$
test_serialize_weights_returnearly 0.1241s 0.1214s 8.2389 Ops/s 8.0140 Ops/s $\color{#35bf28}+2.81\%$
test_serialize_weights_pickle 1.1499s 0.6853s 1.4591 Ops/s 1.5442 Ops/s $\textbf{\color{#d91a1a}-5.51\%}$
test_serialize_weights_filesystem 0.1677s 99.0331ms 10.0976 Ops/s 10.9480 Ops/s $\textbf{\color{#d91a1a}-7.77\%}$
test_serialize_model_filesystem 97.4125ms 92.0321ms 10.8658 Ops/s 10.8487 Ops/s $\color{#35bf28}+0.16\%$
test_reshape_pytree 66.8850μs 21.0770μs 47.4452 KOps/s 47.4168 KOps/s $\color{#35bf28}+0.06\%$
test_reshape_td 75.0700μs 31.5260μs 31.7198 KOps/s 33.5654 KOps/s $\textbf{\color{#d91a1a}-5.50\%}$
test_view_pytree 68.2270μs 20.8861μs 47.8788 KOps/s 47.8432 KOps/s $\color{#35bf28}+0.07\%$
test_view_td 77.7226ms 11.1221μs 89.9109 KOps/s 128.8276 KOps/s $\textbf{\color{#d91a1a}-30.21\%}$
test_unbind_pytree 69.8800μs 24.2293μs 41.2724 KOps/s 41.7992 KOps/s $\color{#d91a1a}-1.26\%$
test_unbind_td 95.4070μs 35.6136μs 28.0791 KOps/s 28.3385 KOps/s $\color{#d91a1a}-0.92\%$
test_split_pytree 52.3770μs 24.0891μs 41.5125 KOps/s 42.5747 KOps/s $\color{#d91a1a}-2.49\%$
test_split_td 0.4558ms 39.5351μs 25.2940 KOps/s 25.5896 KOps/s $\color{#d91a1a}-1.15\%$
test_add_pytree 74.2280μs 29.8908μs 33.4551 KOps/s 34.2418 KOps/s $\color{#d91a1a}-2.30\%$
test_add_td 0.1081ms 54.8448μs 18.2333 KOps/s 21.3953 KOps/s $\textbf{\color{#d91a1a}-14.78\%}$
test_distributed 0.2405ms 98.8689μs 10.1144 KOps/s 9.7613 KOps/s $\color{#35bf28}+3.62\%$
test_tdmodule 0.1819ms 23.3049μs 42.9094 KOps/s 47.3295 KOps/s $\textbf{\color{#d91a1a}-9.34\%}$
test_tdmodule_dispatch 0.1938ms 45.4327μs 22.0106 KOps/s 23.6455 KOps/s $\textbf{\color{#d91a1a}-6.91\%}$
test_tdseq 0.3675ms 26.6660μs 37.5010 KOps/s 41.5156 KOps/s $\textbf{\color{#d91a1a}-9.67\%}$
test_tdseq_dispatch 0.1557ms 49.3599μs 20.2594 KOps/s 22.4736 KOps/s $\textbf{\color{#d91a1a}-9.85\%}$
test_instantiation_functorch 1.7922ms 1.3144ms 760.8080 Ops/s 750.8512 Ops/s $\color{#35bf28}+1.33\%$
test_instantiation_td 5.1163ms 1.0271ms 973.6312 Ops/s 967.7159 Ops/s $\color{#35bf28}+0.61\%$
test_exec_functorch 0.2946ms 0.1577ms 6.3412 KOps/s 6.2996 KOps/s $\color{#35bf28}+0.66\%$
test_exec_functional_call 0.2933ms 0.1489ms 6.7168 KOps/s 6.8870 KOps/s $\color{#d91a1a}-2.47\%$
test_exec_td 0.2727ms 0.1474ms 6.7863 KOps/s 6.9129 KOps/s $\color{#d91a1a}-1.83\%$
test_exec_td_decorator 0.8381ms 0.2003ms 4.9933 KOps/s 4.9508 KOps/s $\color{#35bf28}+0.86\%$
test_vmap_mlp_speed[True-True] 1.4389ms 0.8967ms 1.1152 KOps/s 1.0862 KOps/s $\color{#35bf28}+2.67\%$
test_vmap_mlp_speed[True-False] 0.7087ms 0.4747ms 2.1066 KOps/s 2.0670 KOps/s $\color{#35bf28}+1.92\%$
test_vmap_mlp_speed[False-True] 1.3323ms 0.8242ms 1.2133 KOps/s 1.2652 KOps/s $\color{#d91a1a}-4.11\%$
test_vmap_mlp_speed[False-False] 0.5722ms 0.3860ms 2.5906 KOps/s 2.5661 KOps/s $\color{#35bf28}+0.95\%$
test_vmap_mlp_speed_decorator[True-True] 3.1730ms 2.3638ms 423.0486 Ops/s 439.9217 Ops/s $\color{#d91a1a}-3.84\%$
test_vmap_mlp_speed_decorator[True-False] 1.0315ms 0.5446ms 1.8361 KOps/s 1.8353 KOps/s $\color{#35bf28}+0.04\%$
test_vmap_mlp_speed_decorator[False-True] 2.3310ms 1.9270ms 518.9361 Ops/s 538.3904 Ops/s $\color{#d91a1a}-3.61\%$
test_vmap_mlp_speed_decorator[False-False] 0.8375ms 0.4150ms 2.4099 KOps/s 2.3625 KOps/s $\color{#35bf28}+2.01\%$

Copy link

github-actions bot commented Feb 5, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 132. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}12$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 86.6410μs 13.8459μs 72.2236 KOps/s 74.0058 KOps/s $\color{#d91a1a}-2.41\%$
test_plain_set_stack_nested 0.1404ms 0.1196ms 8.3583 KOps/s 8.3128 KOps/s $\color{#35bf28}+0.55\%$
test_plain_set_nested_inplace 42.7310μs 15.2248μs 65.6824 KOps/s 67.0575 KOps/s $\color{#d91a1a}-2.05\%$
test_plain_set_stack_nested_inplace 0.1682ms 0.1473ms 6.7878 KOps/s 6.7102 KOps/s $\color{#35bf28}+1.16\%$
test_items 19.8210μs 4.7298μs 211.4251 KOps/s 209.5206 KOps/s $\color{#35bf28}+0.91\%$
test_items_nested 0.3752ms 0.3410ms 2.9324 KOps/s 2.9514 KOps/s $\color{#d91a1a}-0.64\%$
test_items_nested_locked 0.3964ms 0.3453ms 2.8960 KOps/s 2.9147 KOps/s $\color{#d91a1a}-0.64\%$
test_items_nested_leaf 0.2242ms 0.2022ms 4.9460 KOps/s 4.9937 KOps/s $\color{#d91a1a}-0.96\%$
test_items_stack_nested 1.3550ms 1.3030ms 767.4888 Ops/s 764.3396 Ops/s $\color{#35bf28}+0.41\%$
test_items_stack_nested_leaf 1.2357ms 1.1475ms 871.4636 Ops/s 883.8255 Ops/s $\color{#d91a1a}-1.40\%$
test_items_stack_nested_locked 0.9599ms 0.9016ms 1.1092 KOps/s 1.1029 KOps/s $\color{#35bf28}+0.57\%$
test_keys 22.7600μs 4.5972μs 217.5216 KOps/s 217.7742 KOps/s $\color{#d91a1a}-0.12\%$
test_keys_nested 0.7855ms 95.0281μs 10.5232 KOps/s 10.5310 KOps/s $\color{#d91a1a}-0.07\%$
test_keys_nested_locked 0.1243ms 98.1797μs 10.1854 KOps/s 10.1388 KOps/s $\color{#35bf28}+0.46\%$
test_keys_nested_leaf 0.1788ms 78.3888μs 12.7569 KOps/s 12.7863 KOps/s $\color{#d91a1a}-0.23\%$
test_keys_stack_nested 1.1867ms 1.1365ms 879.9211 Ops/s 864.1858 Ops/s $\color{#35bf28}+1.82\%$
test_keys_stack_nested_leaf 1.1650ms 1.1138ms 897.8166 Ops/s 876.0605 Ops/s $\color{#35bf28}+2.48\%$
test_keys_stack_nested_locked 0.8238ms 0.7162ms 1.3963 KOps/s 1.3646 KOps/s $\color{#35bf28}+2.32\%$
test_values 9.0303μs 1.8774μs 532.6420 KOps/s 529.2650 KOps/s $\color{#35bf28}+0.64\%$
test_values_nested 65.5610μs 45.1867μs 22.1304 KOps/s 22.0623 KOps/s $\color{#35bf28}+0.31\%$
test_values_nested_locked 66.9910μs 47.1066μs 21.2284 KOps/s 20.8377 KOps/s $\color{#35bf28}+1.88\%$
test_values_nested_leaf 57.3410μs 39.6767μs 25.2037 KOps/s 25.1423 KOps/s $\color{#35bf28}+0.24\%$
test_values_stack_nested 1.0035ms 0.9473ms 1.0556 KOps/s 1.0492 KOps/s $\color{#35bf28}+0.61\%$
test_values_stack_nested_leaf 1.1099ms 0.9541ms 1.0481 KOps/s 1.0496 KOps/s $\color{#d91a1a}-0.14\%$
test_values_stack_nested_locked 0.6201ms 0.5687ms 1.7585 KOps/s 1.7457 KOps/s $\color{#35bf28}+0.74\%$
test_membership 3.6180μs 0.9240μs 1.0822 MOps/s 1.0648 MOps/s $\color{#35bf28}+1.64\%$
test_membership_nested 21.8700μs 2.9004μs 344.7825 KOps/s 342.8454 KOps/s $\color{#35bf28}+0.56\%$
test_membership_nested_leaf 19.5600μs 2.9045μs 344.2962 KOps/s 341.3316 KOps/s $\color{#35bf28}+0.87\%$
test_membership_stacked_nested 33.8210μs 11.1839μs 89.4140 KOps/s 90.3387 KOps/s $\color{#d91a1a}-1.02\%$
test_membership_stacked_nested_leaf 33.1810μs 11.1978μs 89.3036 KOps/s 89.1276 KOps/s $\color{#35bf28}+0.20\%$
test_membership_nested_last 24.5010μs 5.3518μs 186.8545 KOps/s 187.6120 KOps/s $\color{#d91a1a}-0.40\%$
test_membership_nested_leaf_last 24.3700μs 5.3100μs 188.3240 KOps/s 188.5834 KOps/s $\color{#d91a1a}-0.14\%$
test_membership_stacked_nested_last 0.1886ms 0.1548ms 6.4615 KOps/s 6.3681 KOps/s $\color{#35bf28}+1.47\%$
test_membership_stacked_nested_leaf_last 32.9000μs 13.0332μs 76.7269 KOps/s 76.9027 KOps/s $\color{#d91a1a}-0.23\%$
test_nested_getleaf 30.3400μs 8.4017μs 119.0234 KOps/s 118.9454 KOps/s $\color{#35bf28}+0.07\%$
test_nested_get 22.7200μs 7.9252μs 126.1795 KOps/s 125.8830 KOps/s $\color{#35bf28}+0.24\%$
test_stacked_getleaf 0.3734ms 0.3253ms 3.0741 KOps/s 3.0534 KOps/s $\color{#35bf28}+0.68\%$
test_stacked_get 0.3441ms 0.2937ms 3.4049 KOps/s 3.3573 KOps/s $\color{#35bf28}+1.42\%$
test_nested_getitemleaf 24.4800μs 9.7189μs 102.8926 KOps/s 101.6266 KOps/s $\color{#35bf28}+1.25\%$
test_nested_getitem 65.5110μs 9.2936μs 107.6006 KOps/s 106.8200 KOps/s $\color{#35bf28}+0.73\%$
test_stacked_getitemleaf 0.3927ms 0.3293ms 3.0365 KOps/s 3.0442 KOps/s $\color{#d91a1a}-0.25\%$
test_stacked_getitem 0.3465ms 0.2950ms 3.3894 KOps/s 3.3603 KOps/s $\color{#35bf28}+0.86\%$
test_lock_nested 1.2403ms 0.3495ms 2.8608 KOps/s 2.7692 KOps/s $\color{#35bf28}+3.31\%$
test_lock_stack_nested 88.9161ms 6.3718ms 156.9427 Ops/s 157.2947 Ops/s $\color{#d91a1a}-0.22\%$
test_unlock_nested 79.8311ms 0.4290ms 2.3309 KOps/s 2.8416 KOps/s $\textbf{\color{#d91a1a}-17.97\%}$
test_unlock_stack_nested 90.0823ms 6.4607ms 154.7815 Ops/s 153.4693 Ops/s $\color{#35bf28}+0.85\%$
test_flatten_speed 0.3485ms 0.2607ms 3.8355 KOps/s 3.8338 KOps/s $\color{#35bf28}+0.05\%$
test_unflatten_speed 0.4108ms 0.3589ms 2.7865 KOps/s 2.8384 KOps/s $\color{#d91a1a}-1.83\%$
test_common_ops 1.0892ms 0.5987ms 1.6702 KOps/s 1.6839 KOps/s $\color{#d91a1a}-0.82\%$
test_creation 18.0300μs 1.5254μs 655.5736 KOps/s 639.4044 KOps/s $\color{#35bf28}+2.53\%$
test_creation_empty 23.8810μs 8.8106μs 113.5002 KOps/s 122.6293 KOps/s $\textbf{\color{#d91a1a}-7.44\%}$
test_creation_nested_1 54.5710μs 10.5999μs 94.3406 KOps/s 100.7186 KOps/s $\textbf{\color{#d91a1a}-6.33\%}$
test_creation_nested_2 28.7000μs 12.9337μs 77.3177 KOps/s 80.9867 KOps/s $\color{#d91a1a}-4.53\%$
test_clone 70.9900μs 13.4712μs 74.2325 KOps/s 73.3878 KOps/s $\color{#35bf28}+1.15\%$
test_getitem[int] 29.5200μs 10.5896μs 94.4322 KOps/s 93.2666 KOps/s $\color{#35bf28}+1.25\%$
test_getitem[slice_int] 51.6000μs 20.5251μs 48.7208 KOps/s 47.0921 KOps/s $\color{#35bf28}+3.46\%$
test_getitem[range] 68.4210μs 34.2497μs 29.1973 KOps/s 28.2888 KOps/s $\color{#35bf28}+3.21\%$
test_getitem[tuple] 45.8010μs 18.5439μs 53.9259 KOps/s 53.0955 KOps/s $\color{#35bf28}+1.56\%$
test_getitem[list] 0.1831ms 32.5084μs 30.7613 KOps/s 30.9409 KOps/s $\color{#d91a1a}-0.58\%$
test_setitem_dim[int] 42.8890μs 27.4511μs 36.4284 KOps/s 36.3813 KOps/s $\color{#35bf28}+0.13\%$
test_setitem_dim[slice_int] 69.7410μs 47.4226μs 21.0870 KOps/s 20.3931 KOps/s $\color{#35bf28}+3.40\%$
test_setitem_dim[range] 91.7200μs 61.3839μs 16.2909 KOps/s 15.4505 KOps/s $\textbf{\color{#35bf28}+5.44\%}$
test_setitem_dim[tuple] 58.4010μs 41.7572μs 23.9480 KOps/s 23.4219 KOps/s $\color{#35bf28}+2.25\%$
test_setitem 58.6810μs 18.3772μs 54.4154 KOps/s 54.4001 KOps/s $\color{#35bf28}+0.03\%$
test_set 68.6390μs 18.1554μs 55.0802 KOps/s 55.7778 KOps/s $\color{#d91a1a}-1.25\%$
test_set_shared 2.8498ms 0.1010ms 9.9002 KOps/s 9.7364 KOps/s $\color{#35bf28}+1.68\%$
test_update 85.6500μs 20.7743μs 48.1363 KOps/s 49.3633 KOps/s $\color{#d91a1a}-2.49\%$
test_update_nested 75.1510μs 27.2306μs 36.7233 KOps/s 37.1656 KOps/s $\color{#d91a1a}-1.19\%$
test_set_nested 59.3710μs 19.4069μs 51.5280 KOps/s 52.9609 KOps/s $\color{#d91a1a}-2.71\%$
test_set_nested_new 76.2210μs 22.0665μs 45.3176 KOps/s 45.3736 KOps/s $\color{#d91a1a}-0.12\%$
test_select 89.7210μs 34.7912μs 28.7429 KOps/s 28.8874 KOps/s $\color{#d91a1a}-0.50\%$
test_select_nested 77.6710μs 52.7421μs 18.9602 KOps/s 18.9371 KOps/s $\color{#35bf28}+0.12\%$
test_exclude_nested 0.1543ms 0.1137ms 8.7915 KOps/s 8.6851 KOps/s $\color{#35bf28}+1.23\%$
test_empty[True] 0.4405ms 0.3859ms 2.5911 KOps/s 2.5723 KOps/s $\color{#35bf28}+0.73\%$
test_empty[False] 2.8650μs 0.8352μs 1.1973 MOps/s 1.1955 MOps/s $\color{#35bf28}+0.15\%$
test_to 75.7900μs 55.1282μs 18.1395 KOps/s 18.5631 KOps/s $\color{#d91a1a}-2.28\%$
test_to_nonblocking 0.2964ms 34.0091μs 29.4039 KOps/s 27.5048 KOps/s $\textbf{\color{#35bf28}+6.90\%}$
test_unbind_speed 0.3024ms 0.2660ms 3.7593 KOps/s 3.7274 KOps/s $\color{#35bf28}+0.86\%$
test_unbind_speed_stack0 88.2494ms 3.7413ms 267.2864 Ops/s 283.3561 Ops/s $\textbf{\color{#d91a1a}-5.67\%}$
test_unbind_speed_stack1 24.1710μs 1.8225μs 548.6827 KOps/s 584.0655 KOps/s $\textbf{\color{#d91a1a}-6.06\%}$
test_split 81.3609ms 1.7075ms 585.6614 Ops/s 579.6770 Ops/s $\color{#35bf28}+1.03\%$
test_chunk 2.0535ms 1.5345ms 651.6827 Ops/s 604.0437 Ops/s $\textbf{\color{#35bf28}+7.89\%}$
test_creation[device0] 0.3329ms 75.4370μs 13.2561 KOps/s 14.0480 KOps/s $\textbf{\color{#d91a1a}-5.64\%}$
test_creation_from_tensor 0.1323ms 56.6065μs 17.6658 KOps/s 17.7972 KOps/s $\color{#d91a1a}-0.74\%$
test_add_one[memmap_tensor0] 0.1778ms 6.4867μs 154.1607 KOps/s 151.0886 KOps/s $\color{#35bf28}+2.03\%$
test_contiguous[memmap_tensor0] 0.2028ms 0.6355μs 1.5736 MOps/s 1.5669 MOps/s $\color{#35bf28}+0.43\%$
test_stack[memmap_tensor0] 38.5200μs 4.3461μs 230.0906 KOps/s 229.9716 KOps/s $\color{#35bf28}+0.05\%$
test_memmaptd_index 81.1743ms 0.3000ms 3.3334 KOps/s 3.8180 KOps/s $\textbf{\color{#d91a1a}-12.69\%}$
test_memmaptd_index_astensor 0.5968ms 0.3225ms 3.1004 KOps/s 3.1050 KOps/s $\color{#d91a1a}-0.15\%$
test_memmaptd_index_op 0.9724ms 0.6179ms 1.6184 KOps/s 1.6435 KOps/s $\color{#d91a1a}-1.53\%$
test_serialize_model 91.8884ms 87.8920ms 11.3776 Ops/s 9.6837 Ops/s $\textbf{\color{#35bf28}+17.49\%}$
test_serialize_model_pickle 1.3490s 1.2375s 0.8080 Ops/s 0.8086 Ops/s $\color{#d91a1a}-0.07\%$
test_serialize_weights 0.1733s 95.5593ms 10.4647 Ops/s 10.7360 Ops/s $\color{#d91a1a}-2.53\%$
test_serialize_weights_returnearly 0.2686s 71.5704ms 13.9722 Ops/s 17.1825 Ops/s $\textbf{\color{#d91a1a}-18.68\%}$
test_serialize_weights_pickle 1.4201s 1.2457s 0.8028 Ops/s 0.8091 Ops/s $\color{#d91a1a}-0.78\%$
test_reshape_pytree 41.7600μs 24.8628μs 40.2207 KOps/s 39.9248 KOps/s $\color{#35bf28}+0.74\%$
test_reshape_td 53.1900μs 31.4177μs 31.8292 KOps/s 33.4583 KOps/s $\color{#d91a1a}-4.87\%$
test_view_pytree 46.5210μs 24.4935μs 40.8272 KOps/s 40.4175 KOps/s $\color{#35bf28}+1.01\%$
test_view_td 85.1936ms 10.0951μs 99.0584 KOps/s 96.0400 KOps/s $\color{#35bf28}+3.14\%$
test_unbind_pytree 0.1579ms 31.2955μs 31.9535 KOps/s 31.7740 KOps/s $\color{#35bf28}+0.56\%$
test_unbind_td 0.1687ms 40.1675μs 24.8958 KOps/s 24.3410 KOps/s $\color{#35bf28}+2.28\%$
test_split_pytree 54.4810μs 28.6776μs 34.8704 KOps/s 34.9389 KOps/s $\color{#d91a1a}-0.20\%$
test_split_td 0.1084ms 38.1685μs 26.1996 KOps/s 26.2930 KOps/s $\color{#d91a1a}-0.36\%$
test_add_pytree 59.9210μs 35.5393μs 28.1379 KOps/s 27.7985 KOps/s $\color{#35bf28}+1.22\%$
test_add_td 80.6500μs 49.4432μs 20.2252 KOps/s 19.8869 KOps/s $\color{#35bf28}+1.70\%$
test_distributed 1.8138ms 71.8941μs 13.9093 KOps/s 13.8718 KOps/s $\color{#35bf28}+0.27\%$
test_tdmodule 35.4200μs 18.8136μs 53.1530 KOps/s 56.9096 KOps/s $\textbf{\color{#d91a1a}-6.60\%}$
test_tdmodule_dispatch 0.2148ms 39.4503μs 25.3483 KOps/s 27.7936 KOps/s $\textbf{\color{#d91a1a}-8.80\%}$
test_tdseq 38.6410μs 21.2574μs 47.0425 KOps/s 48.6294 KOps/s $\color{#d91a1a}-3.26\%$
test_tdseq_dispatch 61.5810μs 40.6209μs 24.6178 KOps/s 26.0953 KOps/s $\textbf{\color{#d91a1a}-5.66\%}$
test_instantiation_functorch 1.7778ms 1.6854ms 593.3199 Ops/s 591.1084 Ops/s $\color{#35bf28}+0.37\%$
test_instantiation_td 1.6982ms 1.1661ms 857.5592 Ops/s 847.5204 Ops/s $\color{#35bf28}+1.18\%$
test_exec_functorch 0.2097ms 0.1585ms 6.3090 KOps/s 6.2889 KOps/s $\color{#35bf28}+0.32\%$
test_exec_functional_call 0.2175ms 0.1589ms 6.2923 KOps/s 6.2458 KOps/s $\color{#35bf28}+0.75\%$
test_exec_td 0.1825ms 0.1507ms 6.6352 KOps/s 6.4940 KOps/s $\color{#35bf28}+2.17\%$
test_exec_td_decorator 0.8208ms 0.2066ms 4.8407 KOps/s 4.7906 KOps/s $\color{#35bf28}+1.05\%$
test_vmap_mlp_speed[True-True] 1.4860ms 1.0896ms 917.7393 Ops/s 962.6522 Ops/s $\color{#d91a1a}-4.67\%$
test_vmap_mlp_speed[True-False] 0.7067ms 0.6211ms 1.6099 KOps/s 1.6275 KOps/s $\color{#d91a1a}-1.08\%$
test_vmap_mlp_speed[False-True] 1.0512ms 0.9824ms 1.0180 KOps/s 1.0225 KOps/s $\color{#d91a1a}-0.45\%$
test_vmap_mlp_speed[False-False] 0.6745ms 0.5555ms 1.8002 KOps/s 1.8571 KOps/s $\color{#d91a1a}-3.07\%$
test_vmap_mlp_speed_decorator[True-True] 2.8795ms 2.3939ms 417.7238 Ops/s 423.9077 Ops/s $\color{#d91a1a}-1.46\%$
test_vmap_mlp_speed_decorator[True-False] 0.1197s 0.7733ms 1.2932 KOps/s 1.4902 KOps/s $\textbf{\color{#d91a1a}-13.22\%}$
test_vmap_mlp_speed_decorator[False-True] 2.4506ms 2.0365ms 491.0332 Ops/s 509.5288 Ops/s $\color{#d91a1a}-3.63\%$
test_vmap_mlp_speed_decorator[False-False] 1.0388ms 0.5961ms 1.6776 KOps/s 1.7610 KOps/s $\color{#d91a1a}-4.73\%$
test_vmap_transformer_speed[True-True] 12.9589ms 12.4588ms 80.2649 Ops/s 80.4281 Ops/s $\color{#d91a1a}-0.20\%$
test_vmap_transformer_speed[True-False] 8.6047ms 8.2846ms 120.7057 Ops/s 122.1436 Ops/s $\color{#d91a1a}-1.18\%$
test_vmap_transformer_speed[False-True] 12.7947ms 12.3237ms 81.1446 Ops/s 81.4610 Ops/s $\color{#d91a1a}-0.39\%$
test_vmap_transformer_speed[False-False] 8.4144ms 8.1354ms 122.9195 Ops/s 122.6297 Ops/s $\color{#35bf28}+0.24\%$
test_vmap_transformer_speed_decorator[True-True] 75.1872ms 73.5298ms 13.5999 Ops/s 12.2391 Ops/s $\textbf{\color{#35bf28}+11.12\%}$
test_vmap_transformer_speed_decorator[True-False] 21.4564ms 19.8852ms 50.2886 Ops/s 49.9820 Ops/s $\color{#35bf28}+0.61\%$
test_vmap_transformer_speed_decorator[False-True] 66.4983ms 65.7073ms 15.2190 Ops/s 15.0853 Ops/s $\color{#35bf28}+0.89\%$
test_vmap_transformer_speed_decorator[False-False] 21.1792ms 19.4424ms 51.4340 Ops/s 51.1883 Ops/s $\color{#35bf28}+0.48\%$

batch_size=batch_size,
device=device,
names=names,
*oth,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*oth,
*others,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants