Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] linear scan memory planning strategy #64348

Closed
wants to merge 20 commits into from

Conversation

Linear scan (a heuristic based on https://www.usenix.org/legacy/events/vee05/full_papers/p132-wimmer.pdf) memory planning strategy.

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue cla signed labels Sep 1, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Sep 1, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 6d69dcd (more details on the Dr. CI page):


  • 3/3 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (1/2)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Nov 03 21:06:43 FAIL [0.005s]: test_forward_mod...D_corrcoef_cpu_float64 (__main__.TestGradientsCPU)
Nov 03 21:06:43     result = test(self, **param_kwargs)
Nov 03 21:06:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper
Nov 03 21:06:43     return test(*args, **kwargs)
Nov 03 21:06:43   File "test_ops.py", line 729, in test_forward_mode_AD
Nov 03 21:06:43     self._forward_grad_helper(device, dtype, op, op.get_op())
Nov 03 21:06:43   File "test_ops.py", line 723, in _forward_grad_helper
Nov 03 21:06:43     check_undefined_grad=False, check_batched_grad=False)
Nov 03 21:06:43 AssertionError: NotImplementedError not raised : Running forward AD for an OP that has does not support it did not raise any error. If your op supports forward AD, you should set supports_forward_ad=True
Nov 03 21:06:43 
Nov 03 21:06:43 ======================================================================
Nov 03 21:06:43 FAIL [0.005s]: test_forward_mode_AD_corrcoef_cpu_float64 (__main__.TestGradientsCPU)
Nov 03 21:06:43 ----------------------------------------------------------------------
Nov 03 21:06:43 Traceback (most recent call last):
Nov 03 21:06:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test
Nov 03 21:06:43     result = test(self, **param_kwargs)
Nov 03 21:06:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper
Nov 03 21:06:43     return test(*args, **kwargs)
Nov 03 21:06:43   File "test_ops.py", line 729, in test_forward_mode_AD
Nov 03 21:06:43     self._forward_grad_helper(device, dtype, op, op.get_op())
Nov 03 21:06:43   File "test_ops.py", line 723, in _forward_grad_helper
Nov 03 21:06:43     check_undefined_grad=False, check_batched_grad=False)

See GitHub Actions build Lint / quick-checks (2/2)

Step: "Ensure correct trailing newlines" (full log | diagnosis details | 🔁 rerun)

2021-11-03T19:09:31.0086571Z python: can't open..._launches.py': [Errno 2] No such file or directory
2021-11-03T19:09:30.9759835Z ##[group]Run set -eux
2021-11-03T19:09:30.9760361Z �[36;1mset -eux�[0m
2021-11-03T19:09:30.9761163Z �[36;1mpython torch/testing/_check_kernel_launches.py |& tee "${GITHUB_WORKSPACE}"/cuda_kernel_launch_checks.txt�[0m
2021-11-03T19:09:30.9798885Z shell: /bin/bash -e {0}
2021-11-03T19:09:30.9799240Z env:
2021-11-03T19:09:30.9799754Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.0/x64
2021-11-03T19:09:30.9800418Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.0/x64/lib
2021-11-03T19:09:30.9800920Z ##[endgroup]
2021-11-03T19:09:30.9890446Z + python torch/testing/_check_kernel_launches.py
2021-11-03T19:09:30.9896955Z + tee /home/runner/work/pytorch/pytorch/cuda_kernel_launch_checks.txt
2021-11-03T19:09:31.0086571Z python: can't open file '/home/runner/work/pytorch/pytorch/torch/testing/_check_kernel_launches.py': [Errno 2] No such file or directory
2021-11-03T19:09:31.0159465Z ##[group]Run (! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))
2021-11-03T19:09:31.0161660Z �[36;1m(! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub*.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))�[0m
2021-11-03T19:09:31.0199626Z shell: /bin/bash -e {0}
2021-11-03T19:09:31.0200006Z env:
2021-11-03T19:09:31.0200611Z   pythonLocation: /opt/hostedtoolcache/Python/3.10.0/x64
2021-11-03T19:09:31.0201381Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.10.0/x64/lib
2021-11-03T19:09:31.0201935Z ##[endgroup]
2021-11-03T19:09:31.0579068Z ##[group]Run (! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))
2021-11-03T19:09:31.0581200Z �[36;1m(! git --no-pager grep -I -no $'cudaStreamSynchronize' --  ./aten ./c10 ':(exclude)aten/src/ATen/test' ':(exclude)c10/cuda/CUDAFunctions.h' || (echo "The above files call raw cuda APIs directly; please use at::cuda wrappers instead"; false))�[0m
2021-11-03T19:09:31.0620064Z shell: /bin/bash -e {0}

1 failure not recognized by patterns:

Job Step Action
GitHub Actions Lint / clang-format Run clang-format 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Maksim Levental and others added 8 commits September 1, 2021 04:00
Linear scan (a heuristic based on https://www.usenix.org/legacy/events/vee05/full_papers/p132-wimmer.pdf) memory planning strategy.

[ghstack-poisoned]
Linear scan (a heuristic based on https://www.usenix.org/legacy/events/vee05/full_papers/p132-wimmer.pdf) memory planning strategy.

[ghstack-poisoned]
Linear scan (a heuristic based on https://www.usenix.org/legacy/events/vee05/full_papers/p132-wimmer.pdf) memory planning strategy.

[ghstack-poisoned]
Linear scan (a heuristic based on https://www.usenix.org/legacy/events/vee05/full_papers/p132-wimmer.pdf) memory planning strategy.

[ghstack-poisoned]
Linear scan (a heuristic based on https://www.usenix.org/legacy/events/vee05/full_papers/p132-wimmer.pdf) memory planning strategy.

[ghstack-poisoned]
Linear scan (a heuristic based on https://www.usenix.org/legacy/events/vee05/full_papers/p132-wimmer.pdf) memory planning strategy.

[ghstack-poisoned]
@makslevental
Copy link
Contributor Author

@makslevental has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Maksim Levental added 2 commits September 6, 2021 21:51
Linear scan (a heuristic based on https://www.usenix.org/legacy/events/vee05/full_papers/p132-wimmer.pdf) memory planning strategy.

Differential Revision: [D30769099](https://our.internmc.facebook.com/intern/diff/D30769099)

[ghstack-poisoned]
Linear scan (a heuristic based on https://www.usenix.org/legacy/events/vee05/full_papers/p132-wimmer.pdf) memory planning strategy.

Differential Revision: [D30769099](https://our.internmc.facebook.com/intern/diff/D30769099)

[ghstack-poisoned]
Maksim Levental and others added 4 commits September 12, 2021 17:32
Linear scan (a heuristic based on https://www.usenix.org/legacy/events/vee05/full_papers/p132-wimmer.pdf) memory planning strategy.

Differential Revision: [D30769099](https://our.internmc.facebook.com/intern/diff/D30769099)

[ghstack-poisoned]
Linear scan (a heuristic based on https://www.usenix.org/legacy/events/vee05/full_papers/p132-wimmer.pdf) memory planning strategy.

Differential Revision: [D30769099](https://our.internmc.facebook.com/intern/diff/D30769099)

[ghstack-poisoned]
Linear scan (a heuristic based on https://www.usenix.org/legacy/events/vee05/full_papers/p132-wimmer.pdf) memory planning strategy.

Differential Revision: [D30769099](https://our.internmc.facebook.com/intern/diff/D30769099)

[ghstack-poisoned]
This was referenced Sep 20, 2021
Copy link
Contributor

@Krovatkin Krovatkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit: Could you please take a look at #64348 (comment) before landing this?

// find the "right" region; in order of preference:
// 1. tightest fit free region i.e. smallest i.e. first match since
// avail_regions is sorted by size
// 2. swap with latest ending active live range that is big enough (spilling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please elaborate how does 2. help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean it probably doesn't if the data is indicative but this is the heuristic from the original linear scan paper

Screenshot from 2021-09-22 17-20-11

http://web.cs.ucla.edu/~palsberg/course/cs132/linearscan.pdf bottom page 5

namespace jit {

// join regions that are adjacent and free
void coalesce_avail(std::multiset<MemRegion, regionSizeCmp>& avail_regions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make this helper static, so it stays private to this file.

SortedLiveRangeMap<MemRegion> allocated_ranges;

size_t curr_end_offset = 0;
auto allocate_inactive_ranges = [&](UniqueLiveRange curr_range) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if refactoring this helper out of the mainline code would add to the readability?
a) we would immediately see which structures the helper updates, uses.
b) there would be less noise in the mainline code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. will do

continue;
}

TORCH_INTERNAL_ASSERT(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lmao nuclear shelter proof 😸

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the wrong thing? should i use something else?

size_t candidate_size = candidate_reg.size;
active[curr_range] = {candidate_offset, aligned_curr_size};
// split region (potentially)
if (candidate_size > aligned_curr_size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like duplication of the logic above? Maybe we should refactor it into a helper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's true but it's mostly duplication of unpacking structs and such. only duplicated logic is the check which is more intelligible in context i think. not sure - could go either way (we should flip a coin)

curr_end_offset += aligned_curr_size;
}

// expire any remaining intervals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this comment is very helpful, but I struggle to come up with something better. The only side effect you care about here is moving remaining intervals to allocated_ranges .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you meant *not very helpful? agreed. will add a couple more words

std::vector<MemAllocation> allocations;
allocations.reserve(allocated_ranges.size());
for (const auto& item : allocated_ranges) {
allocations.push_back({item.first, item.second});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err this looks fishy... you already reserved enough elements for allocated_ranges but then you are pushing back those on top reserved ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline

namespace torch {
namespace jit {

using EndSortedLiveRangeMap =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's only used in the cpp file, why don't we move it there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can't remember if it's used in any of the other diffs but it's a vaguely useful typedef? not sure. another coin flip.

{448, 448, TTP(((Vec{10, 10})), ((Vec{10, 1})))},
{448, 896, TTP(((Vec{10, 10})), ((Vec{10, 1})))},
};
testSmall(expected_storage, expected_allocs, Strategy::LINEAR_SCAN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it all strategies are tested with only these two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. longer/more complicated models would've inflated the verification spec data too much i think

Linear scan (a heuristic based on http://web.cs.ucla.edu/~palsberg/course/cs132/linearscan.pdf) memory planning strategy.

Differential Revision: [D30769099](https://our.internmc.facebook.com/intern/diff/D30769099)

[ghstack-poisoned]
@pytorch-probot
Copy link

pytorch-probot bot commented Nov 3, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/6d69dcd34dda3fd49a0db1d93a5e3dc11cd60729/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-dynamic ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
docker-builds ciflow/all 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-xenial-py3-clang5-mobile-code-analysis ciflow/all, ciflow/linux, ciflow/mobile 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@makslevental
Copy link
Contributor Author

@makslevental has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

Hi @makslevental!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@github-actions
Copy link

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label May 21, 2022
@github-actions github-actions bot closed this Jun 20, 2022
@facebook-github-bot facebook-github-bot deleted the gh/makslevental/29/head branch July 21, 2022 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed oncall: jit Add this issue/PR to JIT oncall triage queue open source Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants