Skip to content

Commit

Permalink
Update base for Update on "[NVFuser] Upstream push 0907"
Browse files Browse the repository at this point in the history
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange, 
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)

[ghstack-poisoned]
  • Loading branch information
jjsjann123 committed Sep 19, 2022
2 parents 6c113ba + 9024015 commit bfb7b15
Show file tree
Hide file tree
Showing 683 changed files with 23,353 additions and 17,635 deletions.
2 changes: 1 addition & 1 deletion .circleci/docker/build.sh
Expand Up @@ -379,7 +379,7 @@ docker build \
--build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \
--build-arg "KATEX=${KATEX:-}" \
--build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx900;gfx906}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx906}" \
--build-arg "IMAGE_NAME=${IMAGE_NAME}" \
--build-arg "UCX_COMMIT=${UCX_COMMIT}" \
--build-arg "UCC_COMMIT=${UCC_COMMIT}" \
Expand Down
8 changes: 7 additions & 1 deletion .circleci/docker/common/install_cudnn.sh
Expand Up @@ -4,7 +4,13 @@ if [[ ${CUDNN_VERSION} == 8 ]]; then
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn && cd tmp_cudnn
CUDNN_NAME="cudnn-linux-x86_64-8.3.2.44_cuda11.5-archive"
curl -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.3.2/local_installers/11.5/${CUDNN_NAME}.tar.xz
if [[ ${CUDA_VERSION:0:4} == "11.7" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-8.5.0.96_cuda11-archive"
curl -OLs https://ossci-linux.s3.amazonaws.com/${CUDNN_NAME}.tar.xz
else
curl -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.3.2/local_installers/11.5/${CUDNN_NAME}.tar.xz
fi

tar xf ${CUDNN_NAME}.tar.xz
cp -a ${CUDNN_NAME}/include/* /usr/include/
cp -a ${CUDNN_NAME}/include/* /usr/local/cuda/include/
Expand Down
2 changes: 1 addition & 1 deletion .circleci/docker/common/install_ucc.sh
Expand Up @@ -36,7 +36,7 @@ function install_ucc() {
git submodule update --init --recursive

./autogen.sh
./configure --prefix=$UCC_HOME --with-ucx=$UCX_HOME --with-nccl=no --with-cuda=$with_cuda
./configure --prefix=$UCC_HOME --with-ucx=$UCX_HOME --with-cuda=$with_cuda
time make -j
sudo make install

Expand Down
1 change: 1 addition & 0 deletions .circleci/docker/ubuntu-cuda/Dockerfile
Expand Up @@ -118,6 +118,7 @@ COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm

# Install CUDNN
ARG CUDNN_VERSION
ARG CUDA_VERSION
COPY ./common/install_cudnn.sh install_cudnn.sh
RUN if [ "${CUDNN_VERSION}" -eq 8 ]; then bash install_cudnn.sh; fi
RUN rm install_cudnn.sh
Expand Down
2 changes: 1 addition & 1 deletion .circleci/scripts/windows_cudnn_install.sh
Expand Up @@ -18,7 +18,7 @@ case ${CUDA_VERSION} in
;;
11.7)
# Use cudnn8.3 with hard-coded cuda11.5 version
cudnn_file_name="cudnn-windows-x86_64-8.3.2.44_cuda11.5-archive"
cudnn_file_name="cudnn-windows-x86_64-8.5.0.96_cuda11-archive"
;;
*)
echo "CUDA_VERSION: ${CUDA_VERSION} not supported yet"
Expand Down
2 changes: 1 addition & 1 deletion .github/ci_commit_pins/torchdynamo.txt
@@ -1 +1 @@
fe3173f7e6c804e6330ac187ea8e4101f45ff9a2
41c44bc1d080d6cf063419a4166732b983b84eef
2 changes: 1 addition & 1 deletion .github/ci_commit_pins/vision.txt
@@ -1 +1 @@
84dcf695d64c15f8a0be845ac65901bdde845429
a4f53308b2d0f1aa9191686e326f45c26053f686
2 changes: 1 addition & 1 deletion .github/ci_commit_pins/xla.txt
@@ -1 +1 @@
b8688ee3c03120a15978844db6c4fa73eceb6594
4dec902617aea14ca4013e402eea56e92701cac9
4 changes: 4 additions & 0 deletions .github/merge_rules.yaml
Expand Up @@ -3,6 +3,7 @@
- .jenkins/caffe2/*
- aten/src/ATen/core/interned_strings.h
- docs/source/onnx.rst
- docs/source/onnx*
- docs/source/scripts/onnx/**
- scripts/onnx/**
- test/jit/test_export_modes.py
Expand All @@ -15,6 +16,8 @@
- torch/csrc/jit/serialization/onnx.*
- torch/csrc/onnx/**
- torch/onnx/**
- third_party/onnx
- caffe2/python/onnx/**
approved_by:
- BowenBao
- abock
Expand Down Expand Up @@ -323,6 +326,7 @@
- '*'
approved_by:
- pytorch/metamates
- mruberry
mandatory_checks_name:
- Facebook CLA Check
- Lint
Expand Down
69 changes: 0 additions & 69 deletions .github/scale-config.yml

This file was deleted.

2 changes: 1 addition & 1 deletion .github/scripts/generate_binary_build_matrix.py
Expand Up @@ -13,7 +13,7 @@
from typing import Dict, List, Tuple, Optional


CUDA_ARCHES = ["10.2", "11.3", "11.6", "11.7"]
CUDA_ARCHES = ["10.2", "11.6", "11.7"]


ROCM_ARCHES = ["5.1.1", "5.2"]
Expand Down
9 changes: 0 additions & 9 deletions .github/scripts/generate_ci_workflows.py
Expand Up @@ -207,15 +207,6 @@ class OperatingSystem:
),
]
WINDOWS_BINARY_SMOKE_WORKFLOWS = [
BinaryBuildWorkflow(
os=OperatingSystem.WINDOWS,
package_type="wheel",
build_configs=generate_binary_build_matrix.generate_wheels_matrix(
OperatingSystem.WINDOWS,
arches=["11.3"],
python_versions=["3.7"]),
branches="master",
),
BinaryBuildWorkflow(
os=OperatingSystem.WINDOWS,
package_type="libtorch",
Expand Down
40 changes: 39 additions & 1 deletion .github/scripts/run_torchbench.py
Expand Up @@ -13,10 +13,12 @@
# 1. Does not reuse the build artifact in other CI workflows
# 2. CI jobs are serialized because there is only one worker
import os
import boto3 # type: ignore[import]
import git # type: ignore[import]
import pathlib
import argparse
import subprocess
from pathlib import Path

from typing import List, Tuple

Expand All @@ -31,6 +33,25 @@
direction: decrease
timeout: 720
tests:"""
S3_BUCKET = "ossci-metrics"
S3_PREFIX = "torchbench-pr-test"
S3_URL_BASE = f"https://{S3_BUCKET}.s3.amazonaws.com/"

class S3Client:
def __init__(self, bucket: str = S3_BUCKET, prefix: str = S3_PREFIX):
self.s3 = boto3.client('s3')
self.resource = boto3.resource('s3')
self.bucket = bucket
self.prefix = prefix

def upload_file(self, file_path: Path, filekey_prefix: str) -> None:
assert file_path.is_file(), f"Specified file path {file_path} does not exist or not file."
file_name = file_path.name
s3_key = f"{self.prefix}/{filekey_prefix}/{file_name}"
print(f"Uploading file {file_name} to S3 with key: {s3_key}")
self.s3.upload_file(str(file_path), self.bucket, s3_key)
# output the result URL
print(f"Uploaded the result file {file_name} to {S3_URL_BASE}{s3_key}")

def gen_abtest_config(control: str, treatment: str, models: List[str]) -> str:
d = {}
Expand Down Expand Up @@ -137,9 +158,21 @@ def run_userbenchmarks(pytorch_path: str, torchbench_path: str, base_sha: str, h
print(f"Running torchbench userbenchmark command: {command}")
subprocess.check_call(command, cwd=torchbench_path, env=env)

def process_upload_s3(result_dir: str) -> None:
# validate result directory
result_dir_path = Path(result_dir)
assert result_dir_path.exists(), f"Specified result directory {result_dir} doesn't exist."
# upload all files to S3 bucket oss-ci-metrics
files = [x for x in result_dir_path.iterdir() if x.is_file()]
# upload file to S3 bucket
s3_client: S3Client = S3Client()
filekey_prefix = result_dir_path.name
for f in files:
s3_client.upload_file(f, filekey_prefix)

if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Run TorchBench tests based on PR')
parser.add_argument('--pr-body', required=True, help="The file that contains body of a Pull Request")
parser.add_argument('--pr-body', help="The file that contains body of a Pull Request")

subparsers = parser.add_subparsers(dest='command')
# parser for setup the torchbench branch name env
Expand All @@ -151,6 +184,9 @@ def run_userbenchmarks(pytorch_path: str, torchbench_path: str, base_sha: str, h
run_parser.add_argument('--pr-head-sha', required=True, type=str, help="The Pull Request head hash")
run_parser.add_argument('--pytorch-path', required=True, type=str, help="Path to pytorch repository")
run_parser.add_argument('--torchbench-path', required=True, type=str, help="Path to TorchBench repository")
# parser to upload results to S3
upload_parser = subparsers.add_parser("upload-s3")
upload_parser.add_argument('--result-dir', required=True, type=str, help="Path to benchmark output")
args = parser.parse_args()

if args.command == 'set-torchbench-branch':
Expand Down Expand Up @@ -181,6 +217,8 @@ def run_userbenchmarks(pytorch_path: str, torchbench_path: str, base_sha: str, h
if not models and not userbenchmarks:
print("Can't parse valid models or userbenchmarks from the pr body. Quit.")
exit(-1)
elif args.command == 'upload-s3':
process_upload_s3(args.result_dir)
else:
print(f"The command {args.command} is not supported.")
exit(-1)
19 changes: 18 additions & 1 deletion .github/scripts/trymerge.py
Expand Up @@ -912,6 +912,8 @@ def merge_into(self, repo: GitRepo, *,

repo.push(self.default_branch(), dry_run)
if not dry_run:
if land_check_commit:
self.delete_land_time_check_branch(repo)
gh_add_labels(self.org, self.project, self.pr_num, ["merged"])

def merge_changes(self,
Expand Down Expand Up @@ -962,6 +964,11 @@ def create_land_time_check_branch(self,
repo.checkout(orig_branch)
return commit

def delete_land_time_check_branch(self,
repo: GitRepo) -> None:
land_check_branch = f'landchecks/{self.pr_num}'
repo._run_git('push', 'origin', '-d', land_check_branch)


class MandatoryChecksMissingError(Exception):
pass
Expand Down Expand Up @@ -1344,7 +1351,7 @@ def merge(pr_num: int, repo: GitRepo,
# here to stop the merge process right away
find_matching_merge_rule(pr, repo, skip_mandatory_checks=True)

if land_checks:
if land_checks and not dry_run:
land_check_commit = pr.create_land_time_check_branch(
repo,
'viable/strict',
Expand All @@ -1354,6 +1361,8 @@ def merge(pr_num: int, repo: GitRepo,

gh_post_pr_comment(org, project, pr.pr_num, explainer.get_merge_message(land_check_commit))
if (datetime.utcnow() - pr.last_pushed_at()).days > stale_pr_days:
if land_checks and not dry_run:
pr.delete_land_time_check_branch(repo)
raise RuntimeError("This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again.")

start_time = time.time()
Expand All @@ -1366,6 +1375,8 @@ def merge(pr_num: int, repo: GitRepo,
print(f"Attempting merge of https://github.com/{org}/{project}/pull/{pr_num} ({elapsed_time / 60} minutes elapsed)")
pr = GitHubPR(org, project, pr_num)
if initial_commit_sha != pr.last_commit()['oid']:
if land_checks and not dry_run:
pr.delete_land_time_check_branch(repo)
raise RuntimeError("New commits were pushed while merging. Please rerun the merge command.")
try:
find_matching_merge_rule(pr, repo)
Expand Down Expand Up @@ -1400,10 +1411,16 @@ def merge(pr_num: int, repo: GitRepo,
last_exception = str(ex)
print(f"Merge of https://github.com/{org}/{project}/pull/{pr_num} failed due to: {ex}. Retrying in 5 min")
time.sleep(5 * 60)
except RuntimeError:
if land_checks and not dry_run:
pr.delete_land_time_check_branch(repo)
raise
# Finally report timeout back
msg = f"Merged timed out after {timeout_minutes} minutes. Please contact the pytorch_dev_infra team."
msg += f"The last exception was: {last_exception}"
if not dry_run:
if land_checks:
pr.delete_land_time_check_branch(repo)
gh_add_labels(org, project, pr_num, ["land-failed"])
raise RuntimeError(msg)

Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/_linux-test.yml
Expand Up @@ -117,6 +117,7 @@ jobs:
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PR_BODY: ${{ github.event.pull_request.body }}
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
SCCACHE_S3_KEY_PREFIX: ${{ github.workflow }}
SHM_SIZE: ${{ contains(inputs.build-environment, 'cuda') && '2g' || '1g' }}
DOCKER_IMAGE: ${{ inputs.docker-image }}
XLA_CUDA: ${{ contains(inputs.build-environment, 'xla') && '0' || '' }}
Expand Down Expand Up @@ -171,6 +172,7 @@ jobs:
-e PR_LABELS \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e SCCACHE_S3_KEY_PREFIX \
-e XLA_CUDA \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
Expand Down

0 comments on commit bfb7b15

Please sign in to comment.