Support GPU CI #389

xuxinyi389 · 2024-05-06T13:16:26Z

PR Docs

支持gpu ci，修复与cuda相关的单测。

PR APIs

部分需要补充的Note：
1.torch.device目前在paddle中没有直接映射，但可以根据实际情况映射成CPUPlace或是CUDAPlace。当前paddle中接受device参数的api，对于device参数类型要求并不一致，接受str、int、CPUPlace、CUDAPlace中的一种或几种。之前对torch.device统一被映射为str，该策略对于current_stream等api转换并不适用。当前对和device相关的策略进行了增强。paddle对于device参数的规范也应做系统性的统一。
2.和cuda_memory相关的api由于框架间内存分配机制不一样，比较大小无意义，重写了compare
3.linalg_lstsq会根据不同情况判断是否计算rank，对相应情形修复
4.nn.module的精度需要做进一步放宽
5.grid_sample使用GPU kernel时会强制stop-gradient为false，对单测进行修复
6.test_distributed_all_gather_object.py移动到分布式单测目录下测试。
7.CI上的v100环境torch.cuda.is_bf16_supported()的结果为True，但本地为False，故使用torch.cuda.get_device_properties替代。
8.更多修复case详见pr

paddle-bot · 2024-05-06T13:16:31Z

Thanks for your contribution!

zhwesky2010

matcher的分支都注释一下对应的解决情形
就维持两条单测流水线吧，原来的CPU流水线继续留着，耗费资源也不大，新的GPU流水线调通为止
只能在GPU跑的，看能否优化下更好的写法，目前这种result=None是比较trick的写法

paconvert/api_matcher.py

zhwesky2010 · 2024-05-08T11:20:51Z

paconvert/api_matcher.py

+            kwargs["stream"] = "paddle.device.Stream()"
+        API_TEMPLATE = textwrap.dedent(
+            """
+            {}(stream = paddle.device.Stream(stream_base={}) if isinstance ({},(paddle.base.core.CUDAStream, paddle.base.core.CustomDeviceStream)) else {})


应该先将 torch.cuda.Stream 转成 paddle.device.cuda.Stream，是里面这个torch.cuda.Stream的Matcher有问题吗，才会将问题遗留到上一层的API来解决

torch.cuda.Stream 确实被转换成了 paddle.device.cuda.Stream。这个和他们两没关系。这里这样做的原因是paddle.device.set_stream(stream=)要求stream对象具有stream_base属性，paddle.base.core.CUDAStream, paddle.base.core.CustomDeviceStream没有stream_base属性，因此stream参数必须是paddle.device.Stream。

torch.cuda.Stream 确实被转换成了 paddle.device.cuda.Stream。这个和他们两没关系。这里这样做的原因是paddle.device.set_stream(stream=)要求stream对象具有stream_base属性，paddle.base.core.CUDAStream, paddle.base.core.CustomDeviceStream没有stream_base属性，因此stream参数必须是paddle.device.Stream。

为何这里会传入 paddle.base.core.CUDAStream，这个是一个很底层的概念，要从源头上解决问题

paconvert/api_matcher.py

tests/test_cuda_set_device.py

tests/test_cuda_set_stream.py

tests/test_cuda_set_device.py

zhwesky2010 · 2024-05-08T11:50:27Z

tests/test_device.py

@@ -29,7 +29,17 @@ def compare(
        rtol=1.0e-6,
        atol=0.0,
    ):
-        assert str(pytorch_result).replace("cuda", "gpu") == str(paddle_result)
+        if isinstance(paddle_result, bool):


这个有这么多可能性吗

是的。这个问题主要是因为paddle中并没有和torch.device完全等价的api。不同情况下可能要和paddle.CUDAPlace、int、str相对应，因此在测试相等的时候分支就较多。device的问题也记录到了登记表中

那你直接判断

if isinstance(paddle_result, str if isinstance(paddle_result, paddle.CPUPlace if isinstance(paddle_result, CUDAPlace

这里的判断分支看不太明白

zhwesky2010 · 2024-05-08T11:51:16Z

tests/test_nn_functional_grid_sample.py

@@ -31,7 +31,7 @@ def test_case_1():
        result = F.grid_sample(x, grid)
        """
    )
-    obj.run(pytorch_code, ["result"], check_value=False)
+    obj.run(pytorch_code, ["result"], check_value=False, check_stop_gradient=False)


这个是API有bug吗

是实现差异，不是bug。paddle中检测到使用cudnn实现时，会强制将stop_gradient设置为False。

是实现差异，不是bug。paddle中检测到使用cudnn实现时，会强制将stop_gradient设置为False。

问题是这个设置为False的实现是合理的吗？凡是有差异的地方，先要从API功能语义上推敲是否合理，这里说的 实现差异 需要有一个更合理的定义

paddle和pytorch的grid_sample的GPU实现实现都是调用cudnn的接口，cudnn内部的反向实现会同时计算所有输入的梯度，无法mask只计算部分输入的梯度。因此paddle选择在检测到使用cudnn实现时， PaddlePaddle/Paddle@7de2db4 同时将输入x，grid的stop_gradient设置为False。 pytorch在grid_sample反向实现时也注明了反向会同时计算所有输入的梯度，但其没选择在前向做手工设置
https://github.com/pytorch/pytorch/blob/2ed17e0b1ec0ca2a5dea41f81a63f582f2792d22/aten/src/ATen/native/cudnn/GridSampler.cpp#L125 。paddle出于cudnn该实现的特点，将stop_gradient 人为设置为False也是合理的。paddle和torch在这一点的差异没有对错和优劣之分。

tests/test_cuda_set_device.py

paconvert/api_matcher.py

zhwesky2010 · 2024-05-10T07:53:09Z

paconvert/api_matcher.py

+            kwargs["stream"] = "paddle.device.Stream()"
+        API_TEMPLATE = textwrap.dedent(
+            """
+            {}(stream = paddle.device.Stream(stream_base={}) if isinstance ({},(paddle.base.core.CUDAStream, paddle.base.core.CustomDeviceStream)) else {})


torch.cuda.Stream 确实被转换成了 paddle.device.cuda.Stream。这个和他们两没关系。这里这样做的原因是paddle.device.set_stream(stream=)要求stream对象具有stream_base属性，paddle.base.core.CUDAStream, paddle.base.core.CustomDeviceStream没有stream_base属性，因此stream参数必须是paddle.device.Stream。

为何这里会传入 paddle.base.core.CUDAStream，这个是一个很底层的概念，要从源头上解决问题

zhwesky2010 · 2024-05-10T07:53:44Z

paconvert/api_matcher.py

-        new_kwargs.update(kwargs)
-        return GenericMatcher.generate_code(self, new_kwargs)
+        if "device" in kwargs:
+            if ":" in kwargs["device"] and "if" not in kwargs["device"]:


主要是这种情况：tensor.cuda(device="cuda:0" if cond else "cuda;1")

应该是 if "cuda:" in ?

paconvert/api_matcher.py

zhwesky2010

@skipif这种写法推广开来，然后将if paddle.is_compiled_with_cuda，result=None这些历史trick的写法都删掉

全面适配新写法，不要有了新写法，还留着老的trick

zhwesky2010 · 2024-05-10T10:45:57Z

paconvert/api_matcher.py

+        if "device" in kwargs:
+            if "cuda:" in kwargs["device"] and "if" not in kwargs["device"]:
+                # case1: tensor.cuda(device="cuda:0")
+                new_kwargs["device"] = int(


直接int("""1""")好像也可以

这个实际参数是“”“ ‘1’ ”“”“ ,int(“”“ ‘1’ ”“”“)报错

paconvert/api_matcher.py

tests/test_cuda_set_stream.py

zhwesky2010 · 2024-05-10T11:07:21Z

tests/test_device.py

@@ -29,7 +29,17 @@ def compare(
        rtol=1.0e-6,
        atol=0.0,
    ):
-        assert str(pytorch_result).replace("cuda", "gpu") == str(paddle_result)
+        if isinstance(paddle_result, bool):


那你直接判断

if isinstance(paddle_result, str if isinstance(paddle_result, paddle.CPUPlace if isinstance(paddle_result, CUDAPlace

这里的判断分支看不太明白

zhwesky2010 · 2024-05-10T11:07:42Z

tests/test_hub_download_url_to_file.py

@@ -36,7 +36,8 @@ def compare(
 obj = DownloadAPIBase("torch.hub.download_url_to_file")


-def test_case_1():
+# NOTE: Due to network limits, only test case 3.


是下载速度慢还是？

是的，下载速度很慢，有时候三个case可能会下载40min以上

这个有没办法通过NO_PROXY、PROXY优化下，现在这个会导致成瓶颈吗

zhwesky2010 · 2024-05-10T11:08:15Z

tests/test_linalg_lstsq.py

@@ -136,6 +142,9 @@ def test_case_7():
        import torch
        x = torch.tensor([[10, 2, 3], [3, 10, 5], [5, 6, 12.]])
        y = torch.tensor([[4, 2, 9], [2, 0, 3], [2, 5, 3.]])
+        if torch.cuda.is_available():


这个要拷贝到cuda上原因是什么bug吗

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/linalg/lstsq_cn.html 返回值中：rank 指 x 中矩阵的秩，形状为 (*) 的 Tensor；当 driver 为 'gelsy', 'gelsd', 'gelss' 时，该值会被计算，否则返回空 Tensor。输入参数中”driver “：CPU 下该参数的合法值为 'gels'，'gelsy' (默认)，'gelsd'，'gelss'；CUDA 下该参数的合法值为 'gels' (默认) 。因此在GPU计算模式下，且未指定driver参数时，返回值为空。torch中如果没有设置默认设备会默认使用cpu设备，从而使用CPU模式计算，此时会计算rank，此时就会导致diff，单测失败

zhwesky2010 · 2024-05-10T11:14:20Z

paconvert/api_matcher.py

+        # So kwargs["stream"] must be paddle.device.Stream, not paddle.base.core.CUDAStream.
+        API_TEMPLATE = textwrap.dedent(
+            """
+            {}(stream = paddle.device.Stream(stream_base={}) if isinstance ({},(paddle.base.core.CUDAStream, paddle.base.core.CustomDeviceStream)) else {})


看起来就是 torch.cuda.Strem 没有转成 paddle.device.Stream，正常的 paddle.device.Stream 必然可以set_stream；从源头上修复

torch.cuda.Stream对应的paddle的API已修改为paddle.device.Stream。之前对应paddle.device.cuda.Stream是将要废弃的API，在部分使用场景没做适配，导致了本次转换场景的不兼容。

zhwesky2010

目前CI好像没通过，字符串写法尽可能写长度短一点

paconvert/api_matcher.py

zhwesky2010 · 2024-05-22T04:18:22Z

paconvert/api_matcher.py

+                # case 5: device = 0 if cond else 1
+                kwargs[
+                    "device"
+                ] = f'"gpu:"+str({kwargs["device"]}) if isinstance({kwargs["device"]}, int) else str({kwargs["device"]}).replace("cuda", "gpu")'


换个写法：
f'"gpu:{kwargs["device"]}" if isinstance({kwargs["device"]}, int) else "{kwargs["device"]}".replace("cuda", "gpu")'

对于情形：num=2 torch.cuda.Stream(device=num) 这种写法转换的结果 paddle.device.Stream(device='gpu:num' if isinstance(num, int) else 'num'.replace('cuda', 'gpu'), priority=-1 + 2) 是错误的。应该转换为paddle.device.Stream(device='gpu:' + str(num) if isinstance(num,int) else str(num).replace('cuda', 'gpu'), priority=-1 + 2)

paconvert/api_matcher.py

tests/test_Tensor_cuda.py

tests/test_cuda_set_device.py

tests/test_cuda_set_stream.py

zhwesky2010 · 2024-05-22T08:13:41Z

tests/test_hub_download_url_to_file.py

@@ -36,7 +36,8 @@ def compare(
 obj = DownloadAPIBase("torch.hub.download_url_to_file")


-def test_case_1():
+# NOTE: Due to network limits, only test case 3.


这个有没办法通过NO_PROXY、PROXY优化下，现在这个会导致成瓶颈吗

zhwesky2010 · 2024-05-22T08:20:10Z

@xuxinyi389 需要把除了test_median的单测都修复好，包括CPU和GPU

xuxinyi389 · 2024-05-22T18:20:46Z

单测均已修复，代理已优化，tests/test_hub_download_url_to_file.py 相关单测已经解禁。

zhwesky2010 · 2024-05-23T03:46:05Z

@xuxinyi389 后面优化下GPU CI的时间，目前时间太长了，比如安装Pytorch就花了10多min

poolish

e707fd6

poolish

3ec6598

xuxinyi389 marked this pull request as draft May 7, 2024 06:39

xuxinyi389 marked this pull request as ready for review May 7, 2024 06:39

xuxinyi389 marked this pull request as draft May 7, 2024 06:43

xuxinyi389 marked this pull request as ready for review May 7, 2024 06:43

xuxinyi389 marked this pull request as draft May 7, 2024 06:59

xuxinyi389 marked this pull request as ready for review May 7, 2024 06:59

xuxinyi389 added 6 commits May 7, 2024 07:36

fix_scripts

5595d5f

fix_linalg_lstsq

78cc693

fix-coverage

2eff65d

fix_script

85cea77

fix_tensor_cuda

9377dd6

fix-coverage

1273801

xuxinyi389 changed the title ~~poolish~~ Support GPU CI May 7, 2024

xuxinyi389 and others added 2 commits May 8, 2024 06:30

fix_scripts & fix amp_cuda

1abed2a

fix_scripts

8decd0f

zhwesky2010 reviewed May 8, 2024

View reviewed changes

xuxinyi389 and others added 3 commits May 9, 2024 06:54

fix_bugs

ac2910a

add_decorator

9a49680

fix cpu test

4ea081f

zhwesky2010 reviewed May 10, 2024

View reviewed changes

fix_some_errors

aa5780b

zhwesky2010 reviewed May 10, 2024

View reviewed changes

xuxinyi389 and others added 4 commits May 17, 2024 09:14

fix_all

91729e3

fix

5536262

fix

8f6c261

fix

3bfae0f

zhwesky2010 reviewed May 22, 2024

View reviewed changes

fix typo

4f31b47

zhwesky2010 reviewed May 22, 2024

View reviewed changes

xuxinyi389 and others added 4 commits May 22, 2024 09:11

fix

9fc993b

fix_typo

a7555f0

fix_style

3f761f0

add_test

9db35aa

zhwesky2010 approved these changes May 23, 2024

View reviewed changes

zhwesky2010 merged commit 834a688 into PaddlePaddle:master May 23, 2024
6 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support GPU CI #389

Support GPU CI #389

xuxinyi389 commented May 6, 2024 •

edited

paddle-bot bot commented May 6, 2024

zhwesky2010 left a comment •

edited

zhwesky2010 May 8, 2024

xuxinyi389 May 9, 2024 •

edited

zhwesky2010 May 10, 2024

zhwesky2010 May 8, 2024

xuxinyi389 May 9, 2024 •

edited

zhwesky2010 May 10, 2024

xuxinyi389 May 17, 2024

zhwesky2010 May 8, 2024

xuxinyi389 May 9, 2024

zhwesky2010 May 10, 2024 •

edited

xuxinyi389 May 10, 2024 •

edited

zhwesky2010 May 10, 2024

zhwesky2010 May 10, 2024

zhwesky2010 left a comment

zhwesky2010 May 10, 2024

xuxinyi389 May 17, 2024

zhwesky2010 May 10, 2024

zhwesky2010 May 10, 2024

xuxinyi389 May 17, 2024

zhwesky2010 May 22, 2024

zhwesky2010 May 10, 2024

xuxinyi389 May 17, 2024

zhwesky2010 May 10, 2024 •

edited

xuxinyi389 May 17, 2024

zhwesky2010 left a comment •

edited

zhwesky2010 May 22, 2024

xuxinyi389 May 22, 2024

zhwesky2010 May 22, 2024

zhwesky2010 commented May 22, 2024 •

edited

xuxinyi389 commented May 22, 2024

zhwesky2010 commented May 23, 2024

Support GPU CI #389

Support GPU CI #389

Conversation

xuxinyi389 commented May 6, 2024 • edited

PR Docs

PR APIs

paddle-bot bot commented May 6, 2024

zhwesky2010 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xuxinyi389 May 9, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xuxinyi389 May 9, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhwesky2010 May 10, 2024 • edited

Choose a reason for hiding this comment

xuxinyi389 May 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhwesky2010 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhwesky2010 May 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhwesky2010 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhwesky2010 commented May 22, 2024 • edited

xuxinyi389 commented May 22, 2024

zhwesky2010 commented May 23, 2024

xuxinyi389 commented May 6, 2024 •

edited

zhwesky2010 left a comment •

edited

xuxinyi389 May 9, 2024 •

edited

xuxinyi389 May 9, 2024 •

edited

zhwesky2010 May 10, 2024 •

edited

xuxinyi389 May 10, 2024 •

edited

zhwesky2010 May 10, 2024 •

edited

zhwesky2010 left a comment •

edited

zhwesky2010 commented May 22, 2024 •

edited