You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I run the code of oneDnn at i7-10700 @2.9GHZ, and gpu is hd630. The reorder of GPU is slower a lot than cpu,and the data type of both is f32.
Verbose log converter for cpu:
Performance differences are expected on different platforms. One thing to note though is that oneDNN verbose mode has non-trivial performance overhead, in particular on GPUs and cannot be reliably used to measure performance. You can use benchdnn in performance validation mode to get accurate performance measurements.
@vpirogov. I conducted time consumption tests based on the cpu and gpu for the reordering of convolutional src and weights respectively, the code is from example of primitime convolution.cpp. The result of cpu and gpu are respectively 986 microseconds and 999604 microseconds,gpu is many times slower than cpu. Is there a better way to improve the performance of gpu reoder?
@feixuedudiao First of all, it doesn't make sense to compare the performance of primitives on CPU and GPU without considering the GPU hardware capabilities and configurations.
For your case, if you insist on comparing them, please try performance testing mode of benchdnn for testing. Here is an example of command line to check the 32x32x1x1 reorder:
I tested this command line on a new laptop with a latest Intel integrated GPU hardware and it shows that the performance on GPU is better than that on CPU:
Avg. time on CPU: 0.00714332 ms
Avg. time on GPU: 0.00160363 ms
I run the code of oneDnn at i7-10700 @2.9GHZ, and gpu is hd630. The reorder of GPU is slower a lot than cpu,and the data type of both is f32.
Verbose log converter for cpu:
prim_kind shapes ncalls time(ms) overall% agg_ncalls(avg) agg_time(ms) agg_overall%
reorder 192x192x1x1 5 0.19 30.82 5.00 0.19 30.82
reorder 256x256x1x1 2 0.15 24.73 3.50 0.35 55.55
reorder 256x192x1x1 1 0.05 8.10 2.67 0.40 63.65
reorder 48x48x3x3 2 0.04 7.20 2.50 0.44 70.85
reorder 64x64x3x3 1 0.04 6.62 2.20 0.48 77.48
reorder 32x32x3x3 3 0.03 4.84 2.33 0.51 82.32
reorder 192x96x1x1 1 0.03 4.49 2.14 0.54 86.80
reorder 96x96x1x1 2 0.03 4.45 2.12 0.57 91.25
reorder 32x32x1x1 9 0.01 1.51 2.89 0.58 92.77
reorder 192x1x1x3x3 6 0.01 1.38 3.20 0.59 94.15
reorder 64x256x1x1 2 0.01 1.37 3.09 0.59 95.51
reorder 256x1x1x3x3 2 0.00 0.72 3.00 0.60 96.24
reorder 32x4x3x3 1 0.00 0.66 2.85 0.60 96.90
reorder 48x192x1x1 1 0.00 0.50 2.71 0.61 97.40
reorder 48x64x1x1 2 0.00 0.45 2.67 0.61 97.85
reorder 48x48x1x1 2 0.00 0.37 2.63 0.61 98.22
reorder 96x1x1x3x3 3 0.00 0.34 2.65 0.61 98.55
reorder 32x1x1x3x3 3 0.00 0.32 2.67 0.61 98.87
reorder 64x64x1x1 1 0.00 0.29 2.58 0.62 99.16
reorder 32x48x1x1 2 0.00 0.27 2.55 0.62 99.44
reorder 96x32x1x1 1 0.00 0.24 2.48 0.62 99.68
reorder 32x96x1x1 1 0.00 0.18 2.41 0.62 99.86
reorder 1x32x3x3 1 0.00 0.14 2.35 0.62 100.00
Verbose log converter for gpu:
prim_kind shapes ncalls time(ms) overall% agg_ncalls(avg) agg_time(ms) agg_overall%
reorder 32x32x1x1 9 3.57 16.12 9.00 3.57 16.12
reorder 32x32x3x3 3 1.60 7.22 6.00 5.17 23.34
reorder 192x192x1x1 5 1.42 6.41 5.67 6.60 29.75
reorder 192x1x1x3x3 6 1.37 6.17 5.75 7.96 35.91
reorder 32x1x1x3x3 3 1.26 5.70 5.20 9.23 41.61
reorder 96x1x1x3x3 3 1.17 5.26 4.83 10.39 46.87
reorder 32x48x1x1 2 0.97 4.37 4.43 11.36 51.24
reorder 64x256x1x1 2 0.91 4.09 4.12 12.27 55.33
reorder 48x64x1x1 2 0.87 3.94 3.89 13.14 59.27
reorder 256x256x1x1 2 0.87 3.90 3.70 14.01 63.17
reorder 256x1x1x3x3 2 0.86 3.88 3.55 14.87 67.05
reorder 48x48x3x3 2 0.86 3.88 3.42 15.73 70.93
reorder 48x48x1x1 2 0.81 3.66 3.31 16.54 74.59
reorder 48x192x1x1 1 0.75 3.37 3.14 17.29 77.96
reorder 32x4x3x3 1 0.71 3.20 3.00 18.00 81.16
reorder 96x96x1x1 2 0.70 3.17 2.94 18.70 84.33
reorder 96x32x1x1 1 0.58 2.62 2.82 19.28 86.95
reorder 192x96x1x1 1 0.57 2.56 2.72 19.85 89.50
reorder 1x32x3x3 1 0.55 2.47 2.63 20.39 91.98
reorder 64x64x3x3 1 0.51 2.32 2.55 20.91 94.30
reorder 32x96x1x1 1 0.47 2.11 2.48 21.38 96.41
reorder 256x192x1x1 1 0.42 1.90 2.41 21.80 98.31
reorder 64x64x1x1 1 0.37 1.69 2.35 22.17 100.00
init_cpu.log
init_gpu.log
Fei.
Thanks, best wish.
The text was updated successfully, but these errors were encountered: