You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello @zjin-lcf
I am investigating the performance impact of using the texture cache on Nvidia GPU's in SYCL and CUDA. I have noticed that some benchmarks (clink, convolution1D, convolutionsSeparable, page-rank, swish, all-pairs-distance) have had the explicit ldg instructions already added to them. To that I have a question of how was this selection of benchmarks chosen? Have you tested any other benchmarks and just not noticed any performance benefits of using it there? If so which other benchmarks did you test, and which GPU architectures have you tested them with?
The text was updated successfully, but these errors were encountered:
Yes, I tried to add "ldg" for a few SYCL programs that show performance drop on a V100 GPU. I didn't notice performance benefits using "ldg" for other SYCL programs. However, I have not evaluated the benefits of "ldg" for all benchmarks. I will add "ldg" for other SYCL programs when they benefit from it. If compiler analysis can determine that a program can benefit from "ldg", please let me know. Thanks.
Hello @zjin-lcf
I am investigating the performance impact of using the texture cache on Nvidia GPU's in SYCL and CUDA. I have noticed that some benchmarks (clink, convolution1D, convolutionsSeparable, page-rank, swish, all-pairs-distance) have had the explicit ldg instructions already added to them. To that I have a question of how was this selection of benchmarks chosen? Have you tested any other benchmarks and just not noticed any performance benefits of using it there? If so which other benchmarks did you test, and which GPU architectures have you tested them with?
The text was updated successfully, but these errors were encountered: