Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor improvements in Orochi 2.00 for Zen4 Raphael APU support and other suggestions.. #83

Open
oscarbg opened this issue Apr 5, 2024 · 0 comments

Comments

@oscarbg
Copy link

oscarbg commented Apr 5, 2024

Hi,
first, congrats on getting Orochi 2.0 release out!
comments:

  1. tested your new Orochi 2.0 release on Zen4 Raphael iGPU and get some errors..

1a)first RadixSort test fails on Zen4 due to number_of_blocks becoming 0 due to APU having only 2 compute units..
fix (ParallelPrimitives/RadixSort.cpp):

258c258,259
< 		number_of_blocks = ( number_of_blocks / base ) * base;
---
> 		number_of_blocks = ( number_of_blocks / base ) * base;
> 		if( number_of_blocks == 0 ) number_of_blocks = 4;

1b)VulkanComputeSimple checking of pciBusID is incorrect for APUs as Vulkan reports 0 and ROCM 0x6b or viceversa.. I fixed it main.cpp with:

< 			if (physicalDevicePCIBusInfoProperties.pciDomain == props.pciDomainID &&
< 				physicalDevicePCIBusInfoProperties.pciBus == props.pciBusID &&
< 				physicalDevicePCIBusInfoProperties.pciDevice == props.pciDeviceID) {
< 				physicalDeviceIndex = i;
< 				break;
---
> 			if( physicalDevicePCIBusInfoProperties.pciDomain == props.pciDomainID &&
> 				// physicalDevicePCIBusInfoProperties.pciBus == props.pciBusID &&
> 				physicalDevicePCIBusInfoProperties.pciDevice == props.pciDeviceID )
> 			{
> 				if( physicalDevicePCIBusInfoProperties.pciBus > 16 ) // integrated!!
> 				{
> 					if( props.pciBusID == 0 )
> 					{
> 						physicalDeviceIndex = i;
> 						break;
> 					}
> 				}
> 				else
> 				{
> 					if( physicalDevicePCIBusInfoProperties.pciBus == props.pciBusID )
> 					{
> 						physicalDeviceIndex = i;
> 						break;
> 					}
> 				}
  1. can you fix running samples OK by default after a premake5 build?..
    I mean running executables directly from Orochi-2.00/dist/bin/Debug or Orochi-2.00/dist/bin/Release folders..
    as they seem to try to expect to find kernels or sample data like textures in "../" folder so basically you have to options:

2a) copy or move Debug or Release folder in Orochi-2.0/ folder (out of dist/bin) so samples find required data or

2b) recreate/copy needed files from Orochi-2.0 folder to Orochi-2.00/dist/bin like for example copying:

Orochi-2.00/Test/Texture/texture_test_kernel.hpp
to:
Orochi-2.00/dist/bin/Test/Texture/texture_test_kernel.hpp

3)add Linux support to VulkanComputeSimple RadixSort (Orochi-2.00/Test/VulkanComputeSimple and Orochi-2.00/Test/RadixSort)..
in fact support it's there.. only minor fixes as I added and is very simple..
for Vulkan premake5.lua:

--    buildoptions { "/wd4244" }
      buildoptions { "--std=c++14" }
--      links{ "Pop" }
--      links{ "kernel32", "user32", "gdi32", "winspool", "comdlg32", "advapi32", "shell32", "ole32", "oleaut32", "uuid", "odbc32", "odbccp32", "version" }

  1. finally would be nice if WMMA sample get added NV support.. seems AMD only ATM..
    I started with adding to the kernel "#ifdef CUDA_ARCH" code:
#ifdef __CUDA_ARCH__
#include <mma.h>
//https://developer.nvidia.com/blog/programming-tensor-cores-cuda-9/
using namespace nvcuda;
...
wmma::fragment<wmma::matrix_a, 16, 16, 16, half, wmma::col_major> a_frag2;
	wmma::fragment<wmma::matrix_b, 16, 16, 16, half, wmma::col_major> b_frag2;
	wmma::fragment<wmma::accumulator, 16, 16, 16, float> acc_frag2;
	wmma::fragment<wmma::accumulator, 16, 16, 16, float> c_frag2;
..
#ifdef __CUDA_ARCH__
	nvcuda::wmma::mma_sync( c_frag2, a_frag2, b_frag2, c_frag2 );
#else
	c_frag = __builtin_amdgcn_wmma_f16_16x16x16_f16_w32( a_frag, b_frag, c_frag, false );
#endif

#endif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant