Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for processing strings/byte arrays within the gpu #25

Open
omac777 opened this issue Oct 25, 2019 · 8 comments
Open

support for processing strings/byte arrays within the gpu #25

omac777 opened this issue Oct 25, 2019 · 8 comments

Comments

@omac777
Copy link

omac777 commented Oct 25, 2019

Are there any plans to provide string or byte array handling capability from within emu kernels?
I believe it would be feasible if there was more support for integer types within emu.
I understand both cuda/opencl provide integer support within kernels.

Thank you for listening.

@calebwin
Copy link
Owner

The main thing stopping string handling is supporting bytes/ints/chars, right? This is blocked right now until we figure out better type inference.

I'm not sure about regex'ing but slicing will probably never be supported in the Rust subset because it is practically impossible on GPUs. GPUs have different sections of memory - "global", "local", and "private." "global" is the most expensive to allocate and place data into and out of. "private" is the cheapest but is only registers. Registers are like slots but can only hold primitive types like int or float. So a slice would have to be placed in "global" memory which would be really, really inefficient.

And I don't think slices are really necessary. Once you have a slice you want to do one of 2 things.

  • Index into the slice
  • Iterate over the slice

The first is already possible (if you index directly into the data you are trying to take a slice of) and once we support for loops inside of the "kernel"/for loop body, the second will be possible too.

Sorting could be implemented by hand as parallel bubble sort once we have support for if statements, modulo operator, variables (to add support we need to work on modifying this traversing code and ensure that the type-safety is not messed up. Sorting like this is could also maybe be implemented at some point.

let mut x = vec![0.0; 1000];
// ...
// ...store random numbers in x...
// ...
gpu_do!(load(x));
gpu_do!(launch());
x.sort();

@calebwin
Copy link
Owner

calebwin commented Oct 25, 2019

You can read the linked comment above for figuring out type inference. But basically the challenge is that for OpenACC that does what Emu does but for C/C++, they have stuff like this.

int z = x + y;

And they know the type is int so they can produce the OpenCL, int z = x + y and maintain type safety.

But we have Rust code like this.

let z = x + y;

And somehow, we need to figure out that this z is an int.

@calebwin
Copy link
Owner

calebwin commented Oct 25, 2019

Wait, actually, sorting shouldn't be built in. It should be defined in some separate crate GPU-accelerated sorting.

let mut x = vec![0.0; 1000];
// ...
// ...store random numbers in x...
// ...
gpu_do!(load(x));
x = sorting::sort(x);

Regex'ing and slicing also won't be built in. All of these should be implemented manually. However, for these to be implement-able, the above things do still need to be supported. (variables, if/else, type inference, etc.)

@omac777
Copy link
Author

omac777 commented Oct 28, 2019

I did read a bit more into the "CUDA C PROGRAMMING GUIDE PG-02829-001_v10.1 | August 2019".

In theory, the emu vectors could contain any of these types:

char1, uchar1 1
char2, uchar2 2
char3, uchar3 1
char4, uchar4 4
short1, ushort1 2
short2, ushort2 4
short3, ushort3 2
short4, ushort4 8
int1, uint1 4
int2, uint2 8
int3, uint3 4
int4, uint4 16
long1, ulong1 4 if sizeof(long) is equal to sizeof(int) 8, otherwise
long2, ulong2 8 if sizeof(long) is equal to sizeof(int), 16, otherwise
long3, ulong3 4 if sizeof(long) is equal to sizeof(int), 8, otherwise
long4, ulong4 16
longlong1, ulonglong1 8
longlong2, ulonglong2 16
longlong3, ulonglong3 8
longlong4, ulonglong4 16
float1 4
float2 8
float3 4
float4 16
double1 8
double2 16
double3 8
double4 6

The "if" conditional is supported within cuda kernels.
It's also supported within OpenACC.
https://www.openacc.org/sites/default/files/inline-files/API%20Guide%202.7.pdf

Although outside of the scope of your emu, it could be interesting to see support for GPUDirect RDMA within emu also:
https://www.sc-asia.org/2018/wp-content/uploads/2018/03/1_1500_Ido_Shamay.pdf
https://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf

@omac777
Copy link
Author

omac777 commented Oct 28, 2019

Wait, actually, sorting shouldn't be built in. It should be defined in some separate crate GPU-accelerated sorting.

let mut x = vec![0.0; 1000];
// ...
// ...store random numbers in x...
// ...
gpu_do!(load(x));
x = sorting::sort(x);

Regex'ing and slicing also won't be built in. All of these should be implemented manually. However, for these to be implement-able, the above things do still need to be supported. (variables, if/else, type inference, etc.)

Actually my intent was not to mutate the input request vector itself. I would be passing along a second response vector itself which would contain a different structure of vector, but with similar type something like 8-bit unsigned integer "u8" also known as a byte which is what you would find within your typical memory location or file. If all goes well the actual response reference passed in is a direct mapping to an intended response file which could be local or remote.

@calebwin
Copy link
Owner

In theory, the emu vectors could contain any of these types:

char1, uchar1 1
char2, uchar2 2
char3, uchar3 1
char4, uchar4 4
short1, ushort1 2
short2, ushort2 4
short3, ushort3 2
short4, ushort4 8
int1, uint1 4
int2, uint2 8
int3, uint3 4
int4, uint4 16
lovng1, ulong1 4 if sizeof(long) is equal to sizeof(int) 8, otherwise
long2, ulong2 8 if sizeof(long) is equal to sizeof(int), 16, otherwise
long3, ulong3 4 if sizeof(long) is equal to sizeof(int), 8, otherwise
long4, ulong4 16
longlong1, ulonglong1 8
longlong2, ulonglong2 16
longlong3, ulonglong3 8
longlong4, ulonglong4 16
float1 4
float2 8
float3 4
float4 16
double1 8
double2 16
double3 8
double4 6

Yes. While f32 is what GPUs are optimized for, other primitive types can have support added for them easily. The reason why I haven't just gone ahead and added them is because I'm trying to think carefully about types, type safety.

The "if" conditional is supported within cuda kernels.
It's also supported within OpenACC.
https://www.openacc.org/sites/default/files/inline-files/API%20Guide%202.7.pdf

I also haven't added if statements because that would require adding bool to the type system. And I'm not entirely convinced that just adding these types can be done without breaking type safety guarantee. I'm certain there is a way to do it, I just don't know if the "easy way" is the right way or if there is a harder way that will guarantee type safety with even more certainty.

Actually my intent was not to mutate the input request vector itself. I would be passing along a second response vector itself which would contain a different structure of vector, but with similar type something like 8-bit unsigned integer "u8" also known as a byte which is what you would find within your typical memory location or file. If all goes well the actual response reference passed in is a direct mapping to an intended response file which could be local or remote.

You can create a separate vector and mutate that instead. Emu lets you do that. The only big complication is adding the u8 type. Again, it's probably safe to add, but I'm not yet convinced you can do it easily.

@McSpidey
Copy link

Is there any way to do value clamping without supporting if or bool?

@calebwin
Copy link
Owner

calebwin commented Jan 18, 2020

Not at the moment. I had plans for a rewrite system that would replace expressions with appropriate builtin functions in OpenCL (so an if statement would be replaced with a clamp). Nothing has materialized yet, unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants