Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeking advice on implementing certain operators on GPU #446

Open
zjzjwang opened this issue Apr 17, 2024 · 0 comments
Open

Seeking advice on implementing certain operators on GPU #446

zjzjwang opened this issue Apr 17, 2024 · 0 comments

Comments

@zjzjwang
Copy link

zjzjwang commented Apr 17, 2024

I am considering implementing some operators of bottleneck on the GPU using libraries such as pytorch, cupy, and perhaps CUDA or triton.

Specifically, for the "move" series of operators, when working with large data sizes, using pytorch (on GPU) can significantly accelerate the process. (I implemented a sliding window using unfold.)

However, I've encountered some difficulties while trying to implement rankdata, nanrankdata, and push operators. The performance is not as good as expected (In fact, it is much slower than the implementation in bottleneck.), and I suspect that the for-loops within these implementations might be causing the slowdown.

Do you have any suggestions or recommendations on how to efficiently implement these operators on the GPU?

2 / 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant