Replies: 2 comments 6 replies
-
Further, If I implement certain functions as spark native methods when are optimized through spark, can those be taken to gpus for computation for further performance enhancements? |
Beta Was this translation helpful? Give feedback.
-
We have ways to enable and disable classes of operations to go on the GPU or not, but we don't have ways to enable/disable for specific stages in a plan. The stages of a plan show up mostly in the physical planning stages, and with AQE enabled those stages can even change as the plan is running. Because of that there is no good way to address or tag a specific stage of processing. We don't even have a good way to tag a specific operation to be on the GPU or not. I would love to have the ability to do what you are asking, but spark just does not currently have the tools to do this. If you could give some specific examples of what you are trying to do or understand we can work with you on getting your questions answered. Also I don't know what you mean by "spark native method". Are you referring to processing using an RDD directly? Are you talking about using dataset APIs instead of dataframe APIs? Currently we only support dataframe. Dataset APIs usually involve a lot of reflection and require some form of introspection into the JVM byte code to ever hope of supporting it. We have thought about ways to try and support it, but it is rather complicated and has just not been the highest priority. |
Beta Was this translation helpful? Give feedback.
-
Suppose I have certain number of stages in spark pipeline, how can I ensure that those stages will run on gpus and the other ones will not go there. This can help me in fine tune the data shifting from cpu to gpu only when I need to use the gpus.
Beta Was this translation helpful? Give feedback.
All reactions