Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Rust UDF #993

Open
yongda-fan opened this issue Mar 12, 2024 · 0 comments
Open

Support Rust UDF #993

yongda-fan opened this issue Mar 12, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@yongda-fan
Copy link

yongda-fan commented Mar 12, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
(This section helps Arrow developers understand the context and why for this feature, in addition to the what)

Currently Ballista does not support rust UDF, which makes it hard to process data using some custom function or with external libraries.

Describe the solution you'd like
A clear and concise description of what you want to happen.

There are two possible Solutions

Register the UDF directly

We load rust dynamic library into the executor (similar to this PR: apache/datafusion#1881, and we have partial code here https://github.com/apache/arrow-ballista/tree/main/ballista/core/src/plugin) and register the UDF directly to the DataFusion.

issues:

  1. rust has never guaranteed a stable ABI (i.e. memory layout), therefor the fields in the UDF class in the plugin maybe interpreted incorrectly, e.g. ColumnarValue or Signature, etc.
  2. in practice same rustc version + same optimization level gives the same ABI (i.e. memory layout for the class). this suggest the plugin must be complied with the exact same rustc and compiler flags.
  3. or alternatively we could use a stable api library such as abi_stable or stabby marks all UDF related class

Reconstruct a UDF function a function with Arrow data as parameter and Arrow data as return type

we can only load a function that use Arrow data as parameter and returns, since this is memory layout stable (e.g. using https://arrow.apache.org/rust/arrow/ffi/struct.FFI_ArrowArray.html). we could pass the signature using a serialized string or something.

similar to this: https://github.com/apache/arrow-datafusion-python/blob/main/src/udf.rs

issues:

  1. we lost lots of flexibility provided by rust ScalarUdfImpl, such as change the signature by value, or provide specialized code path to use the ColumnarValue::Scalar

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

alternatively, we could build a custom ballista executor each time we want to add or modify a UDF and deploy it.

Additional context
Add any other context or screenshots about the feature request here.

@yongda-fan yongda-fan added the enhancement New feature or request label Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant