Bring your own Spark model #2080

brandongreenwell-8451 · 2023-09-21T20:00:58Z

Is your feature request related to a problem? Please describe.
I'm wondering if it would be possible to "bring your own" Spark model for use with the interpretability functions, like ICETransformer()? For instance, we have several tools that allow us to do inference on Spark data frames by calling a .predict() or .transform() method. Is it possible to wrap such a non-PySpark MLLib model in a way that we could still use this package for generating explanations and ICE plots in Spark?

Describe the solution you'd like
Suppose I have a custom model (e.g., some kind of scoring code that operates on Spark data frames to transform the data by adding prediction columns). I'd like to wrap said model in a way that would allow me to call the ICETransformer() method. E.g.,

pdp = ICETransformer(
    model=custom_model,  # custom_model.transform(spark_data_frame) 
    targetCol="prediction_col_name",
    kind="average",
    targetClasses=[1],
    categoricalFeatures=categorical_features,
    numericFeatures=numeric_features,
)

Where custom_model has a .transform() method similar to PySpark MLLib models that returns the input data with additional prediction columns.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-09-21T20:01:11Z

Hey @brandongreenwell-8451 👋!
Thank you so much for reporting the issue/feature request 🚨.
Someone from SynapseML Team will be looking to triage this issue soon.
We appreciate your patience.

memoryz · 2023-10-30T17:54:20Z

@brandongreenwell-8451 sorry for the delayed response - I just saw this question.

I assume you have a custom model but it is not implemented as a Spark Transformer object. If that's the case, I'm afraid the current implementation of explainers do not support such a scenario. The explainer logic is implemented entirely in scala, and I don't see a way to bring a python model to JVM side for interpretation.

brandongreenwell-8451 · 2023-10-31T12:34:19Z

Hi @memoryz, thanks for the reply, and that makes sense. I was more so asking about models that do operate on Spark data frames via a .predict() or .transform() method. For example, DataRobot and H2O both provide scoring code for models that can be used to make predictions on Spark data frames (e.g., in Python/pyspark). Is it possible to create a wrapper of some sort that would allow us to use it with some of SynapseML's RAI functions, like ICE curves?

Example code: https://datarobot.github.io/datarobot-predict/1.5/scoring_code_spark/

github-actions bot added the triage label Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bring your own Spark model #2080

Bring your own Spark model #2080

brandongreenwell-8451 commented Sep 21, 2023

github-actions bot commented Sep 21, 2023

memoryz commented Oct 30, 2023

brandongreenwell-8451 commented Oct 31, 2023 •

edited

Bring your own Spark model #2080

Bring your own Spark model #2080

Comments

brandongreenwell-8451 commented Sep 21, 2023

github-actions bot commented Sep 21, 2023

memoryz commented Oct 30, 2023

brandongreenwell-8451 commented Oct 31, 2023 • edited

brandongreenwell-8451 commented Oct 31, 2023 •

edited