Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Support for Ibis expressions #659

Open
cpcloud opened this issue Jan 26, 2023 · 9 comments
Open

Feature: Support for Ibis expressions #659

cpcloud opened this issue Jan 26, 2023 · 9 comments

Comments

@cpcloud
Copy link

cpcloud commented Jan 26, 2023

Hi! 馃憢馃徎

I'm the lead developer of the Ibis project.

I recently did a write up of some analysis of Ibis's CI data using Ibis and plotnine.

I think plotnine is just the bee's knees, and I think ibis and plotnine could be even better together than they are right now.

What would it take to support ibis expressions? I am happy to take this all the way through myself, PR-ing to this repo as well as make any fixes to ibis that might be needed, but I want to check in to make sure this work would have a chance at getting in!

Here's an example from the above notebook where it would be great if I could pass in my ibis objects without having to first turn them into pandas DataFrames:

    t = ... # an ibis table expression
    df = t.execute()
    ggplot(
        df.loc[df.entity == "job"].reset_index(drop=True),
        aes(x="started_date", y="duration", color="factor(improvements)"),
    )
    ... # more plotting code follows

Instead, I'd like to write the following:

    t = ...  # an ibis table expression
    # _ is a placeholder that means "the child table", here it's referring to the t.filter(...) call
    ggplot(
        t.filter(_.entity == "job"),
        aes(x=_.started_date, y=_.duration, color="factor(improvements)"),
    )
    ... # more plotting code follows
@cpcloud
Copy link
Author

cpcloud commented Jan 26, 2023

After perusing the repo, it looks like ibis might be able to implement a to_pandas() method (https://github.com/has2k1/plotnine/blob/main/plotnine/utils.py#L1036)

@cpcloud
Copy link
Author

cpcloud commented Jan 26, 2023

After adding a to_pandas method to ibis expressions, this seems to just work, which is great!

@has2k1
Copy link
Owner

has2k1 commented Jan 26, 2023

to_pandas works (nice), but it is just a first step to having a different (or even more than 1) dataframe type for first class support internally. As it stands, you cannot implement ibis and its expressions; plotnine and pandas are rather meshed together.

I have had discussions (cc @machow) on internals that are agnostic about the type of dataframe by using some kind of adapter API and ibis came up among the options. This seems more practical, though the timeline is still long.

@machow
Copy link
Contributor

machow commented Jan 26, 2023

Hey @cpcloud, were you able to get the ibis lazy expressions to evaluate inside aes? (For example, aes(x = _.some_col - _.some_col.mean(), ...))

It seems like a pretty nice feature, and I'm happy to help dig around (especially since selfishly it'd be useful to chuck in lazy expressions from siuba 馃槄)

@has2k1
Copy link
Owner

has2k1 commented Jan 26, 2023

Hey @cpcloud, were you able to get the ibis lazy expressions to evaluate inside aes? (For example, aes(x = _.some_col - _.some_col.mean(), ...))

The expressions will not work.

@has2k1 has2k1 added the Feature label Jan 26, 2023
@machow
Copy link
Contributor

machow commented Jan 26, 2023

It seems like--since both libraries are able to translate expressions into pandas code--plotnine exposing some kind of generic function (that they could import, or register to via an entrypoint?) might let them put their transformations in?

For example, in siuba...

# generic to_transform function, with siuba implementation ----

from functools import singledispatch
from siuba.siu import Call, Symbolic, strip_symbolic

# define generic

@singledispatch
def to_transform(expr):
    raise NotImplementedError(f"Unsupported type: {type(expr)}")


# register siuba implementation

@to_transform.register(Symbolic)
@to_transform.register(Call)
def _to_transform_siuba(expr: "Symbolic | Call") -> Call:
    return strip_symbolic(expr)


# Apply a siuba symbolic (lazy expression) to a pandas DataFrame ----

from plotnine.data import mtcars
from siuba import _

f_transform = to_transform(_.mpg - _.mpg.mean())
f_transform(mtcars)

In the same way, ibis could translate its expression using its pandas backend and return a callable.

It seems like this approach would be similar to allowing lambdas in aes...

from plotnine import *
from plotnine.data import mtcars

# note the x argument is a lambda
ggplot(mtcars, aes(x=lambda d: d["cyl"], y="mpg"))

Which I think doesn't work, since plotnine needs to know an extra piece?:

  • What is the "name" of the transform? E.g. plotnine knows if you pass "mpg + 1" that the name is "mpg + 1". There would need to be a generic for a lazy expression to tell plotnine its "name".

@has2k1
Copy link
Owner

has2k1 commented Jan 26, 2023

The main issue is that, past some point plotnine only wants to deal with straight up pandas frames, and that point comes before the evaluations and cannot be pushed after them.

                  This point
                  cannot move to after the
                  Evaluate()
InputFrame -> ... EnsurePandas(pd.DataFrame) ... -> Evaluate(pd.Dataframe) -> ...

The solution has to be more comprehensive.

@machow
Copy link
Contributor

machow commented Jan 26, 2023

If I understand correctly, both libraries should be able to supply DataFrames (e.g. via to_pandas()). Once they do that, I wonder if their expressions inside aes(...) could be translated using an approach like in my code above? Are there any other pieces to be aware of? I can dig into this a bit (and also totally understand if it ends up that plotnine prefers to stick with the string expression syntax :).

@has2k1
Copy link
Owner

has2k1 commented Jan 26, 2023

Yes the expression have to match. I think it may be doable. e.g. If the expressions know how to convert themselves e.g.

expr: Symbolic = _.mpg - _.mpg.mean()
expr_str: str = expr.to_p9_aes_expr()

Then plotnine can simply check for to_p9_aes_expr() on any funky object!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants