-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way to refer to multiple joined frames' columns #3421
Comments
The issue here is that you are doing two joins at once. While technically this is going to work, as we allow multiple join nodes internally, this is not something we ever guaranteed to work. If you look at [i, j, ...] documentation you will notice, that there is only one While we eventually could add official support for multiple joins and multiple As a workaround, I would propose to split your logic into several steps, i.e. >>> DT = df[g[0] != None, :, join(jf)]
>>> DT[:, [f["A"], f["B"], g["D"].alias("C")], join(jf2)]
| A B C
| str32 str32 int32
-- + ----- ----- -----
0 | b e 3
1 | b f 4
2 | c f 4
[3 rows x 3 columns] |
Okay, thank you for your answer. In the documentation of the |
Yes, you are right. But from the signature it is not obvious one can do multiple joins and probably we didn’t think it through with respect to addressing other joining frames. I also do not see we have even one test that tests multiple join functionality. My feeling is that if we allow multiple joins we must have a way to address the frame’s columns. The problem is how the new namespaces should look like: |
@hallmeier do you mind pointing me to the link with the quote you referenced about calling the parameter multiple times? |
It's right in the |
Wow I like the fact that you can join multiple frames... Keeping track of namespaces might be complex 🤷♂️. At any rate I don't think it should be deprecated, probably update d docs to say that at the moment only two namespaces are supported, with an example |
Yeah, we definitely need to address this issue at some point, though it is not obvious to me how. The way we are doing it now with |
The "list of namespaces" idea sounds good to me. An alternative idea is to have a |
Yes, probably a |
I want to index
df
on columnA
withjf
and then join withjf2
to update columnC
withjf2
's columnD
(also naming itC
wouldn't help here).So after updating
df
would be:Joining works perfectly:
But I can't update in the same step because column
D
cannot be accessed. I'd like to do something like this:The columns of
df
are inf
and the columns ofjf
are ing
, but the columns ofjf2
cannot be accessed in thej
-statement.While this is a feature request, I'd also appreciate good ideas for workarounds.
The text was updated successfully, but these errors were encountered: