Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the correct way to imput only with in a group #44

Open
oxinabox opened this issue Aug 8, 2019 · 8 comments
Open

Document the correct way to imput only with in a group #44

oxinabox opened this issue Aug 8, 2019 · 8 comments

Comments

@oxinabox
Copy link
Member

oxinabox commented Aug 8, 2019

Say I have a dataframe with say a column of :times,
a column of :units
and a column of values.
And I want to impute values via locf
But I don't want to imput from outside a unit

What is the correct way to do it?
It thought it would be:

by(Impute.locf, df, :unit)

But that makes my computer turn into a space heater

@oxinabox
Copy link
Member Author

oxinabox commented Aug 8, 2019

Answer:

by(Impute.locf!, deepcopy(df), :unit)

One does not want to individually deepcopy each group (which is what lcof does), because each group has a reference to the partent dataframe, and so you make huge copies every time.

So copy before hand, then use the mutating form

@rofinn
Copy link
Member

rofinn commented Aug 8, 2019

Another issue stems from the fact that we can't guarantee that Impute.locf! won't copy the data. This works for dataframes, but that isn't a consistent behaviour. I'm not sure if this kind of thing makes sense in the current documentation, but we might want to add a "How do I do X?" section that has more complicated examples with different table and array types. Might also be good to add a performance section which goes over some of the copying gotchas.

@oxinabox
Copy link
Member Author

oxinabox commented Aug 8, 2019

In this case that by expression does not depend on whether Impute.locf! is inplace or not.

@rofinn
Copy link
Member

rofinn commented Aug 8, 2019

How so? Isn't the point that you don't want to copy each group?

@oxinabox
Copy link
Member Author

oxinabox commented Aug 8, 2019

My point was: by appends all AbstractDataFrames that are returned.
So it is certain to work.
Regardless of if mutating or not.

As you say, the reason to use lcof! is not to make it inplace.
It is to avoid calling locf from internally calling deepcopy(GroupedDataFrame) as that results in massive slow, because it will deepcopy the parent too, for each group.

I know from the fact my code completes in seconds (where as it was sitting there for over a minute before) that this does indeed avoid the deepcopies.

@rofinn
Copy link
Member

rofinn commented Aug 9, 2019

Ahh, oops, I missed the impact of the append. Yeah, that make sense now.

@oxinabox oxinabox changed the title Document the correct way to import only with in a group Document the correct way to imput only with in a group Oct 24, 2019
@oxinabox
Copy link
Member Author

oxinabox commented Jul 15, 2020

maybe we should do less deepcopy and more copy
I suspect everything we want to apply this on has copy defined.

Since the whole problem was deepcopy on a GroupedDataFrame copying the parent dataframe

@rofinn
Copy link
Member

rofinn commented Sep 18, 2020

Okay, I'll try giving that a shot... though I'm guessing we'll need more tests cases for different types of tables then.

@rofinn rofinn mentioned this issue Oct 1, 2020
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants