New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it easy to use dataframes with NestedSelect #6608
base: main
Are you sure you want to change the base?
Conversation
return NestedSelect(options=options, **params) | ||
|
||
@classmethod | ||
def filter_dataframe(cls, data: pd.DataFrame, value: Dict, columns: List|None=None, **params)->pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this instance method instead? And use self.value
instead of the provided value
?
It would make it a bit easier to use but generally less useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think NestedSelect should have a method for this. If this functionality is added, it should be under the class method's keyword args (e.g. subset
kwarg) rather than a separate method. In my opinion, class methods are commonly used for instantiating objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @ahuang11 here, I do not think filter_dataframe
belongs here. There's at least 20 components that would have equal claim to implementing such a method. I get that it's useful, but we can't provide utilities for everything that's useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reconsider. You have groupby
in hvplot for a reason. This is there for the same reason. Being able to easily create nested selections from a dataset. This is really hard to do correctly and takes time to implement. But its a very common use case.
return NestedSelect(options=options, **params) | ||
|
||
@classmethod | ||
def filter_dataframe(cls, data: pd.DataFrame, value: Dict, columns: List|None=None, **params)->pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be a more appropriate name than filter_dataframe
?
get_dataframe_subset
, create_subset
, query_dataframe
, ...?
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6608 +/- ##
==========================================
+ Coverage 82.94% 82.98% +0.03%
==========================================
Files 313 313
Lines 46090 46182 +92
==========================================
+ Hits 38231 38324 +93
+ Misses 7859 7858 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
I'm confused on the difference between the two methods; I think there should only be one method: Also, I personally like my shorter implementation :P def from_pandas(cls, df, **kwargs):
cols = list(df.columns)
grouped = df.groupby(cols[:-1])
nested = grouped[cols[-1]].apply(lambda x: x.tolist()).to_dict()
create_nested_defaultdict = lambda depth: defaultdict(
lambda: create_nested_defaultdict(depth - 1)
)
nested_data = create_nested_defaultdict(len(cols) - 1)
for keys, values in nested.items():
if isinstance(keys, str):
keys = (keys,)
current_dict = nested_data
for i, key in enumerate(keys):
if i != len(keys) - 1:
current_dict = current_dict[key]
else:
current_dict[key] = values
return cls(options=nested_data, **kwargs) |
@@ -654,6 +673,73 @@ def _update_options_programmatically(self): | |||
self.value = original_values | |||
raise | |||
|
|||
@classmethod | |||
def create_options_from_dataframe(cls, data: pd.DataFrame, columns: List|None=None)->Dict|List: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this should be public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefit of making it public is that users have more options for controlling performance. With the create_from_dataframe
option there is no caching and the call will potentially be quite costly if the dataframe is large. And be repeated hundreds or thousands of times.
I would argue it should be public.
def create_options_from_dataframe(cls, data: pd.DataFrame, columns: List|None=None)->Dict|List: | ||
"""Returns the options to use with a NestedSelect | ||
|
||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These docstrings do not follow the style we use throughout Panel, i.e. they're Google style not NumPy style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix when rest of feedback has been resolved fixed as I currently don't know if this PR will ever make it.
return options | ||
|
||
@classmethod | ||
def create_from_dataframe(cls, data: pd.DataFrame, columns: List|None=None, **params)->'NestedSelect': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, if we're going to add something like this we should generalize it and add it to other widgets. Even without having a greater plan though, creating the options from columns is just one approach, creating it from an index is just as valid I would think.
Overall I'm -1 on merging this as is. Before merging something like this I'd like to have an overall plan for generalizing this for widgets other than |
Its fine. Would be ok to add it the code to the reference until we have that plan? The users I interact with would not know how to use a DataFrame with I don't use all the other use cases you mention, so I would not have the imagination to make up that plan. But is not just doing something which works similarly to hvplot |
Closes #6604