Make it easy to use dataframes with NestedSelect #6608

MarcSkovMadsen · 2024-03-29T07:18:49Z

panel/widgets/select.py

MarcSkovMadsen · 2024-03-29T07:21:19Z

panel/widgets/select.py

+        return NestedSelect(options=options, **params)
+
+    @classmethod
+    def filter_dataframe(cls, data: pd.DataFrame, value: Dict, columns: List|None=None, **params)->pd.DataFrame:


Should we make this instance method instead? And use self.value instead of the provided value?

It would make it a bit easier to use but generally less useful.

I don't think NestedSelect should have a method for this. If this functionality is added, it should be under the class method's keyword args (e.g. subset kwarg) rather than a separate method. In my opinion, class methods are commonly used for instantiating objects.

Agree with @ahuang11 here, I do not think filter_dataframe belongs here. There's at least 20 components that would have equal claim to implementing such a method. I get that it's useful, but we can't provide utilities for everything that's useful.

Please reconsider. You have groupby in hvplot for a reason. This is there for the same reason. Being able to easily create nested selections from a dataset. This is really hard to do correctly and takes time to implement. But its a very common use case.

MarcSkovMadsen · 2024-03-29T07:23:25Z

panel/widgets/select.py

+        return NestedSelect(options=options, **params)
+
+    @classmethod
+    def filter_dataframe(cls, data: pd.DataFrame, value: Dict, columns: List|None=None, **params)->pd.DataFrame:


Would there be a more appropriate name than filter_dataframe?

get_dataframe_subset, create_subset, query_dataframe, ...?

codecov · 2024-03-29T07:33:33Z

Codecov Report

Attention: Patch coverage is 98.91304% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 82.98%. Comparing base (d59b23b) to head (3d6037b).
Report is 6 commits behind head on main.

Files	Patch %	Lines
panel/widgets/select.py	97.43%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6608      +/-   ##
==========================================
+ Coverage   82.94%   82.98%   +0.03%     
==========================================
  Files         313      313              
  Lines       46090    46182      +92     
==========================================
+ Hits        38231    38324      +93     
+ Misses       7859     7858       -1

Flag	Coverage Δ
ui-tests	`40.00% <23.91%> (-0.03%)`	⬇️
unitexamples-tests	`71.28% <98.91%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ahuang11 · 2024-03-29T19:12:48Z

I'm confused on the difference between the two methods; I think there should only be one method: from_dataframe or from_pandas to differentiate from polars.

Also, I personally like my shorter implementation :P

def from_pandas(cls, df, **kwargs):
    cols = list(df.columns)
    grouped = df.groupby(cols[:-1])
    nested = grouped[cols[-1]].apply(lambda x: x.tolist()).to_dict()
    create_nested_defaultdict = lambda depth: defaultdict(
        lambda: create_nested_defaultdict(depth - 1)
    )
    nested_data = create_nested_defaultdict(len(cols) - 1)
    for keys, values in nested.items():
        if isinstance(keys, str):
            keys = (keys,)
        current_dict = nested_data
        for i, key in enumerate(keys):
            if i != len(keys) - 1:
                current_dict = current_dict[key]
            else:
                current_dict[key] = values
    return cls(options=nested_data, **kwargs)

philippjfr · 2024-04-01T12:30:45Z

panel/widgets/select.py

@@ -654,6 +673,73 @@ def _update_options_programmatically(self):
                self.value = original_values
                raise

+    @classmethod
+    def create_options_from_dataframe(cls, data: pd.DataFrame, columns: List|None=None)->Dict|List:


I do not think this should be public.

The benefit of making it public is that users have more options for controlling performance. With the create_from_dataframe option there is no caching and the call will potentially be quite costly if the dataframe is large. And be repeated hundreds or thousands of times.

I would argue it should be public.

philippjfr · 2024-04-01T12:33:20Z

panel/widgets/select.py

+    def create_options_from_dataframe(cls, data: pd.DataFrame, columns: List|None=None)->Dict|List:
+        """Returns the options to use with a NestedSelect
+
+        Args:


These docstrings do not follow the style we use throughout Panel, i.e. they're Google style not NumPy style.

I will fix when rest of feedback has been resolved fixed as I currently don't know if this PR will ever make it.

philippjfr · 2024-04-01T12:34:30Z

panel/widgets/select.py

+        return options
+
+    @classmethod
+    def create_from_dataframe(cls, data: pd.DataFrame, columns: List|None=None, **params)->'NestedSelect':


Again, if we're going to add something like this we should generalize it and add it to other widgets. Even without having a greater plan though, creating the options from columns is just one approach, creating it from an index is just as valid I would think.

philippjfr · 2024-04-01T12:37:56Z

Overall I'm -1 on merging this as is. Before merging something like this I'd like to have an overall plan for generalizing this for widgets other than NestedSelect and also not be limited to operating on DataFrame columns, i.e. from_data should support array-like objects (ndarrays and Series), DataFrames, indexes and more and we should define an API that can be consistently supported for all widgets.

MarcSkovMadsen · 2024-04-01T15:12:37Z

Overall I'm -1 on merging this as is. Before merging something like this I'd like to have an overall plan for generalizing this for widgets other than NestedSelect and also not be limited to operating on DataFrame columns, i.e. from_data should support array-like objects (ndarrays and Series), DataFrames, indexes and more and we should define an API that can be consistently supported for all widgets.

Its fine. Would be ok to add it the code to the reference until we have that plan? The users I interact with would not know how to use a DataFrame with NestedSelect without either some guidance, easy to use functionality or spending a huge amount of time re-inventing the wheel. And I believe lots of users are like that.

I don't use all the other use cases you mention, so I would not have the imagination to make up that plan. But is not just doing something which works similarly to hvplot groupby?

MarcSkovMadsen added 2 commits March 29, 2024 06:10

add create method

4120521

add nestedselect dataframe docs

3d6037b

MarcSkovMadsen requested a review from philippjfr March 29, 2024 07:18

MarcSkovMadsen added the ready for review label Mar 29, 2024

MarcSkovMadsen added this to the v1.4.1 milestone Mar 29, 2024

MarcSkovMadsen requested a review from ahuang11 March 29, 2024 07:19

MarcSkovMadsen commented Mar 29, 2024

View reviewed changes

panel/widgets/select.py Show resolved Hide resolved

MarcSkovMadsen commented Mar 29, 2024

View reviewed changes

philippjfr reviewed Apr 1, 2024

View reviewed changes

philippjfr removed this from the v1.4.1 milestone Apr 1, 2024

philippjfr added the status: declined label Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it easy to use dataframes with NestedSelect #6608

Make it easy to use dataframes with NestedSelect #6608

MarcSkovMadsen commented Mar 29, 2024

MarcSkovMadsen Mar 29, 2024

ahuang11 Mar 29, 2024 •

edited

philippjfr Apr 1, 2024

MarcSkovMadsen Apr 1, 2024

MarcSkovMadsen Mar 29, 2024

codecov bot commented Mar 29, 2024 •

edited

ahuang11 commented Mar 29, 2024

philippjfr Apr 1, 2024

MarcSkovMadsen Apr 1, 2024 •

edited

philippjfr Apr 1, 2024

MarcSkovMadsen Apr 1, 2024

philippjfr Apr 1, 2024

philippjfr commented Apr 1, 2024

MarcSkovMadsen commented Apr 1, 2024 •

edited

Make it easy to use dataframes with NestedSelect #6608

Are you sure you want to change the base?

Make it easy to use dataframes with NestedSelect #6608

Conversation

MarcSkovMadsen commented Mar 29, 2024

Choose a reason for hiding this comment

ahuang11 Mar 29, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 29, 2024 • edited

Codecov Report

ahuang11 commented Mar 29, 2024

Choose a reason for hiding this comment

MarcSkovMadsen Apr 1, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philippjfr commented Apr 1, 2024

MarcSkovMadsen commented Apr 1, 2024 • edited

ahuang11 Mar 29, 2024 •

edited

codecov bot commented Mar 29, 2024 •

edited

MarcSkovMadsen Apr 1, 2024 •

edited

MarcSkovMadsen commented Apr 1, 2024 •

edited