Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easy to use dataframes with NestedSelect #6608

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MarcSkovMadsen
Copy link
Collaborator

Closes #6604

panel/widgets/select.py Show resolved Hide resolved
return NestedSelect(options=options, **params)

@classmethod
def filter_dataframe(cls, data: pd.DataFrame, value: Dict, columns: List|None=None, **params)->pd.DataFrame:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this instance method instead? And use self.value instead of the provided value?

It would make it a bit easier to use but generally less useful.

Copy link
Contributor

@ahuang11 ahuang11 Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think NestedSelect should have a method for this. If this functionality is added, it should be under the class method's keyword args (e.g. subset kwarg) rather than a separate method. In my opinion, class methods are commonly used for instantiating objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @ahuang11 here, I do not think filter_dataframe belongs here. There's at least 20 components that would have equal claim to implementing such a method. I get that it's useful, but we can't provide utilities for everything that's useful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reconsider. You have groupby in hvplot for a reason. This is there for the same reason. Being able to easily create nested selections from a dataset. This is really hard to do correctly and takes time to implement. But its a very common use case.

return NestedSelect(options=options, **params)

@classmethod
def filter_dataframe(cls, data: pd.DataFrame, value: Dict, columns: List|None=None, **params)->pd.DataFrame:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a more appropriate name than filter_dataframe?

get_dataframe_subset, create_subset, query_dataframe, ...?

Copy link

codecov bot commented Mar 29, 2024

Codecov Report

Attention: Patch coverage is 98.91304% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 82.98%. Comparing base (d59b23b) to head (3d6037b).
Report is 6 commits behind head on main.

Files Patch % Lines
panel/widgets/select.py 97.43% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6608      +/-   ##
==========================================
+ Coverage   82.94%   82.98%   +0.03%     
==========================================
  Files         313      313              
  Lines       46090    46182      +92     
==========================================
+ Hits        38231    38324      +93     
+ Misses       7859     7858       -1     
Flag Coverage Δ
ui-tests 40.00% <23.91%> (-0.03%) ⬇️
unitexamples-tests 71.28% <98.91%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ahuang11
Copy link
Contributor

I'm confused on the difference between the two methods; I think there should only be one method: from_dataframe or from_pandas to differentiate from polars.

Also, I personally like my shorter implementation :P

def from_pandas(cls, df, **kwargs):
    cols = list(df.columns)
    grouped = df.groupby(cols[:-1])
    nested = grouped[cols[-1]].apply(lambda x: x.tolist()).to_dict()
    create_nested_defaultdict = lambda depth: defaultdict(
        lambda: create_nested_defaultdict(depth - 1)
    )
    nested_data = create_nested_defaultdict(len(cols) - 1)
    for keys, values in nested.items():
        if isinstance(keys, str):
            keys = (keys,)
        current_dict = nested_data
        for i, key in enumerate(keys):
            if i != len(keys) - 1:
                current_dict = current_dict[key]
            else:
                current_dict[key] = values
    return cls(options=nested_data, **kwargs)

@@ -654,6 +673,73 @@ def _update_options_programmatically(self):
self.value = original_values
raise

@classmethod
def create_options_from_dataframe(cls, data: pd.DataFrame, columns: List|None=None)->Dict|List:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this should be public.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit of making it public is that users have more options for controlling performance. With the create_from_dataframe option there is no caching and the call will potentially be quite costly if the dataframe is large. And be repeated hundreds or thousands of times.

I would argue it should be public.

def create_options_from_dataframe(cls, data: pd.DataFrame, columns: List|None=None)->Dict|List:
"""Returns the options to use with a NestedSelect

Args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These docstrings do not follow the style we use throughout Panel, i.e. they're Google style not NumPy style.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix when rest of feedback has been resolved fixed as I currently don't know if this PR will ever make it.

return options

@classmethod
def create_from_dataframe(cls, data: pd.DataFrame, columns: List|None=None, **params)->'NestedSelect':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, if we're going to add something like this we should generalize it and add it to other widgets. Even without having a greater plan though, creating the options from columns is just one approach, creating it from an index is just as valid I would think.

@philippjfr
Copy link
Member

Overall I'm -1 on merging this as is. Before merging something like this I'd like to have an overall plan for generalizing this for widgets other than NestedSelect and also not be limited to operating on DataFrame columns, i.e. from_data should support array-like objects (ndarrays and Series), DataFrames, indexes and more and we should define an API that can be consistently supported for all widgets.

@philippjfr philippjfr removed this from the v1.4.1 milestone Apr 1, 2024
@MarcSkovMadsen
Copy link
Collaborator Author

MarcSkovMadsen commented Apr 1, 2024

Overall I'm -1 on merging this as is. Before merging something like this I'd like to have an overall plan for generalizing this for widgets other than NestedSelect and also not be limited to operating on DataFrame columns, i.e. from_data should support array-like objects (ndarrays and Series), DataFrames, indexes and more and we should define an API that can be consistently supported for all widgets.

Its fine. Would be ok to add it the code to the reference until we have that plan? The users I interact with would not know how to use a DataFrame with NestedSelect without either some guidance, easy to use functionality or spending a huge amount of time re-inventing the wheel. And I believe lots of users are like that.

I don't use all the other use cases you mention, so I would not have the imagination to make up that plan. But is not just doing something which works similarly to hvplot groupby?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make it easy to use DataFrame with NestedSelect
4 participants