Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easy to use DataFrame with NestedSelect #6604

Open
MarcSkovMadsen opened this issue Mar 28, 2024 · 2 comments · May be fixed by #6608
Open

Make it easy to use DataFrame with NestedSelect #6604

MarcSkovMadsen opened this issue Mar 28, 2024 · 2 comments · May be fixed by #6608
Labels
need input from Philipp type: enhancement Minor feature or improvement to an existing feature
Milestone

Comments

@MarcSkovMadsen
Copy link
Collaborator

MarcSkovMadsen commented Mar 28, 2024

The new NestedSelect will be really useful. But 100% of my use cases starts with a Pandas Dataframe. And its currently not very clear how to use that with NestedSelect.

I would recommend either

  • Documenting how to convert categorical columns of a DataFrame to options.
  • Provide one or more methods to easily create a NestedSelect from a dataframe.

Personally I would strongly recommend the second option. I would suggest adding class methods similar to get_options_from_dataframe and create_from_dataframe to the NestedSelect.

Example Code

import panel as pn
from bokeh.sampledata.autompg import autompg_clean
import pandas as pd

def _build_nested_dict(df, depth=0, max_depth=None):
    if max_depth is None:
        max_depth = len(df.columns)
    
    # Base case: if depth reaches the last column before values
    if depth == max_depth - 1:
        return df[df.columns[depth]].tolist()
    
    # Recursive case: build dictionary at current depth
    nested_dict = {}
    for value in df[df.columns[depth]].unique():
        filtered_df = df[df[df.columns[depth]] == value]
        nested_dict[value] = _build_nested_dict(filtered_df, depth + 1, max_depth)
    return nested_dict

def get_options_from_dataframe(df, cols=None):
    if not cols:
        cols = list(df.columns)

    df = df[cols].drop_duplicates().sort_values(cols).reset_index(drop=True)
    options = _build_nested_dict(df)
    return options

def test_get_options_from_dataframe():
    data = {
        'continent': ['Europe', 'Europe', 'Asia', 'Asia', 'North America'],
        'country': ['France', 'France', 'Japan', 'Japan', 'USA'],
        'manufacturer': ['Fiat', 'Peugeot', 'Toyota', 'Nissan', 'Ford'],
        'model': ['500', '208', 'Corolla', 'Sentra', 'Mustang']
    }
    df = pd.DataFrame(data)
    options = get_options_from_dataframe(df)
    print(options)

test_get_options_from_dataframe()

def create_from_dataframe(df, cols=None, **params):
    if not cols:
        cols = list(df.columns)

    options = get_options_from_dataframe(df, cols)
    params["levels"]=params.get("levels", cols)
    return pn.widgets.NestedSelect(options=options, **params)


cols = ["origin", "mfr", "name", ]
import panel as pn

pn.extension()

select=create_from_dataframe(autompg_clean, cols=cols, levels=["Origin", "Manufacturer", "Name"])
select.servable()
nested-select.mp4

Additional Question

Is there some relation to hvPlot/ HoloViews widgets? When you use groupby option in hvPlot it must do something similar?

[x] Yes. I would be willing to provide a PR if the proposal is accepted by Philipp.

@MarcSkovMadsen MarcSkovMadsen added TRIAGE Default label for untriaged issues type: enhancement Minor feature or improvement to an existing feature and removed TRIAGE Default label for untriaged issues labels Mar 28, 2024
@MarcSkovMadsen MarcSkovMadsen added this to the Wishlist milestone Mar 28, 2024
@ahuang11
Copy link
Contributor

ahuang11 commented Mar 29, 2024

I think this code also works (easier to copy/paste this one if anyone is looking for this).

import panel as pn
import pandas as pd
from collections import defaultdict
pn.extension()


data = {
    "world": ["Earth", "Earth", "Earth", "Earth", "Earth", "Earth"],
    "continent": ["Europe", "Europe", "Asia", "Asia", "North America", "North America"],
    "country": ["France", "France", "Japan", "Japan", "USA", "USA"],
    "manufacturer": ["Fiat", "Peugeot", "Toyota", "Nissan", "Ford", "Ford"],
    "model": ["500", "208", "Corolla", "Sentra", "Mustang", "Mustang"],
}
df = pd.DataFrame(data)

cols = list(df.columns)
grouped = df.groupby(cols[:-1])
nested = grouped[cols[-1]].apply(lambda x: x.tolist()).to_dict()
create_nested_defaultdict = lambda depth: defaultdict(
    lambda: create_nested_defaultdict(depth - 1)
)
nested_data = create_nested_defaultdict(len(cols) - 1)
for keys, values in nested.items():
    if isinstance(keys, str):
        keys = (keys,)
    current_dict = nested_data
    for i, key in enumerate(keys):
        if i != len(keys) - 1:
            current_dict = current_dict[key]
        else:
            current_dict[key] = values
pn.widgets.NestedSelect(options=nested_data)

Other than that, I would say a class method would be preferable
pn.widgets.NestedSelect.from_dataframe(df)

@MarcSkovMadsen
Copy link
Collaborator Author

One part of the answer to https://discourse.holoviz.org/t/overwhelmed-by-with-holoviews-hvplot-panel-workflow-permutations-concepts/7141 is to convert the MultiIndex of a DataFrame to a nested dict and use it with NestedSelect.

This is not trivial to do. I still hope I can convince the core devs that users need helper functions to convert DataFrame, MultiIndex etc. to nested dict.

The code is below.

import pandas as pd
from collections import OrderedDict
from pandas.core.indexes.multi import MultiIndex

def multiindex2dict(p: pd.MultiIndex|dict) -> dict:
    """
    Converts a pandas Multiindex to a nested dict
    :parm p: As this is a recursive function, initially p is a pd.MultiIndex, but after the first iteration it takes
    the internal_dict value, so it becomes to a dictionary
    """
    internal_dict = {}
    end = False
    for x in p:
        # Since multi-indexes have a descending hierarchical structure, it is convenient to start from the last
        # element of each tuple. That is, we start by generating the lower level to the upper one. See the example
        if isinstance(p, pd.MultiIndex):
            # This checks if the tuple x without the last element has len = 1. If so, the unique value of the
            # remaining tuple works as key in the new dict, otherwise the remaining tuple is used. Only for 2 levels
            # pd.MultiIndex
            if len(x[:-1]) == 1:
                t = x[:-1][0]
                end = True
            else:
                t = x[:-1]
            if t not in internal_dict:
                internal_dict[t] = [x[-1]]
            else:
                internal_dict[t].append(x[-1])
        elif isinstance(x, tuple):
            # This checks if the tuple x without the last element has len = 1. If so, the unique value of the
            # remaining tuple works as key in the new dict, otherwise the remaining tuple is used
            if len(x[:-1]) == 1:
                t = x[:-1][0]
                end = True
            else:
                t = x[:-1]
            if t not in internal_dict:
                internal_dict[t] = {x[-1]: p[x]}
            else:
                internal_dict[t][x[-1]] = p[x]
    
    # Uncomment this line to know how the dictionary is generated starting from the lowest level
    # print(internal_dict)
    if end:
        return internal_dict
    return multiindex2dict(internal_dict)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need input from Philipp type: enhancement Minor feature or improvement to an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants