Make it easy to use DataFrame with NestedSelect #6604

MarcSkovMadsen · 2024-03-28T16:07:54Z

The new NestedSelect will be really useful. But 100% of my use cases starts with a Pandas Dataframe. And its currently not very clear how to use that with NestedSelect.

I would recommend either

Documenting how to convert categorical columns of a DataFrame to options.
Provide one or more methods to easily create a NestedSelect from a dataframe.

Personally I would strongly recommend the second option. I would suggest adding class methods similar to get_options_from_dataframe and create_from_dataframe to the NestedSelect.

Example Code

import panel as pn
from bokeh.sampledata.autompg import autompg_clean
import pandas as pd

def _build_nested_dict(df, depth=0, max_depth=None):
    if max_depth is None:
        max_depth = len(df.columns)
    
    # Base case: if depth reaches the last column before values
    if depth == max_depth - 1:
        return df[df.columns[depth]].tolist()
    
    # Recursive case: build dictionary at current depth
    nested_dict = {}
    for value in df[df.columns[depth]].unique():
        filtered_df = df[df[df.columns[depth]] == value]
        nested_dict[value] = _build_nested_dict(filtered_df, depth + 1, max_depth)
    return nested_dict

def get_options_from_dataframe(df, cols=None):
    if not cols:
        cols = list(df.columns)

    df = df[cols].drop_duplicates().sort_values(cols).reset_index(drop=True)
    options = _build_nested_dict(df)
    return options

def test_get_options_from_dataframe():
    data = {
        'continent': ['Europe', 'Europe', 'Asia', 'Asia', 'North America'],
        'country': ['France', 'France', 'Japan', 'Japan', 'USA'],
        'manufacturer': ['Fiat', 'Peugeot', 'Toyota', 'Nissan', 'Ford'],
        'model': ['500', '208', 'Corolla', 'Sentra', 'Mustang']
    }
    df = pd.DataFrame(data)
    options = get_options_from_dataframe(df)
    print(options)

test_get_options_from_dataframe()

def create_from_dataframe(df, cols=None, **params):
    if not cols:
        cols = list(df.columns)

    options = get_options_from_dataframe(df, cols)
    params["levels"]=params.get("levels", cols)
    return pn.widgets.NestedSelect(options=options, **params)


cols = ["origin", "mfr", "name", ]
import panel as pn

pn.extension()

select=create_from_dataframe(autompg_clean, cols=cols, levels=["Origin", "Manufacturer", "Name"])
select.servable()

nested-select.mp4

Additional Question

Is there some relation to hvPlot/ HoloViews widgets? When you use groupby option in hvPlot it must do something similar?

[x] Yes. I would be willing to provide a PR if the proposal is accepted by Philipp.

The text was updated successfully, but these errors were encountered:

ahuang11 · 2024-03-29T00:16:25Z

I think this code also works (easier to copy/paste this one if anyone is looking for this).

import panel as pn
import pandas as pd
from collections import defaultdict
pn.extension()


data = {
    "world": ["Earth", "Earth", "Earth", "Earth", "Earth", "Earth"],
    "continent": ["Europe", "Europe", "Asia", "Asia", "North America", "North America"],
    "country": ["France", "France", "Japan", "Japan", "USA", "USA"],
    "manufacturer": ["Fiat", "Peugeot", "Toyota", "Nissan", "Ford", "Ford"],
    "model": ["500", "208", "Corolla", "Sentra", "Mustang", "Mustang"],
}
df = pd.DataFrame(data)

cols = list(df.columns)
grouped = df.groupby(cols[:-1])
nested = grouped[cols[-1]].apply(lambda x: x.tolist()).to_dict()
create_nested_defaultdict = lambda depth: defaultdict(
    lambda: create_nested_defaultdict(depth - 1)
)
nested_data = create_nested_defaultdict(len(cols) - 1)
for keys, values in nested.items():
    if isinstance(keys, str):
        keys = (keys,)
    current_dict = nested_data
    for i, key in enumerate(keys):
        if i != len(keys) - 1:
            current_dict = current_dict[key]
        else:
            current_dict[key] = values
pn.widgets.NestedSelect(options=nested_data)

Other than that, I would say a class method would be preferable
pn.widgets.NestedSelect.from_dataframe(df)

MarcSkovMadsen · 2024-04-22T04:56:48Z

One part of the answer to https://discourse.holoviz.org/t/overwhelmed-by-with-holoviews-hvplot-panel-workflow-permutations-concepts/7141 is to convert the MultiIndex of a DataFrame to a nested dict and use it with NestedSelect.

This is not trivial to do. I still hope I can convince the core devs that users need helper functions to convert DataFrame, MultiIndex etc. to nested dict.

The code is below.

import pandas as pd
from collections import OrderedDict
from pandas.core.indexes.multi import MultiIndex

def multiindex2dict(p: pd.MultiIndex|dict) -> dict:
    """
    Converts a pandas Multiindex to a nested dict
    :parm p: As this is a recursive function, initially p is a pd.MultiIndex, but after the first iteration it takes
    the internal_dict value, so it becomes to a dictionary
    """
    internal_dict = {}
    end = False
    for x in p:
        # Since multi-indexes have a descending hierarchical structure, it is convenient to start from the last
        # element of each tuple. That is, we start by generating the lower level to the upper one. See the example
        if isinstance(p, pd.MultiIndex):
            # This checks if the tuple x without the last element has len = 1. If so, the unique value of the
            # remaining tuple works as key in the new dict, otherwise the remaining tuple is used. Only for 2 levels
            # pd.MultiIndex
            if len(x[:-1]) == 1:
                t = x[:-1][0]
                end = True
            else:
                t = x[:-1]
            if t not in internal_dict:
                internal_dict[t] = [x[-1]]
            else:
                internal_dict[t].append(x[-1])
        elif isinstance(x, tuple):
            # This checks if the tuple x without the last element has len = 1. If so, the unique value of the
            # remaining tuple works as key in the new dict, otherwise the remaining tuple is used
            if len(x[:-1]) == 1:
                t = x[:-1][0]
                end = True
            else:
                t = x[:-1]
            if t not in internal_dict:
                internal_dict[t] = {x[-1]: p[x]}
            else:
                internal_dict[t][x[-1]] = p[x]
    
    # Uncomment this line to know how the dictionary is generated starting from the lowest level
    # print(internal_dict)
    if end:
        return internal_dict
    return multiindex2dict(internal_dict)

MarcSkovMadsen added TRIAGE Default label for untriaged issues type: enhancement Minor feature or improvement to an existing feature and removed TRIAGE Default label for untriaged issues labels Mar 28, 2024

MarcSkovMadsen added this to the Wishlist milestone Mar 28, 2024

MarcSkovMadsen added the need input from Philipp label Mar 28, 2024

MarcSkovMadsen linked a pull request Mar 29, 2024 that will close this issue

Make it easy to use dataframes with NestedSelect #6608

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it easy to use DataFrame with NestedSelect #6604

Make it easy to use DataFrame with NestedSelect #6604

MarcSkovMadsen commented Mar 28, 2024 •

edited

ahuang11 commented Mar 29, 2024 •

edited

MarcSkovMadsen commented Apr 22, 2024

Make it easy to use DataFrame with NestedSelect #6604

Make it easy to use DataFrame with NestedSelect #6604

Comments

MarcSkovMadsen commented Mar 28, 2024 • edited

Example Code

Additional Question

ahuang11 commented Mar 29, 2024 • edited

MarcSkovMadsen commented Apr 22, 2024

MarcSkovMadsen commented Mar 28, 2024 •

edited

ahuang11 commented Mar 29, 2024 •

edited