Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom AG Grid function overwrites pandas.DataFrame provided as input with an empty pandas.DataFrame #435

Closed
1 task done
pablo-fence opened this issue Apr 24, 2024 · 2 comments · Fixed by #439
Closed
1 task done
Assignees
Labels
Community Issue/PR opened by the open-source community Issue: Bug Report 🐛 Issue/PR that report/fix a bug

Comments

@pablo-fence
Copy link

Description

data_frame is an empty pandas.DataFrame within the scope of custom AG grid function matrix_aggrid(), see line debug print on line 28 print("Original DataFrame size:", data_frame.shape) .

data_frame is not empty just prior to the function call in debug print line 71.

Expected behavior

No response

Which package?

vizro

Package version

0.1.15

Python version

3.12.1

OS

macOS sonoma 14.4

How to Reproduce

  1. Install package dependencies in provided requirements.txt
    requirements.txt

  2. Run the following code

import pandas as pd
import vizro.models as vm
from vizro.models.types import capture
from dash_ag_grid import AgGrid
from typing import List

# Sample data generator
def generate_sample_data():
    # Creating a DataFrame with the necessary structure
    data = {
        'client_id': range(1, 11),
        'cohort': ['2020-Q1'] * 5 + ['2020-Q2'] * 5,
        'relative_month': [0, 1, 2, 3, 4] * 2,
        'amount_paid': [100, 150, 200, 250, 300] * 2,
        'initial_balance': [500, 500, 500, 500, 500] * 2
    }
    return pd.DataFrame(data)

@capture('ag_grid')
def matrix_aggrid(data_frame, id_column="client_id", row_dimension_column="cohort",
                  column_dimension_column="relative_month", numerator_column="amount_paid",
                  denominator_column="initial_balance", cell_values="numerator_column", periods_to_show=[0, 12]):
    print("Original DataFrame size:", data_frame.shape)
    min_cohort, max_cohort = periods_to_show
    data_frame[column_dimension_column] = data_frame[column_dimension_column].astype(float)
    mask = (data_frame[column_dimension_column] >= min_cohort) & (data_frame[column_dimension_column] <= max_cohort)
    data_frame = data_frame[mask].copy()

    initials = data_frame[[id_column, row_dimension_column, denominator_column]].drop_duplicates().groupby(row_dimension_column)[denominator_column].sum().reset_index()
    gpd = data_frame.groupby([id_column, row_dimension_column, column_dimension_column])[numerator_column].last().reset_index()
    gpd2 = gpd.groupby([row_dimension_column, column_dimension_column])[numerator_column].sum().reset_index()
    gpd3 = initials.merge(gpd2, on=row_dimension_column)

    gpd3['numerator_column'] = gpd3[numerator_column]
    gpd3['denominator_column'] = gpd3[denominator_column]
    gpd3['pct'] = gpd3[numerator_column] / gpd3[denominator_column]

    pt = pd.pivot_table(data=gpd3, index=[row_dimension_column, denominator_column],
                        columns=column_dimension_column, values=cell_values).reset_index()

    columnDefs = [{"headerName": "Cohort Info", "field": row_dimension_column, "filter": True, "sortable": True},
                  {"headerName": "Initial Balance", "field": denominator_column, "type": "numericColumn", "filter": "agNumberColumnFilter"}] + \
                 [{"headerName": str(col), "field": str(col), "type": "numericColumn", "filter": "agNumberColumnFilter"}
                  for col in pt.columns if isinstance(col, (int, float))]

    defaults = {"className": "ag-theme-alpine", "defaultColDef": {"flex": 1, "minWidth": 100, "filter": True,
                                                                 "sortable": True, "resizable": True},
                "style": {"height": "500px", "width": "100%"}}

    return AgGrid(columnDefs=columnDefs, rowData=pt.to_dict("records"), **defaults)

def matrix_analysis_page(
    data_frame: pd.DataFrame,
    id_columns: List[str] = ['client_id'],
    row_dimension_columns:List[str] = ['cohort'],
    column_dimension_columns:List[str] = ['relative_month'],
    numerator_columns: List[str] = ['amount_paid'],
    denominator_columns: List[str] = ['initial_balance'],
    filter_columns: List[str] = ['NONE'],
    ) -> vm.Page:
    
    layout = [
        [0]
    ]

    print("DataFrame size before matrix_datatable call:", data_frame.shape)  # Debug print

    components = [
        vm.AgGrid(
            id="matrix",
            figure=matrix_aggrid(data_frame=data_frame)
        )
    ]
    
    controls = controls + [
        vm.Parameter(
                    targets=['matrix.periods_to_show'],
                    selector=vm.RangeSlider(
                          min=0, max=25, step=10, value=[0,15],
                          title="Select relative cohorts to use")
        ),
        vm.Parameter(
                    targets=['matrix.id_column'],
                    selector=vm.Dropdown(
                        options=id_columns,
                        multi=False,
                        value=id_columns[0],
                        title="Choose id column:"
                    )
        ),
        vm.Parameter( 
                    targets=['matrix.row_dimension_column'],
                    selector=vm.Dropdown(
                        options=row_dimension_columns,
                        multi=False,
                        value=row_dimension_columns[0],
                        title="Choose dimension to use as rows:"
                    )
        ),
        vm.Parameter( 
                    targets=['matrix.column_dimension_column'],
                    selector=vm.Dropdown(
                        options=column_dimension_columns,
                        multi=False,
                        value=column_dimension_columns[0],
                        title="Choose dimension to use as columns:"
                    )
        ),
        vm.Parameter( 
                    targets=['matrix.numerator_column'],
                    selector=vm.Dropdown(
                        options=numerator_columns,
                        multi=False,
                        value=numerator_columns[0],
                        title="Choose dimension to use as numerator_column for percentage calculation:"
                    )
        ),
        vm.Parameter( 
                    targets=['matrix.denominator_column'],
                    selector=vm.Dropdown(
                        options=denominator_columns,
                        multi=False,
                        value=denominator_columns[0],
                        title="Choose dimension to use as denominator_column for percentage calculation:"
                    )
        ),
        vm.Parameter(targets=['matrix.cell_values'],
                selector=vm.RadioItems(
                        options=[
                            {'label': 'Numerator', 'value': "numerator_column"},
                            {'label': 'Denominator', 'value': "denominator_column"},
                            {'label': 'Percentage', 'value': "pct"},
                            ],
                        value='numerator_column',
                        title="Select value to show in cells:"
                        )
        )
            ]
    
    for c in filter_columns:
        controls = controls + [vm.Filter(column=c)]

    # Build page
    page = vm.Page(
        title="Cohort Recovery Analysis",
        layout=vm.Layout(grid=layout),
        components=components,
        controls=controls
    )

    return page

if __name__ == '__main__':
    # Generating a sample DataFrame
    df = generate_sample_data()

    page = matrix_analysis_page(data_frame=df)

    dashboard = vm.Dashboard(
    title="Debug example",
    pages=[page],
    theme='vizro_dark'
    )

    vizro_app = Vizro()
    vizro_app.build(dashboard).run(debug=True)

Output

DataFrame size before matrix_datatable call: (10, 5)
Original DataFrame size: (0, 0)
Traceback (most recent call last):
  File "/Users/pablocebral/repos/fence-prd-dashboard/source/viz_modules/debug.py", line 164, in <module>
    vizro_app.build(dashboard).run(debug=True)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pablocebral/.virtualenvs/fence_prd/lib/python3.12/site-packages/vizro/_vizro.py", line 84, in build
    self.dash.layout = dashboard.build()
                       ^^^^^^^^^^^^^^^^^
  File "/Users/pablocebral/.virtualenvs/fence_prd/lib/python3.12/site-packages/vizro/models/_models_utils.py", line 11, in _wrapper
    return_value = method(self, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pablocebral/.virtualenvs/fence_prd/lib/python3.12/site-packages/vizro/models/_dashboard.py", line 122, in build
    page.build()  # TODO: ideally remove, but necessary to register slider callbacks
    ^^^^^^^^^^^^
  File "/Users/pablocebral/.virtualenvs/fence_prd/lib/python3.12/site-packages/vizro/models/_models_utils.py", line 11, in _wrapper
    return_value = method(self, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pablocebral/.virtualenvs/fence_prd/lib/python3.12/site-packages/vizro/models/_page.py", line 117, in build
    components_container[f"{self.layout.id}_{component_idx}"].children = component.build()
                                                                         ^^^^^^^^^^^^^^^^^
  File "/Users/pablocebral/.virtualenvs/fence_prd/lib/python3.12/site-packages/vizro/models/_components/ag_grid.py", line 114, in build
    html.Div(self.__call__(data_frame=pd.DataFrame()), id=self.id, className="table-container"),
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pablocebral/.virtualenvs/fence_prd/lib/python3.12/site-packages/vizro/models/_components/ag_grid.py", line 55, in __call__
    figure = self.figure(**kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pablocebral/.virtualenvs/fence_prd/lib/python3.12/site-packages/vizro/models/types.py", line 135, in __call__
    return self.__function(**{**self.__bound_arguments, **kwargs})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pablocebral/repos/fence-prd-dashboard/source/viz_modules/debug.py", line 27, in matrix_aggrid
    data_frame[column_dimension_column] = data_frame[column_dimension_column].astype(float)
                                          ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pablocebral/.virtualenvs/fence_prd/lib/python3.12/site-packages/pandas/core/frame.py", line 3893, in __getitem__
    indexer = self.columns.get_loc(key)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pablocebral/.virtualenvs/fence_prd/lib/python3.12/site-packages/pandas/core/indexes/range.py", line 418, in get_loc
    raise KeyError(key)
KeyError: 'relative_month'

Code of Conduct

@pablo-fence pablo-fence added Issue: Bug Report 🐛 Issue/PR that report/fix a bug Status: Needs triage 🔍 Issue/PR needs triaging labels Apr 24, 2024
@petar-qb
Copy link
Contributor

Hi @pablo-fence and thanks for the great question. We will prioritise and investigate this issue in detail next sprint and then will you know about results.

In the meantime: Could adding the following two lines at the beginning of the def matrix_aggrid(...) function solve your problem?

    if data_frame.empty:
        return AgGrid()

@maxschulz-COL maxschulz-COL added Community Issue/PR opened by the open-source community and removed Status: Needs triage 🔍 Issue/PR needs triaging labels Apr 25, 2024
@pablo-fence
Copy link
Author

thank you for the prompt response @petar-qb , adding what you suggested solved it once the dashboard reloads a second time.

I'll keep posted to see if there's more to it once you guys look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community Issue: Bug Report 🐛 Issue/PR that report/fix a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants