Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_report crashes for almost empty string columns #963

Open
AndrejIring opened this issue May 24, 2023 · 1 comment
Open

create_report crashes for almost empty string columns #963

AndrejIring opened this issue May 24, 2023 · 1 comment
Assignees
Labels
type: bug Something isn't working

Comments

@AndrejIring
Copy link

Describe the bug
A clear and concise description of what the bug is.
create_report crashes for large DataFrames with almost empty string columns with the error 'Series' object has no attribute 'len'

After investigation, I found out that an error occurs in the calculation of the mean length of the elements in method _calc_nom_stats. It is caused by a partition containing only NaN values which are in the method nom_comps dropped which causes an empty Dask partition to be created. Afterward when compute is called an error is raised.

To Reproduce

import pandas as pd
import numpy as np
from dataprep.eda import create_report


df = pd.DataFrame(np.random.randint(-100,100, (300000,100)))
df.loc[0, "almost_empty_col"] = "single value"
report = create_report(df)

Expected behavior
A clear and concise description of what you expected to happen.

During the create_report calculations empty partitions should be handled. Particularly in method nom_comps empty partitions should be dropped after calling srs = srs.dropna()

Desktop (please complete the following information):

  • OS: Linux Ubuntu 22.04 LTS
  • Platform Python script
  • Python Version 3.8.16
  • Dataprep Version: 0.4.5
@AndrejIring AndrejIring added the type: bug Something isn't working label May 24, 2023
@dovahcrow
Copy link
Member

Hi @AndrejIring thanks for the bug report and the detailed analysis of the reason! I'll take a look into the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants