Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop duplicate rows before returning output df #2

Open
jcweaver opened this issue Dec 14, 2023 · 1 comment
Open

Drop duplicate rows before returning output df #2

jcweaver opened this issue Dec 14, 2023 · 1 comment

Comments

@jcweaver
Copy link

I'm not sure if this is expected or not but when running pred_fl_full_name, pred_census_last_name, & pred_fl_last_name using ethnicolr2, I get returned dataframes with duplicate rows. I don't think these duplicate rows should be added.

@rajashekar
Copy link
Member

@jcweaver - I verified with below code, I am not seeing duplicate rows, can you please share minimal code to replicate?

names = [
    {"last": "hernandez", "first": "hector"},
    {"last": "zhang", "first": "simon"},
]
df = pd.DataFrame(names)
df["fullname"] = self.df["last"] + " " + self.df["first"]
odf = pred_fl_full_name(self.df, full_name_col="fullname")
pd.set_option("display.max_colwidth", None)
print(odf)

output -

        last   first  ...     preds                                                                                                                    probs
0  hernandez  hector  ...  hispanic  {'asian': 0.0003889082, 'hispanic': 0.987706, 'nh_black': 0.0012073584, 'nh_white': 0.008855153, 'other': 0.0018425211}
1      zhang   simon  ...     asian   {'asian': 0.936533, 'hispanic': 0.0016647797, 'nh_black': 0.0012903506, 'nh_white': 0.011221227, 'other': 0.049290624}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants