Drop duplicate rows before returning output df #2

jcweaver · 2023-12-14T23:52:14Z

I'm not sure if this is expected or not but when running pred_fl_full_name, pred_census_last_name, & pred_fl_last_name using ethnicolr2, I get returned dataframes with duplicate rows. I don't think these duplicate rows should be added.

rajashekar · 2023-12-15T00:48:51Z

@jcweaver - I verified with below code, I am not seeing duplicate rows, can you please share minimal code to replicate?

names = [
    {"last": "hernandez", "first": "hector"},
    {"last": "zhang", "first": "simon"},
]
df = pd.DataFrame(names)
df["fullname"] = self.df["last"] + " " + self.df["first"]
odf = pred_fl_full_name(self.df, full_name_col="fullname")
pd.set_option("display.max_colwidth", None)
print(odf)

output -

        last   first  ...     preds                                                                                                                    probs
0  hernandez  hector  ...  hispanic  {'asian': 0.0003889082, 'hispanic': 0.987706, 'nh_black': 0.0012073584, 'nh_white': 0.008855153, 'other': 0.0018425211}
1      zhang   simon  ...     asian   {'asian': 0.936533, 'hispanic': 0.0016647797, 'nh_black': 0.0012903506, 'nh_white': 0.011221227, 'other': 0.049290624}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop duplicate rows before returning output df #2

Drop duplicate rows before returning output df #2

jcweaver commented Dec 14, 2023

rajashekar commented Dec 15, 2023

Drop duplicate rows before returning output df #2

Drop duplicate rows before returning output df #2

Comments

jcweaver commented Dec 14, 2023

rajashekar commented Dec 15, 2023