Merging/Joining dataframes, and dropping columns leaves the key column. #48

villerantanen opened this issue May 16, 2023 · 0 comments


  • Operating System: Linux
  • Python Version: Python 3.9.16
  • How did you install bamboolib: pip
Description of Issue

I'm joining two dataframes, person and track. They have key columns named trackingid and tracking_id.

In the default case, the merge operation will keep both key columns in the output, but naturally, they are duplicate column content. If I select "Drop some columns", and select the key column, Bamboolib will make sure the key is not removed, since it's required in the merge.

# Step: Inner Join with track where trackingid=tracking_id
person2 = pd.merge(person, track.drop(columns=[]), how='inner', left_on=['trackingid'], right_on=['tracking_id'])

This behavior is counter intuitive, and the Transformation should drop the columns after the join.

Reproduction Steps

  1. Join any two tables.
  2. Try to drop the key column.

What steps have you taken to resolve this already?

