Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging/Joining dataframes, and dropping columns leaves the key column. #48

Open
villerantanen opened this issue May 16, 2023 · 0 comments

Comments

@villerantanen
Copy link

Environment

  • Operating System: Linux
  • Python Version: Python 3.9.16
  • How did you install bamboolib: pip
  • Python packages:
Package                   Version
------------------------- ---------
aiofiles                  22.1.0
aiosqlite                 0.18.0
ansi                      0.3.6
anyio                     3.6.2
argon2-cffi               21.3.0
argon2-cffi-bindings      21.2.0
arrow                     1.2.3
astroid                   2.13.5
asttokens                 2.2.1
astunparse                1.6.3
attrs                     22.2.0
Babel                     2.11.0
backcall                  0.2.0
bamboolib                 1.30.19
beautifulsoup4            4.11.2
binaryornot               0.4.4
bleach                    6.0.0
boto3                     1.26.47
botocore                  1.29.63
certifi                   2022.12.7
cffi                      1.15.1
chardet                   5.1.0
charset-normalizer        3.1.0
cleverdict                1.9.2
click                     8.1.3
click-plugins             1.1.1
cligj                     0.7.2
colorama                  0.4.6
comm                      0.1.2
conda-pack                0.7.0
cookiecutter              2.1.1
cryptography              39.0.0
cx-Oracle                 8.3.0
cycler                    0.11.0
damenu                    0.3.1
DAutils                   0.3
debugpy                   1.6.6
decorator                 5.1.1
defusedxml                0.7.1
descartes                 1.1.0
dill                      0.3.6
et-xmlfile                1.1.0
executing                 1.2.0
fastjsonschema            2.16.2
Fiona                     1.9.0
flake8                    6.0.0
fonttools                 4.38.0
fqdn                      1.5.1
fsspec                    2023.1.0
future                    0.18.3
geopandas                 0.12.2
gitdb                     4.0.10
GitPython                 3.1.30
graphviz                  0.20.1
greenlet                  2.0.2
idna                      3.4
importlib-metadata        6.0.0
ipykernel                 6.21.1
ipyslickgrid              0.0.3
ipython                   8.9.0
ipython-genutils          0.2.0
ipywidgets                7.7.2
isoduration               20.11.0
isort                     5.12.0
jedi                      0.18.2
Jinja2                    3.1.2
jinja2-time               0.2.0
jmespath                  1.0.1
joblib                    1.2.0
json5                     0.9.11
jsonpointer               2.3
jsonschema                4.17.3
jupyter_client            8.0.3
jupyter_core              5.2.0
jupyter-events            0.5.0
jupyter_server            2.0.6
jupyter_server_fileid     0.6.0
jupyter-server-mathjax    0.2.6
jupyter_server_terminals  0.4.4
jupyter_server_ydoc       0.6.1
jupyter-ydoc              0.2.2
jupyterlab                3.5.2
jupyterlab-git            0.41.0
jupyterlab-pygments       0.2.2
jupyterlab_server         2.19.0
jupyterlab-widgets        1.1.1
kiwisolver                1.4.4
lazy-object-proxy         1.9.0
MarkupSafe                2.1.2
matplotlib                3.5.3
matplotlib-inline         0.1.6
mccabe                    0.7.0
mistune                   2.0.5
munch                     2.5.0
nbclassic                 0.5.1
nbclient                  0.7.2
nbconvert                 7.2.9
nbdime                    3.1.1
nbformat                  5.7.3
nest-asyncio              1.5.6
notebook                  6.5.2
notebook_shim             0.2.2
numpy                     1.24.1
openpyxl                  3.0.10
packaging                 23.0
pandarallel               1.6.4
pandas                    1.5.2
pandocfilters             1.5.0
parso                     0.8.3
patsy                     0.5.3
pexpect                   4.8.0
pickleshare               0.7.5
Pillow                    9.4.0
pip                       22.3.1
platformdirs              3.0.0
plotly                    5.11.0
ppscore                   1.3.0
prometheus-client         0.16.0
prompt-toolkit            3.0.36
psutil                    5.9.4
ptyprocess                0.7.0
pure-eval                 0.2.2
pyarrow                   10.0.1
pycodestyle               2.10.0
pycparser                 2.21
pyflakes                  3.0.1
pyflowchart               0.2.3
Pygments                  2.14.0
pylint                    2.15.10
pyparsing                 3.0.9
pyproj                    3.4.1
pyrsistent                0.19.3
python-dateutil           2.8.2
python-dotenv             0.21.0
python-highcharts         0.4.2
python-json-logger        2.0.4
python-slugify            8.0.1
pytz                      2022.7.1
PyYAML                    6.0
pyzmq                     25.0.0
requests                  2.28.2
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
s3fs                      0.4.2
s3transfer                0.6.0
scikit-learn              1.2.1
scipy                     1.10.0
Send2Trash                1.8.0
setuptools                65.6.3
shapely                   2.0.1
simple-term-menu          1.5.2
six                       1.16.0
smmap                     5.0.0
sniffio                   1.3.0
soupsieve                 2.4
SQLAlchemy                1.4.46
sqlalchemy-vertica-python 0.5.10
stack-data                0.6.2
statsmodels               0.13.5
tenacity                  8.1.0
terminado                 0.17.1
text-unidecode            1.3
threadpoolctl             3.1.0
tinycss2                  1.2.1
toml                      0.10.2
tomli                     2.0.1
tomlkit                   0.11.6
tornado                   6.2
tqdm                      4.64.1
traitlets                 5.9.0
typing_extensions         4.4.0
uri-template              1.2.0
urllib3                   1.26.14
vertica-python            1.2.0
verticapy                 0.12.0
wcwidth                   0.2.6
webcolors                 1.12
webencodings              0.5.1
websocket-client          1.5.0
wheel                     0.37.1
widgetsnbextension        3.6.1
wrapt                     1.14.1
xlrd                      2.0.1
xlwt                      1.3.0
y-py                      0.5.5
ypy-websocket             0.8.2
zipp                      3.12.0
  • If bamboolib is used with JupyterLab:
        jupyterlab-plotly v5.11.0 enabled OK
        jupyterlab_pygments v0.2.2 enabled OK (python, jupyterlab_pygments)
        nbdime-jupyterlab v2.1.1 enabled OK
        @jupyterlab/git v0.41.0 enabled OK (python, jupyterlab-git)
        @jupyter-widgets/jupyterlab-manager v3.1.1 enabled OK (python, jupyterlab_widgets)

Description of Issue

I'm joining two dataframes, person and track. They have key columns named trackingid and tracking_id.

In the default case, the merge operation will keep both key columns in the output, but naturally, they are duplicate column content. If I select "Drop some columns", and select the key column, Bamboolib will make sure the key is not removed, since it's required in the merge.

# Step: Inner Join with track where trackingid=tracking_id
person2 = pd.merge(person, track.drop(columns=[]), how='inner', left_on=['trackingid'], right_on=['tracking_id'])

This behavior is counter intuitive, and the Transformation should drop the columns after the join.

Reproduction Steps

  1. Join any two tables.
  2. Try to drop the key column.

What steps have you taken to resolve this already?

Anything else?

...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant