Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suffixes ignored on second merge #9458

Closed
MatzeB opened this issue Feb 10, 2015 · 3 comments
Closed

Suffixes ignored on second merge #9458

MatzeB opened this issue Feb 10, 2015 · 3 comments
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode Usage Question

Comments

@MatzeB
Copy link

MatzeB commented Feb 10, 2015

I am trying to merge multiple dataframe with consecutive merge operations, I want to add a suffix to the name of the newly merged column names. A simplified version of my code looks like this:

from pandas import *
f0 = DataFrame(columns=['data'], data=[1,2,3], index=['a','b','c'])
f1 = DataFrame(columns=['data'], data=[4,5,6], index=['c','b','a'])
f2 = DataFrame(columns=['data'], data=[7,8,9], index=['a','c','b'])
merged = f0
merged = merged.merge(f1, left_index=True, right_index=True, suffixes=("_0", "_1"))
merged = merged.merge(f2, left_index=True, right_index=True, suffixes=("", "_2"))
print merged.columns

With pandas 0.15.2 on python 2.7 this returns:

Index([u'data_0', u'data_1', u'data'], dtype='object')

while I would have expected

Index([u'data_0', u'data_1', u'data_2'], dtype='object')
@jreback
Copy link
Contributor

jreback commented Feb 10, 2015

seems you just want this?

In [45]: pd.concat([f0,f1,f2],axis=1,ignore_index=True)
Out[45]: 
   0  1  2
a  1  6  7
b  2  5  9
c  3  4  8

suffixes only apply if there are duplicate columns which after the first merge there are not (when compared to the 3rd)

In [46]: merged1 = merged.merge(f1, left_index=True, right_index=True, suffixes=("_0", "_1"))

In [47]: merged1
Out[47]: 
   data_0  data_1
c       3       4
b       2       5
a       1       6

In [48]: merged1.merge(f2, left_index=True, right_index=True, suffixes=("", "_2"))
Out[48]: 
   data_0  data_1  data
a       1       6     7
c       3       4     8
b       2       5     9

@jreback jreback closed this as completed Feb 10, 2015
@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Usage Question labels Feb 10, 2015
@MatzeB
Copy link
Author

MatzeB commented Feb 10, 2015

Ah thanks for the explanation, I missed the fact that suffixes only get applied to duplicate column names. And indeed the concat solution is simpler. And for the record: concat(... ignore_index=True) is exactly the opposite of what I want, but with concat(... ignore_index=False) it works nicely.

@jreback
Copy link
Contributor

jreback commented Feb 10, 2015

gr8

keep in mind that you generally don't want to have duplicate columns

you might want a multi level result - use the keys argument to concat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode Usage Question
Projects
None yet
Development

No branches or pull requests

2 participants