Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frame _apply_standard error when operating on 0 or NaN values #6246

Closed
venuktan opened this issue Feb 4, 2014 · 3 comments
Closed

frame _apply_standard error when operating on 0 or NaN values #6246

venuktan opened this issue Feb 4, 2014 · 3 comments

Comments

@venuktan
Copy link

venuktan commented Feb 4, 2014

Here are the steps I followed. I am using pandas (0.12.0) .

In [1]: import pandas as pd
In [4]: dataFrame = pd.read_csv('./test.csv')
In [7]: dataFrame
Out[7]:
    r1   r2  r3  r4  r5
0  NaN  3.5 NaN NaN   5
1  4.5  NaN   4 NaN NaN
2  1.5  NaN NaN NaN NaN
3  NaN  NaN NaN NaN NaN
4  NaN  NaN NaN NaN NaN
5  4.5  NaN   4 NaN NaN
6  NaN  NaN NaN NaN NaN

In [8]: dataFrame['mean'] = dataFrame.mean(axis=1)
In [9]: dataFrame
Out[9]:
    r1   r2  r3  r4  r5  mean
0  NaN  3.5 NaN NaN   5  4.25
1  4.5  NaN   4 NaN NaN  4.25
2  1.5  NaN NaN NaN NaN  1.50
3  NaN  NaN NaN NaN NaN   NaN
4  NaN  NaN NaN NaN NaN   NaN
5  4.5  NaN   4 NaN NaN  4.25
6  NaN  NaN NaN NaN NaN   NaN
In [10]: dataFrame.dtypes
Out[10]:
r1      float64
r2      float64
r3      float64
r4      float64
r5      float64
mean    float64
dtype: object

In [11]: meanCenteredDataFrame = dataFrame.apply(lambda x: x -x['mean'])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-11-e6cc746e933b> in <module>()
----> 1 meanCenteredDataFrame = dataFrame.apply(lambda x: x -x['mean'])

//anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, args, **kwds)
   4414                     return self._apply_raw(f, axis)
   4415                 else:
-> 4416                     return self._apply_standard(f, axis)
   4417             else:
   4418                 return self._apply_broadcast(f, axis)

//anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures)
   4489                     # no k defined yet
   4490                     pass
-> 4491                 raise e
   4492
   4493

KeyError: ('mean', u'occurred at index r1')

In [12]: dataFrame.fillna(0,inplace=True)

In [13]: meanCenteredDataFrame = dataFrame.apply(lambda x: x -x['mean'])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-13-e6cc746e933b> in <module>()
----> 1 meanCenteredDataFrame = dataFrame.apply(lambda x: x -x['mean'])

Type:       DataFrame
String Form:
r1   r2  r3  r4  r5  mean
           0  0.0  3.5   0   0   5  4.25
//anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, args, **kwds)
   4414                     return self._apply_raw(f, axis)
   4415                 else:
-> 4416                     return self._apply_standard(f, axis)
   4417             else:
   4418                 return self._apply_broadcast(f, axis)

//anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures)
   4489                     # no k defined yet
   4490                     pass
-> 4491                 raise e
   4492
   4493

KeyError: ('mean', u'occurred at index r1')

@TomAugspurger
Copy link
Contributor

I think you want to use the axis=1 keyword argument in the apply:

In [21]: df
Out[21]: 
    r1   r2  r3  r4  r5  mean
0  NaN  3.5 NaN NaN   5  4.25
1  4.5  NaN   4 NaN NaN  4.25
2  1.5  NaN NaN NaN NaN  1.50
3  NaN  NaN NaN NaN NaN   NaN
4  NaN  NaN NaN NaN NaN   NaN
5  4.5  NaN   4 NaN NaN  4.25
6  NaN  NaN NaN NaN NaN   NaN

[7 rows x 6 columns]

In [22]: df.apply(lambda x: x - x['mean'], axis=1)
Out[22]: 
     r1    r2    r3  r4    r5  mean
0   NaN -0.75   NaN NaN  0.75     0
1  0.25   NaN -0.25 NaN   NaN     0
2  0.00   NaN   NaN NaN   NaN     0
3   NaN   NaN   NaN NaN   NaN   NaN
4   NaN   NaN   NaN NaN   NaN   NaN
5  0.25   NaN -0.25 NaN   NaN     0
6   NaN   NaN   NaN NaN   NaN   NaN

[7 rows x 6 columns]

df.apply operates column wise be default (axis=0), so when starts on column r1 it select mean.

@jreback
Copy link
Contributor

jreback commented Feb 4, 2014

even better is to do

df.sub(df['mean'],axis='index')

see http://pandas.pydata.org/pandas-docs/dev/basics.html#matching-broadcasting-behavior

@venuktan
Copy link
Author

venuktan commented Feb 4, 2014

@jreback
thats working . Thank you

@venuktan venuktan closed this as completed Feb 4, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants