Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: groupby.pct_change() does not work properly in Pandas 0.23.0. Grouping is ignored. #21200

Closed
Pferdow30 opened this issue May 25, 2018 · 4 comments · Fixed by #21235
Closed
Milestone

Comments

@Pferdow30
Copy link

Code Sample

>>>import pandas as pd
>>>import numpy as np

>>>df = pd.DataFrame(data=np.random.rand(8, 1), columns={'a'})
>>>df['grp']=1
>>>df.loc[::2, 'grp']=2
>>>df['%_groupby']=df.groupby('grp')['a'].pct_change()
>>>df['%_shift']=df.groupby('grp')['a'].shift(0)/df.groupby('grp')['a'].shift(1)-1
>>>print(df)

Problem description

When there are different groups in a dataframe, by using groupby it is expected that the pct_change function be applied on each group. However, combining groupby with pct_change does not produce the correct result.

Output:

     a  grp  %_groupby   %_shift
0  1.0    2        NaN       NaN
1  1.1    1   0.100000       NaN
2  1.2    2   0.090909  0.200000
3  1.3    1   0.083333  0.181818
4  1.4    2   0.076923  0.166667
5  1.5    1   0.071429  0.153846
6  1.6    2   0.066667  0.142857
7  1.7    1   0.062500  0.133333

Expected Output

     a  grp  %_groupby   %_shift
0  1.0    2        NaN       NaN
1  1.1    1        NaN       NaN
2  1.2    2   0.200000  0.200000
3  1.3    1   0.181818  0.181818
4  1.4    2   0.166667  0.166667
5  1.5    1   0.153846  0.153846
6  1.6    2   0.142857  0.142857
7  1.7    1   0.133333  0.133333

Output of pd.show_versions()

INSTALLED VERSIONS


commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: 3.2.1
pip: 10.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.14.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@simonariddell
Copy link
Contributor

I can see the pct_change function in groupby.py on line ~3944 is not implementing this properly. Whereas the method it overrides implements it properly for a dataframe. I'd like to think this should be relatively straightforward to remedy.
I'll take a crack at a PR for this. Although I haven't contributed to pandas before, so we'll see if I am able to complete it in a timely manner.

@jreback
Copy link
Contributor

jreback commented May 25, 2018

maybe related to #11811

@jreback jreback changed the title groupby followed by pct_change does not work properly in Pandas 0.23.0. Grouping is ignored. BUG: groupby.pct_change() does not work properly in Pandas 0.23.0. Grouping is ignored. May 25, 2018
@jreback jreback added this to the Next Major Release milestone May 25, 2018
@ZenW00kie
Copy link

Found something along these lines when you shift in reverse so

import pandas_datareader.data as web
import pandas as pd

tickers = ['F','AAPL','NFLX','AMZN','GOOG']

df = pd.DataFrame()
for ticker in tickers:
    data = web.DataReader(ticker, 'iex', '2018-01-01', '2018-06-01')
    data['ticker'] = ticker
    df = df.append(data)

df = df.reset_index()
df['5_day_growth'] = df.groupby('ticker').close.pct_change(periods=-5)
df['5_day_growth_alt'] = df.groupby('ticker').close.pct_change(periods=5).shift(-5)

The alternate method gives you correct output rather than shifting in the calculation.

print(df[['date','ticker','close','5_day_growth', '5_day_growth_alt']].head(6))

          date ticker    close  5_day_growth  5_day_growth_alt
0  2018-01-02      F  12.1939     -0.032115          0.033181
1  2018-01-03      F  12.2903     -0.020717          0.021155
2  2018-01-04      F  12.5022     -0.013672          0.013862
3  2018-01-05      F  12.7141     -0.002268          0.002273
4  2018-01-08      F  12.6659      0.003820         -0.003805
5  2018-01-09      F  12.5985      0.073894         -0.068810

@WillKoehrsen
Copy link

WillKoehrsen commented Jun 28, 2018

A workaround for this is using apply. This should produce the desired result:

df['%_groupby'] = df.groupby('grp')['a'].apply(lambda x: x.pct_change())

matthewgilbert added a commit to matthewgilbert/strategy that referenced this issue Aug 22, 2018
There was a bug introduced in pandas 0.23.* using pct_change() on a
groupby. Details at pandas-dev/pandas#21200
@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Dec 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants