Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chain function produces incorrect indices if period missing #39

Open
pricemg opened this issue Oct 27, 2021 · 0 comments
Open

Chain function produces incorrect indices if period missing #39

pricemg opened this issue Oct 27, 2021 · 0 comments

Comments

@pricemg
Copy link

pricemg commented Oct 27, 2021

The chain does not handle missing periods correctly but still produces a result.

import pandas as pd
from pandas import Timestamp
import precon

df_all_periods = pd.DataFrame.from_records([
        (Timestamp('2018-01-01'), 100.000000),
        (Timestamp('2018-02-01'), 100.527400),
        (Timestamp('2018-03-01'), 100.894000),
        (Timestamp('2018-04-01'), 100.689100),
        (Timestamp('2018-05-01'), 102.670400),
        (Timestamp('2018-06-01'), 100.811000),
        (Timestamp('2018-07-01'), 102.632500),
        (Timestamp('2018-08-01'), 103.133200),
        (Timestamp('2018-09-01'), 103.111400),
        (Timestamp('2018-10-01'), 103.417700),
        (Timestamp('2018-11-01'), 103.155800),
        (Timestamp('2018-12-01'), 103.616800),
        (Timestamp('2019-01-01'), 104.246480),
        (Timestamp('2019-02-01'), 101.093900),
        (Timestamp('2019-03-01'), 101.726900),
        (Timestamp('2019-04-01'), 100.478600),  # April 2019 value present
        (Timestamp('2019-05-01'), 100.647800),
        (Timestamp('2019-06-01'), 100.439100),
        (Timestamp('2019-07-01'), 102.181900),
        (Timestamp('2019-08-01'), 100.608800),
        (Timestamp('2019-09-01'), 102.067000),
        (Timestamp('2019-10-01'), 102.418300),
        (Timestamp('2019-11-01'), 102.769600),
        (Timestamp('2019-12-01'), 103.120900),
        (Timestamp('2020-01-01'), 103.519414),
        (Timestamp('2020-02-01'), 100.710500),
    ],
    columns=('period', 'index_value'),
).set_index('period')

df_period_missing = pd.DataFrame.from_records([
        (Timestamp('2018-01-01'), 100.000000),
        (Timestamp('2018-02-01'), 100.527400),
        (Timestamp('2018-03-01'), 100.894000),
        (Timestamp('2018-04-01'), 100.689100),
        (Timestamp('2018-05-01'), 102.670400),
        (Timestamp('2018-06-01'), 100.811000),
        (Timestamp('2018-07-01'), 102.632500),
        (Timestamp('2018-08-01'), 103.133200),
        (Timestamp('2018-09-01'), 103.111400),
        (Timestamp('2018-10-01'), 103.417700),
        (Timestamp('2018-11-01'), 103.155800),
        (Timestamp('2018-12-01'), 103.616800),
        (Timestamp('2019-01-01'), 104.246480),
        (Timestamp('2019-02-01'), 101.093900),
        (Timestamp('2019-03-01'), 101.726900),
        (Timestamp('2019-04-01'), None),  # April 2019 value missing
        (Timestamp('2019-05-01'), 100.647800),
        (Timestamp('2019-06-01'), 100.439100),
        (Timestamp('2019-07-01'), 102.181900),
        (Timestamp('2019-08-01'), 100.608800),
        (Timestamp('2019-09-01'), 102.067000),
        (Timestamp('2019-10-01'), 102.418300),
        (Timestamp('2019-11-01'), 102.769600),
        (Timestamp('2019-12-01'), 103.120900),
        (Timestamp('2020-01-01'), 103.519414),
        (Timestamp('2020-02-01'), 100.710500),
    ],
    columns=('period', 'index_value'),
).set_index('period')

expected = pd.DataFrame.from_records([
        (Timestamp('2018-01-01'), 100.000000),
        (Timestamp('2018-02-01'), 100.527400),
        (Timestamp('2018-03-01'), 100.894000),
        (Timestamp('2018-04-01'), 100.689100),
        (Timestamp('2018-05-01'), 102.670400),
        (Timestamp('2018-06-01'), 100.811000),
        (Timestamp('2018-07-01'), 102.632500),
        (Timestamp('2018-08-01'), 103.133200),
        (Timestamp('2018-09-01'), 103.111400),
        (Timestamp('2018-10-01'), 103.417700),
        (Timestamp('2018-11-01'), 103.155800),
        (Timestamp('2018-12-01'), 103.616800),
        (Timestamp('2019-01-01'), 104.246480),
        (Timestamp('2019-02-01'), 105.386833),
        (Timestamp('2019-03-01'), 106.046713),
        (Timestamp('2019-04-01'), 104.745404),
        (Timestamp('2019-05-01'), 104.921789),
        (Timestamp('2019-06-01'), 104.704227),
        (Timestamp('2019-07-01'), 106.521034),
        (Timestamp('2019-08-01'), 104.881133),
        (Timestamp('2019-09-01'), 106.401255),
        (Timestamp('2019-10-01'), 106.767473),
        (Timestamp('2019-11-01'), 107.133691),
        (Timestamp('2019-12-01'), 107.499909),
        (Timestamp('2020-01-01'), 107.915346),
        (Timestamp('2020-02-01'), 108.682084),
    ],
    columns=('period', 'index_value'),
).set_index('period')

df_all_periods['chained'] = precon.chain(df_all_periods)

df_period_missing['chained'] = precon.chain(df_period_missing)

pd.concat([df_all_periods, df_period_missing, expected], keys=['all_periods', 'period_missing', 'expected'], axis=1)

In the above example expected is calculated for if all periods are present but using the equation of unlinked index * linked base / 100 so the chained indices after the missing period are not affected. precon.chain doesn't have an issue as it uses a backfill after shifting the indices by one period to fill in the first month.

@pricemg pricemg changed the title Function produces incorrect indices if period missing Chain function produces incorrect indices if period missing Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant