Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confirm expected behaviour of aggregate with nested hierarchy #737

Open
willu47 opened this issue Mar 22, 2023 · 4 comments
Open

Confirm expected behaviour of aggregate with nested hierarchy #737

willu47 opened this issue Mar 22, 2023 · 4 comments

Comments

@willu47
Copy link

willu47 commented Mar 22, 2023

Hi, following a question I asked in the openmod session today, please could you confirm the expected behaviour of the .aggregate function when presented with missing levels in the data hierarchy. For example, the following test fails because the two coal sub-categories Primary Energy|Fossil|Coal|Lignite and Primary Energy|Fossil|Coal|Brown are ignored.

Am I missing something?

import pandas as pd
from pyam import IamDataFrame, IAMC_IDX

LONG_IDX = IAMC_IDX + ["year"]

PRICE_NESTED_DF = pd.DataFrame(
    [
        ["model_a", "scen_a", "World", "Primary Energy|Fossil|Coal|Lignite", "EJ/yr", 2010, 10.0],
        ["model_a", "scen_a", "World", "Primary Energy|Fossil|Coal|Brown", "EJ/yr", 2010, 30.0],
        ["model_a", "scen_a", "World", "Primary Energy|Fossil|Gas", "EJ/yr", 2010, 45.0],
    ],
    columns=LONG_IDX + ["value"],
)


def test_nested_aggregate():
    actual = IamDataFrame(PRICE_NESTED_DF).aggregate(variable='Primary Energy|Fossil').data
    data = [
        ["model_a", "scen_a", "World", "Primary Energy|Fossil", "EJ/yr", 2010, 85.0]
    ]
    expected = pd.DataFrame(data, columns=LONG_IDX + ["value"])
    print(actual)
    print(expected)
    assert pd.testing.assert_frame_equal(actual, expected)
@willu47
Copy link
Author

willu47 commented Mar 22, 2023

If I add recursive=True argument to the aggregate() function I get this:

     model scenario region                    variable   unit  year  value
0  model_a   scen_a  World       Primary Energy|Fossil  EJ/yr  2010   85.0
1  model_a   scen_a  World  Primary Energy|Fossil|Coal  EJ/yr  2010   40.0

@willu47
Copy link
Author

willu47 commented Mar 22, 2023

(
    IamDataFrame(PRICE_NESTED_DF)
    .aggregate(variable='Primary Energy|Fossil', recursive=True)
    .aggregate(variable='Primary Energy|Fossil')
)

returns

     model scenario region               variable   unit  year  value
0  model_a   scen_a  World  Primary Energy|Fossil  EJ/yr  2010   40.0

@danielhuppmann
Copy link
Member

Thanks @willu47 - indeed, I'd say that this is behaving as expected.

  • If aggregate() is called without further arguments, it will uses all variables that are directly below variable
    (equivalent to filter(variable=f"{variable}|*"), level=0, see this utility method)
  • If components is given explicitly, this will be used instead
  • If recursive=True, pyam will work its way up the variable tree up to variable and only return the computed data
  • If append=True, it will append the computed data to the object and return None.
    Maybe this is what you're looking for?

Question back to you: which other behavior would you find more intuitive? Or how could we improve the docs?

Sidenote:

df.aggregate("<variable>", append=True)

has the same behavior as

df.append(df.aggregate("<variable>"))

but the first option has better performance.

@danielhuppmann
Copy link
Member

And FYI: pyam has a testing module with a function pyam.testing.assert_iamframe_equal, see the docs - this is maybe more appropriate for your use case because you don't have to worry about the order of the columns and rows (and it operates on an indexed pd.Series, so it's faster).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants