Confirm expected behaviour of aggregate with nested hierarchy #737

willu47 · 2023-03-22T17:40:10Z

Hi, following a question I asked in the openmod session today, please could you confirm the expected behaviour of the .aggregate function when presented with missing levels in the data hierarchy. For example, the following test fails because the two coal sub-categories Primary Energy|Fossil|Coal|Lignite and Primary Energy|Fossil|Coal|Brown are ignored.

Am I missing something?

import pandas as pd
from pyam import IamDataFrame, IAMC_IDX

LONG_IDX = IAMC_IDX + ["year"]

PRICE_NESTED_DF = pd.DataFrame(
    [
        ["model_a", "scen_a", "World", "Primary Energy|Fossil|Coal|Lignite", "EJ/yr", 2010, 10.0],
        ["model_a", "scen_a", "World", "Primary Energy|Fossil|Coal|Brown", "EJ/yr", 2010, 30.0],
        ["model_a", "scen_a", "World", "Primary Energy|Fossil|Gas", "EJ/yr", 2010, 45.0],
    ],
    columns=LONG_IDX + ["value"],
)


def test_nested_aggregate():
    actual = IamDataFrame(PRICE_NESTED_DF).aggregate(variable='Primary Energy|Fossil').data
    data = [
        ["model_a", "scen_a", "World", "Primary Energy|Fossil", "EJ/yr", 2010, 85.0]
    ]
    expected = pd.DataFrame(data, columns=LONG_IDX + ["value"])
    print(actual)
    print(expected)
    assert pd.testing.assert_frame_equal(actual, expected)

The text was updated successfully, but these errors were encountered:

willu47 · 2023-03-22T17:42:24Z

If I add recursive=True argument to the aggregate() function I get this:

     model scenario region                    variable   unit  year  value
0  model_a   scen_a  World       Primary Energy|Fossil  EJ/yr  2010   85.0
1  model_a   scen_a  World  Primary Energy|Fossil|Coal  EJ/yr  2010   40.0

willu47 · 2023-03-22T22:25:19Z

(
    IamDataFrame(PRICE_NESTED_DF)
    .aggregate(variable='Primary Energy|Fossil', recursive=True)
    .aggregate(variable='Primary Energy|Fossil')
)

returns

     model scenario region               variable   unit  year  value
0  model_a   scen_a  World  Primary Energy|Fossil  EJ/yr  2010   40.0

danielhuppmann · 2023-03-23T05:23:30Z

Thanks @willu47 - indeed, I'd say that this is behaving as expected.

If aggregate() is called without further arguments, it will uses all variables that are directly below variable
(equivalent to filter(variable=f"{variable}|*"), level=0, see this utility method)
If components is given explicitly, this will be used instead
If recursive=True, pyam will work its way up the variable tree up to variable and only return the computed data
If append=True, it will append the computed data to the object and return None.
Maybe this is what you're looking for?

Question back to you: which other behavior would you find more intuitive? Or how could we improve the docs?

Sidenote:
df.aggregate("<variable>", append=True)
has the same behavior as
df.append(df.aggregate("<variable>"))
but the first option has better performance.

danielhuppmann · 2023-03-23T05:25:48Z

And FYI: pyam has a testing module with a function pyam.testing.assert_iamframe_equal, see the docs - this is maybe more appropriate for your use case because you don't have to worry about the order of the columns and rows (and it operates on an indexed pd.Series, so it's faster).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confirm expected behaviour of aggregate with nested hierarchy #737

Confirm expected behaviour of aggregate with nested hierarchy #737

willu47 commented Mar 22, 2023

willu47 commented Mar 22, 2023

willu47 commented Mar 22, 2023 •

edited by danielhuppmann

danielhuppmann commented Mar 23, 2023

danielhuppmann commented Mar 23, 2023

Confirm expected behaviour of aggregate with nested hierarchy #737

Confirm expected behaviour of aggregate with nested hierarchy #737

Comments

willu47 commented Mar 22, 2023

willu47 commented Mar 22, 2023

willu47 commented Mar 22, 2023 • edited by danielhuppmann

danielhuppmann commented Mar 23, 2023

danielhuppmann commented Mar 23, 2023

willu47 commented Mar 22, 2023 •

edited by danielhuppmann