New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
init sankey #770
base: main
Are you sure you want to change the base?
init sankey #770
Conversation
Thanks for this PR :) |
Thanks @daymontas1 for stepping up here, much appreciated :) You should be able to test the PR as is and see how you like it's usefulness, but there are a few things that I would like to see happen before we merge it:
Please feel free to ask about any of these steps if you need help :) |
Hi all, to simplify here, I would simply limit this functionality to pyam > 2 - we can do a version check for this on the fly and raise an error. |
@@ -141,6 +141,7 @@ | |||
"message::costs", | |||
"message::emissions", | |||
), | |||
("message::sankey", "concat", "out::pyam", "in::pyam"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would get rid of this, and simply do the concatenation outside of reporter (e.g., in the jupyter notebook here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept this for now as I don't know what the Reporter actually does and it seems useful to just get the dataframe format we need from the Reporter. But please elaborate :)
message_ix/util/__init__.py
Outdated
df_filtered = df.filter(region=region + "*", year=year) | ||
|
||
mapping = {} | ||
for var in df_filtered["variable"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this whole function could be cleaned up as follows:
mapping = {}
for var in df.filter(region=region + "*", year=year).variable:
if var in variables_not_needed or var in flows_not_needed:
continue
is_input = gvc(var, 0) == "in"
start_idx, end_idx = [1, 2], [3, 4] if is_input else [3, 4], [1, 2]
mapping[var] = (gvc(var, start_idx, join=True), gvc(var, end_idx, join=True))
return mapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this suggestion, it almost worked out of the box :)
Maybe most interesting to you: you need
(start_idx, end_idx) = ([1, 2], [3, 4]) if is_input else ([3, 4], [1, 2])
to do the combined check you had in mind.
message_ix/util/__init__.py
Outdated
if gvc(var, 0) == "out": | ||
mapping[var] = (gvc(var, [3, 4], join=True), gvc(var, [1, 2], join=True)) | ||
|
||
for k in mapping.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If above works, then this full block can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure my edits support the same range of exclusion as the original did, but I'm not quite sure what the difference between "flow" and "variable" is and you can now exclude things like "final|electricity", which seems to be what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes fully sense to merge both as you did. I used an older piece of code for that. And the reason for distinguishing "flow" and "variable" was, if I remember correctly, that some reports could have something like emissions or costs included. But even in this case one "exclude" list would have been sufficient ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming :)
message_ix/util/__init__.py
Outdated
@@ -232,3 +233,23 @@ def expand_dims(scenario: Scenario, name, **data): | |||
|
|||
# Add the expanded data | |||
scenario.add_par(name, new_data) | |||
|
|||
|
|||
def sankey_mapper(df, year, region, flows_not_needed=[], variables_not_needed=[]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the top of this function we can check that pyam is > 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can make this work without requiring pyam > 2 :)
try:
from pyam.str import get_variable_components as gvc
except ImportError: # Python < 3.10, pandas < 2.0
from pyam.utils import get_variable_components as gvc
Should do the trick :)
* In recent pyam versions, get_variable_components moved to pyam.str
Rebased onto current main and expanded the PR a bit :) The missing type hints (from the failing code quality check) probably indicate that we need to add |
The other main error I'm seeing just indicates that by adding this reporter step, we need to adapt the expected value of reporter steps in another test, which is no problem :) |
Create sankey diagram and add tutorial.
The approach here creates the sankey diagram by using the sankey plot function of the
pyam
package. This function requires a mapping dictionary. The mapping dict is created by using the newly created automatic reportmessage::sankey
and a utility functionsankey_mapper()
.This pull request is related to the draft pull request in message_data , which can be removed, since the sankey diagram has been implemented in the
pyam
package.How to review
General approach should be reviewed. Also, I am not sure if the util function
sankey_mapper()
is at the right place.PR checklist