init sankey #770

mabudz · 2024-01-07T21:19:52Z

Create sankey diagram and add tutorial.

The approach here creates the sankey diagram by using the sankey plot function of the pyam package. This function requires a mapping dictionary. The mapping dict is created by using the newly created automatic report message::sankey and a utility function sankey_mapper().

This pull request is related to the draft pull request in message_data , which can be removed, since the sankey diagram has been implemented in the pyam package.

How to review

General approach should be reviewed. Also, I am not sure if the util function sankey_mapper() is at the right place.

PR checklist

Continuous integration checks all ✅
Add or expand tests; coverage checks both ✅
Add, expand, or update documentation.
Update release notes.

CLAassistant · 2024-01-07T21:19:57Z

All committers have signed the CLA.

glatterf42 · 2024-01-10T12:29:09Z

Thanks for this PR :)
Getting the tests to pass is a bit tricky since the API for pyam changed when they went from version 1.9.0 to 2.0.0 (as you might expect) and our code needs to satisfy both versions since we test on Python <= 3.9, which is not supported for pyam 2.0.0. So let me take a closer look, maybe we'd want to use the function that pyam's sankey function wraps around directly, after all.

glatterf42 · 2024-03-07T09:58:36Z

Thanks @daymontas1 for stepping up here, much appreciated :)

You should be able to test the PR as is and see how you like it's usefulness, but there are a few things that I would like to see happen before we merge it:

Please rebase on main to keep it up to date with current developments.
Please migrate the main functionality from util/__init__.py to somewhere more appropriate. E.g. report/sankey.py if it's a reporting tool or util/sankey.py if there are more general use cases.
Please check if we can use plotly directly instead of pyam's wrapper functionality. This should allow us to utilize more design options and avoids some dependency constraints (see above). See some examples and their docs.
Please confirm that we want to integrate the sankey functionality in the reporting as it currently stands. I'd probably ask at least @khaeru for his opinion.
Please write a test if possible to confirm that sankey_mapper maps the correct values to the correct keys. We might also need to adapt one or two reporting tests that check the number of reporting steps if this stays in the default reporting workflow.

Please feel free to ask about any of these steps if you need help :)

gidden · 2024-03-13T12:34:51Z

Hi all, to simplify here, I would simply limit this functionality to pyam > 2 - we can do a version check for this on the fly and raise an error.

gidden · 2024-03-13T12:36:38Z

message_ix/report/__init__.py

@@ -141,6 +141,7 @@
        "message::costs",
        "message::emissions",
    ),
+    ("message::sankey", "concat", "out::pyam", "in::pyam"),


I would get rid of this, and simply do the concatenation outside of reporter (e.g., in the jupyter notebook here)

I kept this for now as I don't know what the Reporter actually does and it seems useful to just get the dataframe format we need from the Reporter. But please elaborate :)

gidden · 2024-03-13T12:40:15Z

message_ix/util/__init__.py

+    df_filtered = df.filter(region=region + "*", year=year)
+
+    mapping = {}
+    for var in df_filtered["variable"]:


this whole function could be cleaned up as follows:

mapping = {} for var in df.filter(region=region + "*", year=year).variable: if var in variables_not_needed or var in flows_not_needed: continue is_input = gvc(var, 0) == "in" start_idx, end_idx = [1, 2], [3, 4] if is_input else [3, 4], [1, 2] mapping[var] = (gvc(var, start_idx, join=True), gvc(var, end_idx, join=True)) return mapping

Thanks for this suggestion, it almost worked out of the box :)
Maybe most interesting to you: you need

(start_idx, end_idx) = ([1, 2], [3, 4]) if is_input else ([3, 4], [1, 2])

to do the combined check you had in mind.

gidden · 2024-03-13T12:43:22Z

message_ix/util/__init__.py

+        if gvc(var, 0) == "out":
+            mapping[var] = (gvc(var, [3, 4], join=True), gvc(var, [1, 2], join=True))
+
+    for k in mapping.keys():


If above works, then this full block can be removed

Not quite sure my edits support the same range of exclusion as the original did, but I'm not quite sure what the difference between "flow" and "variable" is and you can now exclude things like "final|electricity", which seems to be what we want?

I think it makes fully sense to merge both as you did. I used an older piece of code for that. And the reason for distinguishing "flow" and "variable" was, if I remember correctly, that some reports could have something like emissions or costs included. But even in this case one "exclude" list would have been sufficient ;-)

Thanks for confirming :)

gidden · 2024-03-13T12:45:44Z

message_ix/util/__init__.py

@@ -232,3 +233,23 @@ def expand_dims(scenario: Scenario, name, **data):

        # Add the expanded data
        scenario.add_par(name, new_data)
+
+
+def sankey_mapper(df, year, region, flows_not_needed=[], variables_not_needed=[]):


at the top of this function we can check that pyam is > 2

I think we can make this work without requiring pyam > 2 :)

try: from pyam.str import get_variable_components as gvc except ImportError: # Python < 3.10, pandas < 2.0 from pyam.utils import get_variable_components as gvc

Should do the trick :)

* In recent pyam versions, get_variable_components moved to pyam.str

glatterf42 · 2024-03-14T13:48:50Z

Rebased onto current main and expanded the PR a bit :)
Thanks for jumping in here, @gidden :)
@daymontas1, please pull these latest changes before making further edits. If you already have some edits locally, you can git stash them, git pull these changes and git stash pop your work on top of this.

The missing type hints (from the failing code quality check) probably indicate that we need to add pyam.* to our mypy config exclude list.

glatterf42 · 2024-03-14T13:54:33Z

The other main error I'm seeing just indicates that by adding this reporter step, we need to adapt the expected value of reporter steps in another test, which is no problem :)

glatterf42 added this to the 3.9 milestone Jan 11, 2024

glatterf42 self-assigned this Jan 11, 2024

glatterf42 added the enh New features & functionality label Jan 11, 2024

gidden reviewed Mar 13, 2024

View reviewed changes

mabudz and others added 7 commits March 14, 2024 11:39

init sankey

fbdc852

Apply formatting and fix import from pyam

a71ff88

* In recent pyam versions, get_variable_components moved to pyam.str

Move sankey_mapper to own file

a519027

Add plotly as optional dependency

801ec77

Update tutorial

8887d29

Remove forgotten temp output

487158e

Add test for sankey_mapper

bf156ee

glatterf42 force-pushed the sankey branch from 748a889 to bf156ee Compare March 14, 2024 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

init sankey #770

init sankey #770

mabudz commented Jan 7, 2024

CLAassistant commented Jan 7, 2024 •

edited

glatterf42 commented Jan 10, 2024

glatterf42 commented Mar 7, 2024

gidden commented Mar 13, 2024

gidden Mar 13, 2024

glatterf42 Mar 14, 2024

gidden Mar 13, 2024 •

edited

glatterf42 Mar 14, 2024

gidden Mar 13, 2024

glatterf42 Mar 14, 2024

mabudz Mar 16, 2024

glatterf42 Mar 18, 2024

gidden Mar 13, 2024

glatterf42 Mar 14, 2024

glatterf42 commented Mar 14, 2024

glatterf42 commented Mar 14, 2024

init sankey #770

Are you sure you want to change the base?

init sankey #770

Conversation

mabudz commented Jan 7, 2024

How to review

PR checklist

CLAassistant commented Jan 7, 2024 • edited

glatterf42 commented Jan 10, 2024

glatterf42 commented Mar 7, 2024

gidden commented Mar 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gidden Mar 13, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glatterf42 commented Mar 14, 2024

glatterf42 commented Mar 14, 2024

CLAassistant commented Jan 7, 2024 •

edited

gidden Mar 13, 2024 •

edited