Backwards-compatibility of variable names #46

khaeru · 2023-12-06T13:16:10Z

At the SWG meeting on 2023-12-06, Masa Sugiyama and others raised the idea of how to support backward-compatibility if it becomes necessary to change a variable name.

This issue is to discuss/collect ideas.

khaeru · 2023-12-06T13:48:43Z

My suggestion:

In the NAVIGATE project, we made use of the fact that the nomenclature package (which is used with this repo) tolerates (or reads and stores?) extra entries in code lists. For example, we had:
```
- NAV_Dem-20C-all_u:
    navigate_task: T3.5
    navigate_climate_policy: 20C
    navigate_T35_policy: act+ele+tec
```
In this, navigate_T35_policy is like description, units, or other attributes.
This is analogous to/imitates the SDMX concept of an Annotation.
We should simply specify a common annotation ID that would contain 1 or a list of older/superseded/alias variable names. For instance:
```
- Final Energy|Foo|Bar:
    iamc-variable-superseded: |
      Final Energy|Bar|Foo
      Final Energy|Foo Bar
```
It could be iamc-variable-synonym, iamc-variable-old, or anything—I don't have any strong preference here.
Code that needs to handle older data could then access these annotations for info on the correspondence of old and current names, for instance to construct a "mapping" or "table", perform replacement, or whatever makes sense in a particular implementation.

A minimal working example (MWE) using SDMX:

import sdmx
import sdmx.model.v21 as m

# Create a Code whose ID is a current variable name
c = m.Code(id="Final Energy|Foo|Bar")

# Create an annotation containing old/superseded variable names
ann = m.Annotation(
    id="iamc-variable-old",
    text="\n".join(
        ["Final Energy|Bar|Foo", "Final Energy|Foo Bar"],
    )
)
c.annotations.append(ann)

# Write to file
cl = m.Codelist(id="VARIABLE", name="IAMC variable name")
cl.append(c)
msg = sdmx.message.StructureMessage()
msg.add(cl)
with open("example.xml", "wb") as f:
    f.write(sdmx.to_xml(msg, pretty_print=True))

This gives output like:

…
  <str:Code id="Final Energy|Foo|Bar">
    <com:Annotations>
      <com:Annotation id="iamc-variable-old">
        <com:AnnotationText xml:lang="en">Final Energy|Bar|Foo
Final Energy|Foo Bar</com:AnnotationText>
      </com:Annotation>
    </com:Annotations>
  </str:Code>

And can be read and used like:

# Read the file, retrieve the codelist
>>> msg = sdmx.read_sdmx("example.xml")
>>> cl = msg.codelist["VARIABLE"] 

# Retrieve a specific variable name
>>> c = cl["Final Energy|Foo|Bar"]
>>> c
<Code Final Energy|Foo|Bar>

# Retrieve the list of old names from the annotation
>>> c.eval_annotation("iamc-variable-old").split("\n")
['Final Energy|Bar|Foo', 'Final Energy|Foo Bar']

christophbertram · 2023-12-06T21:50:21Z

Do I understand it right that you say we can in principle add as many entries as we want? The old examples of the ENGAGE and NAVIGATE template only seem to have the entries "description" and "unit", but you say we could also add extra entries for storing the 'old' name.
And then similarly, we could also create extra entries to denote maximum and minimum allowed per-capita values, and aliases with other data structures (e.g. the iTEM transport variable names or similar).

khaeru · 2023-12-07T08:58:27Z

@christophbertram I say we should agree on as many common annotations as we need, and that doing so is a feature of the SDMX standard (and supported by tools that implement it). What I don't know is whether the nomenclature tool that @phackstock and @danielhuppmann have developed supports access and use of such annotations: I only know we can put such entries in YAML files such as appear in this repo and they will be tolerated by nomenclature, i.e. it won't error when trying to read the files.

Per full-resolution keys: yes, exactly. I hope we can provide a proof-of-concept when linking the iTEM structure info to this repo.

Per "minimum and maximum allowed values per capita"—I think that is actually data, not structure. You can imagine an IAMC-structured table (or with fewer or more dimensions, e.g. possibly without YEAR or REGION) in which the numbers are not "actual observed historical values" nor "model-projection values" but "expected {minimum,maximum} per capita values". One could imagine having different sets of such values for different purposes, even when the same variable names are used.

danielhuppmann · 2023-12-11T08:20:05Z

Thanks for raising this issue, see a few comments below. Let's please try to keep issues and discussions narrow and start new issues where possible.

Cross-reference to legacy variables/regions or other standards: this is already implemented in a simple example here, see

common-definitions/definitions/variable/energy/final-energy.yaml

Line 115 in 3f530e2

navigate: Final Energy|Carbon Removal|Electricity|{Carbon Removal Option}

and the value can be accessed from the nomenclature.DataStructureDefinition as

dsd.variable["Final Energy|Carbon Removal|Direct Air Capture|Electricity"].navigate

If you have specific suggestions for feature-support in nomenclature, e.g. as a "known" attribute with dedicated documentation, please start an issue there.

Validation of values should indeed be handled as a separate use-case and will be implemented similar to the required-data feature in nomenclature, see here. This PR IAMconsortium/pyam#804 is a step towards support for that feature.
The main reason for keeping this separate is that different projects may want to use different reference data or validation thresholds.

FlorianLeblancDr · 2024-02-16T09:39:18Z

I think this is partly fixed by yesterday's Daniel commit
#PR61

khaeru added the question Further information is requested label Dec 6, 2023

khaeru added discuss Gather ideas and consensus on specific topics and removed question Further information is requested labels Dec 6, 2023

danielhuppmann mentioned this issue Apr 11, 2024

Guard against methods-terms as sub-categories #89

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backwards-compatibility of variable names #46

Backwards-compatibility of variable names #46

khaeru commented Dec 6, 2023

khaeru commented Dec 6, 2023

christophbertram commented Dec 6, 2023

khaeru commented Dec 7, 2023

danielhuppmann commented Dec 11, 2023 •

edited

FlorianLeblancDr commented Feb 16, 2024

Backwards-compatibility of variable names #46

Backwards-compatibility of variable names #46

Comments

khaeru commented Dec 6, 2023

khaeru commented Dec 6, 2023

christophbertram commented Dec 6, 2023

khaeru commented Dec 7, 2023

danielhuppmann commented Dec 11, 2023 • edited

FlorianLeblancDr commented Feb 16, 2024

danielhuppmann commented Dec 11, 2023 •

edited