Unclear error message for concat and IamDataFrame.append and potentially undesired behaviour #707

jkikstra · 2022-09-29T12:24:48Z

Short error description:

When using either pyam.concat() or IamDataFrame.append() on two dataframes that come from files that have a column with row indices, I get an unexpected error message, saying Incompatible timeseries data index dimensions:

Moreover, it is possible to read in both files separately as a pyam.IamDataframe, so I am not entirely sure this current behaviour is desired, as one would expect to be able to concatenate to pyam.IamDataframes.

Deleting column with the indices resolves the issue.

Detail on the data:

The files looks like this:

Note:
Somehow, the first two files merge fine, while the third throws the error.

How to reproduce:
For iiasa colleagues, run the notebook: "I:\kikstra\shape-data\Combine files with pyam.ipynb"

The text was updated successfully, but these errors were encountered:

danielhuppmann · 2022-09-29T14:14:34Z

Sorry @jkikstra, but I'm afraid that this is expected behavior.

pyam is quite "generous" when it comes to additional data dimensions - it automatically assumes that any column named other than model, scenario, ... or a time dimension is an additional relevant identifier (aka "extra-column", see here). This is intended to make it user-friendly when working with e.g. subannual timeslices or additional columns to distinguish between different climate models (this is used in open-scm, I believe)

pandas reads the first "index" column as unnamed: 0, and pyam then interprets that as an extra-column. So the dimensions of this IamDataFrame is indeed incompatible with the dimensions of the other instances, and a simple concatenation is not possible.

Possible solutions:

Improve the error message to show the names of the incompatible dimensions, in this case
```
ValueError: Items have incompatible timeseries data index dimensions: 'unnamed: 0'
```
Not sure if that would be really helpful, though...
Add a method to drop_dimensions() which could be applied to the first IamDataFrame (so that you don't have to manually delete the column).
Come up with some elegant function arguments to drop incompatible dimensions or "fill" missing values during concatenation...

jkikstra · 2022-09-30T09:07:51Z

Okay great!

If we expect that this occurs relatively often, given the prevalence of pandas, I like the suggestion to first improve the error message.
The suggested small change (0.) helps a bit. Maybe you also want to consider adding a line like "Please check your data file, does your data perhaps have an unnecessary index column which you are passing as an 'extra column' here?" if the column is unnamed, potentially with a reference to the extra-columns documentation?
I think that would be very helpful, as it would make it clear that users should go to their data and fix the files.

(1.) Indeed a step further would be to add a function to do similar things. Maybe rather drop_extra_dimensions() or drop_extra_columns(), or keep_only_pyam_dimensions()? And add it to the error message - as "Consider using drop_extra_columns()" before concatenating. (Might need a warning then if users are doing it and one column is subannual?)

(2.) Hmm yes, so concat or append with an option like join_by could be a nice option. But if users are generally still not using extra columns much, that may be lower priority. Especially if you already do (the potentially simpler 0 and 1), users can deal with thing in code without having to touch the data files?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unclear error message for concat and IamDataFrame.append and potentially undesired behaviour #707

Unclear error message for concat and IamDataFrame.append and potentially undesired behaviour #707

jkikstra commented Sep 29, 2022 •

edited

danielhuppmann commented Sep 29, 2022

jkikstra commented Sep 30, 2022

Unclear error message for concat and IamDataFrame.append and potentially undesired behaviour #707

Unclear error message for concat and IamDataFrame.append and potentially undesired behaviour #707

Comments

jkikstra commented Sep 29, 2022 • edited

danielhuppmann commented Sep 29, 2022

jkikstra commented Sep 30, 2022

jkikstra commented Sep 29, 2022 •

edited