Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear error message for concat and IamDataFrame.append and potentially undesired behaviour #707

Open
jkikstra opened this issue Sep 29, 2022 · 2 comments

Comments

@jkikstra
Copy link
Collaborator

jkikstra commented Sep 29, 2022

Short error description:

When using either pyam.concat() or IamDataFrame.append() on two dataframes that come from files that have a column with row indices, I get an unexpected error message, saying Incompatible timeseries data index dimensions:

Moreover, it is possible to read in both files separately as a pyam.IamDataframe, so I am not entirely sure this current behaviour is desired, as one would expect to be able to concatenate to pyam.IamDataframes.

Deleting column with the indices resolves the issue.

image

Detail on the data:

The files looks like this:
image

Note:
Somehow, the first two files merge fine, while the third throws the error.

How to reproduce:
For iiasa colleagues, run the notebook: "I:\kikstra\shape-data\Combine files with pyam.ipynb"

@danielhuppmann
Copy link
Member

Sorry @jkikstra, but I'm afraid that this is expected behavior.

pyam is quite "generous" when it comes to additional data dimensions - it automatically assumes that any column named other than model, scenario, ... or a time dimension is an additional relevant identifier (aka "extra-column", see here). This is intended to make it user-friendly when working with e.g. subannual timeslices or additional columns to distinguish between different climate models (this is used in open-scm, I believe)

pandas reads the first "index" column as unnamed: 0, and pyam then interprets that as an extra-column. So the dimensions of this IamDataFrame is indeed incompatible with the dimensions of the other instances, and a simple concatenation is not possible.

Possible solutions:

  1. Improve the error message to show the names of the incompatible dimensions, in this case
    ValueError: Items have incompatible timeseries data index dimensions: 'unnamed: 0'
    Not sure if that would be really helpful, though...
  2. Add a method to drop_dimensions() which could be applied to the first IamDataFrame (so that you don't have to manually delete the column).
  3. Come up with some elegant function arguments to drop incompatible dimensions or "fill" missing values during concatenation...

@jkikstra
Copy link
Collaborator Author

Okay great!

If we expect that this occurs relatively often, given the prevalence of pandas, I like the suggestion to first improve the error message.
The suggested small change (0.) helps a bit. Maybe you also want to consider adding a line like "Please check your data file, does your data perhaps have an unnecessary index column which you are passing as an 'extra column' here?" if the column is unnamed, potentially with a reference to the extra-columns documentation?
I think that would be very helpful, as it would make it clear that users should go to their data and fix the files.

(1.) Indeed a step further would be to add a function to do similar things. Maybe rather drop_extra_dimensions() or drop_extra_columns(), or keep_only_pyam_dimensions()? And add it to the error message - as "Consider using drop_extra_columns()" before concatenating. (Might need a warning then if users are doing it and one column is subannual?)

(2.) Hmm yes, so concat or append with an option like join_by could be a nice option. But if users are generally still not using extra columns much, that may be lower priority. Especially if you already do (the potentially simpler 0 and 1), users can deal with thing in code without having to touch the data files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants