Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Is there a multivariate filtering feature? #848

Closed
pkufubo opened this issue Apr 18, 2024 · 5 comments
Closed

Question: Is there a multivariate filtering feature? #848

pkufubo opened this issue Apr 18, 2024 · 5 comments

Comments

@pkufubo
Copy link

pkufubo commented Apr 18, 2024

Hello pyam community,

First of all, thank you for your ongoing efforts in developing and maintaining the pyam package.

I'd like to discuss the potential for an enhanced feature related to multivariate filtering. In my recent projects, I've encountered where it would be beneficial to select data that meets multiple variable criteria simultaneously. For instance, I often need to select scenarios that include both 'Emissions|CH4' and 'Emissions|NOx'.

Currently, I manually filter for each variable and then compute the intersection of these filters to find scenarios that include all specified variables. This process can be quite cumbersome and error-prone, especially with a large number of variables.

Is there an existing feature that simplifies this process? If not, I believe adding a multivariate filtering feature that allows users to specify multiple variables and returns scenarios containing all these variables would be extremely helpful. Such a feature would enhance the usability and efficiency of data handling within the pyam framework.

Thank you for considering this suggestion. I look forward to your cooments and any potential updates.

Best regards,
Bo Fu

@danielhuppmann
Copy link
Member

Thank you, @pkufubo, for reaching out.

If I understand your question correctly, then the simple answer is that you can use a list as filter-argument, i.e.

df.filter(variable=["Emissions|CH4", "Emissions|NOx"])

This is documented in the slice() method here, which is used by the filter() method.

@danielhuppmann
Copy link
Member

@pkufubo, did my suggestion answer your question? If yes, please close this issue, or clarify.

@pkufubo
Copy link
Author

pkufubo commented Apr 29, 2024

@danielhuppmann Thank you for your response, and my apologies for the delay in getting back to you.
I'm afraid I haven't made my requirements clear. I want to select the scenarios where both "Emissions|CH4" and "Emissions|NOx" are available. If only one or zero variable is supplied, I would like to exclude that scenario from consideration.

The code you give df.filter(variable=["Emissions|CH4", "Emissions|NOx"]) seems to find the union of df.filter(variable=["Emissions|CH4"]) and df.filter(variable=["Emissions|NOx"]).

I wrote a script to meet such requirements, but it's long.

variable_list = ['Emissions|CH4','Emissions|NOx']
model_scenario_list = []
for model,scenario in df.index:
    df_filter = df.filter(model=model, scenario=scenario)
    if 0 in [len(df_filter.filter(variable=var)) for var in variable_list ]:  ##if any varible is missing, drop the model-scenario
        continue
    else:
        model_scenario_list .append(df_filter) # the model-scenario selected
data_sel = pyam.concat(model_scenario_list )

Does pyam has an API for selection like this? Thank you again.

@danielhuppmann
Copy link
Member

Thanks for the clarification, indeed, there is an easier option:

variable_list = ['Emissions|CH4', 'Emissions|NOx']
df.require_data(variable=variable_list, exclude_on_fail=True)
df_sel = df.filter(exclude=False)

See the docs of df.require_data() for more info.

Also, you code could be simplified like:

  if all([v in df_filter.variable for v in variable_list]): 
      model_scenario_list.append(df_filter)

@pkufubo
Copy link
Author

pkufubo commented Apr 30, 2024

Thank you for your kind rely. My question has perfectlly answered and I will close this issue.

@pkufubo pkufubo closed this as completed Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants