Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equality constraints involving params values not always caught correctly #440

Open
hmgaudecker opened this issue Feb 28, 2023 · 11 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@hmgaudecker
Copy link
Member

hmgaudecker commented Feb 28, 2023

Bug description

The following constraint:

{
    'query': "category == 'transition' & name2 == 'x' & name1 == 'y' & not (-1e-10 < value < 1e-10)",
    'type': 'equality'
}

is not caught correctly because in line 62 of process_selectors.py, we have:

selected = evaluator(helper)

with helper (after applying the query):

                                    value  lower_bound
category   period name1      name2                    
transition 0      y          x      200    -inf
           1      y          x      216    -inf
           2      y          x      232    -inf
           3      y          x      248    -inf

That is, the value fields of helper contain the indices in internal_params (I suppose) instead of the value field of the original params DataFrame.

What I am actually trying to achieve is to exclude the rows with values 200 and 248 in the above DataFrame because they are fixed to zero. The corresponding constraints may change as I try different specifications, so I want to do so in a programmatic way -- I'd be curious to know if there were better options.

To Reproduce

I'd hope the above is clear enough, but I can try to distill an example.

Expected behavior

Rows values 200 and 248 should not appear in the result of evaluator(helper) because their value in the DataFrame passed to estimate_ml as the params argument is 0.0.

System

  • Version 0.4.3

Potential solution

Around line 40 of process_selectors, define helper as:

helper = params.copy()
helper["internal_index"] = tree_converter.params_unflatten(np.arange(n_params))["value"]

and adjust _get_selection_evaluator accordingly. This would work out-of-the box for field == "query" with an appropriate adjustment of the key (value -> internal_index in line 181). However, I do not see through what it would imply for other types of fields.

@hmgaudecker hmgaudecker added the bug Something isn't working label Feb 28, 2023
@janosg
Copy link
Member

janosg commented Feb 28, 2023

Thanks for the detailed issue and debugging!

I think this is a general problem that occurs if a query or selector uses parameter values.

The reason why we replace params with the helper is that the selected helper values will be the indices of a flattened params vector.

Changing this in general is actually a non-trivial problem and it will take a while to fix this.

@janosg janosg self-assigned this Feb 28, 2023
@hmgaudecker
Copy link
Member Author

Thanks, I understand that. But why would you require the "value" field in the helper and can't just use an arbitrary name?

@janosg
Copy link
Member

janosg commented Mar 1, 2023

We use the "value" field because it avoids rewriting tree flattening rules. But we might be able to avoid it.

The whole problem would be trivial to fix if we treated your case of 1 params DataFrame as a special case. Since users can also use DataFrames deep inside nested dictionaries, our solution needs to be more general.

For your case, a simple workaround could be to evaluate your queries on the start params and replace them by locs. i.e.

new_constraints = []
for constr in constraints:
    if "query" in constr:
        new_constr = constr.copy()
        q = new_constr.pop("query")
        new_constr["loc"] = start_params.query(q).index
        new_constraints.append(new_constr)
    else:
        new_constraints.append(constr)

@hmgaudecker
Copy link
Member Author

Almost. This works well for the entire start_params, but helper only has the parameters that actually vary in the optimization problem. Hence, the index of parameters that are fixed does not exist and the estimation errors out with an IndexError.

I'll try to think through some further workaround...

@janosg
Copy link
Member

janosg commented Mar 1, 2023

Yes, I would run this on your start params before you pass the constraints to estimagic.

@hmgaudecker
Copy link
Member Author

That is what I did. But different .loc-constraints converted this way fail because the corresponding index is not present anymore in helper.

@hmgaudecker
Copy link
Member Author

Put differently,

>>> len(start_params)
360
>>> len(helper)
80

A query just returns an empty object, a loc-constraint yields an IndexError.

@janosg
Copy link
Member

janosg commented Mar 1, 2023

I am very surprised.

We define helper as:

n_params = len(tree_converter.params_flatten(params))
helper = tree_converter.params_unflatten(np.arange(n_params))

This is before any constraints handling is done, i.e. at this point we do not even know which parameters will be fixed later. helper should look exactly like params, except for replacing parameter values with integers ranging from 0 to n_params - 1.

@janosg
Copy link
Member

janosg commented Mar 1, 2023

To explain a bit more: We always treat parameters as a general pytree that can be flattened and unflattened. Even if params is just one DataFrame.

A solution like:

helper = params.copy()
helper["internal_index"] = tree_converter.params_unflatten(np.arange(n_params))["value"]

Assumes that params is something that has a value column and can be assigned new columns. We cannot make this assumption in general.

@hmgaudecker
Copy link
Member Author

Thanks, I'll run it through the debugger again asap and will report back with more details. Probably my mistake somewhere.

Thinking of it: Most likely explanation is that this error occurs now during the generation of sensible starting values for a full skillmodels run, i.e. the reduction in parameters is because of that. So will have to do the conversion from query to loc before setting each different model to run!

@hmgaudecker
Copy link
Member Author

That was it.

I'd think the use case is sufficiently rare to not support it. The way to go would be to exclude if a regex like string_to_error_out here:

delimit = r"[ \n&|><=()]"
string_to_error_out = f"{delimit}value{delimit}"

shows up in a query.

Error message should include an explanation and your workaround, I am using this function now:

def convert_queries_to_locs(start_params, constraints):
    new_constraints = []
    for constr in constraints:
        if "query" in constr:
            new_constr = constr.copy()
            q = new_constr.pop("query")
            q_result = start_params.query(q)
            if not q_result.empty:
                new_constr["loc"] = q_result.index
                new_constraints.append(new_constr)
        else:
            new_constraints.append(constr)
    return new_constraints

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants