Equality constraints involving params values not always caught correctly #440

hmgaudecker · 2023-02-28T05:39:17Z

Bug description

The following constraint:

{
    'query': "category == 'transition' & name2 == 'x' & name1 == 'y' & not (-1e-10 < value < 1e-10)",
    'type': 'equality'
}

is not caught correctly because in line 62 of process_selectors.py, we have:

selected = evaluator(helper)

with helper (after applying the query):

                                    value  lower_bound
category   period name1      name2                    
transition 0      y          x      200    -inf
           1      y          x      216    -inf
           2      y          x      232    -inf
           3      y          x      248    -inf

That is, the value fields of helper contain the indices in internal_params (I suppose) instead of the value field of the original params DataFrame.

What I am actually trying to achieve is to exclude the rows with values 200 and 248 in the above DataFrame because they are fixed to zero. The corresponding constraints may change as I try different specifications, so I want to do so in a programmatic way -- I'd be curious to know if there were better options.

To Reproduce

I'd hope the above is clear enough, but I can try to distill an example.

Expected behavior

Rows values 200 and 248 should not appear in the result of evaluator(helper) because their value in the DataFrame passed to estimate_ml as the params argument is 0.0.

System

Version 0.4.3

Potential solution

Around line 40 of process_selectors, define helper as:

helper = params.copy()
helper["internal_index"] = tree_converter.params_unflatten(np.arange(n_params))["value"]

and adjust _get_selection_evaluator accordingly. This would work out-of-the box for field == "query" with an appropriate adjustment of the key (value -> internal_index in line 181). However, I do not see through what it would imply for other types of fields.

The text was updated successfully, but these errors were encountered:

janosg · 2023-02-28T09:59:40Z

Thanks for the detailed issue and debugging!

I think this is a general problem that occurs if a query or selector uses parameter values.

The reason why we replace params with the helper is that the selected helper values will be the indices of a flattened params vector.

Changing this in general is actually a non-trivial problem and it will take a while to fix this.

hmgaudecker · 2023-02-28T10:12:22Z

Thanks, I understand that. But why would you require the "value" field in the helper and can't just use an arbitrary name?

janosg · 2023-03-01T09:54:33Z

We use the "value" field because it avoids rewriting tree flattening rules. But we might be able to avoid it.

The whole problem would be trivial to fix if we treated your case of 1 params DataFrame as a special case. Since users can also use DataFrames deep inside nested dictionaries, our solution needs to be more general.

For your case, a simple workaround could be to evaluate your queries on the start params and replace them by locs. i.e.

new_constraints = []
for constr in constraints:
    if "query" in constr:
        new_constr = constr.copy()
        q = new_constr.pop("query")
        new_constr["loc"] = start_params.query(q).index
        new_constraints.append(new_constr)
    else:
        new_constraints.append(constr)

hmgaudecker · 2023-03-01T11:22:43Z

Almost. This works well for the entire start_params, but helper only has the parameters that actually vary in the optimization problem. Hence, the index of parameters that are fixed does not exist and the estimation errors out with an IndexError.

I'll try to think through some further workaround...

janosg · 2023-03-01T12:45:25Z

Yes, I would run this on your start params before you pass the constraints to estimagic.

hmgaudecker · 2023-03-01T13:39:31Z

That is what I did. But different .loc-constraints converted this way fail because the corresponding index is not present anymore in helper.

hmgaudecker · 2023-03-01T13:40:56Z

Put differently,

>>> len(start_params)
360
>>> len(helper)
80

A query just returns an empty object, a loc-constraint yields an IndexError.

janosg · 2023-03-01T13:51:44Z

I am very surprised.

We define helper as:

n_params = len(tree_converter.params_flatten(params))
helper = tree_converter.params_unflatten(np.arange(n_params))

This is before any constraints handling is done, i.e. at this point we do not even know which parameters will be fixed later. helper should look exactly like params, except for replacing parameter values with integers ranging from 0 to n_params - 1.

janosg · 2023-03-01T13:55:46Z

To explain a bit more: We always treat parameters as a general pytree that can be flattened and unflattened. Even if params is just one DataFrame.

A solution like:

helper = params.copy()
helper["internal_index"] = tree_converter.params_unflatten(np.arange(n_params))["value"]

Assumes that params is something that has a value column and can be assigned new columns. We cannot make this assumption in general.

hmgaudecker · 2023-03-01T14:18:46Z

Thanks, I'll run it through the debugger again asap and will report back with more details. Probably my mistake somewhere.

Thinking of it: Most likely explanation is that this error occurs now during the generation of sensible starting values for a full skillmodels run, i.e. the reduction in parameters is because of that. So will have to do the conversion from query to loc before setting each different model to run!

hmgaudecker · 2023-03-02T05:08:30Z

That was it.

I'd think the use case is sufficiently rare to not support it. The way to go would be to exclude if a regex like string_to_error_out here:

delimit = r"[ \n&|><=()]"
string_to_error_out = f"{delimit}value{delimit}"

shows up in a query.

Error message should include an explanation and your workaround, I am using this function now:

def convert_queries_to_locs(start_params, constraints):
    new_constraints = []
    for constr in constraints:
        if "query" in constr:
            new_constr = constr.copy()
            q = new_constr.pop("query")
            q_result = start_params.query(q)
            if not q_result.empty:
                new_constr["loc"] = q_result.index
                new_constraints.append(new_constr)
        else:
            new_constraints.append(constr)
    return new_constraints

hmgaudecker added the bug Something isn't working label Feb 28, 2023

janosg self-assigned this Feb 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Equality constraints involving params values not always caught correctly #440

Equality constraints involving params values not always caught correctly #440

hmgaudecker commented Feb 28, 2023 •

edited

janosg commented Feb 28, 2023

hmgaudecker commented Feb 28, 2023

janosg commented Mar 1, 2023 •

edited

hmgaudecker commented Mar 1, 2023

janosg commented Mar 1, 2023

hmgaudecker commented Mar 1, 2023

hmgaudecker commented Mar 1, 2023

janosg commented Mar 1, 2023 •

edited

janosg commented Mar 1, 2023

hmgaudecker commented Mar 1, 2023

hmgaudecker commented Mar 2, 2023

Equality constraints involving params values not always caught correctly #440

Equality constraints involving params values not always caught correctly #440

Comments

hmgaudecker commented Feb 28, 2023 • edited

Bug description

To Reproduce

Expected behavior

System

Potential solution

janosg commented Feb 28, 2023

hmgaudecker commented Feb 28, 2023

janosg commented Mar 1, 2023 • edited

hmgaudecker commented Mar 1, 2023

janosg commented Mar 1, 2023

hmgaudecker commented Mar 1, 2023

hmgaudecker commented Mar 1, 2023

janosg commented Mar 1, 2023 • edited

janosg commented Mar 1, 2023

hmgaudecker commented Mar 1, 2023

hmgaudecker commented Mar 2, 2023

hmgaudecker commented Feb 28, 2023 •

edited

janosg commented Mar 1, 2023 •

edited

janosg commented Mar 1, 2023 •

edited