-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Equality constraints involving params values not always caught correctly #440
Comments
Thanks for the detailed issue and debugging! I think this is a general problem that occurs if a query or selector uses parameter values. The reason why we replace params with the helper is that the selected helper values will be the indices of a flattened params vector. Changing this in general is actually a non-trivial problem and it will take a while to fix this. |
Thanks, I understand that. But why would you require the "value" field in the helper and can't just use an arbitrary name? |
We use the "value" field because it avoids rewriting tree flattening rules. But we might be able to avoid it. The whole problem would be trivial to fix if we treated your case of 1 params DataFrame as a special case. Since users can also use DataFrames deep inside nested dictionaries, our solution needs to be more general. For your case, a simple workaround could be to evaluate your queries on the start params and replace them by new_constraints = []
for constr in constraints:
if "query" in constr:
new_constr = constr.copy()
q = new_constr.pop("query")
new_constr["loc"] = start_params.query(q).index
new_constraints.append(new_constr)
else:
new_constraints.append(constr) |
Almost. This works well for the entire I'll try to think through some further workaround... |
Yes, I would run this on your start params before you pass the constraints to estimagic. |
That is what I did. But different |
Put differently, >>> len(start_params)
360
>>> len(helper)
80 A query just returns an empty object, a loc-constraint yields an |
I am very surprised. We define helper as: n_params = len(tree_converter.params_flatten(params))
helper = tree_converter.params_unflatten(np.arange(n_params)) This is before any constraints handling is done, i.e. at this point we do not even know which parameters will be fixed later. |
To explain a bit more: We always treat parameters as a general pytree that can be flattened and unflattened. Even if params is just one DataFrame. A solution like: helper = params.copy()
helper["internal_index"] = tree_converter.params_unflatten(np.arange(n_params))["value"] Assumes that params is something that has a value column and can be assigned new columns. We cannot make this assumption in general. |
Thanks, I'll run it through the debugger again asap and will report back with more details. Probably my mistake somewhere. Thinking of it: Most likely explanation is that this error occurs now during the generation of sensible starting values for a full skillmodels run, i.e. the reduction in parameters is because of that. So will have to do the conversion from query to loc before setting each different model to run! |
That was it. I'd think the use case is sufficiently rare to not support it. The way to go would be to exclude if a regex like delimit = r"[ \n&|><=()]"
string_to_error_out = f"{delimit}value{delimit}" shows up in a query. Error message should include an explanation and your workaround, I am using this function now: def convert_queries_to_locs(start_params, constraints):
new_constraints = []
for constr in constraints:
if "query" in constr:
new_constr = constr.copy()
q = new_constr.pop("query")
q_result = start_params.query(q)
if not q_result.empty:
new_constr["loc"] = q_result.index
new_constraints.append(new_constr)
else:
new_constraints.append(constr)
return new_constraints |
Bug description
The following constraint:
is not caught correctly because in line 62 of
process_selectors.py
, we have:with
helper
(after applying the query):That is, the
value
fields ofhelper
contain the indices ininternal_params
(I suppose) instead of thevalue
field of the original params DataFrame.What I am actually trying to achieve is to exclude the rows with
value
s 200 and 248 in the above DataFrame because they are fixed to zero. The corresponding constraints may change as I try different specifications, so I want to do so in a programmatic way -- I'd be curious to know if there were better options.To Reproduce
I'd hope the above is clear enough, but I can try to distill an example.
Expected behavior
Rows
value
s 200 and 248 should not appear in the result ofevaluator(helper)
because theirvalue
in the DataFrame passed toestimate_ml
as theparams
argument is 0.0.System
Potential solution
Around line 40 of
process_selectors
, definehelper
as:and adjust
_get_selection_evaluator
accordingly. This would work out-of-the box forfield == "query"
with an appropriate adjustment of the key (value
->internal_index
in line 181). However, I do not see through what it would imply for other types offield
s.The text was updated successfully, but these errors were encountered: