Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested pd.Series (and more) dont get unraveled properly some calls (i think it is numerical jacobeans) #476

Open
mbkjaer opened this issue Feb 2, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@mbkjaer
Copy link

mbkjaer commented Feb 2, 2024

Bug description

Surface Level: When the parameters are a nested series, (using numerical differensiation and "scipy_slsqp"). The function calls works as intended. But after a few calls, the code stops. (See full error message below)

Lower levels:
Firstly this can be traced to when the jacobian(p) function (from nonlinear_constraints.py) is called.

A few layers down, i have identified that the problems lies with the input to the function tree_flatten() (from tree_utils.py), it happens when registry=None. In turn, this leads to get_registry() (from registry.py), beeing called with types=None. This function now returns a list of only the default_types, this list does NOT include "pandas.Series".

Thoughts for fixing

The place when it works (in the function calls before the jacobean) the get_tree_converter() calls tree_flatten() in the following way:
_registry = get_registry(extended=True) #NOTE that this get_registry function is from tree_registry.py
.... = tree_flatten(params, registry=_registry)

i made a week attempt to use this code everywhere: registry = get_registry(extended=True), but did not get it to work.
Altough I do not have more than a beginner level undertanding of the estimagic code, it seems to me that the more general fix is to always use exteded list, right now there a are two "default-like" registeres. And it seems like the solution used upuntill now is to use get_registry(extended=True)

My quick-fix in get_registry() from from registry.py

#types = [] if types is None else types
types = ["pandas.Series"] if types is None else types

NB! (that would have saved me some confusion!)

get_registry() from rom tree_utils.py is not the same as get_registry() from from registry.py

To Reproduce

import numpy as np
import estimagic as em
from pandas import Series

def constraint_function(x,y):
return np.array((x[0]**2 - Series([1,2,3])))

def criterion(x):
return np.sum((x[0] - 1)**2)

y_values = Series([Series([1,2,3]),3])

constraint_dict = {
"type": "nonlinear",
"selector": lambda x: x,
"func": lambda x: constraint_function(x,y_values),
"value": 0.0,
}

res = em.minimize(
criterion=criterion,
params=Series([-np.ones(3),1]),
algorithm="scipy_slsqp",
constraints=constraint_dict
)

Expected behavior

Surface: No crash would be expected.

Mid level depth: I think tree_flatten() should NOT be called with registry=None
(alternatively registry==None should make the extended registrer)

Screenshots/Error messages

File "....\estimagic_nonlinconstraints_nested_series_failed.py", line 30, in
res = em.minimize(
File "C:\Python_Stuff\Project\lib\site-packages\estimagic\optimization\optimize.py", line 407, in minimize
return _optimize(
File "C:\Python_Stuff\Project\lib\site-packages\estimagic\optimization\optimize.py", line 754, in _optimize
raw_res = internal_algorithm(problem_functions, x=x, step_id=step_ids[0])
File "C:\Python_Stuff\Project\lib\site-packages\estimagic\optimization\get_algorithm.py", line 212, in algorithm_with_history_collection
out = algorithm(
_kwargs)
File "C:\Python_Stuff\Project\lib\site-packages\estimagic\optimization\get_algorithm.py", line 139, in wrapper_add_logging_algorithm
res = algorithm(**_kwargs)
File "C:\Python_Stuff\Project\lib\site-packages\estimagic\decorators.py", line 241, in wrapper_mark_minimizer
return func(*args, **kwargs)
File "C:\Python_Stuff\Project\lib\site-packages\estimagic\optimization\scipy_optimizers.py", line 128, in scipy_slsqp
res = scipy.optimize.minimize(
File "C:\Python_Stuff\Project\lib\site-packages\scipy\optimize_minimize.py", line 719, in minimize
res = _minimize_slsqp(fun, x0, args, jac, bounds,
File "C:\Python_Stuff\Project\lib\site-packages\scipy\optimize_slsqp_py.py", line 418, in _minimize_slsqp
a = _eval_con_normals(x, cons, la, n, m, meq, mieq)
File "C:\Python_Stuff\Project\lib\site-packages\scipy\optimize_slsqp_py.py", line 486, in _eval_con_normals
a_eq = vstack([con['jac'](x, *con['args'])
File "C:\Python_Stuff\Project\lib\site-packages\scipy\optimize_slsqp_py.py", line 486, in
a_eq = vstack([con['jac'](x, *con['args'])
File "C:\Python_Stuff\Project\lib\site-packages\estimagic\parameters\nonlinear_constraints.py", line 124, in _internal_jacobian
jac = jacobian(selected)
File "C:\Python_Stuff\Project\lib\site-packages\estimagic\parameters\nonlinear_constraints.py", line 101, in jacobian
return first_derivative(
File "C:\Python_Stuff\Project\lib\site-packages\estimagic\differentiation\derivatives.py", line 270, in first_derivative
derivative = matrix_to_block_tree(jac, f0_tree, params)
File "C:\Python_Stuff\Project\lib\site-packages\estimagic\parameters\block_trees.py", line 50, in matrix_to_block_tree
raw_block = block_values.reshape((*s1, *s2))
ValueError: cannot reshape array of size 12 into shape (3,2)

System

  • OS: [Windows 10 ]
  • Version [22H2]

Feel free to give me an email, and to arrage a call if you feel that could be helpfull!

@mbkjaer mbkjaer added the bug Something isn't working label Feb 2, 2024
@timmens
Copy link
Member

timmens commented Feb 2, 2024

Hello!

Thanks for opening an issue and for digging so deep yourself. This is indeed a bug, and
we need to fix it.

We will try to fix it with the next release, but this can take some time. In the mean
time, I'd like to propose a workaround.

The problem is that block trees for parameters that consist of nested pandas.Series,
pandas.DataFrame, or numpy.ndarray are not properly handled in estimagic. As you
already noted, these are required whenever estimagic computes Jacobians or Hessians. One way around this
problem is to rewrite the parameter object so it does not contain nested
pandas.Series. For example, in the minimal example you posted, we could replace the
outer series with a dictionary:

import estimagic as em
import pandas as pd

params = {
    "series": pd.Series([1, 2, 3]),
    "scalar": 3,
}

def criterion(params):
    return np.sum((params["series"] - 1) ** 2) + params["scalar"] ** 2

def constraint_function(ser):
    return ser ** 2

constraint_dict = {
    "type": "nonlinear",
    "selector": lambda params: params["series"],
    "func": constraint_function,
    "value": pd.Series([1, 2, 3]),
}

res = em.minimize(
    criterion=criterion,
    params=params,
    algorithm="scipy_slsqp",
    constraints=constraint_dict
)
res.params
{
    'series': 0    1.000000
              1    1.414214
              2    1.732051
              dtype: float64,

    'scalar': 1.5444802223500015e-06
}

I hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants