Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C/S update error #188

Open
hdoupe opened this issue Nov 8, 2021 · 3 comments
Open

C/S update error #188

hdoupe opened this issue Nov 8, 2021 · 3 comments

Comments

@hdoupe
Copy link
Collaborator

hdoupe commented Nov 8, 2021

I'm getting this error when trying to update Tax-Brain for Compute Studio:

ValueError: Must have equal len keys and value when setting with an iterable
Full stack trace
_______________________________________________________________________________ TestFunctions1.test_run_model ________________________________________________________________________________

self = <cs_config.tests.test_functions.TestFunctions1 object at 0x7fefc584fb50>

    def test_run_model(self):
        self.test_all_data_specified()
        inputs = self.get_inputs({})
        check_get_inputs(inputs)
    
        class MetaParams(Parameters):
            defaults = inputs["meta_parameters"]
    
        mp_spec = MetaParams().specification(serializable=True)
    
>       result = self.run_model(mp_spec, self.ok_adjustment)

../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/cs_kit/validate.py:221: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
cs-config/cs_config/functions.py:150: in run_model
    results = compute(*delayed_list)
../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/dask/base.py:568: in compute
    results = schedule(dsk, keys, **kwargs)
../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/dask/threaded.py:79: in get
    results = get_async(
../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/dask/local.py:517: in get_async
    raise_exception(exc, tb)
../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/dask/local.py:325: in reraise
    raise exc
../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/dask/local.py:223: in execute_task
    result = _execute_task(task, data)
../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/dask/core.py:121: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
cs-config/cs_config/helpers.py:187: in nth_year_results
    agg1, agg2 = fuzzed(dv1, dv2, reform_affected, 'aggr')
cs-config/cs_config/helpers.py:158: in fuzzed
    group2.iloc[idx] = group1.iloc[idx]
../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/pandas/core/indexing.py:723: in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/pandas/core/indexing.py:1730: in _setitem_with_indexer
    self._setitem_with_indexer_split_path(indexer, value, name)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pandas.core.indexing._iLocIndexer object at 0x7fef9101f7c0>, indexer = (223671, slice(None, None, None))
value = c62100                  9009.470531
aftertax_income        10535.143557
payrolltax              1378.448991
benefit_va...          1.000000
benefit_cost_total         0.000000
s006                     231.800000
Name: 223671, dtype: float64
name = 'iloc'

    def _setitem_with_indexer_split_path(self, indexer, value, name: str):
        """
        Setitem column-wise.
        """
        # Above we only set take_split_path to True for 2D cases
        assert self.ndim == 2
    
        if not isinstance(indexer, tuple):
            indexer = _tuplify(self.ndim, indexer)
        if len(indexer) > self.ndim:
            raise IndexError("too many indices for array")
        if isinstance(indexer[0], np.ndarray) and indexer[0].ndim > 2:
            raise ValueError(r"Cannot set values with ndim > 2")
    
        if (isinstance(value, ABCSeries) and name != "iloc") or isinstance(value, dict):
            from pandas import Series
    
            value = self._align_series(indexer, Series(value))
    
        # Ensure we have something we can iterate over
        info_axis = indexer[1]
        ilocs = self._ensure_iterable_column_indexer(info_axis)
    
        pi = indexer[0]
        lplane_indexer = length_of_indexer(pi, self.obj.index)
        # lplane_indexer gives the expected length of obj[indexer[0]]
    
        # we need an iterable, with a ndim of at least 1
        # eg. don't pass through np.array(0)
        if is_list_like_indexer(value) and getattr(value, "ndim", 1) > 0:
    
            if isinstance(value, ABCDataFrame):
                self._setitem_with_indexer_frame_value(indexer, value, name)
    
            elif np.ndim(value) == 2:
                self._setitem_with_indexer_2d_value(indexer, value)
    
            elif len(ilocs) == 1 and lplane_indexer == len(value) and not is_scalar(pi):
                # We are setting multiple rows in a single column.
                self._setitem_single_column(ilocs[0], value, pi)
    
            elif len(ilocs) == 1 and 0 != lplane_indexer != len(value):
                # We are trying to set N values into M entries of a single
                #  column, which is invalid for N != M
                # Exclude zero-len for e.g. boolean masking that is all-false
    
                if len(value) == 1 and not is_integer(info_axis):
                    # This is a case like df.iloc[:3, [1]] = [0]
                    #  where we treat as df.iloc[:3, 1] = 0
                    return self._setitem_with_indexer((pi, info_axis[0]), value[0])
    
                raise ValueError(
                    "Must have equal len keys and value "
                    "when setting with an iterable"
                )
    
            elif lplane_indexer == 0 and len(value) == len(self.obj.index):
                # We get here in one case via .loc with a all-False mask
                pass
    
            elif len(ilocs) == len(value):
                # We are setting multiple columns in a single row.
                for loc, v in zip(ilocs, value):
                    self._setitem_single_column(loc, v, pi)
    
            elif len(ilocs) == 1 and com.is_null_slice(pi) and len(self.obj) == 0:
                # This is a setitem-with-expansion, see
                #  test_loc_setitem_empty_append_expands_rows_mixed_dtype
                # e.g. df = DataFrame(columns=["x", "y"])
                #  df["x"] = df["x"].astype(np.int64)
                #  df.loc[:, "x"] = [1, 2, 3]
                self._setitem_single_column(ilocs[0], value, pi)
    
            else:
>               raise ValueError(
                    "Must have equal len keys and value "
                    "when setting with an iterable"
                )
E               ValueError: Must have equal len keys and value when setting with an iterable

../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/pandas/core/indexing.py:1808: ValueError

It looks like the error is being thrown when assigning the values of one dataframe to another one (line 158):

for name, group2 in gdf2:
group2 = copy.deepcopy(group2)
indices = np.where(group2['reform_affected'])
num = min(len(indices[0]), NUM_TO_FUZZ)
if num > 0:
choices = np.random.choice(indices[0], size=num, replace=False)
group1 = gdf1.get_group(name)
for idx in choices:
group2.iloc[idx] = group1.iloc[idx]
group_list.append(group2)

I verified that all other tests are passing locally, too.

I verified that the correct PUF file is being downloaded by downloading a new copy of the PUF and then testing it against the copy in the S3 bucket:

image

@hdoupe
Copy link
Collaborator Author

hdoupe commented Nov 8, 2021

It looks like the group2 dataframe has an extra column that's not in group1: reform_affected. If we drop this, I think we should be good to go. @andersonfrailey thoughts?

image

@andersonfrailey
Copy link
Collaborator

@hdoupe

If we drop this, I think we should be good to go

I agree. I went back a bit and it looks like that column was added three years ago. Unclear why it's just throwing an error now, but dropping it should fix the problem.

@hdoupe
Copy link
Collaborator Author

hdoupe commented Nov 9, 2021

Ok, cool. Thanks for taking a look. I'll open a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants