Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MergeError when executing "woebin" function #70

Open
kendalvictor opened this issue Jan 13, 2021 · 9 comments
Open

MergeError when executing "woebin" function #70

kendalvictor opened this issue Jan 13, 2021 · 9 comments

Comments

@kendalvictor
Copy link

Hi,
image
few days ago after updating the PANDAS library to version 1.2.0, the "woebin" function of scorerapy version '0.1.9.2' stopped working.

When trying to execute it, the error is seen:


MergeError Traceback (most recent call last)
in
----> 1 cortes = sc.woebin(
2 data[
3 (data[col_target].notnull())
4 ].drop(
5 [col for col in data.columns if 'target' in col and col != col_target] + col_no_review,

C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy\woebin.py in woebin(dt, y, x, var_skip, breaks_list, special_values, stop_limit, count_distr_limit, bin_num_limit, positive, no_cores, print_step, method, ignore_const_cols, ignore_datetime_cols, check_cate_num, replace_blank, save_breaks_list, **kwargs)
956 print(('{:'+str(len(str(xs_len)))+'.0f}/{} {}').format(i, xs_len, x_i), flush=True)
957 # woebining on one variable
--> 958 bins[x_i] = woebin2(
959 dtm = pd.DataFrame({'y':dt[y], 'variable':x_i, 'value':dt[x_i]}),
960 breaks=breaks_list[x_i] if (breaks_list is not None) and (x_i in breaks_list.keys()) else None,

C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy\woebin.py in woebin2(dtm, breaks, spl_val, init_count_distr, count_distr_limit, stop_limit, bin_num_limit, method)
720 if method == 'tree':
721 # 2.tree-like optimal binning
--> 722 bin_list = woebin2_tree(
723 dtm, init_count_distr=init_count_distr, count_distr_limit=count_distr_limit,
724 stop_limit=stop_limit, bin_num_limit=bin_num_limit, breaks=breaks, spl_val=spl_val)

C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy\woebin.py in woebin2_tree(dtm, init_count_distr, count_distr_limit, stop_limit, bin_num_limit, breaks, spl_val)
482 '''
483 # initial binning
--> 484 bin_list = woebin2_init_bin(dtm, init_count_distr=init_count_distr, breaks=breaks, spl_val=spl_val)
485 initial_binning = bin_list['initial_binning']
486 binning_sv = bin_list['binning_sv']

C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy\woebin.py in woebin2_init_bin(dtm, init_count_distr, breaks, spl_val)
274
275 # dtm $ binning_sv
--> 276 dtm_binsv_list = dtm_binning_sv(dtm, breaks, spl_val)
277 dtm = dtm_binsv_list['dtm']
278 binning_sv = dtm_binsv_list['binning_sv']

C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy\woebin.py in dtm_binning_sv(dtm, breaks, spl_val)
113 # sv_df = sv_df.assign(value = lambda x: x.value.astype(dtm['value'].dtypes))
114 # dtm_sv & dtm
--> 115 dtm_sv = pd.merge(dtm.fillna("missing"), sv_df[['value']].fillna("missing"), how='inner', on='value', right_index=True)
116 dtm = dtm[~dtm.index.isin(dtm_sv.index)].reset_index() if len(dtm_sv.index) < len(dtm.index) else None
117 # dtm_sv = dtm.query('value in {}'.format(sv_df['value'].tolist()))

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
72 validate=None,
73 ) -> "DataFrame":
---> 74 op = _MergeOperation(
75 left,
76 right,

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py in init(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
648 warnings.warn(msg, UserWarning)
649
--> 650 self._validate_specification()
651
652 cross_col = None

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py in _validate_specification(self)
1301 )
1302 if self.left_index or self.right_index:
-> 1303 raise MergeError(
1304 'Can only pass argument "on" OR "left_index" '
1305 'and "right_index", not a combination of both.'

MergeError: Can only pass argument "on" OR "left_index" and "right_index", not a combination of both.

image

@Okroshiashvili
Copy link

@kendalvictor I think you have to downgrade Pandas at 0.25.0. But, before you downgrade, in Pandas merge() method either indicate on argument or only left_index and right_index not both of them. Here, you try to merge using column value as well as merge on index simultaneously. I hope this helps

@kendalvictor
Copy link
Author

Hi @Okroshiashvili the solution was to lower the version of pandas to 1.1.3, but ideally, this error should be taken into consideration for a version of this library since currently its "woebin" function does not work in version 1.2.0 of pandas

@Okroshiashvili
Copy link

I think it's not surprising to have version incompatibility. I hope maintainers will solve this problem but until then if your problem is solved, please close this issue :)

@kendalvictor
Copy link
Author

Solved after pandas library version change from 1.2.0 to 1.1.3

@ShichenXie
Copy link
Owner

The bug should be fixed. Please check the latest version on the Github.

@chenz1hao
Copy link

but the problem still till now.
image

@FairmoneyKunal
Copy link

I am still having problem while using with pandas 1.3.4, do we have any new work around?

@ShichenXie ShichenXie reopened this Jun 10, 2022
@ShichenXie
Copy link
Owner

Please install the latest version on GitHub and try again. It should be fixed.

@VladOnMyOwn
Copy link

I have the same problem with pandas 1.5.3.
2023-02-13_00h01_33

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants