Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index is out of bounds for axis 0 #982

Open
taimoorhussain1259 opened this issue Apr 11, 2023 · 2 comments
Open

Index is out of bounds for axis 0 #982

taimoorhussain1259 opened this issue Apr 11, 2023 · 2 comments

Comments

@taimoorhussain1259
Copy link

taimoorhussain1259 commented Apr 11, 2023

Hello everyone,

I used this library and it worked very well. Due to some conflicts in conda, I had to remake an environment. I reinstalled imbalanced-learn==0.10 but i am facing this issue. Please guide me.
Thanks.

IndexError                                Traceback (most recent call last)
Cell In[8], line 72
     69 valid_index = train_valid_index[1]
     71 train_data_df = data_train_valid.iloc[train_index, :]
---> 72 train_data_df = generate_synthetic_samples(train_data_df)
     74 #train_data_df = [train_data_df, model_old.sample(num_rows=200), model_new.sample(num_rows=200)]
     75 #train_data_df = pd.concat(train_data_df)        
     77 X_train=train_data_df.iloc[:, :-1]

Cell In[7], line 11, in generate_synthetic_samples(data_df)

---> 11 X_smote, Y_smote = SMOTE(random_state=42).fit_resample(temp_df.iloc[:, :-1], temp_df.iloc[:, -1])

     12 X_border, Y_border = BorderlineSMOTE(random_state=42).fit_resample(temp_df.iloc[:, :-1], temp_df.iloc[:, -1])

File ~/anaconda3/envs/p39/lib/python3.9/site-packages/imblearn/base.py:203, in BaseSampler.fit_resample(self, X, y)
    182 """Resample the dataset.
    183 
    184 Parameters
   (...)
    200     The corresponding label of `X_resampled`.
    201 """
    202 self._validate_params()
--> 203 return super().fit_resample(X, y)

File ~/anaconda3/envs/p39/lib/python3.9/site-packages/imblearn/base.py:88, in SamplerMixin.fit_resample(self, X, y)
     82 X, y, binarize_y = self._check_X_y(X, y)
     84 self.sampling_strategy_ = check_sampling_strategy(
     85     self.sampling_strategy, y, self._sampling_type
     86 )
---> 88 output = self._fit_resample(X, y)
     90 y_ = (
     91     label_binarize(output[1], classes=np.unique(y)) if binarize_y else output[1]
     92 )
     94 X_, y_ = arrays_transformer.transform(output[0], y_)

File ~/anaconda3/envs/p39/lib/python3.9/site-packages/imblearn/over_sampling/_smote/base.py:356, in SMOTE._fit_resample(self, X, y)
    354 self.nn_k_.fit(X_class)
    355 nns = self.nn_k_.kneighbors(X_class, return_distance=False)[:, 1:]
--> 356 X_new, y_new = self._make_samples(
    357     X_class, y.dtype, class_sample, X_class, nns, n_samples, 1.0
    358 )
    359 X_resampled.append(X_new)
    360 y_resampled.append(y_new)

File ~/anaconda3/envs/p39/lib/python3.9/site-packages/imblearn/over_sampling/_smote/base.py:110, in BaseSMOTE._make_samples(self, X, y_dtype, y_type, nn_data, nn_num, n_samples, step_size)
    107 rows = np.floor_divide(samples_indices, nn_num.shape[1])
    108 cols = np.mod(samples_indices, nn_num.shape[1])
--> 110 X_new = self._generate_samples(X, nn_data, nn_num, rows, cols, steps)
    111 y_new = np.full(n_samples, fill_value=y_type, dtype=y_dtype)
    112 return X_new, y_new

File ~/anaconda3/envs/p39/lib/python3.9/site-packages/imblearn/over_sampling/_smote/base.py:154, in BaseSMOTE._generate_samples(self, X, nn_data, nn_num, rows, cols, steps)
    114 def _generate_samples(self, X, nn_data, nn_num, rows, cols, steps):
    115     r"""Generate a synthetic sample.
    116 
    117     The rule for the generation is:
   (...)
    152         Synthetically generated samples.
    153     """
--> 154     diffs = nn_data[nn_num[rows, cols]] - X[rows]
    156     if sparse.issparse(X):
    157         sparse_func = type(X).__name__

IndexError: index 94092224477536 is out of bounds for axis 0 with size 348
@taimoorhussain1259 taimoorhussain1259 changed the title [BUG] Index is out of bounds for axis 0 Apr 11, 2023
@glemaitre
Copy link
Member

We will need a minimal reproducer to be able to check if this is a bug or a misusage.

@jkelin
Copy link

jkelin commented Sep 23, 2023

Also seeing this in SMOTE, BorderlineSMOTE and ADASYN. 0.10 works fine, 0.11 breaks.

Repro:

from imblearn.over_sampling import BorderlineSMOTE
from sklearn.datasets import make_classification

X, y = make_classification(
    n_classes=2,
    class_sep=2,
    weights=[0.1, 0.9],
    n_informative=3,
    n_redundant=1,
    flip_y=0,
    n_features=20,
    n_clusters_per_class=1,
    n_samples=1000,
    random_state=10,
)

BorderlineSMOTE().fit_resample(
    X,
    y,
)

Throws IndexError: index 22117 is out of bounds for axis 0 with size 1000

Setting n_features=10, fixes the issue.

Python 3.11.4, Numpy 1.24.4, Linux

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants