Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

IndexError: index 5000 is out of bounds for axis 0 with size 5000 #474

Open
venkidevictor opened this issue Mar 14, 2020 · 0 comments
Open

Comments

@venkidevictor
Copy link

Hi,
I am running my capstone project and working on my dataset. When I tried to clean my dataset removing the outliers, I am getting this error.
I am attaching the code as below.

#Removing Outliers
#Tukey Method

import required libraries

from collections import Counter

Outlier detection

def detect_outliers(df,n,features):

outlier_indices = []

# iterate over features(columns)
for col in features:
    # 1st quartile (25%)
    Q1 = np.percentile(df[col], 25)
    # 3rd quartile (75%)
    Q3 = np.percentile(df[col],75)
    # Interquartile range (IQR)
    IQR = Q3 - Q1
    
    # outlier step
    outlier_step = 1.5 * IQR
    
    # Determine a list of indices of outliers for feature col
    outlier_list_col = df[(df[col] < Q1 - outlier_step) | (df[col] > Q3 + outlier_step )].index
    
    # append the found outlier indices for col to the list of outlier indices 
    outlier_indices.extend(outlier_list_col)
    
# select observations containing more than 2 outliers
outlier_indices = Counter(outlier_indices)        
multiple_outliers = list( k for k, v in outlier_indices.items() if v > n )

return multiple_outliers   

List of Outliers

Outliers_to_drop = detect_outliers(data1.drop('Class',axis=1),0,list(data1.drop('Class',axis=1)))
data1.drop('Class',axis=1).loc[Outliers_to_drop]

#Create New Dataset without Outliers
good_data = data1.drop(data1.index[Outliers_to_drop]).reset_index(drop = True)
good_data.info()


IndexError Traceback (most recent call last)
in
1 #Create New Dataset without Outliers
----> 2 good_data = data1.drop(data1.index[Outliers_to_drop]).reset_index(drop = True)
3 good_data.info()

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in getitem(self, key)
4289
4290 key = com.values_from_object(key)
-> 4291 result = getitem(key)
4292 if not is_scalar(result):
4293 return promote(result)

IndexError: index 5000 is out of bounds for axis 0 with size 5000

​Can any one help me to fix this and code it properly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant