Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TuRF value error #52

Open
J-Bleker opened this issue Jun 15, 2018 · 8 comments
Open

TuRF value error #52

J-Bleker opened this issue Jun 15, 2018 · 8 comments

Comments

@J-Bleker
Copy link

J-Bleker commented Jun 15, 2018

Hello,

I am currently trying to use TuRF to get my feature importance scores, and my code is almost the same as the example code in the docs:

from skrebate.turf import TuRF
​
# Take x & y from dataframes
X = x.values
Y = y.values
​
# Take feature names as header
header = x.columns

# Implement TuRF with ReliefF as algorithm
tf = TuRF(core_algorithm="ReliefF", n_features_to_select=2, pct=0.5,verbose=True)
tf.fit(X, Y, header)

# Output

Created distance array in 0.03900003433227539 seconds.
Feature scoring under way ...
Completed scoring in 12.943000078201294 seconds.
Created distance array in 0.02700018882751465 seconds.
Feature scoring under way ...
Completed scoring in 6.190999984741211 seconds.
Created distance array in 0.004999876022338867 seconds.
Feature scoring under way ...
Completed scoring in 3.2160000801086426 seconds.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-63-e796aad72373> in <module>()
      1 tf = TuRF(core_algorithm="ReliefF", n_features_to_select=2, pct=0.5,verbose=True)
----> 2 tf.fit(X, Y, header)

C:\ProgramData\Anaconda3\lib\site-packages\skrebate\turf.py in fit(self, X, y, headers)
    164                 self.feature_importances_.append(low_score - reduction * self._lost[i]) #append discounted score as a marker of when the feature was removed.
    165             else: #Feature made final cut
--> 166                 score_index = self.headers.index(i)
    167                 self.feature_importances_.append(core_fit.feature_importances_[score_index])
    168 

ValueError: 'mean_surface_score' is not in list

A very odd error in my opinion since I am certain all feature names are in the header. Anyone knows the solution to this error? Unfortunately i cannot supply my data but it is just a dataframe with about 150 samples, a certain number of features as columns and one column with the labels (X does not contain this column).

Thanks!

@ryanurbs
Copy link
Member

Thanks for the issue report, i'll check this out and get back to you asap.

@swatisaini
Copy link

Hi @BBeuker ,
Did you find any solution for this ?

@ryanurbs
Copy link
Member

ryanurbs commented Aug 9, 2018 via email

@ryanurbs
Copy link
Member

ryanurbs commented Aug 9, 2018

My intuition is that you are using a dataset with a small number of features, and TuRF is removing too many features leading to your missing variable label.

@swatisaini
Copy link

swatisaini commented Aug 9, 2018 via email

@ryanurbs
Copy link
Member

ryanurbs commented Aug 9, 2018 via email

@swatisaini
Copy link

swatisaini commented Aug 10, 2018 via email

@Tipulidae
Copy link

I'm having the same problem. Pretty sure it's the same as reported in #54 as well. On line 133 in turf.py, non_select is not the complement of select, if pct = 0.5 and the number of features is odd. If pct != 0.5, I think you would get the crash regardless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants