Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random subsampling for feature selection #47 #650

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Random subsampling for feature selection #47 #650

wants to merge 2 commits into from

Conversation

xliu833
Copy link

@xliu833 xliu833 commented Dec 21, 2019

Description

Related issues or pull requests

Fixes #47

Pull Request Checklist

  • Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./mlxtend/*/tests directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (if applicable)
  • Ran PYTHONPATH='.' pytest ./mlxtend -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv)
  • Checked for style issues by running flake8 ./mlxtend

@pep8speaks
Copy link

pep8speaks commented Dec 21, 2019

Hello @lareinaxy! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-12-21 22:28:28 UTC

@xliu833 xliu833 changed the title 15th attempt Random subsampling for feature selection #47 Dec 21, 2019
@@ -382,7 +382,8 @@ def fit(self, X, y, custom_feature_names=None, groups=None, **fit_params):

else:
select_in_range = False
k_to_select = self.k_features
k_to_select = int(len(X[1])**.5)
np.take(X, np.random.permutation(X.shape[1]), axis=1, out=X)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Lareina, thanks for the PR. I don't quite understand how the value is chosen k_to_select = int(len(X[1])**.5). So this means that the random subset is always sqrt of the original number of features?

There are two problems with that.

  1. We want to add the random feature subset size as an optional thing
  2. I think we should let the user allow to choose the subset size

There could be a new parameter

use_random_feature_subset for the SequentialFeatureSelector class which either accepts a function like f = lambda x: int(np.sqrt(x) or None.

Let me know if you have questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request for sequential feature selector: random subsets at each step
3 participants