Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

min_weight_fraction_leaf suggested improvements #6945

Closed
ben519 opened this issue Jun 28, 2016 · 3 comments · Fixed by #7301
Closed

min_weight_fraction_leaf suggested improvements #6945

ben519 opened this issue Jun 28, 2016 · 3 comments · Fixed by #7301
Labels
Milestone

Comments

@ben519
Copy link

ben519 commented Jun 28, 2016

Description

I've been using the min_weight_fraction_leaf parameter of DecisionTreeClassifier and RandomForestClassifier incorrectly and I think it's likely other people are doing the same thing as me.

For example, the documentation for min_weight_fraction_leaf in DecisionTreeClassifier says

The minimum weighted fraction of the input samples required to be at a leaf node.

It was really unclear to me what the docs meant by "weighted fraction of the input samples". Initially I thought it was a weighting based on the size of the classes or the values given by class_weight. I think a slight change in the parameter description could clear up this confusion. Perhaps something like

The minimum weighted fraction of the input samples required to be at a leaf node where weights are determined by sample_weight in the fit() method.

Furthermore, it appears min_weight_fraction_leaf only applies if sample_weight is provided in the call fit(). If sample_weight is not provided in the call to fit(), min_weight_fraction_leaf is silently ignored. Here, I think min_weight_fraction_leaf should still apply under the assumption that all samples are equally weighted OR a warning should be given that min_weight_fraction_leaf will not be used since sample_weight was not provided.

Versions

Darwin-15.5.0-x86_64-i386-64bit
Python 3.5.1 |Continuum Analytics, Inc.| (default, Dec 7 2015, 11:24:55)
[GCC 4.2.1 (Apple Inc. build 5577)]
NumPy 1.11.0
SciPy 0.17.1
Scikit-Learn 0.17.1

Also, I would love to make the changes I suggested (if they're deemed worthy), but I have little experience contributing to open-source libraries. Might need a bit of hand-holding if someone would be willing to help me out.

@jnothman
Copy link
Member

Please submit a PR

On 29 June 2016 at 06:09, Ben notifications@github.com wrote:

Description

I've been using the min_weight_fraction_leaf parameter of
DecisionTreeClassifier and RandomForestClassifier incorrectly and I think
it's likely other people are doing the same thing as me.

For example, the documentation for min_weight_fraction_leaf in
DecisionTreeClassifier
http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
says

The minimum weighted fraction of the input samples required to be at a
leaf node.

It was really unclear to me what the docs meant by "weighted fraction of
the input samples". Initially I thought it was a weighting based on the
size of the classes or the values given by class_weight. I think a slight
change in the parameter description could clear up this confusion. Perhaps
something like

The minimum weighted fraction of the input samples required to be at a
leaf node where weights are determined by sample_weight in the fit() method.

Furthermore, it appears min_weight_fraction_leaf only applies if
sample_weight is provided in the call fit(). If sample_weight is not
provided in the call to fit(), min_weight_fraction_leaf is silently
ignored. Here, I think min_weight_fraction_leaf should still apply under
the assumption that all samples are equally weighted OR a warning should be
given that min_weight_fraction_leaf will not be used since sample_weight
was not provided.
Versions

Darwin-15.5.0-x86_64-i386-64bit
Python 3.5.1 |Continuum Analytics, Inc.| (default, Dec 7 2015, 11:24:55)
[GCC 4.2.1 (Apple Inc. build 5577)]
NumPy 1.11.0
SciPy 0.17.1
Scikit-Learn 0.17.1


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#6945, or mute the
thread
https://github.com/notifications/unsubscribe/AAEz6xE2BmEJHo6hGgTWoigsPutoD4_nks5qQX9zgaJpZM4JAe96
.

@amueller
Copy link
Member

I think if min_weight_fraction_leaf is set and no sample_weights provided, it should either raise an error or assume uniform weights. In this case it's a bit redundant with min_samples_leaf but I think assuming uniform weights would still be better.

@gg2572
Copy link

gg2572 commented Jun 20, 2017

I think this is similar to min_samples_leaf. Instead of requiring an absolute number of samples in each leaf node, min_weight_fraction_leaf provides the option to require a fraction of samples (or weights) in each leaf. Whether the model is using weights for samples depends on the class_weight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants