Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC improve documentation of NCR #1017

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

solegalli
Copy link
Contributor

Reword documentation and docstrings for the NCR.

Related to #854

@solegalli
Copy link
Contributor Author

@glemaitre ready for review

^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :class:`NeighbourhoodCleaningRule` is another "cleaning" algorithm. It removes
samples from the majority class that are closest to the boundary with the minority
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
samples from the majority class that are closest to the boundary with the minority
samples from the majority class that are the closest to the boundary formed by the samples of the minority class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't totally understand this sentence. Let me try a modification in a new commit.


The :class:`NeighbourhoodCleaningRule` expands on the cleaning performed by
:class:`EditedNearestNeighbours` by eliminating additional majority class samples if
they are among the 3 closest neighbours of a sample from the minority class.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a parameter controlling the 3-NN.

Suggested change
they are among the 3 closest neighbours of a sample from the minority class.
they are among the :math:`N` closest neighbours (i.e. using the parameter `n_neighbours`) of a sample from the minority class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throughout the docs we are using K as the number of neighbours, not N. I guess the n in n_neighbours comes from n=number. I'd rather stick to K if that's alright with you, for consitency. I'll fix this in a separate commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I removed this sentence altogether as per below suggestion.

The procedure for the :class:`NeighbourhoodCleaningRule` is as follows:

1. Remove observations from the majority class with edited nearest neighbors (ENN).
2. Remove additional samples from the majority class if they are one of the k closest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we repeating the same sentence as above, I would remove the paragraph above and only go with the bullet point sequence.

To carry out step 2 there is one condition: a sample will only be removed if its class
has a minimum number of observations. The minimum number of observations is regulated
by the `threshold_cleaning` parameter. In the original article
:cite:`laurikkala2001improving`, samples would be removed if the class had at
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not go in details regarding the original paper but instead just phrase that we check that the number of samples in the class to under-sample is above the threshold times the number of samples in the minority class.

@glemaitre glemaitre changed the title re-word explanation and docstrings of NCR DOC improve documentation of NCR Jul 10, 2023
solegalli and others added 3 commits July 11, 2023 13:37
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@solegalli
Copy link
Contributor Author

How can I check the linting error message?

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@solegalli
Copy link
Contributor Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants