[MRG] ENH: safe-level SMOTE #626

laurallu · 2019-11-04T03:21:31Z

This is an implementation of the safe-level SMOTE proposed in the following paper:

C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, "Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem," In: Theeramunkong T.,
Kijsirikul B., Cercone N., Ho TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science, vol 5476. Springer, Berlin, Heidelberg, 475-482, 2009.

Todo list:

add unit tests

lgtm-com · 2019-11-04T03:46:11Z

This pull request introduces 1 alert when merging bcc3069 into 321b751 - view on LGTM.com

new alerts:

1 for Redundant comparison

codecov · 2019-11-04T03:54:58Z

Codecov Report

Merging #626 into master will increase coverage by 0.05%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #626      +/-   ##
==========================================
+ Coverage   97.93%   97.98%   +0.05%     
==========================================
  Files          83       84       +1     
  Lines        4784     4911     +127     
==========================================
+ Hits         4685     4812     +127     
  Misses         99       99

Impacted Files	Coverage Δ
imblearn/over_sampling/_smote.py	`97.73% <100%> (+0.52%)`	⬆️
imblearn/over_sampling/__init__.py	`100% <100%> (ø)`	⬆️
...blearn/over_sampling/tests/test_safelevel_smote.py	`100% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update afbf781...866a04f. Read the comment docs.

glemaitre · 2019-11-04T15:38:10Z

Thanks for the contribution.

You will need to add tests to check that the new function is giving expected results.

glemaitre · 2019-11-04T15:38:42Z

Oh I see that you mentioned it now :)

laurallu · 2019-11-06T05:10:07Z

I just added some tests. Any suggestions?

lgtm-com · 2019-11-06T05:25:00Z

This pull request introduces 1 alert when merging 394d686 into 321b751 - view on LGTM.com

new alerts:

1 for Redundant comparison

chkoar · 2019-11-06T15:13:53Z

imblearn/over_sampling/_smote.py

+    sampling_strategy=BaseOverSampler._sampling_strategy_docstring,
+    random_state=_random_state_docstring,
+)
+class SLSMOTE(BaseSMOTE):


@glemaitre SafeLevelSMOTE vs SLSMOTE

chkoar · 2019-11-06T15:17:53Z

imblearn/over_sampling/_smote.py

+
+        self.m_neighbors = m_neighbors
+
+    def _assign_sl(self, nn_estimator, samples, target_class, y):


I would use the name _assign_safe_levels unless it hurts the readability in the calling code.

chkoar · 2019-11-06T15:23:40Z

Thanks!

I just added some tests. Any suggestions?

Please use full names in variables, if you can. E.g. sl should be safe_lavels. Unless it hurts the readability.
Could you add a section in the documentation?

lgtm-com · 2019-11-11T19:40:16Z

This pull request introduces 1 alert when merging fd11e32 into afbf781 - view on LGTM.com

new alerts:

1 for Redundant comparison

laurallu · 2019-11-12T22:46:52Z

1. Please use full names in variables, if you can. E.g. [`sl`](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/394d686364725763de8ea2cc3f504d8c08fe111a/imblearn/over_sampling/_smote.py#L1469) should be `safe_lavels`. Unless it hurts the readability.

2. Could you add a section in the [documentation](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/master/doc/over_sampling.rst)?

I've made changes accordingly. I think it's probably ready to go through a detailed review.

glemaitre · 2019-11-17T11:28:48Z

I would suggest moving this implementation into smote_variants. The idea behind this move is to benchmark the smote variants on a common benchmark on a large number of datasets and include in imbalanced-learn only the versions that show an advantage. You can see the discussion and contribute to it: https://github.com/gykovacs/smote_variants/issues/14

@laurallu would this strategy would be fine with you?

chkoar · 2019-11-17T13:28:43Z

Since, Safe Level SMOTE exists there IMHO I believe that we should review @laurallu PR and merge it in imblearn.

glemaitre · 2019-11-17T18:56:17Z

OK, let's do that. Let's open an issue to discuss the inclusion criterion to explain what we are expecting in the future. I will review this PR in a near future.

laurallu · 2019-11-21T03:38:50Z

Thanks for pointing out the smote_variants to me. I will check it out. I would love to see the inclusion criterion too since I might code up more methods.

added safe-level-smote method

bcc3069

unit tests added for safe-level SMOTE

394d686

chkoar reviewed Nov 6, 2019

View reviewed changes

glemaitre force-pushed the master branch from 65132db to 68123d0 Compare November 8, 2019 22:54

laurallu added 2 commits November 11, 2019 13:53

fixed variable name, added doc and test

609c4fc

Merge remote-tracking branch 'upstream/master' into safe-level

fd11e32

=removed redundant lines

866a04f

laurallu changed the title ~~[WIP] ENH: safe-level SMOTE~~ [MRG] ENH: safe-level SMOTE Nov 12, 2019

chkoar force-pushed the master branch from 4a201cd to 0eb9033 Compare June 20, 2020 02:58

glemaitre force-pushed the master branch from f8347ad to 56eefdf Compare September 29, 2021 16:10

glemaitre force-pushed the master branch from 3228f8a to 7e94390 Compare October 21, 2021 20:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] ENH: safe-level SMOTE #626

[MRG] ENH: safe-level SMOTE #626

laurallu commented Nov 4, 2019 •

edited

lgtm-com bot commented Nov 4, 2019

codecov bot commented Nov 4, 2019 •

edited

glemaitre commented Nov 4, 2019

glemaitre commented Nov 4, 2019

laurallu commented Nov 6, 2019

lgtm-com bot commented Nov 6, 2019

chkoar Nov 6, 2019

chkoar Nov 6, 2019

chkoar commented Nov 6, 2019

lgtm-com bot commented Nov 11, 2019

laurallu commented Nov 12, 2019

glemaitre commented Nov 17, 2019

chkoar commented Nov 17, 2019

glemaitre commented Nov 17, 2019

laurallu commented Nov 21, 2019


		self.m_neighbors = m_neighbors

		def _assign_sl(self, nn_estimator, samples, target_class, y):

[MRG] ENH: safe-level SMOTE #626

Are you sure you want to change the base?

[MRG] ENH: safe-level SMOTE #626

Conversation

laurallu commented Nov 4, 2019 • edited

lgtm-com bot commented Nov 4, 2019

codecov bot commented Nov 4, 2019 • edited

Codecov Report

glemaitre commented Nov 4, 2019

glemaitre commented Nov 4, 2019

laurallu commented Nov 6, 2019

lgtm-com bot commented Nov 6, 2019

chkoar Nov 6, 2019

Choose a reason for hiding this comment

chkoar Nov 6, 2019

Choose a reason for hiding this comment

chkoar commented Nov 6, 2019

lgtm-com bot commented Nov 11, 2019

laurallu commented Nov 12, 2019

glemaitre commented Nov 17, 2019

chkoar commented Nov 17, 2019

glemaitre commented Nov 17, 2019

laurallu commented Nov 21, 2019

laurallu commented Nov 4, 2019 •

edited

codecov bot commented Nov 4, 2019 •

edited