Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fix LocalOutlierFactor's output for data with duplicated samples #28773

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Commits on Apr 5, 2024

  1. Fix scikit-learn#27839: Adjust LocalOutlierFactor for data with dupli…

    …cated samples
    
    Previously, when the dataset had values repeat more times than the algorithm's number of neighbors, it miscalculates the outliers.
    Because the distance between the duplicated samples is 0, the local reachability density is equal to 1e10. This leads to values that are close to the duplicated values having a really low negative outlier factor (under -1e7), labeling them as outliers.
    This fix checks if the minimum negative outlier factor is under -1e7 and, if so, raises the number of neighbors to the number of occurrences of the most frequent value + 1, also raising a warning.
    Notes: Added a handle_duplicates variable, which allows developers to manually handle the duplicate values, if desired. Also added a memory_limit variable to avoid creating memory errors for really large datasets, which can also be changed manually by developers.
    HenriqueProj committed Apr 5, 2024
    Configuration menu
    Copy the full SHA
    e754830 View commit details
    Browse the repository at this point in the history

Commits on Apr 8, 2024

  1. Configuration menu
    Copy the full SHA
    19cb411 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    bc069b6 View commit details
    Browse the repository at this point in the history
  3. Fix changelog error

    HenriqueProj committed Apr 8, 2024
    Configuration menu
    Copy the full SHA
    c6470c6 View commit details
    Browse the repository at this point in the history

Commits on Apr 10, 2024

  1. Fix: Changed approach according to review

    Removed automatic change to neighbors number and changed the warning
    Also changed the associated test, to catch the warning.
    HenriqueProj committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    909b25c View commit details
    Browse the repository at this point in the history

Commits on Apr 22, 2024

  1. Update sklearn/neighbors/_lof.py

    Changed comment according to review
    
    Co-authored-by: Tim Head <betatim@gmail.com>
    HenriqueProj and betatim committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    de442f0 View commit details
    Browse the repository at this point in the history

Commits on May 20, 2024

  1. Configuration menu
    Copy the full SHA
    50eb839 View commit details
    Browse the repository at this point in the history

Commits on May 27, 2024

  1. Configuration menu
    Copy the full SHA
    b2f79c5 View commit details
    Browse the repository at this point in the history