Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please start following semver guidelines #27363

Closed
redbmk opened this issue Sep 13, 2023 · 6 comments
Closed

Please start following semver guidelines #27363

redbmk opened this issue Sep 13, 2023 · 6 comments
Labels
Needs Decision Requires decision

Comments

@redbmk
Copy link

redbmk commented Sep 13, 2023

Describe the bug

Hey all, I'm a bit new to skikit-learn, but have built and used many tools that follow semver. I didn't really see anything in the documentation saying that this isn't using semver, but I've noticed breaking changes without a major version bump, which goes against the philosophy of semver.

This can cause issues for example when:

  1. Installing packages without pinning dependencies (e.g. using requirements.txt instead of a package manager like pipenv or poetry)
  2. Using tools like dependabot that might scan for bug and security fixes, assuming compatible versions
  3. Trying to update packages that use this as a dependency but may use features that have been removed (e.g. mlflow==2.3.1 depended on skikit-learn<2)

According to semver, something that's a breaking change (changing behavior, removing a feature, changing a function signature in an incompatible way, removing support for a version of python, etc) should be a major version bump. While documenting the change is great, it's not enough to adhere to semver and should be a major version bump (e.g. dropping python 3.7 support in v1.1 should have actually waited until v2.0

In general, once you hit version 1, the idea is:

  • Breaking changes: new major version
  • New features: new minor version
  • Bug and security fixes: new patch version

Searching the issues and documentation for semver didn't come up with a lot, but I did see it mentioned in this ticket which prompted the push to v1.

Steps/Code to Reproduce

❯ pip install eli5==0.13.0
❯ python -c 'import eli5'

Expected Results

❯ pip install eli5==0.13.0 scikit-learn==1.2.2
❯ python -c 'import eli5'echo $?
0

Actual Results

❯ pip install eli5==0.13.0 scikit-learn==1.3.0
❯ python -c 'import eli5'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/me/.venv/lib/python3.11/site-packages/eli5/__init__.py", line 13, in <module>
    from .sklearn import explain_weights_sklearn, explain_prediction_sklearn
  File "/Users/me/.venv/lib/python3.11/site-packages/eli5/sklearn/__init__.py", line 3, in <module>
    from .explain_weights import (
  File "/Users/me/.venv/lib/python3.11/site-packages/eli5/sklearn/explain_weights.py", line 78, in <module>
    from .permutation_importance import PermutationImportance
  File "/Users/me/.venv/lib/python3.11/site-packages/eli5/sklearn/permutation_importance.py", line 7, in <module>
    from sklearn.utils.metaestimators import if_delegate_has_method
ImportError: cannot import name 'if_delegate_has_method' from 'sklearn.utils.metaestimators' (/Users/me/.venv/lib/python3.11/site-packages/sklearn/utils/metaestimators.py)

Versions

Passing version:

❯ pip install eli5==0.13.0 scikit-learn==1.2.2
...
❯ python -c 'import sklearn; sklearn.show_versions()'

System:
    python: 3.11.4 (main, Jun 21 2023, 17:21:11) [Clang 14.0.3 (clang-1403.0.22.14.1)]
executable: /Users/me/.venv/bin/python
   machine: macOS-13.5.1-x86_64-i386-64bit

Python dependencies:
      sklearn: 1.2.2
          pip: 23.1.2
   setuptools: 65.5.0
        numpy: 1.25.2
        scipy: 1.11.2
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.3.2
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 12
         prefix: libomp
       filepath: /Users/me/.venv/lib/python3.11/site-packages/sklearn/.dylibs/libomp.dylib
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 6
         prefix: libopenblas
       filepath: /Users/me/.venv/lib/python3.11/site-packages/numpy/.dylibs/libopenblas64_.0.dylib
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: Haswell

       user_api: blas
   internal_api: openblas
    num_threads: 6
         prefix: libopenblas
       filepath: /Users/me/.venv/lib/python3.11/site-packages/scipy/.dylibs/libopenblas.0.dylib
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: Haswell

Failing version:

❯ pip install eli5==0.13.0 scikit-learn==1.3.0
...
❯ python -c 'import sklearn; sklearn.show_versions()'

System:
    python: 3.11.4 (main, Jun 21 2023, 17:21:11) [Clang 14.0.3 (clang-1403.0.22.14.1)]
executable: /Users/me/.venv/bin/python
   machine: macOS-13.5.1-x86_64-i386-64bit

Python dependencies:
      sklearn: 1.3.0
          pip: 23.1.2
   setuptools: 65.5.0
        numpy: 1.25.2
        scipy: 1.11.2
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.3.2
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 12
         prefix: libomp
       filepath: /Users/me/.venv/lib/python3.11/site-packages/sklearn/.dylibs/libomp.dylib
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 6
         prefix: libopenblas
       filepath: /Users/me/.venv/lib/python3.11/site-packages/numpy/.dylibs/libopenblas64_.0.dylib
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: Haswell

       user_api: blas
   internal_api: openblas
    num_threads: 6
         prefix: libopenblas
       filepath: /Users/me/.venv/lib/python3.11/site-packages/scipy/.dylibs/libopenblas.0.dylib
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: Haswell
@redbmk redbmk added Bug Needs Triage Issue requires triage labels Sep 13, 2023
@glemaitre
Copy link
Member

As far as I know, we are as bad as the scientific ecosystem.

Our deprecation policy is documented here: https://scikit-learn.org/dev/developers/contributing.html#deprecation.

I see previous discussion in NumPy from which we depends: numpy/numpy#10156. I also see that it brought the proposal of the NEP23: https://numpy.org/neps/nep-0023-backwards-compatibility.html.

IMO, if we choose a different versioning and deprecation policy than packages that we are depending on, we shoot ourselves in the foot.

@glemaitre glemaitre added Needs Decision Requires decision and removed Bug Needs Triage Issue requires triage labels Sep 15, 2023
@ogrisel
Copy link
Member

ogrisel commented Sep 18, 2023

Also we don't really pretend to follow SemVer anywhere in our doc, as far as I know.

But for the case you report there is indeed a problem: we did not deprecate if_delegate_has_method properly, despite this function not being marked as private. I tried the reproducer and indeed, no deprecation warning is raised with scikit-learn 1.2.

We should be more careful about the not removing utility functions that do not have a leading _ in their name or in their ancestor module names.

So +1 for keeping our rolling deprecation approach but we should enforce it more strictly in cases such as the one reported in this issue.

@glemaitre
Copy link
Member

glemaitre commented Sep 18, 2023

This is weird because we got a deprecation cycle as well for if_delegate_has_method.

https://github.com/scikit-learn/scikit-learn/blob/1.2.X/sklearn/utils/metaestimators.py#L122-L124

I assume that there is a side-effect on the warning raised that we didn't foresee at the time.

@adrinjalali
Copy link
Member

adrinjalali commented Sep 18, 2023

Yeah we follow our deprecation policy, unless we forget or unless for some reason the warning is not raised, but we don't follow SemVer the same was as the rest of the scipy/pydata ecosystem. So I think we can close this issue. And eli5 can use our new @available_if decorator.

@redbmk
Copy link
Author

redbmk commented Sep 18, 2023

IMO, if we choose a different versioning and deprecation policy than packages that we are depending on, we shoot ourselves in the foot.

If you depend on a package that doesn't follow semver, wouldn't you want to pin to an exact version that is guaranteed not to break your tools? Any of the package managers I've ever seen use semver to determine compatibility. If you use anything other than == to require a package that looks like semver but isn't, then you're shooting yourself in the foot.

Also we don't really pretend to follow SemVer anywhere in our doc, as far as I know.

Your docs may not mention semver, but package managers think you look like semver. Tools like MLFlow that require < 2 have to be very careful to not use anything that you may or may not deprecate in the future.

Is there anything stopping you from using semver? It sounds like the only difference from what you're doing now to something that would follow semver is dropping 1. from your versions (e.g. 1.2.x => 2.x, 1.3.x => 3.x, etc). I'm not sure what harm would come to the scientific community as a result of this but you would be doing the rest of the python community a favor by adhering to well-established standards.

@adrinjalali
Copy link
Member

adrinjalali commented Sep 20, 2023

The thing is, we like to think that when changes happen with a one year FutureWarning notice, then people should have the time to fix them, and we don't consider them breaking changes. Of course it can be debated whether they're "breaking" or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Decision Requires decision
Projects
None yet
Development

No branches or pull requests

4 participants