Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review existing dependencies #2007

Open
valentynbez opened this issue Apr 10, 2024 · 3 comments
Open

Review existing dependencies #2007

valentynbez opened this issue Apr 10, 2024 · 3 comments

Comments

@valentynbez
Copy link

valentynbez commented Apr 10, 2024

Adding a new functionality comes at the cost of complexity and more abstractions. I hereby propose to do a basic review of the current project dependencies.
I.e. a quick search through a codebase reveals:

  • only single function realsort from natsort package is used
  • munkres is an outdated and not supported package (last release in 2020)
  • are pluggy and pytest needed in the deployment version? They can be separated into scikit-bio and scikit-bio[dev]
  • requests is only used in single file and functionality can be completely moved to urllib3

The code that's is dependent on outdated packages should be highlighted, so that if possible replacement for the functionality appears it can be quickly reimplemented. These changes, although cumbersome, will reduce amount of dependencies and ensure software's longevity.

@qiyunzhu
Copy link
Contributor

qiyunzhu commented Apr 10, 2024

Hello @valentynbez Thank you for reviewing the scikit-bio codebase and making these suggestions!

Reducing dependencies is a development goal of ours. In the past several months (from 0.5.x to 0.6.0), we have already removed multiple heavy dependencies, including scikit-learn, matplotlib, jupyter, and small, less actively maintained dependencies, like hdmedians. However, our opinions diverge on the priority of minimizing dependencies. I personally am very interested in the potential to reducing dependencies. My thoughts on your specific suggestions are:

  1. scikit-bio is now dependent on biom-format, which in turn depends on natsort. If we decide to remove this dependency, we will need a joined effort. @wasade
  • A side note for @wasade : biom-format is dependent on anndata. @mortonjt @valentynbez and I had a discussion on if we should further exploit anndata.
  1. munkres: I found that the dependency hierarchy is: scikit-bio <- matplotlib-base <- fonttools <- munkres. We already removed Matplotlib as a dependency of scikit-bio. But the conda-forge recipe still has it. Let me delete this line.

  2. requests was added recently to replace an outdated dependency. See Replaced cachecontrol with requests #1863 and Fixing aarch64 float issue #1859. Together with responses in the test pipeline. I am open to using the built-in urllib3, if the latter solution is simple and reliable. Do you have a suggestion?

  3. pytest lets the user test whether the installed version works correctly, by typing python -m skbio.test. This may be a good thing to have. Currently it is broken in the PyPI release, which does not have pytest as a dependency. I am open to the idea of leaving it to the development version.

@wasade
Copy link
Collaborator

wasade commented Apr 10, 2024

I'd be in favor or a PR to remove natsort for biom

@qiyunzhu
Copy link
Contributor

@wasade Looks like biom-format depends on natsort but doesn't actually use it. Instead, biom-format has its own natsort function. So a suggested solution would be: 1) remove natsort dependency from biom-format, 2) let scikit-bio use biom-format's built-in natsort function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants