Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace _subsample with biom's optimized subsample #2016

Merged
merged 11 commits into from
May 31, 2024

Conversation

wasade
Copy link
Collaborator

@wasade wasade commented May 1, 2024

This PR uses the subsample method from biom. Subsample was recently extensively optimized by @sfiligoi. And, biom as a package was made Windows compatible by @mataton. However, the new version is not yet released. This PR sets the stage for using these improvements. I'll shift to a biom release at the next opportunity.

A few important items:

  • some existing tests made I think slightly conservative assumptions
  • this PR is a stepping stone to providing a 2D matrix or Table to subsample, but in my opinion we should defer those changes as a separate PR

Please complete the following checklist:

  • I have read the contribution guidelines.

  • I have documented all public-facing changes in the changelog.

  • This pull request includes code, documentation, or other content derived from external source(s). If this is the case, ensure the external source's license is compatible with scikit-bio's license. Include the license in the licenses directory and add a comment in the code giving proper attribution. Ensure any other requirements set forth by the license and/or author are satisfied.

    • It is your responsibility to disclose code, documentation, or other content derived from external source(s). If you have questions about whether something can be included in the project or how to give proper attribution, include those questions in your pull request and a reviewer will assist you.
  • This pull request does not include code, documentation, or other content derived from external source(s).

Note: This document may also be helpful to see some of the things code reviewers will be verifying when reviewing your pull request.

Copy link

codecov bot commented May 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.47%. Comparing base (c1d6c18) to head (0b12fd0).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2016      +/-   ##
==========================================
- Coverage   98.47%   98.47%   -0.01%     
==========================================
  Files         184      184              
  Lines       31502    31498       -4     
  Branches     7673     7667       -6     
==========================================
- Hits        31021    31017       -4     
  Misses        464      464              
  Partials       17       17              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

skbio/stats/_subsample.py Outdated Show resolved Hide resolved
@qiyunzhu qiyunzhu mentioned this pull request May 1, 2024
4 tasks
@qiyunzhu
Copy link
Contributor

qiyunzhu commented May 12, 2024

@wasade Since biom-format 2.1.16 is online, would it be the time to finish and merge this PR? I was able to setup a numpy 2.0 dev environment for scikit-bio, and found that the current bottleneck is the old _subsample code. Thank you!

  File "skbio/stats/__subsample.pyx", line 1, in init skbio.stats.__subsample
    # ----------------------------------------------------------------------------
ImportError: numpy.core.multiarray failed to import (auto-generated because you didn't call 'numpy.import_array()' after cimporting numpy; use '<void>numpy._import_array' to disable if you are certain you don't need it).

@wasade wasade changed the title WIP: Replace _subsample with biom's optimized subsample Replace _subsample with biom's optimized subsample May 30, 2024
@wasade
Copy link
Collaborator Author

wasade commented May 30, 2024

@qiyunzhu, if green then I think we're good for final review

@wasade
Copy link
Collaborator Author

wasade commented May 30, 2024

I don't observe the same failure on x86_64, updating CI. There may be a wider variance on the failing test than what was originally tested for.

@wasade
Copy link
Collaborator Author

wasade commented May 30, 2024

Okay, we're green

Copy link
Contributor

@qiyunzhu qiyunzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. Thanks! Just curious: What's the rationale of specifying "macos-13", "macos-14" instead of "macos-latest"?

@wasade
Copy link
Collaborator Author

wasade commented May 31, 2024 via email

@qiyunzhu
Copy link
Contributor

@wasade Interesting! I did a quick search and found this comment. If there is an official source, it will be useful to comment in the yml file. But for now, let's just merge.

@qiyunzhu qiyunzhu merged commit 094b166 into scikit-bio:main May 31, 2024
30 checks passed
@wasade
Copy link
Collaborator Author

wasade commented May 31, 2024

@qiyunzhu, here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants