Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Improving alpha diversity calculation #2024

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

qiyunzhu
Copy link
Contributor

@qiyunzhu qiyunzhu commented May 12, 2024

@ebolyen This addresses the relevant issue #2014 . Now the alpha diversity metrics return np.nan instead of zero when the community is empty.

Exceptions are metrics that are solely based on observed counts, including sobs, singles, doubles and osd. They will return zero.

One relevant question is how to deal with one-taxon communities when calculating species evenness metrics. At present, pielou_e and heip_e return np.nan, whereas simpson_e returns 1. I tend to think 1 makes sense. Evenness should range between 0 and 1, in which 1 means all species have the same abundance (i.e., completely even). When there is only one species, it is guaranteed that evenness is maximized. On the other hand, it will not be ideal if a non-empty community returns a NaN value. What do you think?

@wasade This also improves how counts are validated. Specifically, it will remove zeros from the counts before calculating alpha diversity. This will save troubles in the downstream implementation of alpha diversity metrics. This will also permit efficiency improvement by directly feeding sparse matrices from BIOM into the calculation. However, the latter is not done yet. This needs further discussion. That being said, the current PR should be safe to use before the final solution.

Also cc'ing @mortonjt

Please complete the following checklist:

  • I have read the contribution guidelines.

  • I have documented all public-facing changes in the changelog.

  • This pull request includes code, documentation, or other content derived from external source(s). If this is the case, ensure the external source's license is compatible with scikit-bio's license. Include the license in the licenses directory and add a comment in the code giving proper attribution. Ensure any other requirements set forth by the license and/or author are satisfied.

    • It is your responsibility to disclose code, documentation, or other content derived from external source(s). If you have questions about whether something can be included in the project or how to give proper attribution, include those questions in your pull request and a reviewer will assist you.
  • This pull request does not include code, documentation, or other content derived from external source(s).

Note: This document may also be helpful to see some of the things code reviewers will be verifying when reviewing your pull request.

Copy link

codecov bot commented May 12, 2024

Codecov Report

Attention: Patch coverage is 96.47059% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 98.58%. Comparing base (e3a27f5) to head (4bf73ef).

Files Patch % Lines
skbio/diversity/_util.py 87.50% 3 Missing ⚠️
skbio/diversity/alpha/_base.py 96.66% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2024      +/-   ##
==========================================
- Coverage   98.59%   98.58%   -0.01%     
==========================================
  Files         182      182              
  Lines       30885    30909      +24     
  Branches     7512     7541      +29     
==========================================
+ Hits        30452    30473      +21     
- Misses        418      421       +3     
  Partials       15       15              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@qiyunzhu
Copy link
Contributor Author

Hello @ebolyen would you like to comment on whether the consideration of edge cases (empty community, one-species community) is sane in this PR (see my comments above)? Also cc'ing @gregcaporaso

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant