WIP Improving alpha diversity calculation #2024
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@ebolyen This addresses the relevant issue #2014 . Now the alpha diversity metrics return
np.nan
instead of zero when the community is empty.Exceptions are metrics that are solely based on observed counts, including
sobs
,singles
,doubles
andosd
. They will return zero.One relevant question is how to deal with one-taxon communities when calculating species evenness metrics. At present,
pielou_e
andheip_e
returnnp.nan
, whereassimpson_e
returns 1. I tend to think 1 makes sense. Evenness should range between 0 and 1, in which 1 means all species have the same abundance (i.e., completely even). When there is only one species, it is guaranteed that evenness is maximized. On the other hand, it will not be ideal if a non-empty community returns a NaN value. What do you think?@wasade This also improves how counts are validated. Specifically, it will remove zeros from the counts before calculating alpha diversity. This will save troubles in the downstream implementation of alpha diversity metrics. This will also permit efficiency improvement by directly feeding sparse matrices from BIOM into the calculation. However, the latter is not done yet. This needs further discussion. That being said, the current PR should be safe to use before the final solution.
Also cc'ing @mortonjt
Please complete the following checklist:
I have read the contribution guidelines.
I have documented all public-facing changes in the changelog.
This pull request includes code, documentation, or other content derived from external source(s). If this is the case, ensure the external source's license is compatible with scikit-bio's license. Include the license in the
licenses
directory and add a comment in the code giving proper attribution. Ensure any other requirements set forth by the license and/or author are satisfied.This pull request does not include code, documentation, or other content derived from external source(s).
Note: This document may also be helpful to see some of the things code reviewers will be verifying when reviewing your pull request.