Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculating SAC on metagenome clusters #36

Open
nmb85 opened this issue Sep 4, 2020 · 3 comments
Open

Calculating SAC on metagenome clusters #36

nmb85 opened this issue Sep 4, 2020 · 3 comments

Comments

@nmb85
Copy link

nmb85 commented Sep 4, 2020

@luizirber, one more thing for today (not intending to distract you), it would be really interesting if you could calculate the species accumulation curve (SAC) for hash sets in clusters of metagenomes in your monster wort database. For example, when looking at soil metagenomes as a cluster, you could build a matrix of hashes (such as here), calculate different orders of intersection between hash sets from the soil metagenomes, and then plot an SAC from the hashes. While this might be impossible with kmers, and species tallies are corrupted by incomplete annotation due to incomplete databases, hashes might give you a chance to get an accurate SAC based on plotting the effect of incrementally adding hash sets and seeing the change in intersection sets. See equation 3 in this paper for a definitive explanation. Then you could efficiently use all the data in the SRA and JGI dbs to estimate if the species count based on current soil metagenome is "open" (SAC fits a power law function) or "closed" (SAC fits an exponential function), that is, whether or not we've collected enough data to estimate an asymptote for the number of species (in this case using hashes as a proxy) in soil metagenomes (or some other interesting biome). Although I'm not a soil biologist, I think that's a major question in their field. Other biomes might be interesting too. Not sure if anyone has tried this with raw kmers, but it would seem too gargantuan of a task. Hashes might make this problem tractable?

@luizirber
Copy link
Member

That is a really good idea... and a monstrous matrix 🤣

I'll work on sharing all the sigs in a couple of weeks, but it is not something I can tackle at the moment 😢

@ctb
Copy link

ctb commented Sep 5, 2020

yes! we explored this quite a bit a while back for tara, see https://github.com/ctb/2017-sourmash-rarefy/blob/master/tara-rarefy.ipynb for an example. Haven't looked at the code in a while tho ;).

@nmb85
Copy link
Author

nmb85 commented Nov 19, 2020

Have you already seen this?
https://ieeexplore.ieee.org/abstract/document/9139876

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants