Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After combine two datasets together to recluster, some of the abundance of high-abundant OTUs were reduced dramatically. #514

Open
peiyaohu opened this issue Feb 28, 2023 · 2 comments

Comments

@peiyaohu
Copy link

peiyaohu commented Feb 28, 2023

I have two datasets A and B, where there is a high-abundance OTU (id: OTU_54) in dataset A. In order to compare the abundance of OTU_54 in the two datasets, I put the raw sequencing data of A and B together (=>A+B), followed the example steps provided on the website to cluster (the parameters are the same as when A and B analyzed), and found that the OTU_54 in the original A dataset had very low abundance in the otutab(A+B) produced by the new clustering.

So I blast all.nonchimeras.fasta (the file before cluster at 97% similarity) of A, B and A+B with OTU_54, and filtered the blast results according to identity > 97%, alignment length>300, and checked the number of matches, and found that A+B lost a lot of OTU_54.

wc -l filt_nonchim*               # filtered blast results. 
   76966 filt_nonchim18.txt   #generated from datasetB
  157240 filt_nonchim19.txt  #generated from datasetA
   12369 filt_nonchim.txt       #generated from A+B

How can I address or optimize the analysis process? Thanks!

@frederic-mahe
Copy link
Collaborator

and found that the OTU_54 in the original A dataset had very low abundance in the otutab(A+B) produced by the new clustering.

This is a known downside of using a centroid-based fix-threshold clustering approach: some clusters shrink or disappear when adding more data.

A given centroid1 can be abundant in a sample A, but close to a more abundant centroid2 present in a sample B. If you clusterize A+B, then centroid2 captures some or all the reads initially captured by centroid1.

and checked the number of matches, and found that A+B lost a lot of OTU_54.

If I understand correctly, reads from OTU_54 are not lost, but were re-distributed into other OTUs. There is not much that can be done to mitigate that downside.

@peiyaohu
Copy link
Author

peiyaohu commented Mar 1, 2023

Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants