Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustering dada2-obtained ASVs into OTUs with DECIPHER #1944

Open
ConstanceBtd opened this issue May 2, 2024 · 7 comments
Open

Clustering dada2-obtained ASVs into OTUs with DECIPHER #1944

ConstanceBtd opened this issue May 2, 2024 · 7 comments

Comments

@ConstanceBtd
Copy link

Hello,

I have a phyloseq object that I obtained with the dada2 pipeline. I would like to cluster those ASVs into OTUs. For that, I have the following code (just the start of it):

clusters<-DistanceMatrix(refseq(physeq_asv), includeTerminalGaps = T, processors=1)
clusters<-TreeLine(clusters,method="single", cutoff=0.03, processors=NULL)

with physeq_asv being my phyloseq object obtained with dada2.
For the second line I have the following error:

Error in DECIPHER::TreeLine(clusters_matrix, method = "single", cutoff = 0.03,  :
  myDistMatrix must be a matrix for method 'single'.

Does anyone know where it could come from ?

Thanks !

@benjjneb
Copy link
Owner

benjjneb commented May 3, 2024

I'm not sure, this is an error coming from the DECIPHER package, but the developer of that package is @digitalwright and they might be able to shed more light.

One note is that the code block you posted and the code causing the error don't exactly match up, one is using the variable name clusters while the other is using the variable name clusters_matrix.

@ConstanceBtd
Copy link
Author

Thanks for your answer.
Sorry for the mix-up, I did change the variable name afterwards and forgot to copy the new error message (same as the one I pasted but with clusters instead of cluster_matrix).

@digitalwright
Copy link

You need to specify myDistMatrix=clusters.

@ConstanceBtd
Copy link
Author

Thank you, it works now !

@ConstanceBtd
Copy link
Author

@digitalwright, I'm sorry to bother you again but I have another question regarding ASV clustering with DECIPHER.
I am using the code I found here to transform the ASV phyloseq that I already have into a phyloseq object clustered into OTUs:

d <- DECIPHER::DistanceMatrix(refseq(physeq_asv),type = "matrix", includeTerminalGaps = T, processors=1)

clusters <- DECIPHER::TreeLine(
  myDistMatrix=d, 
  method = "single",
  cutoff = 0.03, # use `cutoff = 0.03` for a 97% OTU 
  processors = 1)

## Use dplyr to merge the columns of the seqtab matrix for ASVs in the same OTU
# prep by adding sequences to the `clusters` data frame
clusters <- clusters %>%
  add_column(sequence = asv_sequences)

merged_seqtab <- seqtab %>% 
  t %>%
  rowsum(clusters$cluster) %>%
  t

With this code I encounter the following error:

> clusters <- clusters %>%
+   add_column(sequence = asv_sequences)
Error:
! The `.data` argument of `add_column()` must be a data frame as of tibble 2.1.1.
Backtrace:
 1. clusters %>% add_column(sequence = asv_sequences)
 2. tibble::add_column(., sequence = asv_sequences)
 3. lifecycle::deprecate_stop("2.1.1", "add_column(.data = 'must be a data frame')")
 4. lifecycle:::deprecate_stop0(msg)

The problem comes from the fact that I have updated the code with the TreeLine function (replacing the IdClusters one) but the output for TreeLine is a dendrogram which I have trouble transforming into a matrix to make the add_column work. Do you have any idea on how I could make this code work ?

Thanks again !

@digitalwright
Copy link

You need to add TreeLine(..., type="clusters").

@ConstanceBtd
Copy link
Author

Thank you so much ! It worked

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants