Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runninf TransDecoder at a higher -m paremeter #191

Open
mollylRivers opened this issue Nov 16, 2023 · 7 comments
Open

Runninf TransDecoder at a higher -m paremeter #191

mollylRivers opened this issue Nov 16, 2023 · 7 comments

Comments

@mollylRivers
Copy link

Hi Brian,

This isn't so much an issue as a question about how TransDecoder works. I have received a transcriptome assembled from a combination of short-read and long-read sequencing using rnaSPAdes. The transcriptome has already had CD-HIT and TransDecoder applied. However, when TransDecoder was applied, -m 50 was used. This has resulted in a very large transcriptome (~400,000 transcripts), which is very unlikely to be true in this case. I have used the default -m 100 on my other transcriptomes of closely related species and have much smaller transcriptomes (~30,000 - 90,000 transcripts). I tried to apply TransDecoder to the large processed transcriptome, CD-HIT and TransDecoder applied at -m 50, and this caused the transcriptome to drastically reduce in size to ~ 17,000 transcripts. I unfortunately don't have access to the un-processed transcriptome, so I can't test how the second run of TransDecoder affects the transcriptome size.

I am wondering what is happening with this second run of TransDecoder, is it sound to run it a second time with a higher -m cutoff? I assumed that as I am changing the minimum amino acid length of the ORFs it would just remove those between 50 and 100 amino acids in length. But it is unlikely that there are over 350,000 reads that are this length in my transcriptome. Could this potentially have to do with the increased rate of false positive ORF predictions with the reduced length parameter?

Many thanks for your help,
Molly

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Nov 17, 2023 via email

@mollylRivers
Copy link
Author

Hi Brian,

Thanks so much for this information. When I looked at the transcript length distribution of the transcriptome before and after applying TransDecoder again (at -m 100), I found that when TransDecoder had been applied at -m 50 there were ~400,000 transcripts less than 1,000 bp in length. But, after reapplication of TransDecoder at -m 100, there were 0 transcripts less than 1,00 bp in length. This would explain the large decrease in transcriptome size. I wonder why this would happen and if there is a way I can prevent this from happening, as there will likely be transcripts of interest that are less than 1,000 bp in length?

Many thanks,
Molly

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Nov 20, 2023 via email

@mollylRivers
Copy link
Author

Hi Brian,

Thank you for your response. I did retry running in a new directory but there was no change in the number of transcripts produced. I have attached a txt file with the error message I produced. I am not really sure what the problem might be, hopefully you can help me decipher it.

Many thanks,
Molly

TransDecoder_m100_error_message.txt

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Nov 21, 2023 via email

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Nov 23, 2023 via email

@mollylRivers
Copy link
Author

Hi Brian, Thanks for this, it seems that that may have been the problem all along. Sorry for wasting your time. Thanks, Molly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants