Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"winnow" transcripts to filter by coverage #1

Open
AlexGaithuma opened this issue Dec 24, 2020 · 0 comments
Open

"winnow" transcripts to filter by coverage #1

AlexGaithuma opened this issue Dec 24, 2020 · 0 comments

Comments

@AlexGaithuma
Copy link

AlexGaithuma commented Dec 24, 2020

Hi fishercera,

I read your approach on Transcriptome de novo assembly approach and am interested. However, I am not a bioinformatics expert but a DIY and "learn while doing it" kind of guy.

I want to follow the process and use it on my data.
Could you be kind enough to provide an outline of the commands you used to achieve the end result of ~20,000 transcripts. This would be very helpful. Thanks in advance. my email is akiariegaithuma@gmail.com

Your words are as follows:

I started with >100,000 transcripts in a de-novo transcriptome made from
pooled siblings' tissues.
What I have done to "winnow" transcripts is to filter by coverage, as here:
https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Transcript-Quantification#filtering-transcripts
Then I take the remaining transcripts that passed that filter and I predict
ORFs with something like Transdecoder (I used GeneMarkS-T).
THEN I cluster the predicted proteome at a 70% identity threshold using
USEARCH: https://www.drive5.com/usearch/
The centroid sequences you get from that are the ones that are most
representative of each cluster. I take the headers for the centroid
proteins and use them to pull the matching nucleotide transcripts from my
assembly.

This has generally ended up with a nice manageable transcriptome of ~20,000
transcripts. The N50 goes up considerably. And my BUSCO results are quite
good!

@AlexGaithuma AlexGaithuma changed the title "winnow" transcripts is to filter by coverage "winnow" transcripts to filter by coverage Dec 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant