Transcriptome Assembly Quality Assessment
Once your assembly is complete, you'll want to know how 'good' it is, and you might want to compare the quality of the assembly to similar assemblies generated by alternative assemblers, or having run an assembly with different parameters.
There are some general ways to characterize the quality of your assembly:
-
Examine the RNA-Seq read representation of the assembly. Ideally, at least ~80% of your input RNA-Seq reads are represented by your transcriptome assembly. The remaining unassembled reads likely corresponds to lowly expressed transcripts with insufficient coverage to enable assembly, or are low quality or aberrant reads.
-
Examine the representation of full-length reconstructed protein-coding genes, by searching the assembled transcripts against a database of known protein sequences.
-
Use BUSCO to explore completeness according to conserved ortholog content.
-
Compute the E90N50 transcript contig length - the contig N50 value based on the set of transcripts representing 90% of the expression data.
-
Compute DETONATE scores. DETONATE provides a rigorous computational assessment of the quality of a transcriptome assembly, and is useful if you want to run several assemblies using different parameter settings or using altogether different tools. That assembly with the highest DETONATE score is considered the best one.
-
Try using TransRate. TransRate generates a number of useful statistics for evaluating your transcriptome assembly. Read about TransRate here: http://genome.cshlp.org/content/26/8/1134. Note that certain statistics may be biased against the large numbers of transcripts that are very lowly expressed. Consider generating TransRate statistics for your transcriptome before and after applying a minimum expression-based filter.
-
Explore rnaQUAST a quality assessment tool for de novo transcriptome assemblies.
- Trinity Wiki Home
- Installing Trinity
- Running Trinity
- Trinity process and resource monitoring
- Output of Trinity Assembly
- Assembly Quality Assessment
- Downstream Analyses
- Miscellaneous additional functionality that may be of interest
- Contributing code
- Trinity Tidbits
- Frequently Asked Questions (FAQ)
- There are too many transcripts! What do I do?
- How to minimize RAM usage
- How do I use reads I downloaded from SRA
- How do I identify the specific reads that were incorporated into the transcript assemblies?
- How can I perform cross-species analysis?
- How do I combine PE and SE reads?
- How can I run this in parallel on a computing grid?
- Computing and Time requirements
- Errors during Trinity run
- Killing Trinity
- Contact us