New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
assemblies of 1k random SRAs from novel_id90.fa #246
Comments
All contigs having a motifator hit: |
hmmsearch results of those contigs versus Pfam-A:
|
CheckV on those contigs:
|
Virsorter2 results on those contigs:
|
added the FASTA files of the viruses found by VirSorter2:
|
Palmscan on the above virsorter2 sequences + usearch_global (id=0.7) to palmdb using https://github.com/rcedgar/palmdb/blob/main/2021-03-14/uniques.fa.gz (as palmdb.fa in the scripts below). results:
|
Diamond of virsorter2 sequences to nr. I took the "full" and "less than 2 genes" FASTA contigs produced by virsorter2. I discarded the "partial" ones as it's a tiny category (20 sequences out of 161536, wouldn't be significant to analyze, but let me know if you'd still like to have results for them). diamond cmdline: results for full: results for lt2genes: |
An analysis of the above raw results: 59,488 contigs annotated as full viruses:
93,728 contigs annotated as having less than 2 genes (lt2genes):
What I mean by 'nr entry' is an ENA accession, e.g. My criteria for 'similar to known entry' is:
Criteria for 'novel':
Notebook of the analysis: https://gitlab.pasteur.fr/rchikhi_pasteur/serratus-rdrp-analysis/-/blob/master/1000_random/virsorter_fasta_analysis/notebook.ipynb TSV results:
Preview:
Preview:
|
And finally, the aggregation of the two previous palmdb+diamond analyses: Aggregated table for both full and lt2genes contigs: Preview:
|
Comparison between the above TSV file and the one @rcedgar produced in https://github.com/ababaian/serratus/wiki/Viral-contigs-containing-RdRP for a different set of contigs:
Set statistics:
Curiously, 5,307 full contigs in 1000_random_sra are not in
|
List of the 1,000 random SRAs accessions:
s3://serratus-rayan/1krandom-assembly/1000_random_sra.txt
(obtained through:
cat novel_id90.fa | grep ">" | cut -c 2- |sort |uniq | shuf -n 1000
)952 SPAdes log files were produced (meaning SPAdes at least started running the assembly). I could investigate why the remaining 48 failed if you'd like (but it doesn't seem so important). For 70 of the 952 SRAs, SPAdes didn't produce an output, likely because it ran out of memory (for the few cases I eyeballed). So, as a result:
882 SRAs were assembled into a
scaffolds.fasta
file:s3://serratus-rayan/1krandom-assembly/1000_random_sra.scaffolds.txt
The text was updated successfully, but these errors were encountered: