Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all RdRP+ macro/micro contigs #252

Open
rchikhi opened this issue Mar 27, 2021 · 1 comment
Open

all RdRP+ macro/micro contigs #252

rchikhi opened this issue Mar 27, 2021 · 1 comment

Comments

@rchikhi
Copy link
Collaborator

rchikhi commented Mar 27, 2021

This issues describes the procedure to search all of our contigs against RdRP and presents results.
(Slack thread: https://hackseq-rna.slack.com/archives/C012H9SDQCA/p1615948152031200)

Input: FASTA files of contigs, either assembled using micro (all SRA .pro DIAMOND hits assembled with rnaviralspades) or macro (all from s3://lovelywater/assembly/contigs/, i.e. all CoV + dicistro + quenya + satellite + 1k random subset assembled using either coronaSPAdes or rnaviralspades).

Output: FASTA of all the contigs that hit RdRP either with HMM and/or palmscan, i.e. the RdRP+ contigs:
s3://serratus-rayan/pro-assembly/rdrpplus.micro.fa
s3://serratus-rayan/pro-assembly/rdrpplus.macro.fa
total size: 8.2 GB

hmmsearch was run using an exhaustive collection of RdRP HMMs:
https://gitlab.pasteur.fr/rchikhi_pasteur/serratus-rdrp-analysis/-/blob/master/hmm_macro_micro/RdRP_all.v2.hmm

alignments were made using this script:
https://gitlab.pasteur.fr/rchikhi_pasteur/serratus-rdrp-analysis/-/blob/master/hmm_macro_micro/align_hmm_to_contigs.sh

@rchikhi rchikhi changed the title all macro/micro assemblies having a RdRP (HMM or palm) hit all RdRP+ macro/micro contigs Mar 27, 2021
@rchikhi
Copy link
Collaborator Author

rchikhi commented Mar 27, 2021

Some stats:

  • number of macro RdRP+ contigs: 6,822,262
    total size: 5,828,043,099 bp
    longest macro RdRP+ contig: 1,086,412 bp
    N50: 1,870 bp

  • number of micro RdRP+ contigs: 4,631,850
    total size : 2,158,973,751 bp
    longest micro RdRP+ contig: 16,630 bp
    N50 : 622 bp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant