Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quenya assemblies #238

Open
rchikhi opened this issue Dec 11, 2020 · 5 comments
Open

Quenya assemblies #238

rchikhi opened this issue Dec 11, 2020 · 5 comments

Comments

@rchikhi
Copy link
Collaborator

rchikhi commented Dec 11, 2020

All AA-guided assemblies: s3://serratus-public/assemblies/quenya/gene_clusters/
Those are all the coronaspades assemblies matching the input RdRp file (quenya.protref.aa)
Could be assembled: 452 out of 497 (list: s3://serratus-public/assemblies/quenya/rdrps_analysis/list_assembled_quenya.txt)

All coronaspades output files: s3://serratus-public/assemblies/quenya/other/

@rchikhi
Copy link
Collaborator Author

rchikhi commented Dec 11, 2020

all RdRPs present in the above gene_clusters files: s3://serratus-public/assemblies/quenya/rdrps/

extracted using this script. In a nutshell: all tblastn hits of quenya contigs (gene_clusters) to quenya.protref.aa that have length above 550 100 (somewhat arbitrary, but most RdRps seem to be above 550) regardless of identity, then grouped by bedtools to extract unique regions from contigs (because the same contig region may match to several hits from quenya.protref.aa).

@rchikhi
Copy link
Collaborator Author

rchikhi commented Dec 13, 2020

All the above RdRps in a single FASTA file: s3://serratus-public/assemblies/quenya/rdrps_analysis/all_rdrps.fa

Diamond of all_rdrps.fa file against nr: s3://serratus-public/assemblies/quenya/rdrps_analysis/all_rdrps.fa.diamond_vs_nr.fmt6-custom
cmdline: \time diamond blastx --db ~/diamonddb/nr --query all_rdrps.fa -p 48 --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen qseq sseq

@rchikhi
Copy link
Collaborator Author

rchikhi commented Dec 14, 2020

A different analysis direction:

all gene_clusters.fa files are concatenated here: s3://serratus-public/assemblies/quenya/rdrps_analysis/all_gene_clusters.fa

Diamond blastx of this file against rdrp0_Q_D.fa: s3://serratus-public/assemblies/quenya/rdrps_analysis/all_gene_clusters.fa.diamond_vs_rdrp0_q_d.fmt6

@rchikhi
Copy link
Collaborator Author

rchikhi commented Dec 20, 2020

Pathracer-seq-fs --max-fs 0 applied to RdRP_[X].hmm (with X=1,2,3,4,q) versus all_gene_clusters.fa:

s3://serratus-public/assemblies/quenya/rdrps_analysis/pathracer_seq_fs/

@rchikhi
Copy link
Collaborator Author

rchikhi commented Dec 20, 2020

Pathracer applied to RdRP_q.hmm and all assembly graphs:

s3://serratus-public/assemblies/quenya/rdrps_analysis/pathracer/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant