You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would probably not extend the set of HMMs so rapidly and instead go for more "iterative" approach:
Take all Epsy assemblies
We expect that there should be a missed part that contains (quite weak) match to Spike_torovirin model
Take assembly graphs, try to find the subgraph with missed match. Check whether there is some evidence for poly-A tail (e.g. high coverage tail in case of trimmed assemblies)
Extract the missed part and try to improve the Spike model, probably also looking for other more or less conservative matches
Note that 1. above (set of CoV genomes vs Pfam) is already effectively done by https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6295324/ There is a set of 93 HMMs available. We just need to take the newer versions of them, if available (it was based on Pfam release 31 and we're at 33 these days).
@asl @rcedgar
To help coronaSPAdes identify CoV-associated contigs from the full assembly graph, we need to expand our set of target HMMs.
Versions:
The next version could run the full Pfam HMM library against the following sequences:
cov3ma
)cov3ma
+ other Nidovirales genomesAlso, we should figure out whether to run
hmmsearch
with max sensitivity--max -E 0.01
, or be conservative and use--cut_ga
.The text was updated successfully, but these errors were encountered: