New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
all .pro assemblies #242
Comments
Assemblies done (measured by 3,378,813 (no idea why in ~300 cases, no Number of empty assemblies: 2,890,521 Thus, non-empty assemblies (i.e. both 488,292 (14.4%) For reference, 19% of the rVert assemblies were non-empty. |
Likely the assembly failed. Can you collect few logs out there? |
Can do, let me just finish with the bulk of the results first. Number of non-empty 168,460 |
Hi @rchikhi Minor feature request/suggestion for future runs: can you combine all micro-assemblies into one FASTA file? This file should not be too big, only around 1 Gb or so. This would be easier to process on Linux than millions of small FASTAs or millions of directories, each with a small/empty FASTA. This would require embedding the SRA identifier in the sequence label a.k.a. FASTA defline, e.g. as a prefix |
Data availabilityIndividual assemblies (excluding empty files):
Individual motifator analyses of the above assemblies:
For download convenience, the above two folders (assemblies and motifator analyses) are packaged into a tar.gz file each:
All these folders are relatively small (~10GB) but have in the order of millions of files. |
In addition, for @rcedgar, here are all the motifator outputs (just the LHF files) concatenated into a single file: s3://serratus-rayan/pro-assembly/all.before_rr.LHF.fasta SRR id is added as follows: |
And concatenated unitigs/contigs: s3://serratus-rayan/pro-assembly/all.before_rr.fasta |
For reference, these assemblies were performed using that script: and motifator was run using that script: |
here's an exhaustive list of "reads" that are above 600 bp among the single-end libraries: https://serratus-rayan.s3.amazonaws.com/rdrp-pan-assembly/prelim/all_se.above_600bp.txt from that list I extracted the set of 719 accessions that are deemed not to be Illumina short reads: https://serratus-rayan.s3.amazonaws.com/rdrp-pan-assembly/prelim/nonILMN.txt |
Coverage analysis of the motifator hits within the .pro assembliess3://serratus-rayan/pro-assembly/depth_summary.csv schema: where p_cvgX is the percentage of bases of the region where coverage is >= X code used to generate those results |
This thread will be for updates of the
.pro
assemblies.number of
.pro.gz
files analyzed (all of s3://serratus-public/out/21* except *r1p*):5,726,283
number of
.fasta.gz
obtained after converting.pro
to FASTA and discarding empty files:3,379,127
The text was updated successfully, but these errors were encountered: