Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to separate genomes from .genomic.fna.gz files? #20

Open
jmwhitha opened this issue Aug 31, 2020 · 2 comments
Open

Is there a way to separate genomes from .genomic.fna.gz files? #20

jmwhitha opened this issue Aug 31, 2020 · 2 comments

Comments

@jmwhitha
Copy link

Hi Vitor,

Thanks for making this application.

I was wondering if there is a way to use it so that I can separate the genomes once I've downloaded the genomic.fna.gz files? I have tried to use awk but the formatting varies a good bit for genomes. As you probably know, sometimes the descriptions have "sp." or "strain", sometimes they have "Scaffolds" or "contigs", etc., which makes it hard but not impossible to separate individual genomes.

If your application cannot separate the genomes either, are you familiar with any applications or scripts that can?

Thank you,
Jason

@pirovc
Copy link
Owner

pirovc commented Sep 1, 2020

Hi Jason,

There's currently no way to do that with genome_updater.

I believe you could parse the assembly_summary.txt file of the current version and get the information you need to separate the files. Check the fields 9 and 12, more info here: ftp://ftp.ncbi.nlm.nih.gov/genomes/README_assembly_summary.txt

In the assembly_summary.txt, the first column is the assembly accession which points you to the file downloaded with genome_updater if you use: {output_dir}/{version}/files/{assembly_accession}*genomic.fna.gz

I hope that helps, I will leave this issue open and mark this an enhancement so I may include some of those features in the next release.

Best
Vitor

@jmwhitha
Copy link
Author

jmwhitha commented Sep 2, 2020

Thank you so much for pointing me to the assembly_summary.txt. This seems like a good starting point to a solution.

Looking forward to the enhancement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants