Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] - add assembly accession support to bactopia search #476

Open
tfischer78 opened this issue Feb 1, 2024 · 6 comments
Open
Labels
enhancement New feature or request question Further information is requested

Comments

@tfischer78
Copy link

Hello, I'm trying to use genomes from public databases and having a problem. The sequence I'm analyzing starts "SRR" and the program gives me an error saying it needs "SRX*".

I've truncated the accessions below

bactopia \ --accession SRR101**

Similar problem when I call a assembly and bioproject.

bactopia-search -q PRJNA**

The bioproject gives a warning that there are no reads in ENA, but it is from NCBI.

bactopia \ --accession DALQ**

@tfischer78 tfischer78 added the question Further information is requested label Feb 1, 2024
@rpetit3
Copy link
Member

rpetit3 commented Feb 1, 2024

Hi @tfischer78,

Thank you letting me know of this issue. Could you by chance share the full accession so I can get a fix put in place for you?

Cheers,
Robert

@tfischer78
Copy link
Author

tfischer78 commented Feb 1, 2024 via email

@rpetit3
Copy link
Member

rpetit3 commented Feb 1, 2024

Alright here's what I'm thinking

For accession: SRR10177533

I made it require SRX accessions because of the way the accessioning hierarchy, a single sample could have multiple runs (SRR), but only one experiment to represent the sequencing.

So, first we'll will want to use bactopia search to convert the SRR to SRX.

bactopia search --query SRR10177533
bactopia --accessions bactopia-accessions.txt

Or, for one sample I would probably just pull it from NCBI manually: SRX6899308

bactopia --accession SRX6899308

Now for the assemblies. Unfortunately, at the moment there isn't a way to rapidly pull assembly accessions using bactopia directly. I've been thinking this would be a nice feature to add (e.g. allow assembly accessions in bactopia-search)

So for, PRJNA514245, this might be a problem (unless you already know this), but it has 300K+ assemblies associated with it. So I will avoid it for now.


finally for, DALQDS000000000, Bactopia expects the NCBI Assembly accession (GCF_, GCA_). So you will want figure that out (spoiler, it is: GCA_027253235).

With the NCBI Assembly accession, you can then pass it to Bactopia

bactopia --accession GCA_027253235

Hope this helps!

@rpetit3 rpetit3 added the enhancement New feature or request label Feb 1, 2024
@rpetit3 rpetit3 changed the title using bactopia with SRA and assemblies from NCBI. [feature request] - add assembly accession support to bactopia search Feb 1, 2024
@rpetit3
Copy link
Member

rpetit3 commented Feb 1, 2024

haha, Tony I hope you're OK with me hijacking this to turn it into a feature request!

I'll also improve the working on the "error saying it needs "SRX*" to suggest using "bactopia search"

@tfischer78
Copy link
Author

tfischer78 commented Feb 1, 2024 via email

@rpetit3
Copy link
Member

rpetit3 commented Feb 1, 2024

I noticed that disclaimer as well. I'm hoping once it goes through it was be an automatic change (e.g. an Assembly accession takes you straight to Datasets)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants