Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taxize always produce an API error after running for 1-2h+ #907

Open
GossypiumH opened this issue Jan 12, 2023 · 4 comments
Open

Taxize always produce an API error after running for 1-2h+ #907

GossypiumH opened this issue Jan 12, 2023 · 4 comments

Comments

@GossypiumH
Copy link

Hi,

I have an issue with taxize. I am trying to retrieve the full taxonomy (from Kingdom to Order) of a dataset with 10k+ bacterias (10182 to be exact).

I have in input a dataframe with only one column with the species names (ex: Xenorhabdus sp.) so my script is very simple, as follow :

library(taxize)
library(dplyr)
library(tidyr)

taxa = read.csv(file="/home/jbf/MEGA/KBS-MSU/G820/Abundance_matrixes/test_taxize.txt", sep="\t", header=T, check.names=F)

taxize_options(ncbi_sleep = 1.5)

test = dplyr::tbl_df(cbind(classification(taxa$specie_ID, db="ncbi", rows=1, verbose=TRUE, batch_size=5)))

I tried to play with the value of "taxize_options(ncbi_sleep = 1.5)" but apparently it doesn't change the fact that I always have an API error as follow :

Retrieving data for taxon 'Janthinobacterium sp.'

Error: {"error":"error forwarding request","api-key":"192.108.190.140","type":"ip",
"status":"ok"}

It happens at random after 1 or 2 hours of NCBI requests. I would very much like to have an idea of what is going on and if I did something wrong.

Thank you in advance,

@zachary-foster
Copy link
Collaborator

Does the error always happen on the same taxon, or is it somewhat random? For such a large query, I recommend using taxizedb, which supports offline queries of downloaded databases.

@GossypiumH
Copy link
Author

GossypiumH commented Jan 12, 2023

The error is totally random. It can happen after 30 minutes of running or after 2 hours, I never passed the two hours cap though, it always bugs before.

My problem is that I can't use taxizedb because it only works with an input that is taxon IDs and I only have names.

@zachary-foster
Copy link
Collaborator

Would this work for your purposes?

library(taxizedb)
classification(name2taxid(c('Arabidopsis thaliana', 'pig')))
#> $`3702`
#>                    name         rank      id
#> 1    cellular organisms      no rank  131567
#> 2             Eukaryota superkingdom    2759
#> 3         Viridiplantae      kingdom   33090
#> 4          Streptophyta       phylum   35493
#> 5        Streptophytina    subphylum  131221
#> 6           Embryophyta        clade    3193
#> 7          Tracheophyta        clade   58023
#> 8         Euphyllophyta        clade   78536
#> 9         Spermatophyta        clade   58024
#> 10        Magnoliopsida        class    3398
#> 11      Mesangiospermae        clade 1437183
#> 12       eudicotyledons        clade   71240
#> 13           Gunneridae        clade   91827
#> 14         Pentapetalae        clade 1437201
#> 15               rosids        clade   71275
#> 16              malvids        clade   91836
#> 17          Brassicales        order    3699
#> 18         Brassicaceae       family    3700
#> 19           Camelineae        tribe  980083
#> 20          Arabidopsis        genus    3701
#> 21 Arabidopsis thaliana      species    3702
#> 
#> $`9823`
#>                    name         rank      id
#> 1    cellular organisms      no rank  131567
#> 2             Eukaryota superkingdom    2759
#> 3          Opisthokonta        clade   33154
#> 4               Metazoa      kingdom   33208
#> 5             Eumetazoa        clade    6072
#> 6             Bilateria        clade   33213
#> 7         Deuterostomia        clade   33511
#> 8              Chordata       phylum    7711
#> 9              Craniata    subphylum   89593
#> 10           Vertebrata        clade    7742
#> 11        Gnathostomata        clade    7776
#> 12           Teleostomi        clade  117570
#> 13         Euteleostomi        clade  117571
#> 14        Sarcopterygii   superclass    8287
#> 15 Dipnotetrapodomorpha        clade 1338369
#> 16            Tetrapoda        clade   32523
#> 17              Amniota        clade   32524
#> 18             Mammalia        class   40674
#> 19               Theria        clade   32525
#> 20             Eutheria        clade    9347
#> 21        Boreoeutheria        clade 1437010
#> 22       Laurasiatheria   superorder  314145
#> 23         Artiodactyla        order   91561
#> 24                Suina     suborder   35497
#> 25               Suidae       family    9821
#> 26                  Sus        genus    9822
#> 27           Sus scrofa      species    9823
#> 
#> attr(,"class")
#> [1] "classification"
#> attr(,"db")
#> [1] "ncbi"

Created on 2023-01-12 with reprex v2.0.2

@GossypiumH
Copy link
Author

Hum ! Tank you it should probably works !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants