Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taxize package, gnr_resolve() , Error: Request Entity Too Large (HTTP 413) #899

Open
TimWOceanSciences opened this issue Sep 30, 2022 · 1 comment

Comments

@TimWOceanSciences
Copy link

Hello

I have been successful in running gnr_resolve on a data set with a few 100 rows to get matched names. However i'm getting the following error (Error: Request Entity Too Large (HTTP 413)) when running my bigger data set (79,298 rows), even with 'http="post"'. Is there any way I can easily overcome this? My data set will get larger and larger in the future so not being able to run large data sets will cause me a real headache. I know the wormsbynames() function of the worms package which I'm also using processes the data into chunks to avoid this I think. Does taxsize gnr_resolve() have an equivalent? What is the max data set size gnr_resolve() can handle?
what I run below

Taxize_test<-gnr_resolve(Formatted_Benthic_Biomass_Data_WW_TW_FINAL$Nomen,resolve_once = FALSE,best_match_only = TRUE,canonical = TRUE, http="post",fields="all",preferred_data_sources=9)
Error: Request Entity Too Large (HTTP 413)

Any help greatly appreciated.

session_info()
─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.2.0 (2022-04-22 ucrt)
os Windows 10 x64 (build 19044)
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United Kingdom.utf8
ctype English_United Kingdom.utf8
tz Europe/London
date 2022-09-30
rstudio 2022.02.2+485 Prairie Trillium (desktop)
pandoc NA

─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
ape 5.6-2 2022-03-02 [1] CRAN (R 4.2.1)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0)
backports 1.4.1 2021-12-13 [1] CRAN (R 4.2.0)
bit 4.0.4 2020-08-04 [1] CRAN (R 4.2.0)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.2.0)
bold 1.2.0 2021-05-11 [1] CRAN (R 4.2.1)
broom 0.8.0 2022-04-13 [1] CRAN (R 4.2.0)
cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.0)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.2.0)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.2.0)
cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0)
codetools 0.2-18 2020-11-04 [2] CRAN (R 4.2.0)
colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.0)
conditionz 0.1.0 2019-04-24 [1] CRAN (R 4.2.1)
crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0)
crul 1.3 2022-09-03 [1] CRAN (R 4.2.1)
curl 4.3.2 2021-06-23 [1] CRAN (R 4.2.0)
data.table 1.14.2 2021-09-27 [1] CRAN (R 4.2.0)
DBI 1.1.2 2021-12-20 [1] CRAN (R 4.2.0)
dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.2.0)
devtools * 2.4.4 2022-07-20 [1] CRAN (R 4.2.1)
digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0)
dplyr * 1.0.9 2022-04-28 [1] CRAN (R 4.2.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)
fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)
forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.2.0)
foreach 1.5.2 2022-02-02 [1] CRAN (R 4.2.1)
foreign 0.8-82 2022-01-16 [2] CRAN (R 4.2.0)
fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0)
generics 0.1.2 2022-01-31 [1] CRAN (R 4.2.0)
ggplot2 * 3.3.6 2022-05-03 [1] CRAN (R 4.2.0)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.2.0)
haven 2.5.0 2022-04-15 [1] CRAN (R 4.2.0)
hms 1.1.1 2021-09-26 [1] CRAN (R 4.2.0)
htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.2.0)
htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.2.0)
httpcode 0.3.0 2020-04-10 [1] CRAN (R 4.2.1)
httpuv 1.6.5 2022-01-05 [1] CRAN (R 4.2.0)
httr * 1.4.3 2022-05-04 [1] CRAN (R 4.2.0)
iterators 1.0.14 2022-02-05 [1] CRAN (R 4.2.1)
jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.2.0)
later 1.3.0 2021-08-18 [1] CRAN (R 4.2.0)
lattice 0.20-45 2021-09-22 [2] CRAN (R 4.2.0)
lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0)
lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.2.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
maptools * 1.1-4 2022-04-17 [1] CRAN (R 4.2.0)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.1)
mime 0.12 2021-09-28 [1] CRAN (R 4.2.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.2.1)
modelr 0.1.8 2020-05-19 [1] CRAN (R 4.2.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0)
nlme 3.1-157 2022-03-25 [2] CRAN (R 4.2.0)
pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0)
pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.2.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
pkgload 1.3.0 2022-06-27 [1] CRAN (R 4.2.1)
plyr * 1.8.7 2022-03-24 [1] CRAN (R 4.2.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.0)
processx 3.5.3 2022-03-25 [1] CRAN (R 4.2.0)
profvis 0.3.7 2020-11-02 [1] CRAN (R 4.2.1)
promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.2.0)
ps 1.7.0 2022-04-23 [1] CRAN (R 4.2.0)
purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.2.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
raster 3.5-15 2022-01-22 [1] CRAN (R 4.2.0)
Rcpp 1.0.8.3 2022-03-17 [1] CRAN (R 4.2.0)
readr * 2.1.2 2022-01-30 [1] CRAN (R 4.2.0)
readxl 1.4.0 2022-03-28 [1] CRAN (R 4.2.0)
remotes 2.4.2 2021-11-30 [1] CRAN (R 4.2.1)
reprex 2.0.1 2021-08-05 [1] CRAN (R 4.2.0)
reshape 0.8.9 2022-04-12 [1] CRAN (R 4.2.1)
rgdal * 1.5-32 2022-05-09 [1] CRAN (R 4.2.0)
rlang 1.0.4 2022-07-12 [1] CRAN (R 4.2.1)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0)
rvest 1.0.2 2021-10-16 [1] CRAN (R 4.2.0)
scales 1.2.0 2022-04-13 [1] CRAN (R 4.2.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.1)
shiny 1.7.1 2021-10-02 [1] CRAN (R 4.2.0)
sp * 1.4-7 2022-04-20 [1] CRAN (R 4.2.0)
stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0)
stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.2.0)
taxize * 0.9.100 2022-04-22 [1] CRAN (R 4.2.1)
terra 1.5-21 2022-02-17 [1] CRAN (R 4.2.0)
tibble * 3.1.7 2022-05-03 [1] CRAN (R 4.2.0)
tidyr * 1.2.0 2022-02-01 [1] CRAN (R 4.2.0)
tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.0)
tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.2.0)
triebeard 0.3.0 2016-08-04 [1] CRAN (R 4.2.1)
tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.0)
urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.2.1)
urltools 1.7.3 2019-04-14 [1] CRAN (R 4.2.1)
usethis * 2.1.6 2022-05-25 [1] CRAN (R 4.2.1)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)
uuid 1.1-0 2022-04-19 [1] CRAN (R 4.2.0)
vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0)
vroom 1.5.7 2021-11-30 [1] CRAN (R 4.2.0)
withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)
worms * 0.2.2 2018-04-25 [1] CRAN (R 4.2.1)
worrms * 0.4.2 2020-07-08 [1] CRAN (R 4.2.1)
xml2 1.3.3 2021-11-30 [1] CRAN (R 4.2.0)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.0)
zoo 1.8-10 2022-04-15 [1] CRAN (R 4.2.1)

@TimWOceanSciences
Copy link
Author

I got round it with the below

need to split the data table for taxize due to size.

chunk <- 1000
n <- nrow(Formatted_Benthic_Biomass_Data_WW_TW_FINAL)
r <- rep(1:ceiling(n/chunk),each=chunk)[1:n]
d <- split(Formatted_Benthic_Biomass_Data_WW_TW_FINAL,r)

output <- list()
for (i in seq_along(d)){
Taxize_test<-gnr_resolve(d[[i]]$Nomen,resolve_once = FALSE,best_match_only = TRUE,canonical = TRUE, http="post",fields="all",preferred_data_sources=9)
output[[i]] <- Taxize_test
}

#rejoin tibbles of gnr_resolve
Taxize_test<-bind_rows(output)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant