Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl error in griddap function - struggling to download bulk data #114

Open
nhill917 opened this issue Nov 29, 2023 · 6 comments
Open

curl error in griddap function - struggling to download bulk data #114

nhill917 opened this issue Nov 29, 2023 · 6 comments

Comments

@nhill917
Copy link

Hi, I am attempting to download a range of enviro products (eg: sst, chl-a, O2, ssha) for the Pacific Ocean (latitude: -50,50 and longitude: 130,300) at monthly timesteps from ~1998-2022 to inform SDM models. Ideally, I'd like to write a loop or similar that can download all chl-a layers in one go and run it over night as an example. But currently, I am struggling to download a single layer without an error popping up. I was able to run the vignette fine but when I try to do more I often get an error when I run the griddap() function.

The code I am running is attempting to run a loop that downloads a single nc file, saves it and then deletes it before downloading the next so as to not overfill R memory. Below, is an example where I have tried to download a year's worth (n=12) sst layers. It temperamentally runs when for a single file, but it seems very fiddly currently. Also, even when I assign a different url path, it seems to go back to the default when run.

Any help on how to get this code running, or more broadly how best to download the kind of data I list out above is apprecaited.
Kind regards,
Nick.

######################## Code #######################

Define out dir

out_dir = paste0('./Dat/ExplanatoryVars/sst')

Define params

lats = c(-50, 50)
longs = c(130, 300)
#times <- seq(as.Date("2019-01-01"), as.Date("2019-12-01"), by = "1 month")
times <- data.frame(t1 = seq(as.Date("2018-02-01"), as.Date("2018-12-01"), by = "1 month"),
t2 = seq(as.Date("2018-02-02"), as.Date("2018-12-02"), by = "1 month"))
info <- info('erdMH1sstdmdayR20190SQ_Lon0360')
field <- 'sstMasked'

outs <- tibble()
for (i in 1:nrow(times)) {
tm <- c(paste0(times[i,1]), paste0(times[i,2])) # create time vector
nm <- paste0('sst_', paste0(times[i,1])) # create file name
x <- griddap(info, latitude = lats, longitude = longs,
time = tm, fields = field, fmt = 'nc', url = 'https://apdrc.soest.hawaii.edu/erddap/') # read in data
save(x, file = paste0(out_dir,'/',nm,'.nc')) # save netcdf
rm(x) #remove netcdf
}
#################### Code ##############################

info() output passed to x; setting base url to: https://upwell.pfeg.noaa.gov/erddap
Error in curl::curl_fetch_disk(x$url$url, x$disk, handle = x$url$handle) :
Failure when receiving data from the peer

@rmendels
Copy link
Collaborator

@nhill917

replace:

x <- griddap(info, latitude = lats, longitude = longs,
time = tm, fields = field, fmt = 'nc', url = 'https://apdrc.soest.hawaii.edu/erddap/')

with:

x <- griddap(info, latitude = lats, longitude = longs,
time = tm, fields = field, fmt = 'nc', url = 'https://coastwatch.pfeg.noaa.gov/erddap/')

I just tried it for 'i <- 1' and not worked just fine. HTH and good luck on your research.

@rmendels
Copy link
Collaborator

@nhill917

BTW - if what you are doing is getting environmental data along a track, look at the function 'rxtracto()' in the 'rerddapXtracto' package. It is designed to do just that.

@nhill917
Copy link
Author

Thank you for the quick reply. The code you provided ran successfully, but the loop still fell over after ~3 iterations with the same error. It also still seemed to default back to the proxy url even though I used the same line of code you provided. I am looking to apply SDMs, so I need the enviro layers, not just the environment where the presence points are for my study species. I have explored the rerrdapXtracto package and it looks great but I was having similar time out issues, so I am currently trying the rerddap package as an alternate.

I also had a more generic Q around the 'time' argument in the griddap function. If I give this argument a span of one month for a daily product, will it give me a single averaged layer or 30 individual layers? Equally when requesting data from a monthly composite product, do I just need to request a single day in January to get the january layer, or do I need to request the full month? I hope this makes sense.

See code and error below. Any help is much appreciated.
Kind regards,
Nick.

lats = c(-50., 50.)
longs = c(130., 300)
times <- data.frame(t1 = seq(as.Date("2018-02-01"), as.Date("2018-12-01"), by = "1 month"),
t2 = seq(as.Date("2018-02-02"), as.Date("2018-12-02"), by = "1 month"))
info <- info('erdMH1sstdmdayR20190SQ_Lon0360')
field <- 'sstMasked'

outs <- tibble()
for (i in 1:nrow(times)) {
tm <- c(paste0(times[i,1]), paste0(times[i,2])) # create time vector
nm <- paste0('sst_', paste0(times[i,1])) # create file name
x <- griddap(info, latitude = lats, longitude = longs,
time = tm, fields = field, fmt = 'nc', url = 'https://coastwatch.pfeg.noaa.gov/erddap/') # read in data
assign(nm, x)
save(list = paste0(nm), file = paste0(out_dir,'/',nm,'.nc')) # save netcdf
rm(list = c('x', paste0(nm))) # remove netcdf
}

info() output passed to x; setting base url to: https://upwell.pfeg.noaa.gov/erddap
info() output passed to x; setting base url to: https://upwell.pfeg.noaa.gov/erddap
info() output passed to x; setting base url to: https://upwell.pfeg.noaa.gov/erddap
Error in curl::curl_fetch_disk(x$url$url, x$disk, handle = x$url$handle) :
Failure when receiving data from the peer

@rmendels
Copy link
Collaborator

@nhill917 First I am at home, so I have no special access. see below.

If you get rerddap::info() for a dataset, that is where the base url is read from, so you don't need the url, and in your case the url you are giving differs from the url in info. Second, rerddap works by bounding boxes, so you can give a time min and max, a latitude min and max, a longitude min and max, and it will get that data in one chunk, assuming that the time-lat-lon region isn't too big. I can I suggest you further read the vignettes. A lot of this is covered in them. A time span of a month for a daily product will return the 30 or 31 days of data, but in "tidy long-form". For a monthly dataset you just have to be careful because they are centered times and will return the closest, usually using the 15th or 16th is a safe bet. what will be return is the values for the one time that fall within the lat-on box

I would also suggest before you run again that you run:

rerddap::cache_delete_all()

HTH.

> for (i in 1:10) {
+     tm <- c(paste0(times[i,1]), paste0(times[i,2])) # create time vector
+     nm <- paste0('sst_', paste0(times[i,1])) # create file name
+     x <- griddap(info, latitude = lats, longitude = longs,
+                  time = tm, fields = field, fmt = 'nc')
+ }
info() output passed to x; setting base url to: https://upwell.pfeg.noaa.gov/erddap
info() output passed to x; setting base url to: https://upwell.pfeg.noaa.gov/erddap
info() output passed to x; setting base url to: https://upwell.pfeg.noaa.gov/erddap
info() output passed to x; setting base url to: https://upwell.pfeg.noaa.gov/erddap
info() output passed to x; setting base url to: https://upwell.pfeg.noaa.gov/erddap
info() output passed to x; setting base url to: https://upwell.pfeg.noaa.gov/erddap
info() output passed to x; setting base url to: https://upwell.pfeg.noaa.gov/erddap
info() output passed to x; setting base url to: https://upwell.pfeg.noaa.gov/erddap
info() output passed to x; setting base url to: https://upwell.pfeg.noaa.gov/erddap
info() output passed to x; setting base url to: https://upwell.pfeg.noaa.gov/erddap

@rmendels
Copy link
Collaborator

@nhill917 If I have the right person you are at University of Tasmania is that correct? if so, and you are accessing from the University, other people at Universities in Australia and New Zealand have had problems also because the networks are behind things like Akamai or Cloudfare that act as proxies and can really mess up the interface. If you keep having problems, try running from outside the University, or talk to your IT about how to get around Cloudfare or whichever one they are using. Someone else knew how to do that and it completely solved similar problems. rerddap allows you to pass any settings for the "curl" command in the gridded() call, and you may need to be setting some of those. You also might try setting the verbose option in the gridded() call.

@nhill917
Copy link
Author

Thanks Roy I'll keep digging and get back to you. I got a decent run going this afternoon so that's a win. Thanks so much for looking into this so thoroughly for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants