Asynchronous API requests #493

wlandau · 2022-03-17T16:02:49Z

Does paws have a way to send API requests asynchronously, particularly for uploading and downloading to/from S3? I have heard curl has async built in.

The text was updated successfully, but these errors were encountered:

davidkretch · 2022-03-25T20:00:08Z

It does not. We did look into it a while back and it is kind of tricky in R because you'd need to let curl run in another process behind something like future, then return a future to the user instead of the result. We never got anything like a working example but I think it's probably technically possible.

davidkretch · 2022-03-25T20:52:27Z

One alternative might be to run Paws itself in another process using future.

davidkretch · 2022-03-25T20:59:01Z

There is also an open PR for downloading files from S3 directly to disk which I suppose would help when running it in another process.

DyfanJones · 2022-03-26T11:24:34Z

Hi all,

I have been thinking about this. I think it is a possibly a limitation of the current httr package, as it doesn't call the curl async processes i.e. multi_add, multi_run. The key process we would want to use is curl::curl_fetch_multi. To get it we could extend the current httr package.

First step extend the httr package to include curl::curl_fetch_multi

library(httr)

# Create new multi
write_multi_disk <- function(path, overwrite = FALSE) {
  if (!overwrite && file.exists(path)) {
    stop("Path exists and overwrite is FALSE", call. = FALSE)
  }
  httr:::request(output = write_function("write_multi_disk", path = path, file = NULL))
}

# add method to call curl::curl_fetch_mulit
request_fetch <- function(x, url, handle) UseMethod("request_fetch")
request_fetch.write_multi_disk <- function(x, url, handle) {
  con <- file(x$path)
  curl::curl_fetch_multi(
    url, fail = failure, data = con, handle = handle
  )
  tryCatch({
    curl::multi_run()
  }, interrupt = function(cnd) {
    curl::multi_cancel(handle)
  })
  resp <- curl::handle_data(handle)
  resp$content <- httr:::path(x$path)
  resp
}

# TODO: better failure function to align with paws error handling
failure <- function(msg){
 stop(msg)
}

# Testing new method
r <- httr::VERB(
  "GET",
  url = "https://www.google.com",
  config = write_multi_disk("temp.txt",T)
)

httr::headers(r)
httr::status_code(r)
httr::content(r, as = "raw")

The big issue I see with this is the error handling, however this method could be added/developed long side the current PR #458.

Let me know your thoughts around this @wlandau @davidkretch 😄

wlandau · 2022-03-27T15:26:29Z

@davidkretch, thanks for confirming. I thought that might be the case. @DyfanJones, that's a great point. Seems like async would belong in a package like httr. Looks like async is discussed a bit at (r-lib/httr2#1.

DyfanJones · 2022-04-20T15:01:06Z

Been thinking about this and I think we can get async s3 downloads using the promises similar to what @davidkretch mentioned here:

One alternative might be to run Paws itself in another process using future.

Here is a basic example

library(paws)
library(promises)

future::plan(future::multisession)

s3 = paws::s3()

s3_async_download = function(Bucket, Key, Filename, svc) {
  then({
    future_promise(svc$download_file(
      Bucket = Bucket,
      Key = Key,
      Filename = Filename
    ), seed = T)
  }, onRejected = function(){
    stop(sprintf("Failed to download s3://%s/%s", Bucket, Key))
  })
}

system.time({
  s3$download_file(
    Bucket = "dummy",
    Key = "myfile.csv",
    Filename = "myfile1.csv"
  )
})
#>    user  system elapsed 
#>   0.873   1.348  33.800

system.time({
  s3_async_download(
    Bucket = "dummy",
    Key = "myfile.csv",
    Filename = "myfile2.csv"
    svc = s3
  )
})
#>    user  system elapsed 
#>   0.063   0.005   0.091

^{Created on 2022-04-20 by the reprex package (v2.0.1)}

Seems to be really promising 😉

davidkretch added the enhancement 💡 New feature or request label Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asynchronous API requests #493

Asynchronous API requests #493

wlandau commented Mar 17, 2022

davidkretch commented Mar 25, 2022

davidkretch commented Mar 25, 2022

davidkretch commented Mar 25, 2022

DyfanJones commented Mar 26, 2022

wlandau commented Mar 27, 2022

DyfanJones commented Apr 20, 2022

Asynchronous API requests #493

Asynchronous API requests #493

Comments

wlandau commented Mar 17, 2022

davidkretch commented Mar 25, 2022

davidkretch commented Mar 25, 2022

davidkretch commented Mar 25, 2022

DyfanJones commented Mar 26, 2022

wlandau commented Mar 27, 2022

DyfanJones commented Apr 20, 2022