Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous API requests #493

Open
wlandau opened this issue Mar 17, 2022 · 6 comments
Open

Asynchronous API requests #493

wlandau opened this issue Mar 17, 2022 · 6 comments
Labels
enhancement 💡 New feature or request

Comments

@wlandau
Copy link

wlandau commented Mar 17, 2022

Does paws have a way to send API requests asynchronously, particularly for uploading and downloading to/from S3? I have heard curl has async built in.

@davidkretch davidkretch added the enhancement 💡 New feature or request label Mar 25, 2022
@davidkretch
Copy link
Member

It does not. We did look into it a while back and it is kind of tricky in R because you'd need to let curl run in another process behind something like future, then return a future to the user instead of the result. We never got anything like a working example but I think it's probably technically possible.

@davidkretch
Copy link
Member

One alternative might be to run Paws itself in another process using future.

@davidkretch
Copy link
Member

There is also an open PR for downloading files from S3 directly to disk which I suppose would help when running it in another process.

@DyfanJones
Copy link
Member

Hi all,

I have been thinking about this. I think it is a possibly a limitation of the current httr package, as it doesn't call the curl async processes i.e. multi_add, multi_run. The key process we would want to use is curl::curl_fetch_multi. To get it we could extend the current httr package.

First step extend the httr package to include curl::curl_fetch_multi

library(httr)

# Create new multi
write_multi_disk <- function(path, overwrite = FALSE) {
  if (!overwrite && file.exists(path)) {
    stop("Path exists and overwrite is FALSE", call. = FALSE)
  }
  httr:::request(output = write_function("write_multi_disk", path = path, file = NULL))
}

# add method to call curl::curl_fetch_mulit
request_fetch <- function(x, url, handle) UseMethod("request_fetch")
request_fetch.write_multi_disk <- function(x, url, handle) {
  con <- file(x$path)
  curl::curl_fetch_multi(
    url, fail = failure, data = con, handle = handle
  )
  tryCatch({
    curl::multi_run()
  }, interrupt = function(cnd) {
    curl::multi_cancel(handle)
  })
  resp <- curl::handle_data(handle)
  resp$content <- httr:::path(x$path)
  resp
}

# TODO: better failure function to align with paws error handling
failure <- function(msg){
 stop(msg)
}

# Testing new method
r <- httr::VERB(
  "GET",
  url = "https://www.google.com",
  config = write_multi_disk("temp.txt",T)
)

httr::headers(r)
httr::status_code(r)
httr::content(r, as = "raw")

The big issue I see with this is the error handling, however this method could be added/developed long side the current PR #458.

Let me know your thoughts around this @wlandau @davidkretch 😄

@wlandau
Copy link
Author

wlandau commented Mar 27, 2022

@davidkretch, thanks for confirming. I thought that might be the case. @DyfanJones, that's a great point. Seems like async would belong in a package like httr. Looks like async is discussed a bit at (r-lib/httr2#1.

@DyfanJones
Copy link
Member

Been thinking about this and I think we can get async s3 downloads using the promises similar to what @davidkretch mentioned here:

One alternative might be to run Paws itself in another process using future.

Here is a basic example

library(paws)
library(promises)

future::plan(future::multisession)

s3 = paws::s3()

s3_async_download = function(Bucket, Key, Filename, svc) {
  then({
    future_promise(svc$download_file(
      Bucket = Bucket,
      Key = Key,
      Filename = Filename
    ), seed = T)
  }, onRejected = function(){
    stop(sprintf("Failed to download s3://%s/%s", Bucket, Key))
  })
}

system.time({
  s3$download_file(
    Bucket = "dummy",
    Key = "myfile.csv",
    Filename = "myfile1.csv"
  )
})
#>    user  system elapsed 
#>   0.873   1.348  33.800

system.time({
  s3_async_download(
    Bucket = "dummy",
    Key = "myfile.csv",
    Filename = "myfile2.csv"
    svc = s3
  )
})
#>    user  system elapsed 
#>   0.063   0.005   0.091

Created on 2022-04-20 by the reprex package (v2.0.1)

Seems to be really promising 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 💡 New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants