Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: replace {crul} with {httr2} #157

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from
Draft

refactor: replace {crul} with {httr2} #157

wants to merge 20 commits into from

Conversation

maelle
Copy link
Member

@maelle maelle commented Feb 2, 2023

Fix #156

@maelle
Copy link
Member Author

maelle commented Feb 2, 2023

@dpprdan
Copy link
Member

dpprdan commented Feb 2, 2023

Yeah, watch out for https://github.com/ropensci/opencage/blob/main/R/zzz.R and

opencage/R/oc_config.R

Lines 122 to 126 in 9b57eef

# set rate limit
ratelimitr::UPDATE_RATE(
oc_get_limited,
ratelimitr::rate(n = rate_sec, period = 1L)
)

Though I am sure you've seen that.

R/oc_config.R Outdated
@@ -91,7 +91,6 @@
#' @export
oc_config <-
function(key = Sys.getenv("OPENCAGE_KEY"),
rate_sec = getOption("oc_rate_sec", default = 1L),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe needs actual deprecation 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather leave it in here (so all possible opencage session level options can be set with oc_config()) and just set options("oc_rate_sec" = rate_sec), like no_record and show_key.
Maybe check whether rate_sec is numeric beforehand.

R/oc_process.R Show resolved Hide resolved
),
endpoint = "json"
)
),
"HttpResponse"
"httr2_response"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests are actually failing with "HTTP 400 Bad Request". I'm not sure why as the same oc_forward() query works. I see the first test in this file, that doesn't fail, uses a key for always getting 200. I'm a bit lost.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All queries/tests in test-oc_get.R return HTTP 400 errors, also on main, because the query parameter placename ought to be q here 🙈. Well, the first one passes, because it uses the key_200 which always returns a HTTP 200 response, no matter what nonsense you throw at the API.
I don't know whether this has always been the case or whether there has been a related change in the OpenCage API. Regardless, it never surfaced, because all queries only test whether a HttpResponse object is returned by oc_get(), and the status wasn't checked until later. But the status check is now as part of oc_get() so, the error now surfaces here already.

I am not sure about the usefulness of the Namibia and vector countrycode tests here. I think that these should be tested with oc_forward() if they aren't already?

Here, I am more interested in whether memoisation and rate limiting works as expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in another comment, I wouldn't test memoisation nor rate limiting as we outsource those.

What we should test is as you noted, whether the configuration function can configure the rate. BUT since in req_throttle() I call getOption("oc_rate_sec", default = 1L), it only means retrieving the option before and after calling the configuration function.

Would you agree?

@maelle maelle requested a review from dpprdan February 3, 2023 08:30
@@ -5,8 +5,8 @@ test_that("oc_clear_cache clears cache", {
# until a memoise >v.1.1 is released, we need to run oc_get_memoise() twice to
# have it really cache results
# https://github.com/ropensci/opencage/pull/87#issuecomment-573573183
replicate(2, oc_get_memoise("https://httpbin.org/get"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure these edits were the right ones. However now one cannot "get" stuff from an URL that's not the OpenCage API, with the refactoring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure either whether this actually tests what we want to.
Could we build oc_get() in such a way that it accepts a httr2 request object and just performs the request (i.e. just wrapper around httr2::req_perform())? Then we could just build a simple query request here (I'd rather not hit the OpenCage API for this).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now wondering why we are testing memoization at all, given it's "outsourced" to {memoise}?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not convinced we need to test oc_clear_cache() as it only wraps memoise that we trust?

By the way here's another caching method: https://deploy-preview-537--ropensci.netlify.app/blog/2023/04/21/ropensci-news-digest-april-2023/#caching-the-results-of-functions-of-your-r-package

Copy link
Member

@dpprdan dpprdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, quite some comments. 🙈 I hope they are at least comprehensible.

R/oc_config.R Outdated
@@ -91,7 +91,6 @@
#' @export
oc_config <-
function(key = Sys.getenv("OPENCAGE_KEY"),
rate_sec = getOption("oc_rate_sec", default = 1L),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather leave it in here (so all possible opencage session level options can be set with oc_config()) and just set options("oc_rate_sec" = rate_sec), like no_record and show_key.
Maybe check whether rate_sec is numeric beforehand.

),
endpoint = "json"
)
),
"HttpResponse"
"httr2_response"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All queries/tests in test-oc_get.R return HTTP 400 errors, also on main, because the query parameter placename ought to be q here 🙈. Well, the first one passes, because it uses the key_200 which always returns a HTTP 200 response, no matter what nonsense you throw at the API.
I don't know whether this has always been the case or whether there has been a related change in the OpenCage API. Regardless, it never surfaced, because all queries only test whether a HttpResponse object is returned by oc_get(), and the status wasn't checked until later. But the status check is now as part of oc_get() so, the error now surfaces here already.

I am not sure about the usefulness of the Namibia and vector countrycode tests here. I think that these should be tested with oc_forward() if they aren't already?

Here, I am more interested in whether memoisation and rate limiting works as expected.

@@ -48,25 +48,6 @@ test_that("oc_config throws error with faulty OpenCage key", {
)
})

# test rate_sec argument --------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a little test whether oc_config() can update the oc_rate_limit option?

DESCRIPTION Outdated
dplyr (>= 0.7.4),
httr2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is v0.1.1 sufficient or do we use a feature from a more recent version?
https://httr2.r-lib.org/news/index.html

(Didn't realise that httr2 is so recent 😮)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a dependency on >= 0.2.0 because of r-lib/httr2#101

@@ -130,7 +130,7 @@ oc_process <-
}

# build url
oc_url <- oc_build_url(
oc_url_parts <- oc_build_url(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current naming confused me a little, I am afraid.

I think there are two to three steps until the request is made

1a. format the query parameters, i.e. turn booleans into integers, delete unused parameters, concatenate bounds coordinates, etc. This is now done in oc_build_url() except this does not yet build the url, so maybe oc_format_query_parameters() and the resulting list is query_parameters? I would move the endpoint as a separate input to the next step/function.
1b. build the httr2 request object (or rather just its url part). Basically what build_query_with_req() does now (maybe just name it build_request()?). The inputs are the query_parameters and endpoint, the output is query_req.
2. Make the request with oc_get(). Pretty much like it is now, but with request or similar as an argument name. (Note to self: This has to be a separate step, so we can return the url_only before).

🤔 À propos naming: Maybe I overdid it with the oc_ prefix for internal functions... 😉

Comment on lines +264 to +279
build_query_with_req <- function(oc_url_parts) {
initial_req <- httr2::request("https://api.opencagedata.com")

req_with_url <- if (!is.null(oc_url_parts[["path"]])) {
httr2::req_url_path_append(initial_req, oc_url_parts[["path"]])
} else {
initial_req
}

query_req <- if (!is.null(oc_url_parts[["query"]])) {
httr2::req_url_query(req_with_url, !!!oc_url_parts[["query"]])
} else {
req_with_url
}

query_req
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only use this within oc_process() and tie it to the OpenCage API domain, I think the following should be just as safe. I.e. I cannot think of a scenario where we would make queries to api.opencagedata.com without an endpoint or query_parameters.

Suggested change
build_query_with_req <- function(oc_url_parts) {
initial_req <- httr2::request("https://api.opencagedata.com")
req_with_url <- if (!is.null(oc_url_parts[["path"]])) {
httr2::req_url_path_append(initial_req, oc_url_parts[["path"]])
} else {
initial_req
}
query_req <- if (!is.null(oc_url_parts[["query"]])) {
httr2::req_url_query(req_with_url, !!!oc_url_parts[["query"]])
} else {
req_with_url
}
query_req
build_request <- function(endpoint, query_parameters) {
httr2::request("https://api.opencagedata.com") %>%
httr2::req_url_path_append(endpoint) %>%
httr2::req_url_query(req_with_url, !!!query_parameters)

But if we want to keep it generic (e.g. for the oc_get_memoise() tests), we could instead do

Suggested change
build_query_with_req <- function(oc_url_parts) {
initial_req <- httr2::request("https://api.opencagedata.com")
req_with_url <- if (!is.null(oc_url_parts[["path"]])) {
httr2::req_url_path_append(initial_req, oc_url_parts[["path"]])
} else {
initial_req
}
query_req <- if (!is.null(oc_url_parts[["query"]])) {
httr2::req_url_query(req_with_url, !!!oc_url_parts[["query"]])
} else {
req_with_url
}
query_req
build_request <- function(base_url = "https://api.opencagedata.com", endpoint, query_parameters) {
query_req <- httr2::request(base_url)
query_req <- if (!is.null(endpoint)) {
httr2::req_url_path_append(query_req , endpoint)
}
query_req <- if (!is.null(query_parameters)) {
httr2::req_url_query(query_req, !!!query_parameters)
}
query_req

jsonlite (>= 1.5),
lifecycle,
magrittr,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might as well take the dependency on magrittr v2.0 then?
I.e. does it make sense to more generally bump all versions roughly to the one before the most recent necessary dependency (i.e. httr2) came out? Or is that a mistake? I wouldn't know what more ancient versions really mean practically?

Suggested change
magrittr,
magrittr (>= 2.0.0),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a specific reason for depending on that magrittr version?

tests/testthat/test-oc_clear_cache.R Outdated Show resolved Hide resolved
@@ -5,8 +5,8 @@ test_that("oc_clear_cache clears cache", {
# until a memoise >v.1.1 is released, we need to run oc_get_memoise() twice to
# have it really cache results
# https://github.com/ropensci/opencage/pull/87#issuecomment-573573183
replicate(2, oc_get_memoise("https://httpbin.org/get"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure either whether this actually tests what we want to.
Could we build oc_get() in such a way that it accepts a httr2 request object and just performs the request (i.e. just wrapper around httr2::req_perform())? Then we could just build a simple query request here (I'd rather not hit the OpenCage API for this).

@@ -5,8 +5,8 @@ test_that("oc_clear_cache clears cache", {
# until a memoise >v.1.1 is released, we need to run oc_get_memoise() twice to
# have it really cache results
# https://github.com/ropensci/opencage/pull/87#issuecomment-573573183
replicate(2, oc_get_memoise("https://httpbin.org/get"))
expect_true(memoise::has_cache(oc_get_memoise)("https://httpbin.org/get"))
replicate(2, oc_get_memoise())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
replicate(2, oc_get_memoise())
oc_get_memoise()

@maelle
Copy link
Member Author

maelle commented Apr 4, 2023

Oh: https://httr2.r-lib.org/reference/req_cache.html (not sure how to clear that cache)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

switch from crul to httr2
2 participants