Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate integration with vetiver #163

Open
MarkEdmondson1234 opened this issue Jan 7, 2022 · 9 comments
Open

Investigate integration with vetiver #163

MarkEdmondson1234 opened this issue Jan 7, 2022 · 9 comments

Comments

@MarkEdmondson1234
Copy link
Owner

https://vetiver.tidymodels.org/

@juliasilge
Copy link

To get appropriate versioning support, I imagine this will require rstudio/pins-r#572 to be implemented.

The deployment piece alone on its own doesn't necessarily require the model object to be stored as a pin.

MarkEdmondson1234 added a commit that referenced this issue Jan 8, 2022
@MarkEdmondson1234
Copy link
Owner Author

MarkEdmondson1234 commented Jan 8, 2022

A setup script is here:

library(parsnip)
library(workflows)
data(Sacramento, package = "modeldata")

rf_spec <- rand_forest(mode = "regression")
rf_form <- price ~ type + sqft + beds + baths

rf_fit <-
  workflow(rf_form, rf_spec) %>%
  fit(Sacramento)

library(vetiver)
v <- vetiver_model(rf_fit, "sacramento_rf")

root <- file.path("inst","vetiver")

library(pins)
model_board <- board_folder(file.path(root,"plumber/pins"))
model_board %>% vetiver_pin_write(v)

library(googleCloudRunner)

# the docker takes a long time to install arrow so build it first to cache
repo <- cr_buildtrigger_repo("MarkEdmondson1234/googleCloudRunner",
                             branch = "vetiver")

#cr_buildtrigger_delete("docker-vetiver")
cr_deploy_docker_trigger(repo, "vetiver",
                         location = "inst/vetiver/docker/",
                         includedFiles = "inst/vetiver/**",
                         projectId_target = "gcer-public",
                         timeout = 3600)

cr_deploy_plumber(file.path(root,"plumber"))

I changed the plumber deploiyment server.R to

pr <- plumber::plumb("api.R")
pr <- vetiver::vetiver_pr_predict()
pr$run(host = "0.0.0.0", port = as.numeric(Sys.getenv("PORT")), swagger = TRUE)

The main bottleneck at the moment is getting a Docker image with pins installed since the arrow depedency is 40mins+ and counting to install, will look for a quicker method.

@MarkEdmondson1234
Copy link
Owner Author

The arrow dependency timedout after 60mins, need a bigger build or ideally a pre-existing Docker

@juliasilge
Copy link

There has been some discussion of making the arrow dependency optional. You might want to check out rstudio/pins-r#537 and see if anything in there helps.

FWIW arrow isn't really needed for the model publishing use case.

@MarkEdmondson1234
Copy link
Owner Author

MarkEdmondson1234 commented Jan 8, 2022

Makes sense, yes it seemed a lot of installation for features not used. I've left a comment to see if there is a way though since it would be nice to have an arrow image available.

@MarkEdmondson1234
Copy link
Owner Author

MarkEdmondson1234 commented Jan 8, 2022

The docker built in about 20mins now so available at gcr.io/gcer-public/vetiver

I haven't seen modifying the actual plumber router before so made a new script file to load that in, this would be fairly boilerplate though I think:

#server.r
pr <- plumber::plumb("api.R")
v <- vetiver::vetiver_pin_read(pins::board_folder("pins"), name = "sacramento_rf")
pr <- vetiver::vetiver_pr_predict(pr, v, debug = TRUE)
pr$run(host = "0.0.0.0", port = as.numeric(Sys.getenv("PORT")), swagger = TRUE)

Its built on top of the example plumber script I have so endpoints at /plot and /hello too - I think it would be nice to make a PubSub target for it.

How would vetiver work within an api.R script?

This successfully deployed with this simple Docker - I guess in real life some more dependencies or renv: lockfiles could be involved.

FROM gcr.io/gcer-public/vetiver
COPY ["./", "./"]
ENTRYPOINT ["Rscript", "server.R"]

Example endpoint live at https://vetiver-ewjogewawq-ew.a.run.app/predict. This is on Cloud Run serverless, can take 80 connections per instance, scales up to millions.

Runs the example from the vetiver docs:

data(Sacramento, package = "modeldata")
new_sac <- Sacramento %>% 
   slice_sample(n = 20) %>% 
   select(type, sqft, beds, baths)

endpoint <- vetiver::vetiver_endpoint("https://vetiver-ewjogewawq-ew.a.run.app/predict")
predict(endpoint, new_sac)
# A tibble: 20 x 1
     .pred
     <dbl>
 1 236325.
 2 427492.
 3 417112.
 4 258001.
 5 339775.
...

In real life you could also add a build trigger for any changes to the R script the model is doing, to update the deployment as needed. With the pins integration calling outside services such as GCS, this would be needed less often.

The full setup script below:

library(parsnip)
library(workflows)
data(Sacramento, package = "modeldata")

rf_spec <- rand_forest(mode = "regression")
rf_form <- price ~ type + sqft + beds + baths

rf_fit <-
  workflow(rf_form, rf_spec) %>%
  fit(Sacramento)

library(vetiver)
v <- vetiver_model(rf_fit, "sacramento_rf")

root <- file.path("inst","vetiver")

library(pins)
model_board <- board_folder(file.path(root,"plumber/pins"))
model_board %>% vetiver_pin_write(v)

library(googleCloudRunner)

# the docker takes a long time to install arrow so build it first to cache
repo <- cr_buildtrigger_repo("MarkEdmondson1234/googleCloudRunner",
                             branch = "vetiver")

#cr_buildtrigger_delete("docker-vetiver")
cr_deploy_docker_trigger(repo, "vetiver",
                         location = "inst/vetiver/docker/",
                         includedFiles = "inst/vetiver/**",
                         projectId_target = "gcer-public",
                         timeout = 3600)

# use the vetiver docker image built above to deploy a Cloud Run instance of the model
# deploys folder with api.R, Dockerfile, pins/ and server.R contained
run <- cr_deploy_plumber(file.path(root,"plumber"), remote = "vetiver")

# on succesful deployment
endpoint <- vetiver::vetiver_endpoint(paste0(run$status$url, "/predict"))
library(tidyverse)
data(Sacramento, package = "modeldata")
new_sac <- Sacramento %>%
  slice_sample(n = 20) %>%
  select(type, sqft, beds, baths)

predict(endpoint, new_sac)
# A tibble: 20 x 1
     .pred
     <dbl>
 1 236325.
 2 427492.
 3 417112.
 4 258001.
 5 339775.
...

MarkEdmondson1234 added a commit that referenced this issue Jan 8, 2022
@MarkEdmondson1234
Copy link
Owner Author

Folder structure of working deployment here https://github.com/MarkEdmondson1234/googleCloudRunner/tree/vetiver/inst/vetiver

@juliasilge
Copy link

I've been working lately on generating Docker containers more, if you'd like to take a look and give any feedback. This demo might be helpful for how I am setting things up.

@MarkEdmondson1234
Copy link
Owner Author

Thanks very much will take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants