Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adjust_for_inflation() takes a long time for large vector inputs #39

Open
stevecondylios opened this issue Sep 9, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@stevecondylios
Copy link
Owner

This example (with 10 rows) runs very quickly:

set.seed(123)
nominal_prices <- rnorm(10, mean=10, sd=3)
years <- round(rnorm(10, mean=2006, sd=5))
df <- data.frame(years, nominal_prices)

df$in_2008_dollars <- adjust_for_inflation(nominal_prices, years, "US", to_date = 2008)

However, when the same is attempted with 10000 rows, it takes a very long time:

set.seed(123)
nominal_prices <- rnorm(10000, mean=10, sd=3)
years <- round(rnorm(10000, mean=2006, sd=5))
df <- data.frame(years, nominal_prices)

df$in_2008_dollars <- adjust_for_inflation(nominal_prices, years, "US", to_date = 2008)

And it is not clear why. At the least, the user should receive a message giving some expectation of runtime. But ideally, if possible, it should be refactored to be more performant.

@stevecondylios
Copy link
Owner Author

stevecondylios commented Sep 9, 2022

adjust_for_inflation() is slow because it has to do a fair bit of work (look up rows in the inflation_dataframe and multiply them, for each set of vector inputs.

Here are some approximate times:

start_time <- Sys.time()
number_of_rows <- 10000

nominal_prices <- rnorm(number_of_rows, mean=10, sd=3)
years <- round(rnorm(number_of_rows, mean=2006, sd=5))
df <- data.frame(years, nominal_prices)

df$in_2008_dollars <- adjust_for_inflation(nominal_prices, years, "US", to_date = 2008)

end_time <- Sys.time()
end_time - start_time

# 100 6.2 seconds
# 200 10.46 seconds
# 1000 36 seconds
# 2000 1.1 minutes
# 10000 6 minutes

adjust_for_inflation() can be made to go twice as fast if extrapolation isn't required. E.g.

country <- "US"
inflation_dataframe <- retrieve_inflation_data(country)

inflation_dataframe

fast_inflate <- function(price, from, to) {
  
    make_multiplier <- function(from_input, to_input) {

    inflation_dataframe %>%
      filter(date > from_input & date <= to_input | date < from_input & date >= to_input ) %>%
      .$value %>% {. / 100} %>% {. + 1} %>% { ifelse(from_input < to_input, prod(.), { 1 / prod(.) }) }
  }


  multipliers <- mapply(make_multiplier, from_input = from, to_input = to)

  real_price <- price * multipliers

  real_price
}

# Gives same results but in ~3.25 seconds - about half the time
library(tictoc)
tic()
fast_inflate(df$nominal_prices, df$years, 2008)
toc()

@stevecondylios stevecondylios added the enhancement New feature or request label Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant