Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly Different Results Based on R Version #9

Open
mkw87 opened this issue Feb 7, 2023 · 12 comments
Open

Significantly Different Results Based on R Version #9

mkw87 opened this issue Feb 7, 2023 · 12 comments
Labels
mystery 👻 a spooky mystery

Comments

@mkw87
Copy link

mkw87 commented Feb 7, 2023

First off I just want to say thank you for putting this out there, it's fantastic. Built a script to run in R 4.0.2 and it works great. I then put it into PowerBI and noticed that my answers were significantly different, like 30% lower and the matches weren't very accurate visually. I plopped it out of PowerBI and into R Studio and all of a sudden it was matching again without changing any code, which is when I realized PowerBI was using R 4.2.2 instead of 4.0.2. Now that I noticed this I can avoid issues going forward but do you have any idea why the version change would affect the output so significantly?

@derrickturk derrickturk added the mystery 👻 a spooky mystery label Feb 7, 2023
@derrickturk
Copy link
Owner

I'm not sure what might be causing that. The first place I'd check is probably to make 100% sure that the version of aRpsDCA is the same in both cases, and that the exact same data is being provided to the relevant functions.

It's possible that the functionality of nlminb was changed between R versions in a way that would alter fit results. I have been out of the R loop for a few years now, but I don't see any obvious smoking gun from a quick search.

@mkw87
Copy link
Author

mkw87 commented Feb 7, 2023

I'll look into the versions and see but the data is the exact same. When you pop it out of Power BI it creates the datasets for your in a temp file. At first I was dumbfounded but realized it had to be a version thing because everything else was identical. Thank you for the quick response!

@mkw87
Copy link
Author

mkw87 commented Feb 7, 2023

By the way if I changed R studio to run on 4.2.2 it matched the PowerBI results, so it's something in aRpsDCA or nlminb.

@derrickturk
Copy link
Owner

That is just super weird... nlminb isn't a randomized algorithm as far as I know, and I don't see any changes to that code in the R stats version control history in the last ~8 years. To track this down would probably require a minimized example that reproduces the error, ideally without any proprietary data.

@mkw87
Copy link
Author

mkw87 commented Feb 7, 2023

Version Data.xlsx
Here's anonymous public data as one example of where I observed the issue along with the output best fit gas rate and gas volume from each one. As I mentioned 4.0.2 is much more accurate.

@derrickturk
Copy link
Owner

derrickturk commented Feb 7, 2023

I'm assuming the different results are in a best-overall-fit based on raw rate/time data (i.e. the best.fit function)? Can you run the following script (call it "check.R", but GitHub doesn't allow .R attachments, because it's not like this is a PROGRAMMING SITE OR ANYTHING) and post the entire output, under both versions? This can be done e.g. by source('check.R', echo=TRUE).

sessionInfo()
data <- read.delim(sep='\t', text='t\tq\n0\t15609\n0.235616438\t15609\n0.487671233\t17002\n0.734246575\t10167\n0.969863014\t7344\n1.221917808\t5256\n1.473972603\t4582\n1.583561644\t3959\n1.739726027\t4482\n1.915068493\t3705\n2.126027397\t3452\n2.375342466\t3637\n2.62739726\t3239\n2.824657534\t3517\n3.032876712\t2496\n3.282191781\t1729\n3.534246575\t1655\n\n')
best <- best.fit(q=data$q, t=data$t)
print(best)
best.h2e <- best.hyp2exp(q=data$q, t=data$t)
print(best.h2e)

@mkw87
Copy link
Author

mkw87 commented Feb 7, 2023

Well this is interesting:

4.0.2:

`> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] magicaxis_2.0.10 rlang_0.4.6 sqldf_0.4-11 RSQLite_2.2.0 gsubfn_0.7
[6] proto_1.0.0 data.table_1.12.8 forcats_0.5.0 stringr_1.4.0 dplyr_1.0.0
[11] purrr_0.3.4 readr_1.3.1 tidyr_1.1.0 tibble_3.0.2 ggplot2_3.3.2
[16] tidyverse_1.3.0 readxl_1.3.1 aRpsDCA_1.1.1 RevoUtils_11.0.2 RevoUtilsMath_11.0.0

loaded via a namespace (and not attached):
[1] httr_1.4.1 maps_3.3.0 bit64_0.9-7 jsonlite_1.7.0 viridisLite_0.3.0 modelr_0.1.8
[7] sm_2.2-5.6 assertthat_0.2.1 blob_1.2.1 cellranger_1.1.0 pillar_1.4.6 backports_1.1.7
[13] glue_1.4.1 chron_2.3-55 digest_0.6.25 RColorBrewer_1.1-2 rvest_0.3.5 colorspace_1.4-1
[19] htmltools_0.5.0 pkgconfig_2.0.3 broom_0.7.0 haven_2.3.1 scales_1.1.1 RANN_2.6.1
[25] pracma_2.2.9 generics_0.0.2 ellipsis_0.3.1 withr_2.2.0 lazyeval_0.2.2 cli_2.0.2
[31] magrittr_1.5 crayon_1.3.4 memoise_1.1.0 fs_1.4.2 fansi_0.4.1 MASS_7.3-51.6
[37] xml2_1.3.2 tools_4.0.2 hms_0.5.3 lifecycle_0.2.0 plotly_4.9.2.1 munsell_0.5.0
[43] reprex_0.3.0 plotrix_3.7-8 NISTunits_1.0.1 compiler_4.0.2 grid_4.0.2 rstudioapi_0.11
[49] htmlwidgets_1.5.1 celestial_1.4.6 tcltk_4.0.2 gtable_0.3.0 DBI_1.1.0 R6_2.3.0
[55] lubridate_1.7.9 bit_1.1-15.2 stringi_1.4.6 Rcpp_1.0.5 vctrs_0.3.1 mapproj_1.2.7
[61] dbplyr_1.4.4 tidyselect_1.1.0

data <- read.delim(sep='\t', text='t\tq\n0\t15609\n0.235616438\t15609\n0.487671233\t17002\n0.734246575\t10167\n0.969863014\t7344\n1.221917808\t5256\n1.473972603\t4582\n1.583561644\t3959\n1.739726027\t4482\n1.915068493\t3705\n2.126027397\t3452\n2.375342466\t3637\n2.62739726\t3239\n2.824657534\t3517\n3.032876712\t2496\n3.282191781\t1729\n3.534246575\t1655\n\n')
best <- best.fit(q=data$q, t=data$t)
print(best)
$decline
[1] "Arps hyperbolic decline: <qi = 17762.93, Di = 0.7742964, b = 0.04062814>"

$sse
[1] 39702258

best.h2e <- best.hyp2exp(q=data$q, t=data$t)
print(best.h2e)
$decline
[1] "Arps hyperbolic-to-exponential decline: <qi = 17762.93, Di = 0.7742965, b = 0.04062833, Df = 0.1>"

$sse
[1] 39702258`

4.2.2:

source('check.R', echo=TRUE)

sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C LC_TIME=English_United States.utf8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] magicaxis_2.2.14 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.0 purrr_1.0.1 readr_2.1.3 tidyr_1.3.0
[8] tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2 rlang_1.0.6 sqldf_0.4-11 RSQLite_2.2.20 gsubfn_0.7
[15] proto_1.0.0 readxl_1.4.1 aRpsDCA_1.1.1

loaded via a namespace (and not attached):
[1] httr_1.4.4 maps_3.4.1 bit64_4.0.5 jsonlite_1.8.4 modelr_0.1.10 sm_2.2-5.7.1
[7] assertthat_0.2.1 blob_1.2.3 googlesheets4_1.0.1 cellranger_1.1.0 pillar_1.8.1 backports_1.4.1
[13] glue_1.6.2 chron_2.3-58 rvest_1.0.3 colorspace_2.1-0 pkgconfig_2.0.3 broom_1.0.3
[19] haven_2.5.1 scales_1.2.1 RANN_2.6.1 tzdb_0.3.0 pracma_2.4.2 timechange_0.2.0
[25] googledrive_2.0.0 generics_0.1.3 ellipsis_0.3.2 cachem_1.0.6 withr_2.5.0 cli_3.6.0
[31] magrittr_2.0.3 crayon_1.5.2 memoise_2.0.1 fs_1.6.0 fansi_1.0.4 MASS_7.3-58.1
[37] xml2_1.3.3 tools_4.2.2 hms_1.1.2 gargle_1.3.0 lifecycle_1.0.3 munsell_0.5.0
[43] reprex_2.0.2 plotrix_3.8-2 NISTunits_1.0.1 compiler_4.2.2 grid_4.2.2 rstudioapi_0.14
[49] celestial_1.4.6 tcltk_4.2.2 gtable_0.3.1 DBI_1.1.3 R6_2.5.1 lubridate_1.9.1
[55] fastmap_1.1.0 bit_4.0.5 utf8_1.2.2 stringi_1.7.12 Rcpp_1.0.10 vctrs_0.5.2
[61] mapproj_1.2.11 dbplyr_2.3.0 tidyselect_1.2.0

data <- read.delim(sep='\t', text='t\tq\n0\t15609\n0.235616438\t15609\n0.487671233\t17002\n0.734246575\t10167\n0.969863014\t7344\n1.221917808\t5256\ .... [TRUNCATED]

best <- best.fit(q=data$q, t=data$t)

print(best)
$decline
[1] "Arps hyperbolic-to-exponential decline: <qi = 17762.93, Di = 0.7742965, b = 0.04062814, Df = 0.1>"

$sse
[1] 39702258

best.h2e <- best.hyp2exp(q=data$q, t=data$t)

print(best.h2e)
$decline
[1] "Arps hyperbolic-to-exponential decline: <qi = 17762.93, Di = 0.7742965, b = 0.04062814, Df = 0.1>"

$sse
[1] 39702258

@mkw87
Copy link
Author

mkw87 commented Feb 7, 2023

Yes the difference is occurring using best.fit.from.Np.with.buildup but I'm plotting the outputs from all three methods to see the difference and it's not like there's a better option that it is ignoring.

@derrickturk
Copy link
Owner

derrickturk commented Feb 10, 2023

The "with buildup" part is suggestive. That part of the code has been something of a hotspot for errors. Would you mind running the example script above again, but edited as follows?

library(aRpsDCA)
sessionInfo()
data <- read.delim(sep='\t', text='t\tq\tnp\n0\t15609\t0\n0.235616438\t15609\t1.34\n0.487671233\t17002\t2.91\n0.734246575\t10167\t3.82\n0.969863014\t7344\t4.45\n1.221917808\t5256\t4.94\n1.473972603\t4582\t5.36\n1.583561644\t3959\t5.52\n1.739726027\t4482\t5.77\n1.915068493\t3705\t6.01\n2.126027397\t3452\t6.28\n2.375342466\t3637\t6.61\n2.62739726\t3239\t6.9\n2.824657534\t3517\t7.16\n3.032876712\t2496\t7.35\n3.282191781\t1729\t7.5\n3.534246575\t1655\t7.66\n\n')
best.q <- best.fit.with.buildup(q=data$q, t=data$t)
print(best.q)
best.Np <- best.fit.from.Np.with.buildup(Np=data$np, t=data$t)
print(best.Np)

@derrickturk
Copy link
Owner

For what it's worth, on my machine, I get:
4.0.2:

> library(aRpsDCA)

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] aRpsDCA_1.1.1

loaded via a namespace (and not attached):
[1] compiler_4.0.2

> data <- read.delim(sep='\t', text='t\tq\tnp\n0\t15609\t0\n0.235616438\t15609\t1.34\n0.487671233\t17002\t2.91\n0.734246575\t10167\t3.82\n0.969863014\ .... [TRUNCATED] 

> best.q <- best.fit.with.buildup(q=data$q, t=data$t)

> print(best.q)
$decline
[1] "Arps hyperbolic decline: <qi = 85010, Di = 7.013908, b = 0.7773333> with buildup: <initial rate = 15609, time to peak = 0.4876712, peak rate = 16023.42>"

$sse
[1] 6037814


> best.Np <- best.fit.from.Np.with.buildup(Np=data$np, t=data$t)

> print(best.Np)
$decline
[1] "Arps hyperbolic decline: <qi = 27.63629, Di = 10, b = 0.9849721> with buildup: <initial rate = 5.687209, time to peak = 0.3616438, peak rate = 5.919143>"

$sse
[1] 0.08332251

4.2.2:

> library(aRpsDCA)

> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] aRpsDCA_1.1.1

loaded via a namespace (and not attached):
[1] compiler_4.2.2

> data <- read.delim(sep='\t', text='t\tq\tnp\n0\t15609\t0\n0.235616438\t15609\t1.34\n0.487671233\t17002\t2.91\n0.734246575\t10167\t3.82\n0.969863014\ .... [TRUNCATED] 

> best.q <- best.fit.with.buildup(q=data$q, t=data$t)

> print(best.q)
$decline
[1] "Arps hyperbolic-to-exponential decline: <qi = 85010, Di = 7.013882, b = 0.7773306, Df = 0.1> with buildup: <initial rate = 15609, time to peak = 0.4876712, peak rate = 16023.43>"

$sse
[1] 6037814


> best.Np <- best.fit.from.Np.with.buildup(Np=data$np, t=data$t)

> print(best.Np)
$decline
[1] "Arps hyperbolic-to-exponential decline: <qi = 27.63629, Di = 10, b = 0.9849721, Df = 0.1> with buildup: <initial rate = 5.687209, time to peak = 0.3616438, peak rate = 5.919143>"

$sse
[1] 0.08332251

I also get the same rate forecasts out of both environments for the resulting decline.

@mkw87
Copy link
Author

mkw87 commented Feb 10, 2023

I got the same results you posted.

@derrickturk
Copy link
Owner

That would seem to imply that the issue lies elsewhere. I think the next step would have to be to verify that the exact same data is being sent to the exact same functions in each environment, and arriving at a script that reliably reproduces the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mystery 👻 a spooky mystery
Projects
None yet
Development

No branches or pull requests

2 participants