Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R-devel change; R-API inside parallel regions #3165

Closed
11 tasks done
mattdowle opened this issue Nov 30, 2018 · 4 comments
Closed
11 tasks done

R-devel change; R-API inside parallel regions #3165

mattdowle opened this issue Nov 30, 2018 · 4 comments
Milestone

Comments

@mattdowle
Copy link
Member

mattdowle commented Nov 30, 2018

Luke Tierney contacted me about a memory issue in R-devel. Tests are passing both on CRAN and Travis/Appveyor, but memory usage is higher in some cases. As before the garbage collector turns off.
Not sure when exactly the R-devel change was made, but some time in the last month or so. It might be that the R-devel change causes the new memory issue, or that the R-devel change reveals the problem that exists already (possibly even with data.table-release on R-release). In any case, it needs to be fixed.

Luke wrote :

I distilled the issue in 'constellation' down to the attached file
from the examples. If I run this the memory usage is much bigger than
previously and the gc() output is garbled. It's a
multi-threading issue again; everything looks fine with
OMP_NUM_THREADS=1. With mutlipe threads you are
calling DATAPTR from threads other than the main one and that creates
a race on setting the R_GCEnabled flag, so eventually it is getting
stuck on off. I instrumented the places where the GC is disabled and
tracked this as the first one from a thread other than the main one:

#4  0x00007ffff78ba9d5 in DATAPTR (x=x@entry=0x1167458)
     at ../../../R/src/include/Rinlinedfuns.h:106
#5  0x00007fffea02c4c8 in subsetVectorRaw (target=0x4529590, source=0x1167458,
     idx=0x4175210, any0orNA=FALSE) at subset.c:44
#6  0x00007fffea02c7a6 in subsetDT (x=<optimized out>, rows=<optimized out>,
     cols=<optimized out>) at subset.c:272

Given Luke's detail, it's easy to see the problem in the code without needing to reproduce. All R API usage needs taking outside all parallel regions as Luke requested before. I delayed doing that last time due to time pressure, and now it needs tackling. Last time I just did the first step which was to ensure that DATAPTR inside parallel regions did not receive ALTREP.

This warrants accelerating release of 1.12.0, not least because it impacts Luke.

$ grep "omp.*parallel" *.c

  • subset.c
  • reorder.c
  • fwrite.c
  • fread.c
  • fsort.c
  • forder.c
  • between.c

$ grep ALTREP *.c

  • between.c
  • fsort.c
  • reorder.c
  • wrappers.c
@mattdowle mattdowle added this to the 1.12.0 milestone Nov 30, 2018
@mattdowle mattdowle added the bug label Nov 30, 2018
@MichaelChirico
Copy link
Member

MichaelChirico commented Nov 30, 2018

awesome of Luke to do all the hard work of identifying the cause! 💯

@mattdowle mattdowle changed the title R-devel change reveals DATAPTR inside parallel regions R-devel change; R-API inside parallel regions Nov 30, 2018
@mattdowle
Copy link
Member Author

mattdowle commented Dec 7, 2018

I'm in process of confirming with Luke that it's fixed before closing ...

@mattdowle
Copy link
Member Author

mattdowle commented Dec 10, 2018

Confirmed fixed. Thanks to @ltierney for his help.


For completeness in case we need to refer back...
I could not reproduce the problem with R-devel compiled as follows. (I needed to confirm I could reproduce the problem with 1.11.8 before I could confirm 1.11.9 fixes it.)

./configure --without-recommended-packages --disable-byte-compiled-packages --enable-strict-barrier CC="gcc -fsanitize=address -fno-sanitize=float-divide-by-zero -fno-omit-frame-pointer" CFLAGS="-O0 -g -Wall -pedantic"

even with the script that Luke provided :

library(data.table)
library(constellation)
temp <- as.data.table(vitals[VARIABLE == "TEMPERATURE"])
pulse <- as.data.table(vitals[VARIABLE == "PULSE"])
resp <- as.data.table(vitals[VARIABLE == "RESPIRATORY_RATE"])
temp[, RECORDED_TIME := as.POSIXct(RECORDED_TIME,
  format = "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")]
pulse[, RECORDED_TIME := as.POSIXct(RECORDED_TIME,
  format = "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")]
resp[, RECORDED_TIME := as.POSIXct(RECORDED_TIME,
  format = "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")]
gc()
for (i in 1:10) {
  cat("i=",i,"\n")
  b1 <- bundle(temp, pulse, resp,
    bundle_names = c("PLATELETS", "INR"), window_hours_pre = 24,
    window_hours_post = c(6, 6), join_key = "PAT_ID",
    time_var = "RECORDED_TIME", event_name = "CREATININE", mult = "all")
}
gc()

As a long shot, I recompiled R-devel more simply as follows, still with --enable-strict-barrier which is the important thing :
./configure --without-recommended-packages --enable-strict-barrier
reinstalled constellation and all its dependencies and data.table 1.11.8, and this time Luke's script correctly fails :

Fatal error: Wrong thread calling 'RunFinalizers'

Installing data.table-dev (1.11.9) into this R-devel and rerunning, works fine. Reinstalling data.table 1.11.8 again fails. So it's indeed fixed by 1.11.9. In my original R-devel compile, perhaps something there was making 1.11.8 work (disabling byte compiler, -00, ASAN, etc). Anyway, a simpler R-devel compile was needed and was worth trying.

@jangorecki If time allows, it would be great to add --enable-strict-barrier to the R-devel used in CI pipelines. I don't know if you compile R-devel or not for CI pipelines.

@jangorecki
Copy link
Member

@mattdowle yes I compile R-devel daily, will add this flag there. Added to not forget in #3147

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants