Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.3 modify on copy (rules of copy seem to have changed) #1753

Open
alcor2019 opened this issue Jan 11, 2023 · 6 comments
Open

2.3 modify on copy (rules of copy seem to have changed) #1753

alcor2019 opened this issue Jan 11, 2023 · 6 comments

Comments

@alcor2019
Copy link

Hi,
it seems that the rules for copy-on-modify had changed.
Every time y is modified, y is copied to a new address like you see just below (cf. section 2.31 tracemem() of the book)

sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.utf8 LC_CTYPE=French_France.utf8
[3] LC_MONETARY=French_France.utf8 LC_NUMERIC=C
[5] LC_TIME=French_France.utf8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_4.2.2 tools_4.2.2

x <- c(1, 2, 3)
cat(tracemem(x), "\n")
<000001EE5EB32C58>
y <- x
> cat(tracemem(y), "\n")
<000001EE5EB32C58>
y[[3]] <- 4L
tracemem[0x000001ee5eb32c58 -> 0x000001ee6078a568]:
y[[3]] <- 5L
tracemem[0x000001ee6078a568 -> 0x000001ee6078e178]:
y[[3]] <- 4L
tracemem[0x000001ee6078e178 -> 0x000001ee6077fe38]:

@jxu
Copy link

jxu commented May 5, 2023

I get the same results just modifying x. This doesn't match up with section 2.5 modify in-place which says v should bind to the same object.

> sessionInfo()
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lobstr_1.1.2

loaded via a namespace (and not attached):
 [1] compiler_4.2.3  cli_3.6.1       tools_4.2.3     pillar_1.9.0    glue_1.6.2      rstudioapi_0.14
 [7] crayon_1.5.2    utf8_1.2.3      fansi_1.0.4     vctrs_0.6.1     lifecycle_1.0.3 rlang_1.1.0 
> v <- c(1, 2, 3)
> obj_addr(v)
[1] "0x1889ad7ffb8"
> v[[3]] <- 4
> obj_addr(v)
[1] "0x1889ad6dcf8"

@HyacinthMeng
Copy link

image

This part content may be not properl : https://advanced-r-solutions.rbind.io/names-and-values.html#modify-in-place

image

@jean-baka
Copy link

Hi everyone, section 2.3 actually gives much information, but perhaps it doesn't put enough emphasis on the fact that the copy-on-replace behaviour is tightly linked to the number of references to the object.

To troubleshoot the above examples: first and foremost, please run "bare R", not through RStudio, which actually adds references to objects for the purpose of the GUI. See http://adv-r.had.co.nz/memory.html#modification from the 1st edition to read more about that. For me (R version 4.3.1 on my x86_64-pc-linux-gnu (64-bit) running Debian GNU/Linux), the following example (same as @jxu and @HyacinthMeng above) works well, without copy:

v <- c(1,2,3)
refs(v)
[1] 1
address(v)
[1] "0x55fbab440118"
v[[3]] <- 4
address(v)
[1] "0x55fbab440118"

Another interesting thing I discovered is that apparently, when we run something like v <- 1:3, we create some sort of a promise, and not the "actual", "final" object, contrary to what happens when we create the vector as v <- c(1L, 2L, 3L):

v <- 1:3
refs(v)
[1] 65535
v <- c(1L, 2L, 3L)
refs(v)
[1] 1

This entails the interesting behaviour that when using the shorthand "1:3", the very first replacement seems to make a copy, while the subsequent ones (even when extending the object, provided memory management allows for enough room at that particular place) do not create a copy:

v <- 1:3
c(address(v), refs(v))
[1] "0x55fbabb3dfe0" "65535"
v[2] <- 5L
c(address(v), refs(v))
[1] "0x55fbab43e2e8" "1"
v[1] <- 5L
c(address(v), refs(v))
[1] "0x55fbab43e2e8" "1"

But sometimes, when you extend the vector, the memory management will go find enough room somewhere else:

v[5] <- 10L
c(address(v), refs(v))
[1] "0x55fbab440c38" "1"

Perhaps @hadley could add something on this stuff in the book, and close this issue? I also read that the way R deals with references to objects in undergoing some work, so perhaps what we say here may be outdated soon...?

@hadley
Copy link
Owner

hadley commented Aug 7, 2023

You mean like this? 😄

When exploring copy-on-modify behaviour interactively, be aware that you’ll get different results inside of RStudio. That’s because the environment pane must make a reference to each object in order to display information about it. This distorts your interactive exploration but doesn’t affect code inside of functions, and so doesn’t affect performance during data analysis. For experimentation, I recommend either running R directly from the terminal, or using RMarkdown (like this book).

@jean-baka
Copy link

Yes, thank you Hadley, I had read that, but perhaps the OP didn't... ;)

Still, IMHO there is something to explain about that number of references seemingly equal to 2^16 - 1 for a "promise"(?) like v <- 1:3, versus a neat refs(v) == 1 after v <- c(1L, 2L, 3L)...

@hadley
Copy link
Owner

hadley commented Aug 7, 2023

@jean-baka there's a good reason that the second edition doesn't use refs(), which seems clearly buggy here. If you search for "ALTREP" on https://adv-r.hadley.nz/names-values.html#copy-on-modify, you can see why : is a bit different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants