Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why assigning to names works #9

Open
russellpierce opened this issue Oct 15, 2013 · 7 comments
Open

Why assigning to names works #9

russellpierce opened this issue Oct 15, 2013 · 7 comments

Comments

@russellpierce
Copy link

You write that: "You can see a list of columns with names(frame). You rename columns by, spookily, assigning into names(frame). Do you know how and why this works? Please educate me."

It works because 'names' is an attribute of the data.frame that is being accessed by, e.g. names(iris). This yields the same value as attr(iris,"names")... and you can use either to retrieve or assign names to the columns in a data.frame.

@Protonk
Copy link

Protonk commented Oct 16, 2013

It works because 'names' is an attribute of the data.frame that is being accessed by, e.g. names(iris).

Yes, but I think that's part of the complaint, the 'aRrgh', if you will. "names" is an attribute of the object, but names(iris) is a function that returns that attribute and between that and attr(iris,"names") the only way you can get at the attribute is through accessor functions.

If we do recognize that 'names' is an attribute why do we set the value w/ names(iris) <- 'whatever'--you're assigning to the output of what should basically just be a getter. This pattern is pretty common in R, but that doesn't make it sensible.

@russellpierce
Copy link
Author

Don't think of names() as a getter function to access the attribute "names" in an R object. Think of it as a convenience wrapper around attr(x,"names"). In fact, don't think of getter and putter functions... I think this is one of the sources of your aRrgh. You are thinking of R like a programmer. Not just any programmer, but a programmer from other languages that use other schemas. For example, object orientedness. R's object orientation S3, S4, proto etc are after thoughts. So, R and R code is much more like a functional programming language than it is like an object oriented one. From a non-programming mindset, or the mindset of a functional programmer R makes a reasonable amount of sense (I come from a rudimentary functional programming background and have seen non-programmers pick up and excel at R).

Consider that you really have this thing over here and you want it to be over there. You shouldn't need to call that spot on your bookshelf a different name if you are picking up a book or putting the book back in it - should you? Of course you do in object orienting programming, because you want to verb that object. Of course you don't in functional programming, because it is just one piece of memory moving over to another piece. So, when using base R... ask yourself. How would a functional programmer make something like R work?

(This almost certainly isn't how it works under the hood - it is just a thought experiment). Imagine, for example, that names is just an array that is index matched to an array of every other object in R's workspace. Many of the values in that array are null because they have no names attribute. However, you could easily poke values in and out of that array. If you poke a NULL in you erase the attribute. You poke in any other value and you set the attribute. If you are lucky, R will sanity check it for you, but maybe it won't, e.g. names(iris) <- 1:3. This might seem horrible, but you can add the attribute names to /any/ object in the workspace simply by (implicitly) erasing the null and providing an alternate value for that R object.

Consider:

x <- "bob"
names(x) <- "hi"
x
names(x) <- NULL
x

You didn't have to redefine the object or make a getter or putter function. You just did it. This flexibility is very nice when you are working with R in interactive mode trying to manipulate and play with data (really one of R's strengths). It is pretty common in R because that is R's schema. You can assign things to other things and overwrite many of the /basic/ operations in R. For example, if you thought being able to assign the value of F to T was bad, wait and try:

`+`<- function(e1,e2) {e1-e2}
5+3

Of course, all of this is the good and bad of functional programming. Writing small code is easy. Writing big code gets pretty difficult (near impossible) ... and that is part of why programming languages started going to more object based schemas. You'll note that some of the better maintained and developed packages in R make use of R's class system (or an alternate one like proto)... there are certainly reasons from that. The ease of writing small programs and the difficulty of writing large ones is also part of why R suffers from a bash mindset, i.e. lots of small simple tools that are powerful. For this reason, many experienced R programmers write code that has many functions nested inside many other functions.

Ease of use in interactive mode is also the nice part about dynamic typing. Again - from a programming standpoint, it may very well be a nightmare (it definitely comes up in the Inferno text). However, from a non-programmer playing with the data standpoint, it is easier to be sloppy about the typing and fix the places it turns out to be a problem than it is to always select the type in advance. Another example is vector/array indexing. Things like 0 indexed arrays make sense to programmers but not to non-programmers (another gripe is that you say base 1 when you mean 1 indexed... base 1 sounds like you are talking about a number system, e.g. base 2 or base 10).

R is funky but I like it. I understand and do empathize with your aRrgh. I hope that the analogies I gave above help you think about it and that you will continue to revise aRrgh to help others who are suffering the pain you are suffering.

@Protonk
Copy link

Protonk commented Oct 16, 2013

I think this is one of the sources of your aRrgh.

I should note I'm just a guy with this repo watched, not the maintainer. :)

So, R and R code is much more like a functional programming language than it is like an object oriented one.

Absolutely. But I think there are places where that functional abstraction breaks down pretty clearly and where it doesn't. Or points where it's just orthogonal to the language problems.

Here we're expecting that names(x) will return the names of x (we can look at names or attr, it doesn't change much). So if I'm teaching someone how to evaluate the code as they read it, we could work outward from the object x through normal operator precedence and make sense of what the code is doing. But that doesn't work here. Assignment doesn't happen on the output of the pure function, as do other things e.g. 'bob' %in% names(x), it makes use of names's hidden status as a replacement function in order to perform the assignment on the input. There's nothing functional about that unless we want to break apart the <- abstraction but at that point we've sorta lost the plot. We wanted to find out why names(x) <- 'foo' made sense and it doesn't, relative to what another language might do or what R should do, and we're no closer unless we unpack how assignment works.

If we were looking for a clear way to indicate that the properties of an object are updated, there are examples everywhere of bare object properties being updated by assignment. Either in a language w/ object literals like JS or within R. When we assign x$foo <- 'bar' we're directly updating a property of x. If that's not desirable, the language designers (or application designers) can create an API where x$foo is updated within some constraints. e.g. it has to be a string, then they can create a setter function. R has these in places (see options()) and avoids them in others, but I don't see how they're inferior (even in a functional context) to thingYouWant(x) <- 'bar'. That's functional only insofar as it makes the attributes accessible as the output of a function! :)

@russellpierce
Copy link
Author

Oh, sorry about the misattribution. I guess I should say the original author/maintainer's aRrgh. I think that the places where the functional abstraction breaks down, but that they are informative of R's simplicity proclivity (for the most part). In a functional programming languages functions only do returns.

In R for the most part functions do returns... but they also can lay bare the memory space for writing. You are right, that breaks the analogy to functional programming by violating our (programmers) expectations about how scoping should work. However, if you didn't know about scoping, it would make sense from the you "have this thing over here and you want it to be over there"/I just want to move this piece of memory over to another place approach. From a naive standpoint, if...

x <- LETTERS
x[x == "A"] <- Z 
x

...works then why wouldn't names(x) <- "foo"? Of course, in many places that logic won't work. Then again, I think because I am at peace with the idea that functions have their own scope separate from global that I've never been attempted to assign something to a function result that didn't actually have an implicit replacement function written for it. However, I'd never stopped to think about it before... that R has the ability to have names and names<- be different functions is kind of interesting. Never much in line with the Godel Escher Bach way of thinking about Turing machines and the ability to redefine what + means and does.

I think with names and attr we are operating at kind of a meta-level inside R. There is a reason why names is not just a named list inside of each R object - we want the attribute to be able to operate outside of the structure of the object itself (e.g. class).

I do see your point, but I'm not sure what other approach could have been used in R to indicate that properties of objects were being updated rather than something else. Nearly every other special character on a standard 101 key US keyboard already has some meaning in R or is an escape character of some kind. I'm sure that is why some of the basic operations are multi-character, e.g. matrix multiplication, modulus, etc. I suppose that another multi-character operator could have been added for that purpose... and if someone really wanted to address attributes in R I don't see any reason why they wouldn't be able to. Although I haven't been able to quite crack it yet myself.

@Protonk
Copy link

Protonk commented Oct 16, 2013

From a naive standpoint, if... ...works then why wouldn't names(x) <- "foo"?

I can see the similarities if we're willing to think very hard about what x[x == 'Z'] and names(x) represent. x[x == 'Z'] by itself will return the subsetted elements and I could imagine that as being analogous to returning the name object but I think there's some distinction between accepting the magic of the language at a very low level (i.e. not thinking too hard about how that vector is being subsetted or how the replacement is happening) and being forced to think about it at a higher level.

It's possible that this is a consequence of the subset/assignment pattern as a whole. While I love it, there is some strangeness to this notion...

x <- LETTERS
z <- x[x == 'Z']
# these two lines are different in important ways
x[x == 'Z'] <- 'foo'
z <- 'foo'

The subsetting and extraction function will return (if in the REPL, assigned to a name or at the bottom of a function) the extracted vector as a new object in the environment. If we 'intercept' it, so to speak, it represents some sort of slot for mutable state, iff we do nothing else with it. We get used to it because the pattern is truly very, very handy. But it's weird. That same weirdness is apparent with names(x):

z <- names(x)
z <- 'foo'
names(x) <- 'foo'

From our paradigm within R, these differences can make sense. As you mentioned, it's natural to think about accessor functions like the extraction function and operate with them similarly. But there's enough to justify a complaint.

@russellpierce
Copy link
Author

I now agree that there is enough to justify a complaint. I just think that there is enough clarity that the complaint can be addressed in a way that will help others who have the same complaint understand what R is doing.

I've probably been in the notation too long to see it. Although, if you had replaced your first code example with equals lines it would seem maybe a bit more confusing. One nice thing about the <- notation is that it tells you what is going where. A novice can look at each side of that operator and determine what the 'what' is that is going to the 'where'. The mutable state/dynamic typing/dynamic sizing of data-structures part of it again. Great stuff for playing with data... but potentially dangerous and confusing. I'd especially grant that what R does when data-structure sizes mismatch is confusing, e.g. your last line of the second code example here and in addition to what the original author mentioned in regards to recycling.

@tdsmith
Copy link
Owner

tdsmith commented Oct 16, 2013

This is an interesting conversation; thanks! I had worked out some of my personal angst on this point after someone explained names<-() but I hadn't become invested in the underlying mechanisms of how objects and attributes work. An explanation of R's object systems definitely belongs in aRrgh.

"You are thinking of R like a programmer. Not just any programmer, but a programmer from other languages that use other schemas." This is absolutely my audience, so that's an appropriate perspective from which to approach the guide, if not the language. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants