New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration with magrittr #1208
Comments
DT[, a := some.function(a)] Works perfectly fine |
But imho
isn’t perfectly fine. I like the idea of adding this convenience function. But maybe |
You shouldn't have such strange names in your data set. It is both inconvenient and hardly maintainable. Other than that, you can store the column name in some variable and then do: shortname <- "a_very_very_very_very_long_variable_name"
DT[, (shortname) := some.function(get(shortname))] |
You're right, but even with variables that have intermediate length I still find the magrittr syntax much more convenient to read and write. Anyway, this is just my personal opinion. |
I find that it’s sometimes better to have long variable names in complex data sets to make it clear what is saved in a variable. It is a matter of personal preference. Convenience function are per definition not required to perform a task, they just make it faster to code and often easier to understand. I have no doubt that this function would be of use to many users. But I also understand if the data.table devs don’t want to implement/maintain (too many) convenience functions, you have to draw the line somewhere ;) |
For those of you who are subscribed to this thread, please disregard the last comment (now deleted). It was silly. |
Building from @and3k 's comment, I see some value of : DT[, a %:=>% some.function] Think that reads better (i.e. a |
The DT[, a %:=:% some.function] or |
What extra meaning does |
My understanding was that a major part of motivation for moving to |
I mean greater or equal operator. |
My vote is for The |
Hadn't considered the typing aspect i.e. holding down shift for all characters in the operator is easier I assume. Makes sense. |
Thank, Just curious, why does |
@my-R-help it is valid syntax, see this Why is := allowed as an infix operator? |
+1 I agree with the OP feature request and use of the magrittr syntax. It is the best and most obvious choice for several reasons.
I strongly encourage you to not overthink this FR by introducing a new operator whose choice is just as arbitrary magrittr's choice was.. |
@ctbrown The proposal is for a pipe operator that does something different from the vanilla |
I think the OP's FR request was sufficiently clear, i.e. Let me ask you, what do you hope to gain by introducing another operator On Thu, Oct 27, 2016 at 11:41 AM, franknarf1 notifications@github.com
|
Does |
Technically you are correct, magrittr's %<>% does not assign by-reference, On Thu, Oct 27, 2016 at 2:18 PM, Michael Chirico notifications@github.com
|
First, I do not think that is true and that you speak for all users; I am a user, for example. Second, if it is true, then these users should learn about the difference as they learn to use data.table. Your "reasons above" do not hold water with me. It's not somehow going against magrittr to implement a non-overlapping pipe operator to do a distinct but related thing. To me -- and this is just my impression, just as much as everything you've been saying is yours -- this seems perfectly consistent with the "established practice" of magrittr (which I use almost as often as I use data.table). It's perfectly possible that the use in this context would assign to multiple objects (columns) at once, which surely you would agree is quite distinct from
And, besides modifying by reference and potentially modifying several things at once, we have the fact that we are modifying part of a thing (the data.table), which is quite different from Anyways, since the developers have not shown any sign of doing this task any time soon (by marking this FR with a priority or milestone), how about revisiting this if it actually moves forward? |
@ctbrown by-reference is not just different in implementation, and needs to be differentiated from by-value functions in user interface. That's the whole point of |
First, there was no claim as to speaking for ALL R users. That is a ridiculous assertion. The reference to "users's expectations" was to my own, presumably the OP's and several of my students who have already tried Second, as you point out, it is up to the individual to accept or reject arguments in favor of following the OP's suggestion of following the magriitr syntax. There have been some arguments offered as to why this would be beneficial, but few cogent argument offered why an alternative would be superior or even beneficial. There seems to be a minor argument that because the implementation is different, but that is a rather weak argument. Additionally,
If there are arguments for/against the OP's suggestion, I would love to hear them. But the only thing I have heard is that, "because it is different". Maybe some will see this as valid, but weighed against the OP original suggestion, the alternative does not seem better. |
Everyone that's ever used The answer to this is one of the first things anybody learns about using
http://stackoverflow.com/questions/7033106/why-has-data-table-defined-rather-than-overloading |
First, most users do not need to know the distinguish the difference between by-reference and by-value. It is not a prerequisite of using DT that you know this. Presumably, this is why the DT syntax is so close to DF. @mattdowle could have clearly designed DT with a purely functional interface. He didn't. Presumably, one of reasons was that DT could function as a drop in replacement. With respect to the As to WRT, the standard in the R community -- notorious for it's lack of standards -- magrittr is as good as it gets: ubiquitously used and discussed. The OP suggests Interoperability with it would be a nice feature. I agree. If you have any doubts about this take a look at its CRAN page. Developers are using magrittr in their own packages. Moreover, writing packages is not the majority of R users. But this is really a digression from the topic. The argument you offer falls under: "DT is different from magrittr since the assignment is by reference so the syntax is different". To which the response is still: The implementation is different, true. but the interface should be the same since it is effectively the the same operations for most users, conforms to user expectation and whose true operation can be inferred from context. |
I'm glad he didn't. Locking into "purely functional" simply translates to dropping some important features that user is now capable to use in order to write faster and more memory efficient code. I have projects (i.e. anchormodeling) which would basically be impractical to use in a "purely functional" framework. |
@jangorecki |
Thanks for bringing a sense of enlightenment to the discussion. The references stray a bit from the original proposal, but they help illustrate the points in favor or the OP proposal, Namely,
|
It is a counterproductive and distracting rhetorical device, I'd say, to refer to "users" when you really just mean yourself. You may also have noticed that the OP said "
It is not an appeal to authority since I am not arguing a point there. It as an appeal to you to calm down. This may never even be implemented, so can't you defer the fuss? I imagine it will be a trivial matter to switch the name of the function after it's implemented (if it ever is), and we'll have a better sense of what exact functionality we're looking at at that point. As far as the substantive arguments go:
I look forward to seeing your FRs for these features on https://github.com/tidyverse/magrittr/issues and hope they go through, because I would certainly use that functionality. |
Point taken; I had missed that the OP said that "%:>% looks good to me." Notwithstanding, it is not just me. The OP first suggested the magrittr syntax, first. Presumably, he thought it a good idea despite conceding to an alternative later. I had also thought it a good idea, that is what brought me here and this was prompted by several students who have tried it. Presumably, there are others. Dismissing this as a lone viewpoint is kinda beside the point, anyhow. Second, the argument was, in fact, an appeal-to-authority. It may as also be "an appeal for me to calm down", though I am perfectly calm. In any event, the point seems off topic, it does not address the merits of the OP suggestion. Also, the fact that this is very unlikely to be implemented does not seem to be relevant to the merits of the proposal. It must further concede that you are correct. It will be trivial to change the name of the function once implemented. However, such change could and will likely break any code that is developed that uses the feature. It makes perfect sense to spend time discussing the interface before implementing rather than burdening the users with a incompatible change later. It is unclear what shutting down discussion serve a useful purpose. As to the substantive arguments, they seem to advocate more for increased functionality of magrittr than address proposed |
In an effort to get back on topic, I thought it might be useful to summarize the relevant arguments. Argument in favor of
Arguments in favor of
|
You're absolutely missing how a modification by reference would awfully break this kind of ported code. That the same problem as when you copy a data.table vs copy a data.frame (dt2 <- dt), suddenly you scratch your head about why your orignal dt has been updated when you did work only on the second. This exact precaution to take, invalidates also your first point, as it call for a precise documentation of what does the operator, using a different one will ease finding the correct documentation. |
Understood. Thus the "may be" part of the assertion. On Wed, Nov 2, 2016 at 1:32 AM, Tensibai notifications@github.com wrote:
|
Thanks for all your comments and feedback. Just a minor thing (maybe I'm missing something): Does it in this specific case even make a difference whether it's assigning by reference or by value? What we want to do is to update a column (or several columns) inside the data.table. The user knows that the old column will get overwritten either way. There's no room for misunderstanding, is there? In contrast, this is very different than |
Your intuition is correct. It does not make a difference to the user how this operation is implemented. From users' perspective the results are the same -- values in the column are reassigned. There has been some arguments stating that there should be some differentiation, there hasn't been a cogent explanation as to why. Your proposal of adopting the ( As a side note, I was a little disheartened when you stated, "Thank (SIC), %:>% looks good to me." and did not more forcefully advocate for you initial intuition and proposal more forcefully. In any event, thanks for the proposal. It is brilliant whether it is implemented in DT or not,) |
Thanks for your reply, Christopher. Just to clarify, I personally have a preference for Maybe I should have phrased it that way. Sorry if it caused any confusion. |
I still feel there's room for foot-gun with joins. Having two operators behaving a little differently on their side effects named the same is error prone and will lead to confusion. I can't argue better than that, but there's a reason on why R warns you when a package mask a base function or when loading a package overload another package function. In my opinion, it does make a difference for at least some users to have specific operators when the side effects will be different. Bonus searching for the operator you'll end up on the DT page explaining it's caveats/limitations with no doubt instead of having two choices in the help. Here we're talking about a language, not a user interface, while I agree on a final software user shouldn't care about the implementation behind X button, I highly disagree a programmer should not care about the implementation behind a function. Major objection being: someone thinking TL;DR: Programming is not a UX, you have to be specific about what you want, hence reusing well-known names should not happen. |
The claim that the side-effects are somehow different is dubious. In each case, a variable reassignment is being performed. They are both side-effects. The implementation (by-ref or by-value) doesn't truly distinguishes these since the comparative end states of both systems have changed in analogous ways. Even if the side-effects are different. The distinction is rather unimportant. This point has been raised repeatedly in the above discussion. If the distinction were important, it should be possible to provide an example where it would make a difference to the user. The lack of a counter factual example while not conclusive is a strong indication that there is no distinction. With respect to:
This is just wrong. Reuse of common, well-known names not only should happen, it is very common and is considered good programming practice. This is called polymorphism. It is perfectly acceptable to have methods with the same name that are implemented differently:
The suggestion that:
is similarly flawed and is counter to most users experience. Most programmers probably use hundreds of functions/methods. They do so without knowing their implementation details. The user does needs to know the input and the output/side-effects for the functions to be useful, but how it gets there is most often irrelevant. Granted, users sometimes needs to know details in order to tweak or debugged the function, but it can be argued that this in the vast minority of cases. Consider the world where the users had to know how each and every function worked at all levels. The cognitive load would be immense; programming anything of complexity would be an impossible task. With respect to:
This is not a Bonus, but a liability by a) introducing confusion (how is this different from the very popular magrittr packages, exaclty?) and the b) creating the need for additional unneeded documentation in the first place. If the magrittr syntax, DT devs can say: "go there and read there docs and vignette; DT supports what they are doing there." This cooperation and cross package borrowing raises the value of DT, magrittr and the R ecosystem. ) Lastly, it might be inferred that from the comments about "user interface", "X button" and "UX" that there was a specific UI implied. That is simply not the case. And, while it is abundantly clear we are speaking about a language, it is erroneous to say that the language lacks an interface. The interface is its syntax and it is important. |
To summarize, so the issue can be eventually resolved. All we need is to handle the following translation. DT[, a %<:>% fun] ## or "%:>%"
DT[, a := fun(a)] Is that right? how should it behave if DT[, "a" %<:>% fun]
DT[, "a" := fun(a)] ## this?
DT[, "a" := fun("a")] ## or this? what if its length is not 1? DT[, c("a","b") %<:>% fun]
DT[, c("a","b") %<:>% fun(a, b)]
DT[, c("a","b") %<:>% fun("a","b")]
DT[, c("a","b") %<:>% lapply(list(a, b), fun)]
DT[, c("a","b") %<:>% lapply(c("a", "b"), fun)] Personally speaking I would close it as won't fix because of adding quite a lot complexity and not solving any new problem. |
This is a feature request following the discussion on the mailing list.
I think it would be useful to have something like this as a short-hand form:
So far one has to type
or without magrittr
This is particularly important if
a
is replaced with a variable that has a long name, which is then difficult to type and read. I think there are significant savings in (programmer) efficiency to be made here, especially with longish variable names.The text was updated successfully, but these errors were encountered: