Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*apply functions #18

Open
xiongchiamiov opened this issue Jan 17, 2014 · 1 comment
Open

*apply functions #18

xiongchiamiov opened this issue Jan 17, 2014 · 1 comment

Comments

@xiongchiamiov
Copy link

There are a variety of apply-type functions available in R. Here's what I've figured out so far:

lapply and sapply both loop over a list; for each element in the list, they call the function with the element as a parameter.

R: sapply(list(c(1, 3, 5), c(2, 4, 2)), sum)
Python: map(sum, [[1, 3, 5], [2, 4, 2]])

lapply will always return a list, while sapply attempts to simplify the result to a more concise object (since lists are not as concise as I'm used to in other languages).


mapply is sapply with multiple arguments passed to the function.

R: mapply(sum, list(c(1, 3, 5), c(2, 4, 2)), list(10, 100))
Python: map(sum, [[1, 3, 5], [2, 4, 2]], [10, 100])

This means that mapply with only the function and one argument can be used as a replacement for sapply:

> sapply(list(c(1, 3, 5), c(2, 4, 2)), sum)
[1] 9 8
> mapply(sum, list(c(1, 3, 5), c(2, 4, 2)))
[1] 9 8

Note the order of the arguments is changed. As far as I can tell, this is intended to make the functions maximally confusing.


Finally, apply applies a function to what R refers to as an array, which looks much more like a matrix to me. It can either apply the function to entire rows or entire columns; for some obtuse reason, applying to rows requires MARGIN to be set to 1, while applying to columns requires it to be 2.

Given

library(datasets)
data(mtcars)

R: apply(mtcars, 2, mean)['mpg']
SQL: select avg('mpg') from mtcars;


Finally, tapply groups one thing by another, then applies a function to the groups.

Again using mtcars,

R: tapply(mtcars[['mpg']], mtcars[['cyl']], mean)
SQL: select avg('mpg') from mtcars group by 'cyl';

@tdsmith
Copy link
Owner

tdsmith commented Jan 21, 2014

This deserves mention! I've been avoiding these because I hate them and I haven't figured them out yet; all the R I've been writing has involved processing data frames with plyr, which sidesteps the issue. :| Thanks for the primer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants