create general na functions for treedata #25

lukejharmon · 2014-09-18T13:24:35Z

I think we need three cases: single column (checks for NAs, removes from data and tree as needed); pairwise (removes any taxa not present in BOTH, for things like PGLS); and multivariate (removes any incomplete taxa, for things like phyloPCA).

lukejharmon · 2014-09-18T13:25:09Z

@uyedaj thoughts?

lukejharmon · 2014-09-18T13:26:07Z

Actually this approach won't work with the lapply approach that is used by aceArbor and others. Ergh.

uyedaj · 2014-09-18T13:34:40Z

If want to feed an entire data frame and run it for each column, then the functions need to take care of each column individually (as done in aceArbor), but the other possibilities can be filtered using treeplyFilter right now. We could automate this process with a friendlier wrapper that allowed you to filter selected columns for NA's and select Boolean operators ( "or" or "and"). That way we could cover the possiblities you list.

lukejharmon · 2014-09-18T13:39:15Z

yeah that's great. the filtering is not really being used heavily now in aceArbor, maybe persistent issues like #15.

curtislisle · 2014-09-18T14:01:30Z

Would it be best if we coordinated this character & column management between the Romanesco and aRbor layers? I understand if you guys want to standardize it all at the R level to allow aRbor to be functional outside of Arbor proper, just wondering…

On Sep 18, 2014, at 9:34 AM, Josef Uyeda notifications@github.com wrote:

If want to feed an entire data frame and run it for each column, then the functions need to take care of each column individually (as done in aceArbor), but the other possibilities can be filtered using treeplyFilter right now. We could automate this process with a friendlier wrapper that allowed you to a) filter selected columns for NA's and select Boolean operators ( "or" or "and"). That way we could cover the possiblities you list.

—
Reply to this email directly or view it on GitHub.

uyedaj · 2014-09-18T14:23:58Z

Yes, so here is my thought:

I think we need both. I think it's great if we have common operations done on data frames and trees available at the aRbor level (like eliminating NAs, filtering by category, select rows by condition etc.).

These are duplicated right now in my treeplyr functions, and I don't think treeplyr should replace these most of the time. Where the treeplyr functions are really useful is that they can take any R expression, or combination of R expressions, to filter, select, mutate, or apply a function to a data frame/tree/tree+data.frame. This allows the user in aRbor quickly to apply a function to their data that we wouldn't want to implement as a stand alone function, because it would be too idiosyncratic to their particular purpose (e.g. 'if(island=='Cuba') {SVL * 10}' because your collaborator who measured Cuban anoles measured in centimeters rather than millimeters). Having a specific function for every imaginable operation isn't feasible.

curtislisle · 2014-09-18T14:43:30Z

Agreed. I like the flexibility of having the power at both levels. To me, it seems like Arbor will gradually evolve into having different “collections” of operations. Some will be simple wrappers above the treeplyr/aRbor/rotl layer and others might be more involved at the work step algorithm level. This way there would be simple block collections and “power user” block collections available.

A take away for your standup talks today could discuss how to create these separate “collections” of operations.

On Sep 18, 2014, at 10:23 AM, Josef Uyeda notifications@github.com wrote:

Yes, so here is my thought:

I think we need both. I think it's great if we have common operations done on data frames and trees available at the aRbor level (like eliminating NAs, filtering by category, select rows by condition etc.).

These are duplicated right now in my treeplyr functions, and I don't think treeplyr should replace these most of the time. Where the treeplyr functions are really useful is that they can take any R expression, or combination of R expressions, to filter, select, mutate, or apply a function to a data frame/tree/tree+data.frame. This allows the user in aRbor quickly to apply a function to their data that we wouldn't want to implement as a stand alone function, because it would be too idiosyncratic to their particular purpose (e.g. 'if(island=='Cuba') {SVL * 10}' because your collaborator who measured Cuban anoles measured in centimeters rather than millimeters).

—
Reply to this email directly or view it on GitHub.

lukejharmon mentioned this issue Sep 30, 2014

functions not working in Arbor #31

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create general na functions for treedata #25

create general na functions for treedata #25

lukejharmon commented Sep 18, 2014

lukejharmon commented Sep 18, 2014

lukejharmon commented Sep 18, 2014

uyedaj commented Sep 18, 2014

lukejharmon commented Sep 18, 2014

curtislisle commented Sep 18, 2014

uyedaj commented Sep 18, 2014

curtislisle commented Sep 18, 2014

create general na functions for treedata #25

create general na functions for treedata #25

Comments

lukejharmon commented Sep 18, 2014

lukejharmon commented Sep 18, 2014

lukejharmon commented Sep 18, 2014

uyedaj commented Sep 18, 2014

lukejharmon commented Sep 18, 2014

curtislisle commented Sep 18, 2014

uyedaj commented Sep 18, 2014

curtislisle commented Sep 18, 2014