Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create general na functions for treedata #25

Open
lukejharmon opened this issue Sep 18, 2014 · 7 comments
Open

create general na functions for treedata #25

lukejharmon opened this issue Sep 18, 2014 · 7 comments

Comments

@lukejharmon
Copy link
Member

I think we need three cases: single column (checks for NAs, removes from data and tree as needed); pairwise (removes any taxa not present in BOTH, for things like PGLS); and multivariate (removes any incomplete taxa, for things like phyloPCA).

@lukejharmon
Copy link
Member Author

@uyedaj thoughts?

@lukejharmon
Copy link
Member Author

Actually this approach won't work with the lapply approach that is used by aceArbor and others. Ergh.

@uyedaj
Copy link
Contributor

uyedaj commented Sep 18, 2014

If want to feed an entire data frame and run it for each column, then the functions need to take care of each column individually (as done in aceArbor), but the other possibilities can be filtered using treeplyFilter right now. We could automate this process with a friendlier wrapper that allowed you to filter selected columns for NA's and select Boolean operators ( "or" or "and"). That way we could cover the possiblities you list.

@lukejharmon
Copy link
Member Author

yeah that's great. the filtering is not really being used heavily now in aceArbor, maybe persistent issues like #15.

@curtislisle
Copy link
Member

Would it be best if we coordinated this character & column management between the Romanesco and aRbor layers? I understand if you guys want to standardize it all at the R level to allow aRbor to be functional outside of Arbor proper, just wondering…

On Sep 18, 2014, at 9:34 AM, Josef Uyeda notifications@github.com wrote:

If want to feed an entire data frame and run it for each column, then the functions need to take care of each column individually (as done in aceArbor), but the other possibilities can be filtered using treeplyFilter right now. We could automate this process with a friendlier wrapper that allowed you to a) filter selected columns for NA's and select Boolean operators ( "or" or "and"). That way we could cover the possiblities you list.


Reply to this email directly or view it on GitHub.

@uyedaj
Copy link
Contributor

uyedaj commented Sep 18, 2014

Yes, so here is my thought:

I think we need both. I think it's great if we have common operations done on data frames and trees available at the aRbor level (like eliminating NAs, filtering by category, select rows by condition etc.).

These are duplicated right now in my treeplyr functions, and I don't think treeplyr should replace these most of the time. Where the treeplyr functions are really useful is that they can take any R expression, or combination of R expressions, to filter, select, mutate, or apply a function to a data frame/tree/tree+data.frame. This allows the user in aRbor quickly to apply a function to their data that we wouldn't want to implement as a stand alone function, because it would be too idiosyncratic to their particular purpose (e.g. 'if(island=='Cuba') {SVL * 10}' because your collaborator who measured Cuban anoles measured in centimeters rather than millimeters). Having a specific function for every imaginable operation isn't feasible.

@curtislisle
Copy link
Member

Agreed. I like the flexibility of having the power at both levels. To me, it seems like Arbor will gradually evolve into having different “collections” of operations. Some will be simple wrappers above the treeplyr/aRbor/rotl layer and others might be more involved at the work step algorithm level. This way there would be simple block collections and “power user” block collections available.

A take away for your standup talks today could discuss how to create these separate “collections” of operations.

On Sep 18, 2014, at 10:23 AM, Josef Uyeda notifications@github.com wrote:

Yes, so here is my thought:

I think we need both. I think it's great if we have common operations done on data frames and trees available at the aRbor level (like eliminating NAs, filtering by category, select rows by condition etc.).

These are duplicated right now in my treeplyr functions, and I don't think treeplyr should replace these most of the time. Where the treeplyr functions are really useful is that they can take any R expression, or combination of R expressions, to filter, select, mutate, or apply a function to a data frame/tree/tree+data.frame. This allows the user in aRbor quickly to apply a function to their data that we wouldn't want to implement as a stand alone function, because it would be too idiosyncratic to their particular purpose (e.g. 'if(island=='Cuba') {SVL * 10}' because your collaborator who measured Cuban anoles measured in centimeters rather than millimeters).


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants