Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce package dependencies (e1071, and randomForest)? #180

Open
pa-nathaniel opened this issue Mar 5, 2023 · 2 comments
Open

Reduce package dependencies (e1071, and randomForest)? #180

pa-nathaniel opened this issue Mar 5, 2023 · 2 comments

Comments

@pa-nathaniel
Copy link
Collaborator

pa-nathaniel commented Mar 5, 2023

I am struggling to install FFTrees on a machine due to issues installing randomForest (due to some issues with a dependency with an M2 mac). Really frustrating and feels like a shame to have all of the great FFTrees functionality gated on being able to use randomForest as a competitive algorithm.

This gets me to wonder, what would the pros and cons be of reducing dependencies? Generally including non-essential dependencies is discouraged, and the more that I think about it, randomForest, and other packages used as competitive algorithms, are definitely not essential for seeing the benefits of FFTrees.

How about removing randomForest, and maybe e1071 (for svm()) as dependencies and just using rpart::cart() and lr as competitive algorithms?

I feel like 99.9% of users won't miss it and it could reduce the barrier to entry.

@hneth what do you think?

@pa-nathaniel pa-nathaniel changed the title Reduce package dependencies? Remove some competitive algorithm functionality? Reduce package dependencies (e1071, and randomForest)? Mar 5, 2023
@hneth
Copy link
Collaborator

hneth commented Mar 6, 2023

I haven't experienced this barrier, so far, but you're raising a valid and important point, of course.

So far, I've been viewing the competitive algorithms in the FFTrees package as a nice add-on with more benefits than costs. Given the lack of a generally accepted gold standard and the availability of a vast range of possible classification strategies, it's crucial to compare the performance of FFTs to some alternative models. The current range links and contrasts our seemingly naive trees with fancier methods typically associated with buzz-words like "statistical modeling" and "machine learning". And while I suspect that many users appreciate the automatic availability of such performance benchmarks, it's highly undesirable when enabling these benchmarks prevents them from installing and using FFTrees.

Hence, perhaps the key questions and trade-offs here are:

  1. What proportion of users is lost due to such dependency issues?
  2. Would winning them outweigh the costs of existing users for creating their own benchmarks?
  3. What other costs do we incur by excluding or including the benchmarks?

With regards to 3.: Beyond their technical demands, another critical issue with highly sophisticated alternative benchmarks is that our default usage often fails to exploit their full capacity. This is unavoidable and to be expected, as we're not even trying to optimize the performance of those algorithms. But when then finding a superior solution (e.g., by using RLR instead of LR), enthusiasts of those alternative algorithms (or skeptics of FFTs) may then construe our omission into a general argument against simpler strategies. Hence, removing non-optimized alternatives could also preempt accusations that our competition is not "fair" or "objective" (which may often be justified — but not out of bias or malice, but simply because we're devoting more attention and effort on our favored model than on its alternatives).

@ndphillips
Copy link
Owner

I take your points. I suspect most people who want to compare the effectiveness of FFTrees to other algorithms should be using packages built for that purpose (such as tidymodels and parsnip) rather than using the (somewhat hacky) solutions we built into this package.

I think it would be wise to

  1. Remove all competing algorithms from FFTrees
  2. Update version number to a new minor (or major?) number to indicate this is a breaking change
  3. Provide a link to other packages and recommended workflows if people want to do a comparison between FFTrees and other algorithms.

I'll create a PR for this but since it's a major change I won't merge until getting a review from @hneth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants