Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

superClass Update #60

Open
bkavlak opened this issue Mar 26, 2020 · 0 comments
Open

superClass Update #60

bkavlak opened this issue Mar 26, 2020 · 0 comments

Comments

@bkavlak
Copy link

bkavlak commented Mar 26, 2020

Hi!

Ned Horning might mention about me. I checked superClass.R, and it is quite rich in a statistical fashion. It's been a year that I'm using the random forest model in my crop type classification projects, and the biggest challenge is performance since I'm working on multi-temporal data. [e.g., with more than 250 variables]

  • Addition of the ranger random forest tree, which is way faster than RF (ranger may not have some statistical capabilities that RF has so that it can be added rather than updated).

  • I'd suggest separating data extraction, training, and prediction phases; since the user may try GPU computing with new packages such as h2o4gpu package, which is much faster than ranger if you have an NVIDIA GPU. Adding such packages should be dangerous since they have many dependencies. Also, I think that having some function that does all intervene in the engagement of the user and the understanding of what's going wrong. This function assumes the user is an expert on the subject. (I just want to point out if this way is not intended.)

  • I also have problems with CPU parallelization on my ubuntu machine. If you accept, I can try to change the style of some parts to functional programming with furrr package, which is a parallel version of purrr package. I'm quite new on developing packages, so I can miss here sth that might not make sense for developing sth that everybody uses. (examples for data extraction here: Functional Training Collection nedhorning/RandomForestForRemoteSensing#2)

  • Lastly, since processing takes much time, some text to the console that shows where we are and what is the procedure would be perfect. For example, the function has been processing for one day, and I can't guess how much time left.

It is a good practice for me to working on such issues. If we may decide on the following procedure, I can start to check what I can do.

Thanks for the package!
Batuhan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant