Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training speed of Regression Forest #45

Open
GoogleCodeExporter opened this issue Mar 7, 2016 · 1 comment
Open

Training speed of Regression Forest #45

GoogleCodeExporter opened this issue Mar 7, 2016 · 1 comment

Comments

@GoogleCodeExporter
Copy link

First thank you very much for this wonderful software!

I notice that for same number of samples and features, if only difference is 
the labeling type so one problem is classification and the other problem is 
regression, the time taken for construction of regression forest will be 
considerably longer than classification forest (using default parameters for 
msplit and keep ntrees the same. We also estimate variable importance along the 
way.) Is there any reasons behind this?

Thanks a lot!


Original issue reported on code.google.com by KangD...@gmail.com on 27 Sep 2012 at 8:15

@GoogleCodeExporter
Copy link
Author

Hi Kang

yeh there is a difference between the regression/classification code. when 
creating tree you need to split data but before splitting you need to sort data 
falling into a node. the classification code uses a pre-sorted array and that 
makes the classification code scale as O(number of example) whereas regression 
code uses on the fly code and that makes regression code scale as O(nlog(n)) - 
best sort code scaling.

i am guessing you have lots of examples and thats one reason regression might 
be slower. 

the other reason might be that regression trees may be split totally (i.e leaf 
nodes have the minimum number of examples) whereas your classification trees 
might be much simpler (a low VC dimension)

calculate the mean number of nodes in the model created, that might give you 
some more idea
mean(modelRf.ndbigtree) (classification)
mean(modelRf.ndtree)(regression)



Original comment by abhirana on 27 Sep 2012 at 10:41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant