Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance while running predict for quantile regression #1308

Open
gautham2492 opened this issue May 25, 2023 · 1 comment
Open

Slow performance while running predict for quantile regression #1308

gautham2492 opened this issue May 25, 2023 · 1 comment
Labels

Comments

@gautham2492
Copy link

I am using quantile forest in the GRF package on my data which is around 12 million records. I have a couple of questions:

  • The model trains fine, ie, in around 1 hour when i use 300 trees (am running it on a databricks cluster with 192 gb ram and 48 cores). However it takes an extremely long time (around 1 full day) to predict (200k records). Why is this and is there a way i can speed it up? My model has only 4 features.

  • Is there any way to do hyperparameter tuning on the quantile forest? i notice grf tune is there for regression forests and not quantile.

Any leads will be helpful

@Ri0016
Copy link

Ri0016 commented May 30, 2023

Hi gautham,

I'm not a member of grf lab. But I'm working on random forest algorithm. Following is my guess/explanation of your situation.

It depends on the tree size (the number of nodes in each tree, or the depth of each tree).

Suppose the average depth of each tree is d.
The complexity of constructing a forest with B trees is O(B * mtry * 2^d).
The complexity of predicting one point is O(B * d).

There is another sort cost for quantile estimation. So the complexity of predicting one point is O(B * d * n * log(n)), where n is the sample size. In your case, n is very large. I think this is the key point.

I think d is not very large (less than 20) in your situation. Then B * d * 200k is greater than B * mtry * 2^d. So it's reasonable to predict will be slower than constructing.

Also, there is another possible explanation. You forget to pass value to num.threads.

Hope it's helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants