Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimised tree growing method #74

Open
ArashBayatDev opened this issue Apr 10, 2018 · 1 comment
Open

Optimised tree growing method #74

ArashBayatDev opened this issue Apr 10, 2018 · 1 comment

Comments

@ArashBayatDev
Copy link
Collaborator

I recommend the following improvement to VariantSpark Random Forest importance analysis.

  1. Compute and write importance score to a file after building every 1000 tree.

  2. Automatically identify when enough tree has been built. If implementing the first suggestion then we can compare importance score at each step (1000 trees built) with the importance scores computed in the previous step. if little change has happened then we can stop building more trees.

  3. Frequently (every -rbs tree) dump models (built trees) to disk and allowing to integrate previously built models in a new run. If the process crash half way produced model can be used in the next run.

@BauerLab BauerLab changed the title Improvement Optimised tree growing method Jan 25, 2019
@Yatish0833
Copy link
Collaborator

Work towards updating the VariantSpark code locally to generate frequently (every -rbs tree) dump models (built trees) to disk to create a test dataset which can be used to test the above hypothesis. Test results of this dataset will then be posted on this thread to get acceptance from everyone involved whether or not to move forward with this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants