Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hierarchical sampling of data? #57

Open
GoogleCodeExporter opened this issue Mar 7, 2016 · 3 comments
Open

Hierarchical sampling of data? #57

GoogleCodeExporter opened this issue Mar 7, 2016 · 3 comments

Comments

@GoogleCodeExporter
Copy link

I am wondering if it is possible to incorporate hierarchical sampling of the 
data into the random forest. 

Essentially, I have multiple observations acquired from the same subject,which 
means that the out of bag estimates are not necessarily independent. I'm having 
trouble re-calibrating the model using out of bag predictions because of this.

I looked at the stratified sampling, but it does not look to be the same as 
what I'm asking for.

Original issue reported on code.google.com by alistair...@gmail.com on 27 Feb 2013 at 1:35

@GoogleCodeExporter
Copy link
Author

i am guessing that would be possible, but some c-code array would need to be 
changed.

i think the best approach would be to say sample and create inbag/outbag 
indices for the trees outside in the matlab and then make the tree sample 
according to that inbag/outbag indices. that way you can tear up the sampling 
away from the c-code.

i am guessing you want the inbag/outbag created as follows: assuming that each 
subject is a sample and then bootstrap sample from the subject array and then 
sample from each subject some samples to create some sort of hierarchical 
sampling. 

anyways, i think it can be doable.  i am not sure if you will be upto coding 
some C-code because i am a bit held up till the end of april so i may not be 
able to code it up before then

https://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Reg_C/src/r
eg_RF.cpp#386 
https://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Class_C/src
/classRF.cpp#404

i guess you can put options there use a predefined oob/inbag indices or use the 
existing path if that array is not present.

Original comment by abhirana on 28 Feb 2013 at 5:52

  • Changed state: Accepted

@GoogleCodeExporter
Copy link
Author

Yeah, what you said is exactly what I would want to do. Do you know if the 
stratified sampling accepts 0? If so you could make multiple calls to the C 
function and do a sort of hacky version of the sampling like you suggest. Other 
than that I am not sure of a way to modify the sampling from MATLAB, if I've 
missed something let me know cause that is definitely an option.

I am trying to avoid coding C, you may have noticed, I promise it's for good 
reasons ;)

Original comment by alistair...@gmail.com on 28 Feb 2013 at 11:12

@GoogleCodeExporter
Copy link
Author

i apologize for my late reply.

sorry it looks like the stratified sampling requires a non-zero value :(
hmm, looks like this will need some c coding.

Original comment by abhirana on 9 Mar 2013 at 7:14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant