Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document HoeffdingTree #3572

Merged
merged 10 commits into from
Dec 23, 2023
Merged

Document HoeffdingTree #3572

merged 10 commits into from
Dec 23, 2023

Conversation

rcurtin
Copy link
Member

@rcurtin rcurtin commented Dec 11, 2023

Here's rendered documentation: https://www.ratml.org/misc/mlpack-markdown-doc/hoeffding_tree.html

As with a lot of the previous examples, I had to make fairly intrusive changes to Train() to match the API required by the hyperparameter tuner. Because we don't yet require C++17 (where I could use std::optional), there are a lot of overloads. Once we do, the code can be cleaned up.

The only other functionality I added was a user-facing Reset() function to reset the tree, plus a few accessors to get properties of the trees.

I also added a number of tests for everything that I implemented, as well as a test to make sure that training on floating-point data works.

@shrit
Copy link
Member

shrit commented Dec 14, 2023

@rcurtin it is ready to be reviewed ?

@rcurtin
Copy link
Member Author

rcurtin commented Dec 14, 2023

Yep, ready for review 👍

@shrit
Copy link
Member

shrit commented Dec 14, 2023

Perfect, will give it a chance tomorrow

Copy link
Member

@shrit shrit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good, similar comments to the last review of the logistic regression class the following links are not working:
Screenshot 2023-12-15 at 14-20-04 https __www ratml org

Also regarding the fitness function, it would be great to have more details about them if we have many, because currently, we are only mentioning the Gini Impurity.
I am not an expert in trees, so I have no idea how many fitness functions you can have, or what we have already implemented.

Screenshot 2023-12-15 at 18-10-26 https __www ratml org

The last thing is double about the precision, but apart from that great work, we can merge this really soon

Comment on lines +366 to +367
const bool batchTraining,
const double successProbability);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the sucessProbability as ElemType?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do this, mostly so that when you switch to using a different ElemType, as few things change in the API as possible. (That will also make the documentation job in #3584 easier.) For something like the success probability, using a double or a float or whatever other element type really won't make any serious difference, so I think it might be easier to leave as-is.

const arma::Row<size_t>& labels,
const size_t numClasses,
const bool batchTraining,
const double successProbability,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

const size_t numClasses = 0);
const size_t numClasses,
const bool batchTraining,
const double successProbability,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@rcurtin
Copy link
Member Author

rcurtin commented Dec 19, 2023

Very good, similar comments to the last review of the logistic regression class the following links are not working

Yep, the invalid links will be handled in a follow-up PR. 👍

Also regarding the fitness function, it would be great to have more details about them if we have many, because currently, we are only mentioning the Gini Impurity. I am not an expert in trees, so I have no idea how many fitness functions you can have, or what we have already implemented.

Good point. I added a sentence or two with some links in b74b8bc; let me know what you think. 👍

The last thing is double about the precision, but apart from that great work, we can merge this really soon

In this case, actually the way HoeffdingTree is written internally, the fitness functions should always return double regardless of the ElemType of the original data. This is because the fitness functions for the Hoeffding tree use an arma::Mat<size_t> containing class counts. Like I mentioned in another comment, I don't think it'll make any measurable difference to use double, float, or whatever other type there, so probably just best to leave it as double in my opinion.

@rcurtin rcurtin mentioned this pull request Dec 19, 2023
@shrit
Copy link
Member

shrit commented Dec 21, 2023

All is good from my side, and yes one double scalar value is not an issue, it is usually vectors that can consume memory fast.

Copy link

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

@rcurtin rcurtin merged commit 45ccc83 into mlpack:master Dec 23, 2023
19 checks passed
@rcurtin rcurtin deleted the hoeffding_trees_doc branch December 23, 2023 17:14
@rcurtin rcurtin mentioned this pull request May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants