New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some hyperparameter questions #529
Comments
Hi @alvanli -- Thank you for this question because it's both interesting and something we haven't talked about or documented in detail yet. It could also be important in some circumstances. EBMs are by default no longer a purely cyclic algorithm as described in the original GA2M papers. If you prefer to restore the cyclic nature of EBMs, you can do this easily by setting greediness or greedy_ratio to zero. The current default algorithm is a hybrid that sits somewhere in between fully greedy and fully cyclic. I've been calling it "semi-greedy" boosting. The greediness parameter has been available for some time, but only recently became the default. I'll start chronologically by describing the original greediness parameter. That will be good background for what greedy_ratio is now. In the original cyclic algorithm, the definition of a round was a complete sweep of all the features, so if you had 100 features, you'd make 100 boosting steps per round. During each boosting step we'd calculate the per-feature update, but we'd also get, essentially for free, the gain calculation prior to the update. The interesting thing is, since our learning rate is so low, we make small incremental steps and the gain calculation remains relatively valid after applying the boosting update. We can use this by putting the slightly stale gain calculations into a priority queue and then greedily picking which feature to boost next. Generally, assuming correlations aren't too bad, boosting one feature doesn't affect the gain of other features too much, but even so we'd prefer to periodically resweep all the features to refresh their gain calculations. We originally did this by intermixing cyclic rounds with greedy rounds. The cyclic rounds handle the issue of refreshing the gains, and the greedy rounds allow the algorithm to focus on the most important features. Why was this algorithm change desirable? Well, the main issue with purely cyclic boosting was that it tended to overfit some features and underfit others. This new algorithm seems to be better at finding the right amount of boosting to do on each feature. It appears, although we need more time to digest this, that some of the jaggedness on EBMs graphs was due to late-stage overfitting of some features which would otherwise be smoother. The original semi-greedy algorithm kept the number of boosting steps per rounds fixed. The greediness parameter defined how often to execute a greedy round vs a cyclic round. A good number used to be 0.5 which would alternate between greedy round and cyclic rounds. Sometimes you might want 0.66, which would do 1 cyclic round for every 2 greedy rounds. In the 0.6.0 release we made 2 slight changes to the algorithm. It's clear that the cyclic rounds need to have as many boosting steps as the number of features, since every feature needs to be visited during the round. The greedy rounds however do not need to rigidly obey the same restriction. If there are 100 features, we could allow 90 greedy boosting steps for example. The greedy_ratio sets the ratio of the number of boosting steps in the greedy rounds vs the cyclic rounds. So, for example, the value 1.0 is exactly equivalent to the old greediness value of 0.5, since with a ratio of 1.0 you would get 100 cyclic steps followed by 100 greedy steps intermixed. The default greedy_ratio is now set to 1.5, which means that if there are 100 features there will be 150 greedy boosting steps between cyclic rounds. The algorithm described above tends to work well on many datasets, but let's say for whatever reason you don't want to have any cyclic rounds at all. You can set greedy_ratio to some huge number that basically makes all boosting greedy, but then your algorithm is no longer periodically refreshing the gain calculations though cyclic rounds, which means that the gain calculations for features that aren't being greedily chosen during the greedy rounds are getting more and more stale. For boosting, it takes almost the same amount of work to calculate the gain as it does to calculate the update. To refresh the gain calculations we need to pay this cost, but instead of applying the update, we could just do the calculations and then throw away the update. This is what the cyclic_progress parameter is for. The easiest way to think about this parameter is as a boolean parameter that indicates whether during the cyclic rounds we apply the updates or not. If the updates are not applied and the greedy_ratio is set to a low value, then the algorithm becomes something a lot closer to what XGBoost does, but still retains the additive nature of EBMs. Generally, the current defaults with the greedy_ratio set to 1.5 seems to do well on many datasets, and avoids the enormous computational cost that updating the gains on each boosting step would otherwise require. |
Hi @alvanli -- On the other questions about max_leaves: The max_leaves parameter is only honored when boosting mains currently. For pairs we make one cut in one dimension and then make cuts on each side of the primary cut for the other dimension. At some point we'll implement the obvious improvement that makes pairs cuts more flexible and allows for multiple cuts in each dimension. FAST is currently even more restricted in that it chooses cuts in both dimensions that intersect in a cross. At some point we'll add options to allow for more complex tree growth, but the current FAST algorithm has the benefit of being FAST and easier to implement. For mains, it appears the best value for max_leaves on many/most datasets is 3. 2 is pretty bad actually. 4 tends to be a bit worse than 3, but it's generally pretty close to 3, so I added it as a hyperparameter to get feedback and I could see it being perhaps better on some datasets. Incidentally, I think this property is one of the main advantages that EBMs have over XGBoost and similar algorithms since XGBoost tends to grow the trees until depth 6 which can cause overfitting on the leaf splits. The other benefit that EBMs have over XGBoost is that EBMs consider pair splits jointly, which would be cost prohibitive for unrestricted trees. |
thank you for the always very detailed explanation! I learned a lot reading these tickets 🤗 |
Hi,
Greediness
I saw that in 0.6.0,
greediness
was split intogreedy_ratio
andcyclic_progress
.From my understanding, the original
greediness
mean the proportion of rounds that take the feature split that best minimizes the residual (greedy) vs go through each feature in order (cyclic).boosting cycles that will actively contribute to improving the model's performance
?Max leaves
The text was updated successfully, but these errors were encountered: