Improved tree-based learning using prior knowledge of feature group importance and max_features #26361

nxorable · 2023-05-13T12:32:13Z

nxorable
May 13, 2023

Models like GradientBoosting and RandomForest currently do not allow any prior feature importance to be encoded, yet this could improve model performance.

Use case: learning from panel data augmented by large embeddings

Currently, an existing best practice is to reduce the embedding size with transformations like PCA so that they do not dominate the model's features (too much noise for small datasets), but information is lost. Better would be a learned hyperparameter, which impacts how frequently these features of the data are considered using max_features.

Approach

This could be made possible simply by increasing the probability of selecting these features when considering features and max_features < n_features.

For linear models, the prior feature importance of subsets of features can be expressed prior to fitting by scaling features. For example, ColumnTransformer allows transformer_weights to be specified.

How can we similarly make tree based models pay more attention to certain features, when we know that tree based models like RandomForest do not depend on feature scaling? The simplest way appears to be with feature-wise subsampling (non-random selection of features to consider for splitting) during fit.

Implementation outline

Option 1: As noted above, we can already scale feature sizes using ColumnTransformer. Extend its usefulness to tree-based models by allowing estimators to infer feature importance using the relative average variance of each feature. Add a boolean property (default=False) to enable this behavior.
Option 2: Extend the tree model's properties by allowing feature_weights to be explicitly specified with a list corresponding to the number of features. The probability of subsampling each feature for each tree is in proportion to the specified relative feature_weights.

Integration consideration

To benefit from sklearn's hyperparameter search optimization abilities (e.g. GridSearchCV), one approach would be simply to allow the user to specify as a prior the different categories of features whose relative category-level importance needs to be learned during fit, each with a reasonable weight prior over their feature category groups. Categories are already naturally partitioned when ColumnTransformer is used, so an elegant implementation perhaps would use this information. Note the distinction here: the user would merely specify a prior over what are considered to be different feature categories, vs specifying their importance; a natural prior, for instance would be to specify "embeddings" (e.g. for a 1024-length feature vector) and "other" (for traditional panel data also included as part of X).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved tree-based learning using prior knowledge of feature group importance and max_features #26361

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Improved tree-based learning using prior knowledge of feature group importance and max_features #26361

nxorable May 13, 2023

Replies: 0 comments

nxorable
May 13, 2023