3. Oblique Random Forests#

In oblique random forests (see ObliqueRandomForestClassifier and ObliqueRandomForestRegressor classes), each tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) from the training set. The oblique random forest is the same as that of a random forest, except in how the splits are computed in each tree.

Similar to how random forests achieve a reduced variance by combining diverse trees, sometimes at the cost of a slight increase in bias, oblique random forests aim to do the same. They are motivated to construct even more diverse trees, thereby improving model generalization. In practice the variance reduction is often significant hence yielding an overall better model.

In contrast to the original publication [1], the scikit-learn implementation allows the user to control the number of features to combine in computing candidate splits. This is done via the feature_combinations parameter. For more information and intuition, see documentation on oblique decision trees.

4. Feature importance evaluation#

The relative rank (i.e. depth) of a feature used as a decision node in a tree can be used to assess the relative importance of that feature with respect to the predictability of the target variable. Features used at the top of the tree contribute to the final prediction decision of a larger fraction of the input samples. The expected fraction of the samples they contribute to can thus be used as an estimate of the relative importance of the features. In scikit-tree, the fraction of samples a feature contributes to is combined with the decrease in impurity from splitting them to create a normalized estimate of the predictive power of that feature. This is essentially exactly the same it is done in scikit-learn.

By averaging the estimates of predictive ability over several randomized trees one can reduce the variance of such an estimate and use it for feature selection. This is known as the mean decrease in impurity, or MDI. Refer to [L2014] for more information on MDI and feature importance evaluation with Random Forests. We implement the approach taken in [2] and [3].

Warning

The impurity-based feature importances computed on tree-based models suffer from two flaws that can lead to misleading conclusions. First they are computed on statistics derived from the training dataset and therefore do not necessarily inform us on which features are most important to make good predictions on held-out dataset. Secondly, they favor high cardinality features, that is features with many unique values. Permutation feature importance is an alternative to impurity-based feature importance that does not suffer from these flaws. These two methods of obtaining feature importance are explored in: Permutation Importance vs Random Forest Feature Importance (MDI).

In practice those estimates are stored as an attribute named feature_importances_ on the fitted model. This is an array with shape (n_features,) whose values are positive and sum to 1.0. The higher the value, the more important is the contribution of the matching feature to the prediction function.