API Documentation#
Scikit manifold oblique random forests.
Supervised#
Decision-tree models are traditionally implemented with axis-aligned splits and storing the mean outcome (i.e. label) vote in the leaf nodes. However, more exotic splits are possible, called “oblique” splits, which are some function of multiple feature columns to create a “new feature value” to split on.
This can take the form of a random (sparse) linear combination of feature columns, or even take advantage of the structure in the data (e.g. if it is an image) to sample feature indices in a manifold-aware fashion. This class of models generalizes the splitting function in the trees, while everything else is consistent with how scikit-learn builds trees.
An oblique random forest classifier. |
|
|
An oblique random forest regressor. |
A patch-oblique random forest classifier. |
|
A patch-oblique random forest regressor. |
|
|
A forest classifier with honest leaf estimates. |
|
A decision tree classifier. |
|
An oblique decision tree Regressor. |
|
A oblique decision tree classifier that operates over patches of data. |
|
A oblique decision tree regressor that operates over patches of data. |
|
A decision tree classifier with honest predictions. |
Unsupervised#
Decision-tree models are traditionally used for classification and regression.
However, they are also powerful non-parametric embedding and clustering models.
The RandomTreesEmbedding
is an example of unsupervised
tree model. We implement other state-of-the-art models that explicitly split based
on unsupervised criterion such as variance and BIC.
|
Unsupervised random forest. |
Unsupervised oblique random forest. |
The trees that comprise those forests are also available as standalone classes.
|
Unsupervised decision tree. |
|
Unsupervised oblique decision tree. |
Distance Metrics#
Trees inherently produce a “distance-like” metric. We provide an API for extracting pairwise distances from the trees that include a correction that turns the “tree-distance” into a proper distance metric.
|
Compute the similarity matrix of samples in X using a trained forest. |
In addition to providing a distance metric based on leaves, tree-models
provide a natural way to compute neighbors based on the splits. We provide
an API for extracting the nearest neighbors from a tree-model. This provides
an API-like interface similar to NearestNeighbors
.
|
Meta-estimator for nearest neighbors. |
Experimental Functionality#
We also include experimental functionality that is works in progress.
|
Compute the generalized (conditional) mutual information KSG estimate. |
We also include functions that help simulate and evaluate mutual information (MI) and conditional mutual information (CMI) estimators. Specifically, functions that help simulate multivariate gaussian data and compute the analytical solutions for the entropy, MI and CMI of the Gaussian distributions.
|
Multivariate gaussian simulation for testing entropy and MI estimators. |
|
Simulate data from a helix. |
|
Simulate samples generated on a sphere. |
|
Compute mutual information of a multivariate Gaussian. |
|
Compute the analytical CMI for a multivariate Gaussian distribution. |
|
Compute entropy of a multivariate Gaussian. |