sktree.stats
.PermutationForestRegressor#
- class sktree.stats.PermutationForestRegressor(estimator=None, test_size=0.2, random_state=None, verbose=0)[source]#
Hypothesis testing of covariates with a permutation forest regressor.
This implements permutation testing of a null hypothesis using a random forest. The null hypothesis is generated by permuting
n_repeats
times the covariate indices and then a random forest is trained for each permuted instance. This is compared to the original random forest that was computed on the regular non-permuted data.Warning
Permutation testing with forests is computationally expensive. As a result, if you are testing for the importance of feature sets, consider using
sktree.FeatureImportanceForestRegressor
orsktree.FeatureImportanceForestClassifier
instead, which is much more computationally efficient.Note
This does not allow testing on the posteriors.
- Parameters:
- estimator
object
, default=None Type of forest estimator to use. By default
None
, which defaults tosklearn.ensemble.RandomForestRegressor
with default parameters.- test_size
float
, default=0.2 The proportion of samples to leave out for each tree to compute metric on.
- random_state
int
,RandomState
instance orNone
, default=None Controls both the randomness of the bootstrapping of the samples used when building trees (if
bootstrap=True
) and the sampling of the features to consider when looking for the best split at each node (ifmax_features < n_features
). See Glossary for details.- verbose
int
, default=0 Controls the verbosity when fitting and predicting.
- estimator
- Attributes:
- samples_ArrayLike of shape (n_samples,)
The indices of the samples used in the final test.
- y_true_ArrayLike of shape (n_samples_final,)
The true labels of the samples used in the final test.
- posterior_ArrayLike of shape (n_samples_final, n_outputs)
The predicted posterior probabilities of the samples used in the final test.
- null_dist_ArrayLike of shape (n_repeats,)
The null distribution of the test statistic.
- posterior_null_ArrayLike of shape (n_samples_final, n_outputs, n_repeats)
The posterior probabilities of the samples used in the final test for each permutation for the null distribution.
Methods
statistic
(X, y[, covariate_index, metric, ...])Compute the test statistic.
test
(X, y, covariate_index[, metric, ...])Perform hypothesis test using permutation testing.
reset
- statistic(X, y, covariate_index=None, metric='mse', return_posteriors=False, check_input=True, seed=None, **metric_kwargs)#
Compute the test statistic.
- Parameters:
- XArrayLike of shape (n_samples, n_features)
The data matrix.
- yArrayLike of shape (n_samples, n_outputs)
The target matrix.
- covariate_indexArrayLike, optional of shape (n_covariates,)
The index array of covariates to shuffle, by default None.
- metric
str
, optional The metric to compute, by default “mse”.
- return_posteriors
bool
, optional Whether or not to return the posteriors, by default False.
- check_input
bool
, optional Whether or not to check the input, by default True.
- seed
int
, optional The random seed to use, by default None.
- **metric_kwargs
dict
, optional Keyword arguments to pass to the metric function.
- Returns:
- stat
float
The test statistic.
- posterior_finalArrayLike of shape (n_samples_final, n_outputs), optional
If
return_posteriors
is True, then the posterior probabilities of the samples used in the final test.n_samples_final
is equal ton_samples
if all samples are encountered in the test set of at least one tree in the posterior computation.- samplesArrayLike of shape (n_samples_final,), optional
The indices of the samples used in the final test.
n_samples_final
is equal ton_samples
if all samples are encountered in the test set of at least one tree in the posterior computation.
- stat
- test(X, y, covariate_index, metric='mse', n_repeats=1000, return_posteriors=False, **metric_kwargs)#
Perform hypothesis test using permutation testing.
- Parameters:
- XArrayLike of shape (n_samples, n_features)
The data matrix.
- yArrayLike of shape (n_samples, n_outputs)
The target matrix.
- covariate_indexArrayLike of shape (n_covariates,)
The covariate indices of
X
to shuffle.- metric
str
, optional Metric to compute, by default “mse”.
- n_repeats
int
, optional Number of times to sample the null distribution, by default 1000.
- return_posteriors
bool
, optional Whether or not to return the posteriors, by default False.
- **metric_kwargs
dict
, optional Keyword arguments to pass to the metric function.
- Returns:
- property train_test_samples_#
The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.