sktree.stats.PermutationForestClassifier#

class sktree.stats.PermutationForestClassifier(estimator=None, test_size=0.2, random_state=None, verbose=0)[source]#

Hypothesis testing of covariates with a permutation forest classifier.

This implements permutation testing of a null hypothesis using a random forest. The null hypothesis is generated by permuting n_repeats times the covariate indices and then a random forest is trained for each permuted instance. This is compared to the original random forest that was computed on the regular non-permuted data.

Warning

Permutation testing with forests is computationally expensive. As a result, if you are testing for the importance of feature sets, consider using sktree.FeatureImportanceForestRegressor or sktree.FeatureImportanceForestClassifier instead, which is much more computationally efficient.

Note

This does not allow testing on the posteriors.

Parameters:
estimatorobject, default=None

Type of forest estimator to use. By default None, which defaults to sklearn.ensemble.RandomForestClassifier.

test_sizefloat, default=0.2

The proportion of samples to leave out for each tree to compute metric on.

random_stateint, RandomState instance or None, default=None

Controls both the randomness of the bootstrapping of the samples used when building trees (if bootstrap=True) and the sampling of the features to consider when looking for the best split at each node (if max_features < n_features). See Glossary for details.

verboseint, default=0

Controls the verbosity when fitting and predicting.

Attributes:
samples_ArrayLike of shape (n_samples,)

The indices of the samples used in the final test.

y_true_ArrayLike of shape (n_samples_final,)

The true labels of the samples used in the final test.

posterior_ArrayLike of shape (n_samples_final, n_outputs)

The predicted posterior probabilities of the samples used in the final test.

null_dist_ArrayLike of shape (n_repeats,)

The null distribution of the test statistic.

posterior_null_ArrayLike of shape (n_samples_final, n_outputs, n_repeats)

The posterior probabilities of the samples used in the final test for each permutation for the null distribution.

Methods

statistic(X, y[, covariate_index, metric, ...])

Compute the test statistic.

test(X, y, covariate_index[, metric, ...])

Perform hypothesis test using permutation testing.

reset

statistic(X, y, covariate_index=None, metric='mse', return_posteriors=False, check_input=True, seed=None, **metric_kwargs)#

Compute the test statistic.

Parameters:
XArrayLike of shape (n_samples, n_features)

The data matrix.

yArrayLike of shape (n_samples, n_outputs)

The target matrix.

covariate_indexArrayLike, optional of shape (n_covariates,)

The index array of covariates to shuffle, by default None.

metricstr, optional

The metric to compute, by default “mse”.

return_posteriorsbool, optional

Whether or not to return the posteriors, by default False.

check_inputbool, optional

Whether or not to check the input, by default True.

seedint, optional

The random seed to use, by default None.

**metric_kwargsdict, optional

Keyword arguments to pass to the metric function.

Returns:
statfloat

The test statistic.

posterior_finalArrayLike of shape (n_samples_final, n_outputs), optional

If return_posteriors is True, then the posterior probabilities of the samples used in the final test. n_samples_final is equal to n_samples if all samples are encountered in the test set of at least one tree in the posterior computation.

samplesArrayLike of shape (n_samples_final,), optional

The indices of the samples used in the final test. n_samples_final is equal to n_samples if all samples are encountered in the test set of at least one tree in the posterior computation.

test(X, y, covariate_index, metric='mse', n_repeats=1000, return_posteriors=False, **metric_kwargs)#

Perform hypothesis test using permutation testing.

Parameters:
XArrayLike of shape (n_samples, n_features)

The data matrix.

yArrayLike of shape (n_samples, n_outputs)

The target matrix.

covariate_indexArrayLike of shape (n_covariates,)

The covariate indices of X to shuffle.

metricstr, optional

Metric to compute, by default “mse”.

n_repeatsint, optional

Number of times to sample the null distribution, by default 1000.

return_posteriorsbool, optional

Whether or not to return the posteriors, by default False.

**metric_kwargsdict, optional

Keyword arguments to pass to the metric function.

Returns:
observe_statfloat

Observed test statistic.

pvaluefloat

Pvalue of the test.

property train_test_samples_#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.