treeple.stats.build_coleman_forest#

treeple.stats.build_coleman_forest(est, perm_est, X, y, covariate_index=None, metric='s@98', n_repeats=10000, verbose=False, seed=None, return_posteriors=True, use_sparse=False, **metric_kwargs)[source]#

Build a hypothesis testing forest using a two-forest approach.

The two-forest approach stems from the Coleman et al. 2022 paper, where two forests are trained: one on the original dataset, and one on the permuted dataset. The dataset is either permuted once, or independently for each tree in the permuted forest. The original test statistic is computed by comparing the metric on both forests (metric_forest - metric_perm_forest). For full details, see [1].

Parameters:

estForest: The type of forest to use. Must be enabled with bootstrap=True.
perm_estForest: The forest to use for the permuted dataset.
XArrayLike of shape (n_samples, n_features): Data.
yArrayLike of shape (n_samples, n_outputs): Binary target, so n_outputs should be at most 1.
covariate_indexArrayLike, optional of shape (n_covariates,): The index array of covariates to shuffle, by default None, which defaults to all covariates.
metricstr, optional: The metric to compute, by default “s@98”, for sensitivity at 98% specificity.
n_repeatsint, optional: Number of times to bootstrap sample the two forests to construct the null distribution, by default 10000. The construction of the null forests will be parallelized according to the n_jobs argument of the est forest.
verbosebool, optional: Verbosity, by default False.
seedint, optional: Random seed, by default None.
return_posteriorsbool, optional: Whether or not to return the posteriors, by default True.
use_sparsebool, optional: Whether or not to use a sparse for the calculation of the permutation statistics, by default False. Doesn’t affect return values.
**metric_kwargsdict, optional: Additional keyword arguments to pass to the metric function.

Returns:

observe_statfloat: The test statistic. To compute the test statistic, take permute_stat_ and subtract observe_stat_.
pvaluefloat: The p-value of the test statistic.
orig_forest_probaArrayLike of shape (n_estimators, n_samples, n_outputs): The predicted posterior probabilities for each estimator on their out of bag samples.
perm_forest_probaArrayLike of shape (n_estimators, n_samples, n_outputs): The predicted posterior probabilities for each of the permuted estimators on their out of bag samples.
null_distArrayLike of shape (n_repeats,): The null statistic differences from permuted forests.

References

Examples using `treeple.stats.build_coleman_forest`#

Calculating p-value (MIGHT)

Calculating p-value with multiview data (CoMIGHT)

treeple.stats.build_coleman_forest#

Examples using treeple.stats.build_coleman_forest#

Examples using `treeple.stats.build_coleman_forest`#