treeple.experimental.conditional_resample#

treeple.experimental.conditional_resample(conditional_array, *arrays, nn_estimator=None, replace=True, replace_nbrs=True, n_samples=None, random_state=None, stratify=None)[source]#

Conditionally resample arrays or sparse matrices in a consistent way.

The default strategy implements one step of the bootstrapping procedure. Conditional resampling is a modification of the bootstrap technique that preserves the conditional distribution of the data. This is done by fitting a nearest neighbors estimator on the conditional array and then resampling the nearest neighbors of each sample.

Parameters:
conditional_arrayarray_like of shape (n_samples, n_features)

The array, which we preserve the conditional distribution of.

*arrayssequence of array_like of shape (n_samples,) or (n_samples, n_outputs)

Indexable data-structures can be arrays, lists, dataframes or scipy sparse matrices with consistent first dimension.

nn_estimatorestimator object, default=None

The nearest neighbors estimator to use. If None, then a sklearn.neighbors.NearestNeighbors instance is used.

replacebool, default=True

Implements resampling with replacement. If False, this will implement (sliced) random permutations. The replacement will take place at the level of the sample index.

replace_nbrsbool, default=True

Implements resampling with replacement at the level of the nearest neighbors.

n_samplesint, default=None

Number of samples to generate. If left to None this is automatically set to the first dimension of the arrays. If replace is False it should not be larger than the length of arrays.

random_stateint, RandomState instance or None, default=None

Determines random number generation for shuffling the data. Pass an int for reproducible results across multiple function calls. See Glossary.

stratifyarray_like of shape (n_samples,) or (n_samples, n_outputs), default=None

If not None, data is split in a stratified fashion, using this as the class labels.

Returns:
resampled_arrayssequence of array_like of shape (n_samples,) or (n_samples, n_outputs)

Sequence of resampled copies of the collections. The original arrays are not impacted.