treeple.experimental.conditional_resample#
- treeple.experimental.conditional_resample(conditional_array, *arrays, nn_estimator=None, replace=True, replace_nbrs=True, n_samples=None, random_state=None, stratify=None)[source]#
Conditionally resample arrays or sparse matrices in a consistent way.
The default strategy implements one step of the bootstrapping procedure. Conditional resampling is a modification of the bootstrap technique that preserves the conditional distribution of the data. This is done by fitting a nearest neighbors estimator on the conditional array and then resampling the nearest neighbors of each sample.
- Parameters:
- conditional_arrayarray_like of shape (n_samples, n_features)
The array, which we preserve the conditional distribution of.
- *arrayssequence of array_like of shape (n_samples,) or (n_samples, n_outputs)
Indexable data-structures can be arrays, lists, dataframes or scipy sparse matrices with consistent first dimension.
- nn_estimatorestimator
object
, default=None The nearest neighbors estimator to use. If None, then a
sklearn.neighbors.NearestNeighbors
instance is used.- replace
bool
, default=True Implements resampling with replacement. If False, this will implement (sliced) random permutations. The replacement will take place at the level of the sample index.
- replace_nbrs
bool
, default=True Implements resampling with replacement at the level of the nearest neighbors.
- n_samples
int
, default=None Number of samples to generate. If left to None this is automatically set to the first dimension of the arrays. If replace is False it should not be larger than the length of arrays.
- random_state
int
,RandomState
instance orNone
, default=None Determines random number generation for shuffling the data. Pass an int for reproducible results across multiple function calls. See Glossary.
- stratifyarray_like of shape (n_samples,) or (n_samples, n_outputs), default=None
If not None, data is split in a stratified fashion, using this as the class labels.
- Returns:
- resampled_arrayssequence of array_like of shape (n_samples,) or (n_samples, n_outputs)
Sequence of resampled copies of the collections. The original arrays are not impacted.