treeple.datasets.make_gaussian_mixture#
- treeple.datasets.make_gaussian_mixture(centers, covariances, n_samples=100, transform='linear', noise=None, noise_dims=None, class_probs=None, random_state=None, shuffle=False, return_latents=False, add_latent_noise=False)[source]#
Two-view Gaussian mixture model dataset generator.
This creates a two-view dataset from a Gaussian mixture model and a (possibly nonlinear) transformation.
- Parameters:
- centers1D array_like or
list
of 1D array-likes The mean(s) of the Gaussian(s) from which the latent points are sampled. If is a list of 1D array-likes, each is the mean of a distinct Gaussian, sampled from with probability given by
class_probs
. Otherwise is the mean of a single Gaussian from which all are sampled.- covariances2D array_like or
list
of 2D array-likes The covariance matrix(s) of the Gaussian(s), matched to the specified centers.
- n_samples
int
The number of points in each view, divided across Gaussians per
class_probs
.- transform‘linear’ | ‘sin’ | ‘poly’ |
callable()
, (default ‘linear’) Transformation to perform on the latent variable. If a function, applies it to the latent. Otherwise uses an implemented function.
- noise
float
orNone
(default=None) Variance of mean zero Gaussian noise added to the first view.
- noise_dims
int
orNone
(default=None) Number of additional dimensions of standard normal noise to add.
- class_probsarray_like, default=None
A list of probabilities specifying the probability of a latent point being sampled from each of the Gaussians. Must sum to 1. If None, then is taken to be uniform over the Gaussians.
- random_state
int
, default=None If set, can be used to reproduce the data generated.
- shuffle
bool
, default=False If
True
, data is shuffled so the labels are not ordered.- return_latents
bool
(defaultFalse
) If true, returns the non-noisy latent variables.
- add_latent_noise
bool
(defaultFalse
) If true, adds noise to the latent variables before applying the transformation.
- centers1D array_like or
- Returns:
- Xs
list
ofnp.ndarray
, of shape (n_samples, n_features) The latent data and its noisy transformation.
- y
np.ndarray
, shape (n_samples,) The integer labels for each sample’s Gaussian membership.
- latents
np.ndarray
, shape (n_samples, n_features) The non-noisy latent variables. Only returned if
return_latents=True
.
- Xs
Notes
For each class \(i\) with prior probability \(p_i\), center and covariance matrix \(\mu_i\) and \(\Sigma_i\), and \(n\) total samples, the latent data is sampled such that:
\[(X_1, y_1), \dots, (X_{np_i}, Y_{np_i}) \overset{i.i.d.}{\sim} \mathcal{N}(\mu_i, \Sigma_i)\]Two views of data are returned, the first being the latent samples and the second being a specified transformation of the latent samples. Additional noise may be added to the first view or added as noise dimensions to both views.
Examples
>>> from treeple.datasets.multiview import make_gaussian_mixture >>> import numpy as np >>> n_samples = 10 >>> centers = [[0,1], [0,-1]] >>> covariances = [np.eye(2), np.eye(2)] >>> Xs, y = make_gaussian_mixture(n_samples, centers, covariances, ... shuffle=True, shuffle_random_state=42) >>> print(y) [1. 0. 1. 0. 1. 0. 1. 0. 0. 1.]