treeple.datasets.make_gaussian_mixture#

treeple.datasets.make_gaussian_mixture(centers, covariances, n_samples=100, transform='linear', noise=None, noise_dims=None, class_probs=None, random_state=None, shuffle=False, return_latents=False, add_latent_noise=False)[source]#

Two-view Gaussian mixture model dataset generator.

This creates a two-view dataset from a Gaussian mixture model and a (possibly nonlinear) transformation.

Parameters:
centers1D array_like or list of 1D array-likes

The mean(s) of the Gaussian(s) from which the latent points are sampled. If is a list of 1D array-likes, each is the mean of a distinct Gaussian, sampled from with probability given by class_probs. Otherwise is the mean of a single Gaussian from which all are sampled.

covariances2D array_like or list of 2D array-likes

The covariance matrix(s) of the Gaussian(s), matched to the specified centers.

n_samplesint

The number of points in each view, divided across Gaussians per class_probs.

transform‘linear’ | ‘sin’ | ‘poly’ | callable(), (default ‘linear’)

Transformation to perform on the latent variable. If a function, applies it to the latent. Otherwise uses an implemented function.

noisefloat or None (default=None)

Variance of mean zero Gaussian noise added to the first view.

noise_dimsint or None (default=None)

Number of additional dimensions of standard normal noise to add.

class_probsarray_like, default=None

A list of probabilities specifying the probability of a latent point being sampled from each of the Gaussians. Must sum to 1. If None, then is taken to be uniform over the Gaussians.

random_stateint, default=None

If set, can be used to reproduce the data generated.

shufflebool, default=False

If True, data is shuffled so the labels are not ordered.

return_latentsbool (default False)

If true, returns the non-noisy latent variables.

add_latent_noisebool (default False)

If true, adds noise to the latent variables before applying the transformation.

Returns:
Xslist of np.ndarray, of shape (n_samples, n_features)

The latent data and its noisy transformation.

ynp.ndarray, shape (n_samples,)

The integer labels for each sample’s Gaussian membership.

latentsnp.ndarray, shape (n_samples, n_features)

The non-noisy latent variables. Only returned if return_latents=True.

Notes

For each class \(i\) with prior probability \(p_i\), center and covariance matrix \(\mu_i\) and \(\Sigma_i\), and \(n\) total samples, the latent data is sampled such that:

\[(X_1, y_1), \dots, (X_{np_i}, Y_{np_i}) \overset{i.i.d.}{\sim} \mathcal{N}(\mu_i, \Sigma_i)\]

Two views of data are returned, the first being the latent samples and the second being a specified transformation of the latent samples. Additional noise may be added to the first view or added as noise dimensions to both views.

Examples

>>> from treeple.datasets.multiview import make_gaussian_mixture
>>> import numpy as np
>>> n_samples = 10
>>> centers = [[0,1], [0,-1]]
>>> covariances = [np.eye(2), np.eye(2)]
>>> Xs, y = make_gaussian_mixture(n_samples, centers, covariances,
...                               shuffle=True, shuffle_random_state=42)
>>> print(y)
[1. 0. 1. 0. 1. 0. 1. 0. 0. 1.]