treeple.datasets.make_joint_factor_model#

treeple.datasets.make_joint_factor_model(n_views, n_features, n_samples=100, joint_rank=1, noise_std=1, m=1.5, random_state=None, return_decomp=False)[source]#

Joint factor model data generator.

Samples from a low rank, joint factor model where there is one set of shared scores.

Parameters:
n_viewsint

Number of views to sample. This corresponds to B in the notes.

n_featuresint, or list of int

Number of features in each view. A list specifies a different number of features for each view.

n_samplesint

Number of samples in each view

joint_rankint (default 1)

Rank of the common signal across views.

noise_stdfloat (default 1)

Scale of noise distribution.

mfloat (default 1.5)

Signal strength.

random_stateint or RandomState instance, optional (default=None)

Controls random orthonormal matrix sampling and random noise generation. Set for reproducible results.

return_decompbool, default=False

If True, returns the view_loadings as well.

Returns:
Xslist of array-likes

List of samples data matrices with the following attributes.

  • Xs length: n_views

  • Xs[i] shape: (n_samples, n_features_i).

U: (n_samples, joint_rank)

The true orthonormal joint scores matrix. Returned if return_decomp is True.

view_loadings: list of numpy.ndarray

The true view loadings matrices. Returned if return_decomp is True.

Notes

The data is generated as follows, where:

  • \(b\) are the different views

  • \(U\) is is a (n_samples, joint_rank) matrix of rotation matrices.

  • svals are the singular values sampled.

  • \(W_b\) are (n_features_b, joint_rank) view loadings matrices, which are

    orthonormal matrices to linearly transform the data, while preserving inner products (i.e. a unitary transformation).

For b = 1, .., B

X_b = U @ diag(svals) @ W_b^T + noise_std * E_b

where U and each W_b are orthonormal matrices. The singular values are linearly increasing following [1] section 2.2.3.

References