treeple.datasets.make_joint_factor_model#

treeple.datasets.make_joint_factor_model(n_views, n_features, n_samples=100, joint_rank=1, noise_std=1, m=1.5, random_state=None, return_decomp=False)[source]#

Joint factor model data generator.

Samples from a low rank, joint factor model where there is one set of shared scores.

Parameters:

n_viewsint: Number of views to sample. This corresponds to B in the notes.
n_featuresint, or list of int: Number of features in each view. A list specifies a different number of features for each view.
n_samplesint: Number of samples in each view
joint_rankint (default 1): Rank of the common signal across views.
noise_stdfloat (default 1): Scale of noise distribution.
mfloat (default 1.5): Signal strength.
random_stateint or RandomState instance, optional (default=None): Controls random orthonormal matrix sampling and random noise generation. Set for reproducible results.
return_decompbool, default=False: If True, returns the view_loadings as well.

Returns:

Xslist of array-likes

List of samples data matrices with the following attributes.

Xs length: n_views
Xs[i] shape: (n_samples, n_features_i).

U: (n_samples, joint_rank)

The true orthonormal joint scores matrix. Returned if return_decomp is True.

view_loadings: list of numpy.ndarray

The true view loadings matrices. Returned if return_decomp is True.

Notes

The data is generated as follows, where:

\(b\) are the different views
\(U\) is is a (n_samples, joint_rank) matrix of rotation matrices.
svals are the singular values sampled.
\(W_b\) are (n_features_b, joint_rank) view loadings matrices, which are
orthonormal matrices to linearly transform the data, while preserving inner products (i.e. a unitary transformation).

For b = 1, .., B: X_b = U @ diag(svals) @ W_b^T + noise_std * E_b

where U and each W_b are orthonormal matrices. The singular values are linearly increasing following [1] section 2.2.3.

References