Exploratory analysis on volume correlations

Exploratory analysis on volume correlations#

Plot the data#

# Load the data

volume_correlations, labels = load_volume_corr()

meta = load_vertex_df()

volume_correlations = np.array([rank(v) for v in np.abs(volume_correlations)])

_images/97fe6eec26217e3d7bf8d39b2d06d1fda170354a3528d77c818d780485e350f2.png

Vectorize matrix and compute kruskal-wallis#

We use KW test for speed since computing large distance distance matrices can be difficult to compute.

# use kruskal-wallis for speed

idx = np.triu_indices_from(volume_correlations[0], k=1)

kruskal(*[c[idx] for c in volume_correlations])

KruskalResult(statistic=0.0, pvalue=1.0)

Try apriori community#

vertex_hemispheres = meta.Hemisphere.values
vertex_structures = meta.Level_1.values
vertex_hemisphere_structures = (meta.Hemisphere + "-" + meta.Level_1).values
vertex_hemisphere_substructures = (meta.Hemisphere + "-" + meta.Level_1 + "-" + meta.Level_2).values

volume_ksample = [
    run_ksample(volume_correlations, labels, idx, test="kruskal", absolute=True)
    for idx, labels in enumerate(
        [
            vertex_hemispheres,
            vertex_structures,
            vertex_hemisphere_structures,
            vertex_hemisphere_substructures,
        ]
    )
]

volume_ksample = pd.concat(volume_ksample, ignore_index=True)

volume_ksample.to_csv(
    "../results/outputs/ranked_volume_correlation_3sample_apriori.csv", index=False
)

sns.set_context("talk", font_scale=0.5)

fig, _ = plot_heatmaps(volume_ksample, cbar=True)
fig.savefig("./figures/apriori_ksample_ranked_volume.pdf")


fig, _ = plot_heatmaps(volume_ksample, cbar=False, ranked_pvalue=True)
fig.savefig("./figures/apriori_ksample_ranked_volume_ranked_pval.pdf")

_images/2c63d736f658244b88f97a0c44b7f8ec0c287f9f7d40010cc21137604b80adba.png

_images/03b9c6f987f8f3e62e00345943f72db7c9e2d8d5d6189b6696a87ffd21905644.png

Do FA#

fa_correlations, labels = load_fa_corr()

fa_correlations = np.array([rank(v) for v in np.abs(fa_correlations)])

fa_ksample = [
    run_ksample(fa_correlations, labels, idx, test="manova", absolute=True)
    for idx, labels in enumerate(
        [
            vertex_hemispheres,
            vertex_structures,
            vertex_hemisphere_structures,
            vertex_hemisphere_substructures,
        ]
    )
]

fa_ksample = pd.concat(fa_ksample, ignore_index=True)
fa_ksample.to_csv("../results/outputs/ranked_fa_correlation_3sample_apriori.csv", index=False)

fig, _ = plot_heatmaps(fa_ksample, cbar=True)
fig.savefig("./figures/apriori_ksample_ranked_fa.pdf")


fig, _ = plot_heatmaps(fa_ksample, cbar=False, ranked_pvalue=True)
fig.savefig("./figures/apriori_ksample_ranked_fa_ranked_pval.pdf")

_images/7ed09171d1db07591a133b1e436e83412e5b71aca994d1ba7b116e9366014bb9.png

_images/96de24bb6508dc329d9220073dae2257b8f19a24e47246273dd437ccec9596ad.png

Statistical Experiment#

For a given pair of sub-regions (Left/Right Forebrain, Midbrain, Hindbrain, White Matter Tracts) \(k\) and \(l\), do the edges incident nodes between a pair of sub-regions have the same, or a different, distribution? Formally, consider the following model. Let \(a_{ij}^{(y)}\) be the edge-weight for edge \(i, j\), and let \(z_i \in [K]\) be the node label for node \(i\), where \(y \in \{APOE22, APOE33, APOE44\}\) is the class of the network:

\[\begin{align*} a_{ij}^{(y)} | z_i = k, z_j = l \overset{ind.}{\sim} F^{(y)}_{k,l} \end{align*}\]

where \(F^{(y)}_{k,l}\) is the distribution function for the community of edges \(k\) and \(l\) in a network of class \(y\). For a given tuple of node communities \((k,l)\) and \((k',l')\), the hypothesis of interest is:

\[\begin{align*} H_{0, k, l}^{(y, y')} : F^{(y)}_{k,l} = F^{(y')}_{k,l} \text{ against }H^{(y)}_{A, k,l}: F^{(y)}_{k,l} = F^{(y')}_{k,l} \end{align*}\]

The interpretation of a \(p\)-value less than the cutoff threshold \(\alpha = 0.05\) (after Bonferroni-Holm adjustment) for a given pair of classes \((y, y')\) at a given community pair \((k, l)\) is that the data does not support the null hypothesis, that the community pairing shares an equal distribution between the indicated pair of classes. A sufficient test for this context (univariate data, assumed to be independent, paired) is the Wilcoxon Signed-Rank Test, which can be performed using scipy.

The outcomes (p-values) can be visualized as pairs of heatmaps between a given pair of classes. Further, since the edges are undirected, we can ignore the off-diagonals of the matrix. These outcomes are then ranked, where a large rank indicates the smaller p-values.

volume_pairwise = run_pairwise(
    volume_correlations,
    GENOTYPES,
    vertex_hemisphere_substructures,
    absolute=False,
    test="mannwhitney",
)

fig, _ = plot_pairwise(volume_pairwise, volume_ksample)
fig.savefig("./figures/apriori_pairwise_ranked_volume.pdf")

fa_pairwise = run_pairwise(
    fa_correlations,
    GENOTYPES,
    vertex_hemisphere_substructures,
    absolute=False,
    test="mannwhitney",
)

fig, _ = plot_pairwise(fa_pairwise, fa_ksample)

fig.savefig("./figures/apriori_pairwise_ranked_fa.pdf")