Kernels
Contents
Kernels¶
We build our simple kernels using global, node-wise, and edge-wise properties of the network. Both datasets have the shape
For Matched Networks¶
Attached is the code for calculating the dissimilarity matrix using different kernels for matched networks.
Click to show
# HIDE CODE
import numpy as np
def calculate_dissim(graphs, method="density", norm=None, normalize=True):
""" Calculate the dissimilarity matrix using the input kernel. """
glob = False
node = False
edge = False
if method == "density":
glob = True
num_nodes = graphs.shape[1]
num_edges_possible = num_nodes ** 2 - num_nodes
metric = np.zeros(len(graphs))
for i, graph in enumerate(graphs):
num_edges = np.count_nonzero(graph)
metric[i] = num_edges / num_edges_possible
elif method == "avgedgeweight":
glob = True
metric = np.zeros(len(graphs))
for i, graph in enumerate(graphs):
num_edges = np.count_nonzero(graph)
metric[i] = np.sum(graph) / num_edges
elif method == "avgadjmatrix":
glob = True
metric = np.zeros(len(graphs))
for i, graph in enumerate(graphs):
metric[i] = np.average(graph)
elif method == "degree":
node = True
metric = np.zeros((graphs.shape[0], graphs.shape[1]))
for i, graph in enumerate(graphs):
for j, row in enumerate(graph):
metric[i, j] = np.count_nonzero(row)
elif method == "strength":
node = True
metric = np.zeros((graphs.shape[0], graphs.shape[1]))
for i, graph in enumerate(graphs):
for j, row in enumerate(graph):
metric[i, j] = np.sum(row)
elif method == "edgeweight":
edge = True
metric = graphs
else:
print("Not a valid kernel name.")
dissim_matrix = np.zeros((len(graphs), len(graphs)))
for i, metric1 in enumerate(metric):
for j, metric2 in enumerate(metric):
if glob and norm == None:
diff = np.abs(metric1- metric2)
elif (node or edge) and norm == "l1":
diff = np.linalg.norm(metric1 - metric2, ord=1)
elif (node or edge) and norm == "l2":
diff = np.linalg.norm(metric1 - metric2, ord=2)
else:
print("L1, L2 norms only apply to node or edge-wise kernels.")
dissim_matrix[j, i] = diff
if normalize:
dissim_matrix = dissim_matrix / np.max(dissim_matrix)
return dissim_matrix
Global Properties¶
For each of these kernels, each element of the
Density¶
Graph density refers to the ratio between the number of edges present in the graph and the number of possible edges and is calculated by
where
Average Edge Weight¶
To calculate average edge weight, we sum all the nonzero values in the adjacency matrix and divide by the number of edges present.
Average of the Adjacency Matrix¶
To calculate the average of the adjacency matrix, we sum all the nonzero values in the adjacency matrix and divide by the total number of possible edges. Note that if the graph is unweighted, the average of the adjacency matrix would be the same as graph density.
where
Node-wise Properties¶
For each of these kernels, two
Note
The
where
Note
The
where
Node Degree¶
The degree of a node is the number of edges a node has. In an
Node Strength¶
The strength of a node is the sum of the weights of the edges a node has. In an
Edge-wise Properties¶
Similar to the kernels above, two
Note
The
Note
The
where
Edge Weight¶
For each pair of
For Unmatched Networks¶
The kernels we use above assume the nodes are matched. However, when we have no known matching of the nodes, we cannot use kernels based on the node-wise or edge-wise properties of the network because we cannot calculate a difference vector or matrix directly and calculate its
Click to show
# HIDE CODE
from scipy.stats import ks_2samp
def calculate_dissim_unmatched(graphs, method="degree", normalize=True):
""" Calculate the dissimilarity matrix using the input kernel. """
if method == "degree":
metric = np.zeros((graphs.shape[0], graphs.shape[1]))
for i, graph in enumerate(graphs):
for j, node in enumerate(graph):
metric[i, j] = np.count_nonzero(node)
elif method == "strength":
metric = np.zeros((graphs.shape[0], graphs.shape[1]))
for i, graph in enumerate(graphs):
for j, node in enumerate(graph):
metric[i, j] = np.sum(node)
elif method == "edgeweight":
metric = []
for graph in graphs:
metric.append(np.ravel(np.nonzero(graph)))
else:
print("Not a valid kernel name.")
dissim_matrix = np.zeros((len(graphs), len(graphs)))
for i, metric1 in enumerate(metric):
for j, metric2 in enumerate(metric):
diff, _ = ks_2samp(np.array(metric1), np.array(metric2), mode='asymp')
dissim_matrix[i, j] = diff
if normalize:
dissim_matrix = dissim_matrix / np.max(dissim_matrix)
return dissim_matrix
Established Kernels¶
The following kernels use existing algorithms and/or equations that are much more complex and computationally expensive than those described above. We use them as a point of comparison for the performance for our “simple” kernels.
Laplacian Spectral Distance¶
We compute the I-DAD Laplacian of each graph, which is calculated by
where
Graphs can often be characterized by its Laplacian eigenspectrum, which makes this a useful metric to consider when assessing similarity between graphs [Das, 2004]. The I-DAD Laplacian is particularly useful because all its eigenvalues are real, and for undirected graphs, all eigenvalues are guaranteed to be in the range [0, 2] [Suárez et al., 2022].
We implement the Laplacian spectral kernel by applying pass-to-ranks transformations on the networks, sorting the eigenvalues of the graph Laplacian without any smoothing, then calculating the
Note
Using the code attached below, there are several ways we can derive and use the Laplacian eigenspectrum. We can transform the graphs through pass-to-ranks or binarization before finding its graph Laplacian, and we can simply sort its eigenvalues or smooth them using a Gaussian kernel to construct the eigenspectrum. We also have multiple ways to quantify dissimilarity, such as calculating the cosine difference between two eigenspectra, or by finding the
We use the Laplacian spectral kernel for both matched and unmatched networks.
Click to show
# HIDE CODE
import numpy as np
from graspologic.utils import pass_to_ranks, to_laplacian
from sklearn.neighbors import KernelDensity
from scipy.spatial.distance import cosine
def laplacian_dissim(graphs, transform: str=None, metric: str='l2', smooth_eigvals: bool=False, \
normalize=True):
if transform == 'pass-to-ranks':
for i, graph in enumerate(graphs):
graph = pass_to_ranks(graph)
graphs[i] = graph
elif transform == 'binarize':
graphs[graphs != 0] = 1
elif transform == None:
graphs = graphs
else:
print('Supported transformations are "pass-to-ranks" (simple-nonzero), "binarize", or None.')
eigs = []
for i, graph in enumerate(graphs):
# calculate laplacian
lap = to_laplacian(graph, 'I-DAD')
# find and sort eigenvalues
w = np.linalg.eigvals(lap)
w = np.sort(w)
if smooth_eigvals:
kde = KernelDensity(kernel='gaussian', bandwidth=0.015).fit(w.reshape(-1, 1))
xs = np.linspace(0, 2, 2000)
xs = xs[:, np.newaxis]
log_dens = kde.score_samples(xs)
eigs.append(np.exp(log_dens))
else:
eigs.append(w)
dissim_matrix = np.zeros((len(graphs), len(graphs)))
for i, eig1 in enumerate(eigs):
for j, eig2 in enumerate(eigs):
if metric == 'cosine':
diff = cosine(eig1, eig2)
elif metric == 'l1':
diff = np.linalg.norm(eig1 - eig2, ord=1)
elif metric == 'l2':
diff = np.linalg.norm(eig1 - eig2, ord=2)
dissim_matrix[i, j] = diff
if normalize:
dissim_matrix = dissim_matrix / np.max(dissim_matrix)
return dissim_matrix
For Matched Networks - Omnibus Embedding¶
We generate omnibus embeddings of the matched networks, then calculate the Frobenius norm of the difference between two embeddings to construct the dissimilarity matrix.
Click to show
# HIDE CODE
from graspologic.embed import OmnibusEmbed
# embed using Omni
embedder = OmnibusEmbed(n_elbows=4)
omni_embedding = embedder.fit_transform(graphs)
omni_matrix = np.zeros((len(graphs), len(graphs)))
for i, embedding1 in enumerate(omni_embedding):
for j, embedding2 in enumerate(omni_embedding):
dist = np.linalg.norm(embedding1 - embedding2, ord="fro")
omni_matrix[i, j] = dist
# scale values from 0 to 1
scaled_omni_dissim = omni_matrix / np.max(omni_matrix)
For Unmatched Networks - Latent Distribution Test Statistic¶
The latent distribution test is a two-sample hypothesis test for whether or not two graphs have the same distributions of latent positions. This test does not assume any known matching, as it can use alignment techniques such as seedless-procrustes to find the best orthogonal alignment between two embeddings.
First we generate adjacency spectral embeddings of the largest connected component of the unmatched graphs, then calculate the latent distribution test statistic for every pair to construct the dissimilarity matrix using parameters shown in the code below.
Click to show
# HIDE CODE
from graspologic.inference import latent_distribution_test
from graspologic.embed import AdjacencySpectralEmbed
from graspologic.utils import largest_connected_component
from tqdm import tqdm
import warnings
warnings.filterwarnings("ignore")
ase_graphs = []
for i, graph in enumerate(graphs):
lcc_graph = largest_connected_component(graph)
ase_graph = AdjacencySpectralEmbed(n_components=4).fit_transform(lcc_graph)
ase_graphs.append(ase_graph)
# calculate alignments
latent_dissim = np.zeros((len(ase_graphs), len(ase_graphs)))
Qs = []
for j in tqdm(range(0, len(ase_graphs) - 1)):
for i in range(j+1, len(ase_graphs)):
statistic, _, misc_dict = latent_distribution_test(ase_graphs[i], ase_graphs[j], test='mgc', metric='euclidean', \
n_bootstraps=0, align_type='seedless_procrustes', input_graph=False)
latent_dissim[i, j] = statistic
Qs.append(misc_dict['Q'])
# scale values from 0 to 1 and make matrix symmetric
scaled_latent_dissim = latent_dissim / np.max(latent_dissim)
scaled_latent_dissim = scaled_latent_dissim + scaled_latent_dissim.T
References¶
- 1
Laura E Suárez, Yossi Yovel, Martijin P Van den Heuvel, Olaf Sporns, Yaniv Assaf, Guillaume Lajoie, and Bratislav Misic. A connectomics-based taxonomy of mammals. bioRxiv, 03 2022. doi:10.1101/2022.03.11.483995.
- 2
Eric W. Weisstein. L^1-norm. Accessed May 16, 2022 [Online]. URL: https://mathworld.wolfram.com/L1-Norm.html.
- 3
Eric W. Weisstein. L^2-norm. Accessed May 16, 2022 [Online]. URL: https://mathworld.wolfram.com/L2-Norm.html.
- 4(1,2)
Eric W. Weisstein. Matrix norm. Accessed May 16, 2022 [Online]. URL: https://mathworld.wolfram.com/MatrixNorm.html.
- 5
K. Ch. Das. The laplacian spectrum of a graph. Computers & Mathematics with Applications, 48(5-6):715–724, 09 2004. doi:10.1016/j.camwa.2004.05.005.
- 6
Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, MD, 1985.