Kernels
Contents
Kernels¶
We build our simple kernels using global, node-wise, and edge-wise properties of the network. Both datasets have the shape \((N, n, n)\) where \(N\) is the number of networks in the dataset and \(n\) is the number of nodes in each network. Each graph is represented by an \((n\) x \(n)\) adjacency matrix, where the existence of an edge between node \(i\) and node \(j\) is denoted by a nonzero value at the \(i,j\) th element.
For Matched Networks¶
Attached is the code for calculating the dissimilarity matrix using different kernels for matched networks.
# HIDE CODE
import numpy as np
def calculate_dissim(graphs, method="density", norm=None, normalize=True):
""" Calculate the dissimilarity matrix using the input kernel. """
glob = False
node = False
edge = False
if method == "density":
glob = True
num_nodes = graphs.shape[1]
num_edges_possible = num_nodes ** 2 - num_nodes
metric = np.zeros(len(graphs))
for i, graph in enumerate(graphs):
num_edges = np.count_nonzero(graph)
metric[i] = num_edges / num_edges_possible
elif method == "avgedgeweight":
glob = True
metric = np.zeros(len(graphs))
for i, graph in enumerate(graphs):
num_edges = np.count_nonzero(graph)
metric[i] = np.sum(graph) / num_edges
elif method == "avgadjmatrix":
glob = True
metric = np.zeros(len(graphs))
for i, graph in enumerate(graphs):
metric[i] = np.average(graph)
elif method == "degree":
node = True
metric = np.zeros((graphs.shape[0], graphs.shape[1]))
for i, graph in enumerate(graphs):
for j, row in enumerate(graph):
metric[i, j] = np.count_nonzero(row)
elif method == "strength":
node = True
metric = np.zeros((graphs.shape[0], graphs.shape[1]))
for i, graph in enumerate(graphs):
for j, row in enumerate(graph):
metric[i, j] = np.sum(row)
elif method == "edgeweight":
edge = True
metric = graphs
else:
print("Not a valid kernel name.")
dissim_matrix = np.zeros((len(graphs), len(graphs)))
for i, metric1 in enumerate(metric):
for j, metric2 in enumerate(metric):
if glob and norm == None:
diff = np.abs(metric1- metric2)
elif (node or edge) and norm == "l1":
diff = np.linalg.norm(metric1 - metric2, ord=1)
elif (node or edge) and norm == "l2":
diff = np.linalg.norm(metric1 - metric2, ord=2)
else:
print("L1, L2 norms only apply to node or edge-wise kernels.")
dissim_matrix[j, i] = diff
if normalize:
dissim_matrix = dissim_matrix / np.max(dissim_matrix)
return dissim_matrix
Global Properties¶
For each of these kernels, each element of the \((N\) x \(N)\) dissimilarity matrix is the absolute difference between the metrics, where \(N\) is the number of networks in the dataset.
Density¶
Graph density refers to the ratio between the number of edges present in the graph and the number of possible edges and is calculated by
where \(n\) is the number of nodes in the graph.
Average Edge Weight¶
To calculate average edge weight, we sum all the nonzero values in the adjacency matrix and divide by the number of edges present.
Average of the Adjacency Matrix¶
To calculate the average of the adjacency matrix, we sum all the nonzero values in the adjacency matrix and divide by the total number of possible edges. Note that if the graph is unweighted, the average of the adjacency matrix would be the same as graph density.
where \(n\) is the number of nodes in the graph.
Node-wise Properties¶
For each of these kernels, two \((N\) x \(N)\) dissimilarity matrices are constructed. Each element of the dissimilarity matrix is the \(L^1\)-norm and \(L^2\)-norm of the difference vector between the metrics.
Note
The \(L^1\)-norm is the sum of the magnitude of the vector. Formally,
where \(x_i\) for \(i = 1, 2, ..., n\) are the elements of the vector \(\boldsymbol{x}\) [Weisstein, n.d.].
Note
The \(L^2\)-norm is the Euclidean distance from the origin to the vector. Formally,
where \(x_i\) for \(i = 1, 2, ..., n\) are the elements of the vector \(\boldsymbol{x}\) [Weisstein, n.d.].
Node Degree¶
The degree of a node is the number of edges a node has. In an \((n\) x \(n)\) adjacency matrix \(A\), the degree of node \(i\) is calculated by \(\sum_{j=1}^n a_{j}\) where \(a_{j}=1\) if \(A_{ij} \neq 0\) and \(a_{j}=0\) if \(A_{ij} = 0\). Thus for each graph, we have an \((i\) x \(1)\) vector where each element is the degree of node \(i\), and the dissimilarity matrix is constructed based on these vectors.
Node Strength¶
The strength of a node is the sum of the weights of the edges a node has. In an \((n\) x \(n)\) adjacency matrix \(A\), the degree of node \(i\) is calculated by \(\sum_{j=1}^n A_{ij}\). Thus for each graph, we have an \((i\) x \(1)\) vector where each element is the strength of node \(i\), and the dissimilarity matrix is constructed based on these vectors.
Edge-wise Properties¶
Similar to the kernels above, two \((N\) x \(N)\) dissimilarity matrices are constructed based on the \(L^1\)-matrix norm and \(L^2\)-matrix norm of the difference matrix between the metrics.
Note
The \(L^1\)-matrix norm is also called the maximum absolute column sum norm and is calculated by
Note
The \(L^2\)-matrix norm is also called the spectral norm and is calculated by
where \(A^H\) is the conjugate transpose matrix. In our case, matrix \(A\) is an adjacency matrix and thus always real, so \(A^H = A^T\) [Weisstein, n.d.].
Edge Weight¶
For each pair of \((n\) x \(n)\) graphs, the difference in edge weights is calculated by the difference in their adjacency matrices, yielding an \((n\) x \(n\)) difference matrix. Then the dissimilarity matrix is constructed based on the \(L^1\) and \(L^2\)-matrix norm of these matrices.
For Unmatched Networks¶
The kernels we use above assume the nodes are matched. However, when we have no known matching of the nodes, we cannot use kernels based on the node-wise or edge-wise properties of the network because we cannot calculate a difference vector or matrix directly and calculate its \(L^1\) and \(L^2\) norm. Thus, we flatten the difference vector or matrix for those kernels and find the two-sample Kolmogorov-Smirnov test statistic for the two “distributions”. The metrics remain the same. The modified code is attached below.
# HIDE CODE
from scipy.stats import ks_2samp
def calculate_dissim_unmatched(graphs, method="degree", normalize=True):
""" Calculate the dissimilarity matrix using the input kernel. """
if method == "degree":
metric = np.zeros((graphs.shape[0], graphs.shape[1]))
for i, graph in enumerate(graphs):
for j, node in enumerate(graph):
metric[i, j] = np.count_nonzero(node)
elif method == "strength":
metric = np.zeros((graphs.shape[0], graphs.shape[1]))
for i, graph in enumerate(graphs):
for j, node in enumerate(graph):
metric[i, j] = np.sum(node)
elif method == "edgeweight":
metric = []
for graph in graphs:
metric.append(np.ravel(np.nonzero(graph)))
else:
print("Not a valid kernel name.")
dissim_matrix = np.zeros((len(graphs), len(graphs)))
for i, metric1 in enumerate(metric):
for j, metric2 in enumerate(metric):
diff, _ = ks_2samp(np.array(metric1), np.array(metric2), mode='asymp')
dissim_matrix[i, j] = diff
if normalize:
dissim_matrix = dissim_matrix / np.max(dissim_matrix)
return dissim_matrix
Established Kernels¶
The following kernels use existing algorithms and/or equations that are much more complex and computationally expensive than those described above. We use them as a point of comparison for the performance for our “simple” kernels.
Laplacian Spectral Distance¶
We compute the I-DAD Laplacian of each graph, which is calculated by
where \(D\) is the diagonal matrix of degrees of node \(i\) raised to the -1/2 power, \(I\) is the identity matrix, and \(A\) is the adjacency matrix.
Graphs can often be characterized by its Laplacian eigenspectrum, which makes this a useful metric to consider when assessing similarity between graphs [Das, 2004]. The I-DAD Laplacian is particularly useful because all its eigenvalues are real, and for undirected graphs, all eigenvalues are guaranteed to be in the range [0, 2] [Suárez et al., 2022].
We implement the Laplacian spectral kernel by applying pass-to-ranks transformations on the networks, sorting the eigenvalues of the graph Laplacian without any smoothing, then calculating the \(L^2\)-norm of the difference vector to construct the dissimilarity matrix.
Note
Using the code attached below, there are several ways we can derive and use the Laplacian eigenspectrum. We can transform the graphs through pass-to-ranks or binarization before finding its graph Laplacian, and we can simply sort its eigenvalues or smooth them using a Gaussian kernel to construct the eigenspectrum. We also have multiple ways to quantify dissimilarity, such as calculating the cosine difference between two eigenspectra, or by finding the \(L^1\)-norm, or \(L^2\)-norm of the difference vector between two eigenspectra. The implementation we state above is based on which yields the highest discriminiability value on the matched networks. More details can be found in the Laplacian section in the Appendix.
We use the Laplacian spectral kernel for both matched and unmatched networks.
# HIDE CODE
import numpy as np
from graspologic.utils import pass_to_ranks, to_laplacian
from sklearn.neighbors import KernelDensity
from scipy.spatial.distance import cosine
def laplacian_dissim(graphs, transform: str=None, metric: str='l2', smooth_eigvals: bool=False, \
normalize=True):
if transform == 'pass-to-ranks':
for i, graph in enumerate(graphs):
graph = pass_to_ranks(graph)
graphs[i] = graph
elif transform == 'binarize':
graphs[graphs != 0] = 1
elif transform == None:
graphs = graphs
else:
print('Supported transformations are "pass-to-ranks" (simple-nonzero), "binarize", or None.')
eigs = []
for i, graph in enumerate(graphs):
# calculate laplacian
lap = to_laplacian(graph, 'I-DAD')
# find and sort eigenvalues
w = np.linalg.eigvals(lap)
w = np.sort(w)
if smooth_eigvals:
kde = KernelDensity(kernel='gaussian', bandwidth=0.015).fit(w.reshape(-1, 1))
xs = np.linspace(0, 2, 2000)
xs = xs[:, np.newaxis]
log_dens = kde.score_samples(xs)
eigs.append(np.exp(log_dens))
else:
eigs.append(w)
dissim_matrix = np.zeros((len(graphs), len(graphs)))
for i, eig1 in enumerate(eigs):
for j, eig2 in enumerate(eigs):
if metric == 'cosine':
diff = cosine(eig1, eig2)
elif metric == 'l1':
diff = np.linalg.norm(eig1 - eig2, ord=1)
elif metric == 'l2':
diff = np.linalg.norm(eig1 - eig2, ord=2)
dissim_matrix[i, j] = diff
if normalize:
dissim_matrix = dissim_matrix / np.max(dissim_matrix)
return dissim_matrix
For Matched Networks - Omnibus Embedding¶
We generate omnibus embeddings of the matched networks, then calculate the Frobenius norm of the difference between two embeddings to construct the dissimilarity matrix.
Note
The Frobenius norm of a matrix \(A\) is given by
# HIDE CODE
from graspologic.embed import OmnibusEmbed
# embed using Omni
embedder = OmnibusEmbed(n_elbows=4)
omni_embedding = embedder.fit_transform(graphs)
omni_matrix = np.zeros((len(graphs), len(graphs)))
for i, embedding1 in enumerate(omni_embedding):
for j, embedding2 in enumerate(omni_embedding):
dist = np.linalg.norm(embedding1 - embedding2, ord="fro")
omni_matrix[i, j] = dist
# scale values from 0 to 1
scaled_omni_dissim = omni_matrix / np.max(omni_matrix)
For Unmatched Networks - Latent Distribution Test Statistic¶
The latent distribution test is a two-sample hypothesis test for whether or not two graphs have the same distributions of latent positions. This test does not assume any known matching, as it can use alignment techniques such as seedless-procrustes to find the best orthogonal alignment between two embeddings.
First we generate adjacency spectral embeddings of the largest connected component of the unmatched graphs, then calculate the latent distribution test statistic for every pair to construct the dissimilarity matrix using parameters shown in the code below.
# HIDE CODE
from graspologic.inference import latent_distribution_test
from graspologic.embed import AdjacencySpectralEmbed
from graspologic.utils import largest_connected_component
from tqdm import tqdm
import warnings
warnings.filterwarnings("ignore")
ase_graphs = []
for i, graph in enumerate(graphs):
lcc_graph = largest_connected_component(graph)
ase_graph = AdjacencySpectralEmbed(n_components=4).fit_transform(lcc_graph)
ase_graphs.append(ase_graph)
# calculate alignments
latent_dissim = np.zeros((len(ase_graphs), len(ase_graphs)))
Qs = []
for j in tqdm(range(0, len(ase_graphs) - 1)):
for i in range(j+1, len(ase_graphs)):
statistic, _, misc_dict = latent_distribution_test(ase_graphs[i], ase_graphs[j], test='mgc', metric='euclidean', \
n_bootstraps=0, align_type='seedless_procrustes', input_graph=False)
latent_dissim[i, j] = statistic
Qs.append(misc_dict['Q'])
# scale values from 0 to 1 and make matrix symmetric
scaled_latent_dissim = latent_dissim / np.max(latent_dissim)
scaled_latent_dissim = scaled_latent_dissim + scaled_latent_dissim.T
References¶
- 1
Laura E Suárez, Yossi Yovel, Martijin P Van den Heuvel, Olaf Sporns, Yaniv Assaf, Guillaume Lajoie, and Bratislav Misic. A connectomics-based taxonomy of mammals. bioRxiv, 03 2022. doi:10.1101/2022.03.11.483995.
- 2
Eric W. Weisstein. L^1-norm. Accessed May 16, 2022 [Online]. URL: https://mathworld.wolfram.com/L1-Norm.html.
- 3
Eric W. Weisstein. L^2-norm. Accessed May 16, 2022 [Online]. URL: https://mathworld.wolfram.com/L2-Norm.html.
- 4(1,2)
Eric W. Weisstein. Matrix norm. Accessed May 16, 2022 [Online]. URL: https://mathworld.wolfram.com/MatrixNorm.html.
- 5
K. Ch. Das. The laplacian spectrum of a graph. Computers & Mathematics with Applications, 48(5-6):715–724, 09 2004. doi:10.1016/j.camwa.2004.05.005.
- 6
Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, MD, 1985.