Spectral Method Theory
Contents
12.2. Spectral Method Theory#
We’ve tried to present this material so far in a manner which is as easy to understand as possible with a distant understanding of probability/statistics and a working knowledge of machine learning. Unfortunately, there is really no way around it now, so this section is about to get hairy mathematically. To understand this section properly, you should have an extremely firm understanding of linear algebra, and more than likely, should have a working knowledge of matrix analysis and multivariate probability theory. While we’ve already seen some concentration inequalities in the last section (Chebyshev’s inequality) you should have a working knowledge of how this term can be extended to random vectors and matrices before proceeding. Before taking on this section, we would recommend checking out the excellent primer [6], which should get you a good foundation to understand many of the results we will take a look at here. We aren’t going to prove many of these results; if you want more details, please check out [3].
Buckle up!
12.2.1. Disclaimer about classical statistical asymptotic theory#
While in classic statistics there is a large literature that derives the large sample properties of an estimator, these concepts are more challenging in network analysis for multiple reasons. To start with, the very basic concept of sample size is not particularly clear. We often associate sample size with the number of observations, which are usually assumed to be independent from each other (for example, think of the number of poll participants to estimate polling preferences). In a network, having independent observations is no longer possible in the same way since all we observe are edges, and they are related to some type of interaction between two vertices. We therefore often assume that the sampled units are the vertices. However, everytime a new node is added, a new set of interactions with all the existing vertices is added to the model, which often results in the need of including more parameters, leading to the second important challenge in studying networks. A body of new literature has addressed some of these challenges for the models and estimators introduced in the previous sections, and we review some of these results here.
12.2.2. Adjacency spectral embedding#
In the following sections, we summarize some of the main results in the literature about spectral embeddings. A more in-deep review of this results is presented in [3] and [7]. In this section, we review some theoretical properties for the adjacency spectral embedding (ASE), introduced in Section 5.3. We focus on contextualizing this method using the random dot product graph (RDPG) model from Section 4.3. If you haven’t, we’d recommend you check out the appendix section as well from Section 11.6.
The results we present in this section aim to understand the general question of how effective the spectral embedding methods are in estimating the latent positions of a random network generated from the RDPG model. Ideally, we would like these embeddings to be as close as possible to the true latent positions. The results we present on this section show that these estimates are indeed close if the network is sufficiently large. Moreover, the limiting distribution of these estimates can be characterized explicitly in a similar fashion as the classic central limit theorems in statistics.
In the rest of the section, we consider a random adjacency matrix
The rows of
12.2.2.1. Statistical error of the adjacency spectral embedding#
12.2.2.1.1. Consistency#
As we have observed before (for instance, see Section 4.3), the latent position matrix
When we want to compare the estimates and the true latent positions, we face a unavoidable problem: the parameters of a random dot product graph, namely, the latent positions, are not identifiable. In other words, given a matrix
The first result we review concerns the error of the adjacency spectral embedding (ASE) method; that is, the difference between
Here,
12.2.2.1.2. Asymptotic normality#
A further result on the asymptotic properties of the ASE concerns to the distribution of the estimation error; that is, the difference between the estimated latent position
Distributional results on the rows of the adjacency spectral embedding show that the error in estimating the true latent positions converge to a type of multivariate normal normal distribution as the size of the network grows. In particular, it has been shown that the
difference between
where
For certain classes of distributions, it is possible to simplify this expression further to get a more specific form of the asymptotic distribution of
The two results presented on this section constitute the basic properties of the adjacency spectral embedding under the RDPG model, but existing results are not limited to this particular method and model. Extensions of these results to a broader class of network models have been already developed (included the so-called generalized random dot product graphs). Analogous results also exist for other type of embedding methodologies. For instance, the Laplacian spectral embedding (LSE, from Section 5.3) possess analogous properties to the ASE (namely, consistent estimates with known asymptotic distribution), although the particular form of this theorem is a bit more complicated. The study of different network models and embedding methods is still a research area under development, and new results keep coming every day.
12.2.2.2. Example: Erdős-Rényi graphs and the adjacency spectral embedding#
The ER model (from Section 4.1 and Section 11.4) is the simplest example of a Random Dot Product Graph, in which all edge probabilities are the same. We will use this model to derive concrete expressions for the equations in the theory we have presented. Suppose that
Thus, the ER graph
The adjacency spectral embedding of the ER graph
Here,
from graspologic.simulations import er_np
import numpy as np
# set the probability ahead of time
p = 0.5
ns = np.round(10**np.linspace(1.5, 3.5, 5)).astype(int)
# run the simulations for each number of nodes
As = [[er_np(n, p) for i in range(0, 50)] for n in ns]
Next, we spectrally embed each network into a single dimension, and then check to make sure the signs of the estimated latent positions and the true latent positions align due to the orthogonality issue:
from graspologic.embed import AdjacencySpectralEmbed as ASE
# instantiate an ASE instance with 1 embedding dimension
ase = ASE(n_components=1)
# check for sign alignment, and flip the signs if Xhat and X disagree
def orthogonal_align(Xhat, p=0.5):
if ((Xhat*np.sqrt(p)).sum() < 0):
Xhat = -Xhat
return Xhat
# compute the estimated latent positions and realign in one step
Xhats_aligned = [[orthogonal_align(ase.fit_transform(A)) for A in An] for An in As]
Finally, we compute the maximum difference between the estimated latent positions (after alignment) and
Click to show
import pandas as pd
import seaborn as sns
network_numbers = np.vstack([np.vstack([np.full((n, 1), i) for i in range(0, 50)]) for n in ns]).flatten()
node_indices = np.vstack([np.vstack([np.arange(0, n).reshape((n, 1)) for i in range(0, 50)]) for n in ns]).flatten()
node_counts = np.vstack([np.vstack([np.full((n, 1), n) for i in range(0, 50)]) for n in ns]).flatten()
Xhats_aligned_flat = np.vstack([np.vstack([Xhat for Xhat in Xhats_n]) for Xhats_n in Xhats_aligned]).flatten()
df = pd.DataFrame({"Xhat": Xhats_aligned_flat, "i" : node_indices, "n" : node_counts,
"j" : network_numbers, "X" : np.sqrt(p)})
df["abs_diff"] = np.abs(df["Xhat"] - df["X"])
max_pernet = df.groupby(
["n", "j"]
).agg({
"abs_diff": "max"
}).reset_index()
max_pernet["norm_factor"] = np.log(max_pernet["n"])**2/np.sqrt(max_pernet["n"])
max_pernet["norm_diff"] = max_pernet["abs_diff"]/max_pernet["norm_factor"]
ax = sns.lineplot(data=max_pernet, x="n", y="norm_diff")
ax.set_xlabel("Number of Nodes")
ax.set_ylabel("$ \\frac{\\sqrt{n}\,\, max_i |\hat x_i - \sqrt{p} w_n|}{\log^2(n)}$");

As we can see, this value drops off pretty rapidly as we increase the number of nodes. Therefore, we could choose basically any constant attained by this curve (such as, for instance,
Similarly, we can also obtain an expression for the asymptotic distribution of the difference between the true and estimated positions. First, notice that since
The exact form of the covariance term
Combining these results, we get that the limiting distribution of the difference between
The asymptotic normality result is illustrated in the simulation setting below. In particular, observe that, as the size of the network
Click to show
df_reduced = df[df["j"] == 0] # check what happens for the first network from each set
df_reduced = df_reduced.copy()
# isolate the factor sqrt(n)*(xhat_i - sqrt(p)) that we want the limiting distribution of
df_reduced["limiting_factor"] = df_reduced.apply(lambda x: np.sqrt(x.n) *(x.Xhat - x.X), axis=1)
# np.sqrt(df_reduced["n"]) * (df_reduced["Xhat"] - df_reduced["X"])
Click to show
from scipy.stats import norm
g = sns.FacetGrid(df_reduced, col="n")
g.map(sns.histplot, "limiting_factor", stat="density")
truth = pd.DataFrame({"x" : np.linspace(-2, 2, 100)})
truth["y"] = norm.pdf(truth["x"], scale=np.sqrt(1-p))
axes = g.fig.axes
for ax in axes:
sns.lineplot(data=truth, x="x", y="y", ax=ax, color="red");
g.set_axis_labels("$\\sqrt{n}(\\hat x_i - \sqrt{p})$");
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/seaborn/axisgrid.py:703: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]

As
12.2.2.3. Application: two-graph hypothesis testing#
The results previously discussed demonstrate that the true and estimated latent positions are close to each other, and in fact, their distance gets smaller as
Comparing the distribution of two populations is a frequent problem in statistics and across multiple domains. In classical statistics, a typical strategy to perform this task is to compare the mean of two populations by using an appropriate test statistic. Theoretical results on the distribution of this statistic (either exact or asymptotic) are then used to derive a measure of uncertainty for this problem (such as p-values or confidence intervals). Similarly, when comparing two observed graphs, we may wonder whether they were generated by the same mechanism. The results discussed before have been used to develop valid statistical tests for two-network hypothesis testing questions.
A network hypothesis test for the equivalence between the latent positions of the vertices of a pair of networks with aligned vertices can be constructed by using the estimates of the latent positions. Formally, let
where
Here,
is a constant that standardizes the test statistic. It can be shown that, under appropriate regularity conditions that this test statistic will go to zero as
12.2.3. Theory for multiple network models#
Models for multiple network data often assume that there is a known one-to-one correspondence between the vertices of the graphs. If this correspondence is unknown, an estimate can be obtained via graph matching, which you learned about in Section 7.3. Once the vertices are correctly matched, models for multiple networks exploit the shared structure across the graphs to obtain accurate estimates. In this section we review the theoretical challenges on these circumstances. You can learn more about the problem space across a bevy of excellent academic survey papers such as [8], [9], or [10] for more details.
12.2.3.2. Joint spectral embeddings#
12.2.3.3. Omnibus Embedding (omni)#
The omnibus embedding described in Section 5.4 jointly estimates the latent positions under the joint random dot product network (
Furthermore, a central limit theorem for the rows of the omnibus embedding asserts that:
for some covariance matrix
12.2.3.4. Multiple adjacency spectral embedding (MASE)#
The
In addition, the entries of
as
12.2.4. References#
- 1(1,2)
Avanti Athreya, Donniell E. Fishkind, Minh Tang, Carey E. Priebe, Youngser Park, Joshua T. Vogelstein, Keith Levin, Vince Lyzinski, and Yichen Qin. Statistical inference on random dot product graphs: a survey. J. Mach. Learn. Res., 18(1):8393–8484, January 2017. doi:10.5555/3122009.3242083.
- 2
Keith Levin, A. Athreya, M. Tang, V. Lyzinski, and C. Priebe. A Central Limit Theorem for an Omnibus Embedding of Multiple Random Dot Product Graphs. 2017 IEEE International Conference on Data Mining Workshops (ICDMW), 2017. URL: https://www.semanticscholar.org/paper/A-Central-Limit-Theorem-for-an-Omnibus-Embedding-of-Levin-Athreya/fdc658ee1a25c511d7da405a9df7b30b613e8dc8.
- 3
Jesús Arroyo, Avanti Athreya, Joshua Cape, Guodong Chen, Carey E. Priebe, and Joshua T. Vogelstein. Inference for Multiple Heterogeneous Networks with a Common Invariant Subspace. Journal of Machine Learning Research, 22(142):1–49, 2021. URL: https://jmlr.org/papers/v22/19-558.html.
- 4
Vince Lyzinski, Donniell E. Fishkind, and Carey E. Priebe. Seeded graph matching for correlated Erdös-Rényi graphs. J. Mach. Learn. Res., 15(1):3513–3540, January 2014. doi:10.5555/2627435.2750357.
- 5
Minh Tang, Avanti Athreya, Daniel L. Sussman, Vince Lyzinski, Youngser Park, and Carey E. Priebe. A Semiparametric Two-Sample Hypothesis Testing Problem for Random Graphs. J. Comput. Graph. Stat., 26(2):344–354, April 2017. doi:10.1080/10618600.2016.1193505.
- 6
Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, Cambridge, England, UK, September 2018. ISBN 978-1-10823159-6. doi:10.1017/9781108231596.
- 7
Daniel L. Sussman, Minh Tang, Donniell E. Fishkind, and Carey E. Priebe. A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs. J. Am. Stat. Assoc., 107(499):1119–1128, September 2012. doi:10.1080/01621459.2012.699795.
- 8
D. Conte, P. Foggia, C. Sansone, and M. Vento. THIRTY YEARS OF GRAPH MATCHING IN PATTERN RECOGNITION. Int. J. Pattern Recognit. Artif. Intell., 18(03):265–298, May 2004. doi:10.1142/S0218001404003228.
- 9
Pasquale Foggia, Gennaro Percannella, and Mario Vento. GRAPH MATCHING AND LEARNING IN PATTERN RECOGNITION IN THE LAST 10 YEARS. Int. J. Pattern Recognit. Artif. Intell., 28(01):1450001, October 2013. doi:10.1142/S0218001414500013.
- 10
Junchi Yan, Xu-Cheng Yin, Weiyao Lin, Cheng Deng, Hongyuan Zha, and Xiaokang Yang. A Short Survey of Recent Advances in Graph Matching. In ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pages 167–174. Association for Computing Machinery, New York, NY, USA, June 2016. doi:10.1145/2911996.2912035.