Anomaly Detection For Timeseries of Networks
Contents
8.1. Anomaly Detection For Timeseries of Networks#
There is a particular type of sea slug which has gills on the outside of its body. When you squirt water at these gills, they withdraw into the slug. The interesting thing about this type of slug is that the brain network involved in this gill withdrawal reflex is entirely mapped out, from the neurons which detect and transmit information about the water into the slug’s brain, to the neurons that leave the brain and fire at its muscles. (For those interested, this is a real thing - look up Eric Kandel’s research on Aplysia!)
Say you’re a researcher studying these sea slugs, and you have a bunch of brain networks of the same slug. We can define each node as a single neuron, and edges denote connections between neurons. Each of the brain networks that you have were taken at different time points: some before water started getting squirted at the slug’s gills, and some as the water was getting squirted. Your goal is to reconstruct when water started to get squirted, using only the networks themselves. You hypothesize that there should be some signal change in your networks which can tell you the particular time at which water started getting squirted. Given the network data you have, how do you figure out which timepoints these are?
The broader class of problems this question addresses is called anomaly detection. The idea, in general, is that you have a bunch of snapshots of the same network over time. Although the nodes are the same, the edges are changing at each time point. Your goal is to figure out which time points correspond to the most change, either in the entire network or in particular groups of nodes. You can think of a network as “anomalous” with respect to time if some potentially small group of nodes within the network concurrently changes behavior at some point in time compared to the recent past, while the remaining nodes continue with whatever noisy, normal behavior they had.
In particular, what we would really like to do is separate the signal from the noise. All of the nodes in the network are likely changing a bit over time, since there is some variability intrinsic in the system. Random noise might just dictate that some edges get randomly deleted and some get randomly created at each step. We want to figure out if there are timepoints where the change isn’t just random noise: we’re trying to figure out a point in time where the probability distribution that the network itself is generated from changes.
Let’s simulate some network timeseries data so that we can explore anomaly detection more thoroughly.
8.1.1. Simulating Network Timeseries Data#
For this example, we’re going to assume that our sea slug brain is measured for at
To make the ten non-anomalous timepoints, we’ll:
Generate 100 latent positions. Each latent position
will be a (uniformly) random number between 0.2 and 0.8, for from to .Use
graspologic
’srdpg
to sample an adjacency matrix using the one-dimensional latent position matrix, , whose rows are the latent positions . Do this ten times, to generate the networks with adjacency matrices , where is between and and excluding through .
And to make the two perturbed time points, we’ll do the following twice:
The first
latent positions will be uniform random numbers between and . These latent positions will be denoted respectively. The remaining latent positions will be the same for all nodes as they were for the non-anomalous timepoints, .Use
graspologic
’srdpg
to sample an adjacency matrix using the one-dimensional latent position matrix, , whose first twenty rows are the latent positions and whose remaining rows are the latent positions . This generates the anomalous networks with adjacency matrices , which have different latent positions from the non-anomalous networks.
Once we have this simulated data, we’ll move into some discussion about how we’ll approach detecting the anomalous time points.
Below is code for generating the data. We define a function to generate a particular time-point, with an argument which toggles whether we’ll perturb latent positions for that time point. Then, we just loop through our time-points to sample an adjacency matrix for each one.
import numpy as np
from graspologic.simulations import rdpg
T = 12 # number of timepoints
n = 100 # number of nodes
# non-anomalous latent positions
X = np.random.uniform(.2, .8, size=(n, 1))
# array of latent positions for each time point
Xts = []
for time in range(T):
if time > 4 and time < 7:
Xt = np.copy(X)
Xt[0:20,:] = np.random.uniform(.6, 1.0, size=(20, 1))
Xts.append(Xt)
else:
Xts.append(X)
networks = [rdpg(X) for X in Xts]
networks = np.array(networks)
You can see the adjacency matrices we generated below. Note that you can’t really distinguish a difference between the ten non-anomylous time points and the two perturbed time points with the naked eye, even though the difference is there, so it would be pretty difficult to manually mark the time points - and if you have many time points, rather than just a few, you’d want to be able to automate the process.
Click to show
import matplotlib.pyplot as plt
from graphbook_code import heatmap, cmaps
import seaborn as sns
def rm_ticks(ax, x=False, y=False, **kwargs):
if x is not None:
ax.axes.xaxis.set_visible(x)
if y is not None:
ax.axes.yaxis.set_visible(y)
sns.despine(ax=ax, **kwargs)
fig = plt.figure();
# adjacency matrices
perturbed_points = {5, 6}
for i in range(T):
if i not in perturbed_points:
ax = fig.add_axes([.02*i, -.02*i, .8, .8])
else:
ax = fig.add_axes([.02*i+.8, -.02*i, .8, .8])
ax = heatmap(networks[i], ax=ax, cbar=False)
if i == 0:
ax.set_title("Ten Normal Time Points", loc="left", fontsize=16)
if i == 5:
ax.set_title("Two Perturbed Time Points", loc="left", fontsize=16)
rm_ticks(ax, top=False, right=False)
plt.figtext(1, -.3, "Figure 8.1")
fig.suptitle("Network Timeseries Data", fontsize=24, x=1);

8.1.2. Approaches for Anomaly Detection#
It’s time to start thinking about how we’d approach figuring out which of the time points are anomalies.
One of the simplest approaches to this problem might just be to figure out which node has the highest count of edge changes across your timeseries. For each node across the timeseries, you’d count the number of new edges that appeared (compared to the previous point in time), and the number of existing edges that were deleted. Whichever count is highest could be your anomalous node.
This might give you a rough estimate – and you could even potentially find perturbed time points with this approach – but it’s not necessarily the best solution. Counting edges doesn’t account for other important pieces of information: for instance, you might be interested in which other nodes new edges were formed with. It seems like deleting or creating edges with more important nodes, for instance, should be weighted higher than deleting or creating edges with unimportant nodes.
So let’s try another method. You might actually be able to guess it! The idea will be to simply estimate each network’s latent positions, followed by a hypothesis testing approach. Here’s the idea.
Let’s call the latent positions for our network
This quantity will be relatively big if
In other words, We’re trying to find the time points where the difference in norm between the latent positions at time
There’s an alternate problem where you restrict your view to subsets of nodes rather than entire adjacency matrices. The idea is that you’d find time-points which are anomalous for particular nodes or groups of nodes, rather than the entire network. The general idea is the same: you find latent positions, then test for how big the difference is between time point
8.1.2.1. Challenges of anomaly detection#
In practice, however, this is a bit more challenging than just looking for the consecutive time points with differences that exceed zero. Even in a perfect environment, such as under simulation, we run across two variations of problems that we have seen before. To perform timeseries anomaly detection, what you are first going to assume is that each network
For each timepoint, we have the following question of interest: are the latent positions equal; that is,
8.1.2.1.1. The nonidentifiability problem#
When you observe the network
8.1.2.1.2. We only get to obtain estimates of latent positions#
Now that you have latent position estimates for each network
8.1.3. Detecting if the First Time Point is an Anomaly#
We’ll start with the first time point, which (because we generated the data!) we know in advance is not an anomaly.
So, here’s what’s going on in the code below:
We embed the networks
using OMNI embedding from Section 5.4.5, to produce estimates of the latent position matrices for all timepoints , oriented in the same latent space.We compute the norm of the difference between each pair of samples, letting
withnumpy
, for the timepoint . We repeat this for all timepoints except for the first timepoint, so goes from to . With timepoints in our example, this means that we have total test statistics.
An important point to clarify is that there are a lot of different types of matrix norms: Frobenius norm, spectral norm, and so on. In our case, we’ll be using the ord
parameter argument in numpy determines which norm we use, and ord=2
is the operator norm.
Again, this norm, intuitively, will tell us how different two matrices are. If the norm of the difference between the true latent positions
from graspologic.embed import OmnibusEmbed as OMNI
def compute_statistic(X, Y):
"""
A function which computes the test statistic, the 2-norm.
"""
return np.linalg.norm(X - Y, ord=2)
d = 1
# embed the networks with OMNI embedding
latents = OMNI(n_components=d).fit_transform(networks)
stat_observed = compute_statistic(latents[0], latents[1])
Click to show
print(f'{stat_observed:.3f}')
0.622
Click to show
from matplotlib import pyplot as plt
import seaborn as sns
fig, ax = plt.subplots(1, 1, figsize=(10, 4))
ax.set_title("")
ax.set_ylabel("")
ax.axvline(stat_observed, color="red")
ax.text(1.05*stat_observed , .5, "$s^{(t)} = ||\\hat X^{(2)} - \\hat X^{(1)}||_2$", color="red");
ax.set_xlim([0, 1.4*stat_observed])
ax.set_xlabel("Operator norm of difference");

8.1.4. Hypothesis Testing With our Test Statistic#
We observe the test statistic shown with the red line above. If the first two adjacency matrices have the same latent positions, we would hope that their latent position estimates (after adjusting for rotations) are reasonably similar and the statistic would be relatively small. On the other hand, if their latent positions are different, we would hope that their latent position estimates (even after adjusting for rotations) would be reasonably large and the statistic would also be relatively large. However, the word relative is the key word here: relative what? how should we determine whether it’s small enough to say that
Well, what if we could use our estimated latent positions
Back in two sample testing Section 7.1, we came across this same fundamental problem when attempting to determine whether two latent position matrices were identical, and here we’re going to resort to virtually the same strategy: the parametric bootstrap.
8.1.4.1. Generating the null distribution of each pair of timepoints using the parametric bootstrap#
So: recall, our original question was to determine whether
Our approach is going to be as follows. Given a pair of (already-aligned) estimated latent position matrices
Generate two pairs of adjacency matrices,
and by sampling independently from an RDPG with latent position matrix , and the pair and by sampling independently from an RDPG with latent position matrix . Notice that each pair of adjacency matrices have an identical underlying latent position matrix, which is the estimate of the latent position matrix we previously produced with OMNI.
For each pair of adjacency matrices for timepoint
(which can be either or ), embed the adjacency matrices into dimensions using the OMNI embedding, to produce estimates of the latent positions and .Compute the corresponding statistics for each timepoint
, , again usingnumpy
.
So, what’s going on here exactly?
For the timepoint of interest
def generate_replicate(X, d=d):
"""
Generates two estimated latent position matrices which have the same
underlying latent position matrix, X.
"""
A2 = rdpg(X)
A1 = rdpg(X)
latents = OMNI(n_components=d).fit_transform([A2, A1])
return latents[0], latents[1]
Since these are estimates of latent position matrices from samples of random graphs (which are RDPGs), they won’t be identical, and are in fact, quite useful: they give us an idea of just how much, if the underlying latent position matrix is the same, that the two latent position matrices will differ in terms of the statistic
Finally, we aggregate together all of the statistics we computed, giving us a collection of
def compute_null_statistics(X1, X2, R=1000,d=d):
"""
Compute the test statistics under the null for two latent position
matrices, X1 and X2.
"""
s = []
for r in range(R):
for X in [X1, X2]:
Xrep2, Xrep1 = generate_replicate(X, d=d)
s.append(compute_statistic(Xrep2, Xrep1))
return np.array(s)
null_stats = compute_null_statistics(latents[1], latents[0])
Finally, we’ll do exactly what we did in two sample testing Section 7.1: we’ll compare the statistic we obtained directly from our data,
Click to show
fig, ax = plt.subplots(figsize=(8, 6))
plot = sns.histplot(null_stats, ax=ax)
plot.set_title("Distribution of test statistics with the same latent positions");
plot.set_xlabel("Test statistic value");
plot.vlines(stat_observed, 0, 200, colors='r')
ax.text(1.02*stat_observed , 180, "$s^{(t)} = ||\\hat X^{(2)} - \\hat X^{(1)}||_2$", color="red");
plot.text(.50, 70, "Bootstrapped Distribution", color="blue", rotation=65);

Fortunately,
8.1.4.2. Estimating the -value using the bootstrapped samples#
We’ll again use this observation to determine a
Just like we did for two-sample testing in Section 7.1, we add a
def distn_test_anomaly(X1, X2, d=d, R=1000):
"""
Performs an anomaly distribution test.
"""
tstat = compute_statistic(X1, X2)
null_stats = compute_null_statistics(X1, X2, d=d, R=R)
pval = ((null_stats >= tstat).sum() + 1)/(len(null_stats) + 1)
return tstat, pval
tstat, pval = distn_test_anomaly(latents[1], latents[0])
Click to show
print("Test statistic: {:3f}, p-value: {:3f}".format(tstat, pval))
Test statistic: 0.621803, p-value: 0.287856
When we use a decision threshold of
What happens at the true anomalous timepoints, timepoints
pvals = {}
for t in range(2, T):
_, pval = distn_test_anomaly(latents[t], latents[t - 1])
pvals[f"{t}:{t+1}"] = pval
Again, since we performed multiple comparisons, we adjust the
from statsmodels.stats.multitest import multipletests
alpha = 0.05
_, adj_pvals, _, _ = multipletests([float(p) for p in pvals.values()], alpha=alpha, method="holm")
Click to show
from graphbook_code import cmaps
data = np.fromiter(adj_pvals, dtype=float)
fig, ax = plt.subplots(figsize=(10, 1))
plot = sns.heatmap(data[:, None].T, cmap=cmaps["sequential"],
cbar=False, annot=True, vmin=0, vmax=1);
plot.set_title("p-values for each pair of timepoints after Bonferroni-Holm");
plot.axes.yaxis.set_ticklabels([]);
plot.axes.xaxis.set_ticklabels(list(pvals.keys()));
plot.set_xlabel("Timepoint pairs");

Our results are consistent with what we would expect based on how the simulation was generated.
The first
However, the
Finally, when we get back to the
8.1.5. The Distribution of the Bootstrapped Test Statistic#
One issue that could pop up is that the bootstrapped test statistic is slightly biased. Since we’re generating it from an estimate
Below you can see the true distribution of the test statistic for the real, unperturbed set of latent positions
Click to show
from graspologic.simulations import rdpg
import numpy as np
from graspologic.embed import OmnibusEmbed as OMNI
X = np.random.uniform(.2, .8, size=(100, 1))
networks = []
networks.append(rdpg(X))
networks.append(rdpg(X))
def get_statistic(adjacencies, return_latents=False):
"""
Get the operator norm of the difference of two matrices.
"""
omni = OMNI(n_components=2)
latents_est = omni.fit_transform(adjacencies)
X, Y = latents_est[0], latents_est[1]
y = np.linalg.norm(X - Y, ord=2)
if return_latents:
return y, X
else:
return y
omni = OMNI(n_components=1)
latents = omni.fit_transform(networks)
ys_bootstrap = []
for i in range(1000):
A_, B_ = rdpg(latents[0]), rdpg(latents[0])
y_ = get_statistic([A_, B_])
ys_bootstrap.append(y_)
ys_true = []
for i in range(1000):
A_, B_ = rdpg(X), rdpg(X)
y_ = get_statistic([A_, B_])
ys_true.append(y_)
import seaborn as sns
import matplotlib.pyplot as plt
sns.histplot(ys_true, label="true distribution of $s^{(t)}_r$", color="blue")
sns.histplot(ys_bootstrap, label="distribution of bootstrapped $s^{(t)}_r$ values", color="red")
plt.gca().legend()
plt.figtext(.5, 0, "Figure 8.5")
plt.gca().set_title("Distribution Comparison For the Bootstrapped and True Test Statistic");

8.1.6. References#
This strategy was first described in [1]. Check it out for more details.
- 1
Guodong Chen, Jesús Arroyo, Avanti Athreya, Joshua Cape, Joshua T. Vogelstein, Youngser Park, Chris White, Jonathan Larson, Weiwei Yang, and Carey E. Priebe. Multiple Network Embedding for Anomaly Detection in Time Series of Graphs. arXiv, August 2020. arXiv:2008.10055, doi:10.48550/arXiv.2008.10055.