11.5. Stochastic Block Models#

11.5.1. A Priori Stochastic Block Model#

The a priori SBM is an SBM in which we know ahead of time (a priori) which nodes are in which communities. Here, we will use the variable K to denote the maximum number of different communities. The ordering of the communities does not matter; the community we call 1 versus 2 versus K is largely a symbolic distinction (the only thing that matters is that they are different). The a priori SBM has the following parameter:

Parameter

Space

Description

B

[0,1]K×K

The block matrix, which assigns edge probabilities for pairs of communities

To describe the A Priori SBM, we will designate the community each node is a part of using a vector, which has a single community assignment for each node in the network. We will call this node assignment vector τ, and it is a n-length vector (one element for each node) with elements which can take values from 1 to K. In symbols, we would say that τ{1,...,K}n. What this means is that for a given element of τ, τi, that τi is the community assignment (either 1, 2, so on and so forth up to K) for the ith node. If there we hahd an example where there were 2 communities (K=2) for instance, and the first two nodes are in community 1 and the second two in community 2, then τ would be a vector which looks like:

τ=[1122]

Next, let’s discuss the matrix B, which is known as the block matrix of the SBM. We write down that B[0,1]K×K, which means that the block matrix is a matrix with K rows and K columns. If we have a pair of nodes and know which of the K communities each node is from, the block matrix tells us the probability that those two nodes are connected. If our networks are simple, the matrix B is also symmetric, which means that if bkk=p where p is a probability, that bkk=p, too. The requirement of B to be symmetric exists only if we are dealing with undirected networks.

Finally, let’s think about how to write down the generative model for the a priori SBM. Intuitionally what we want to reflect is, if we know that node i is in community k and node j is in community k, that the (k,k) entry of the block matrix is the probability that i and j are connected. We say that given τi=k and τj=k, aij is sampled independently from a Bern(bkk) distribution for all j>i. Note that the adjacencies aij are not necessarily identically distributed, because the probability depends on the community of edge (i,j). If A is an a priori SBM network with parameter B, and τ is a realization of the node-assignment vector, we write that ASBMn,τ(B).

11.5.1.1. Probability#

What does the probability for the a priori SBM look like? In our previous description, we admittedly simplified things to an extent to keep the wording down. In truth, we model the a priori SBM using a latent variable model, which means that the node assignment vector, ττ, is treated as random. For the case of the a priori SBM, it just so happens that we know the specific value that this latent variable ττ takes, τ, ahead of time.

Fortunately, since τ is a parameter of the a priori SBM, the probability is a bit simpler than for the a posteriori SBM. This is because the a posteriori SBM requires an integration over potential realizations of ττ, whereas the a priori SBM does not, since we already know that ττ was realized as τ.

Putting these steps together gives us that:

Prθ(A)=Prθ(A=A|ττ=τ)=j>iPrθ(aij=aij|ττ=τ),Independence Assumption

Next, for the a priori SBM, we know that each edge aij only actually depends on the community assignments of nodes i and j, so we know that Prθ(aij=aij|ττ=τ)=Pr(aij=aij|τi=k,τj=k), where k and k are any of the K possible communities. This is because the community assignments of nodes that are not nodes i and j do not matter for edge ij, due to the independence assumption.

Next, let’s think about the probability matrix P=(pij) for the a priori SBM. We know that, given that τi=k and τj=k, each adjacency aij is sampled independently and identically from a Bern(bk,k) distribution. This means that pij=bk,k. Completing our analysis from above:

Prθ(A)=j>ibkkaij(1bkk)1aij=k,k[K]bkkmkk(1bkk)nkkmkk

Where nkk denotes the total number of edges possible between nodes assigned to community k and nodes assigned to community k. That is, nkk=j>i1τi=k1τj=k. Further, we will use mkk to denote the total number of edges observed between these two communities. That is, mkk=j>i1τi=k1τj=kaij. Note that for a single (k,k) community pair, that the probability is analogous to the probability of a realization of an ER random variable.

Like the ER model, there are again equivalence classes of the sample space An in terms of their probability. For a two-community setting, with τ and B given, the equivalence classes are the sets:

Ea,b,c(τ,B)={AAn:m11=a,m21=m12=b,m22=c}

The number of equivalence classes possible scales with the number of communities, and the manner in which nodes are assigned to communities (particularly, the number of nodes in each community).

11.5.2. A Posteriori Stochastic Block Model#

In the a posteriori Stochastic Block Model (SBM), we consider that node assignment to one of K communities is a random variable, that we don’t know already like te a priori SBM. We’re going to see a funky word come up, that you’re probably not familiar with, the K probability simplex. What the heck is a probability simplex?

The intuition for a simplex is probably something you’re very familiar with, but just haven’t seen a word describe. Let’s say I have a vector, π=(πk)k[K], which has a total of K elements. π will be a vector, which indicates the probability that a given node is assigned to each of our K communities, so we need to impose some additional constraints. Symbolically, we would say that, for all i, and for all k:

πk=Pr(ττi=k)

The π we’re going to use has a very special property: all of its elements are non-negative: for all πk, πk0. This makes sense since πk is being used to represent the probability of a node i being in group k, so it certainly can’t be negative. Further, there’s another thing that we want our π to have: in order for each element πk to indicate the probability of something to be assigned to k, we need all of the πks to sum up to one. This is because of something called the Law of Total Probability. If we have K total values that ττi could take, then it is the case that:

k=1KPr(ττi=k)=k=1Kπk=1

So, back to our question: how does a probability simplex fit in? Well, the K probability simplex describes all of the possible values that our vector π could take! In symbols, the K probability simplex is:

{π:for all k πk0,k=1Kπk=1}

So the K probability simplex is just the space for all possible vectors which could indicate assignment probabilities to one of K communities.

What does the probability simplex look like? Below, we take a look at the 2-probability simplex (2-d πs) and the 3-probability simplex (3-dimensional πs):

Click to show
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import matplotlib.pyplot as plt
fig=plt.figure(figsize=plt.figaspect(.5))
fig.suptitle("Probability Simplexes")
ax=fig.add_subplot(1,2,1)
x=[1,0]
y=[0,1]
ax.plot(x,y)
ax.set_xticks([0,.5,1])
ax.set_yticks([0,.5,1])
ax.set_xlabel("$\pi_1$")
ax.set_ylabel("$\pi_2$")
ax.set_title("2-probability simplex")

ax=fig.add_subplot(1,2,2,projection='3d')
x = [1,0,0]
y = [0,1,0]
z = [0,0,1]
verts = [list(zip(x,y,z))]
ax.add_collection3d(Poly3DCollection(verts, alpha=.6))
ax.view_init(elev=20,azim=10)
ax.set_xticks([0,.5,1])
ax.set_yticks([0,.5,1])
ax.set_zticks([0,.5,1])
ax.set_xlabel("$\pi_1$")
ax.set_ylabel("$\pi_2$")
h=ax.set_zlabel("$\pi_3$", rotation=0)
ax.set_title("3-probability simplex")
plt.show()
../../_images/sbms_2_0.png

The values of π=(π) that are in the K-probability simplex are indicated by the shaded region of each figure. This comprises the (π1,π2) pairs that fall along a diagonal line from (0,1) to (1,0) for the 2-simplex, and the (π1,π2,π3) tuples that fall on the surface of the triangular shape above with nodes at (1,0,0), (0,1,0), and (0,0,1).

This model has the following parameters:

Parameter

Space

Description

π

the K probability simplex

The probability of a node being assigned to community K

B

[0,1]K×K

The block matrix, which assigns edge probabilities for pairs of communities

The a posteriori SBM is a bit more complicated than the a priori SBM. We will think about the a posteriori SBM as a variation of the a priori SBM, where instead of the node-assignment vector being treated as a known fixed value (the community assignments), we will treat it as unknown. ττ is called a latent variable, which means that it is a quantity that is never actually observed, but which will be useful for describing our model. In this case, ττ takes values in the space {1,...,K}n. This means that for a given realization of ττ, denoted by τ, that for each of the n nodes in the network, we suppose that an integer value between 1 and K indicates which community a node is from. Statistically, we write that the node assignment for node i, denoted by ττi, is sampled independently and identically from Categorical(π). Stated another way, the vector π indicates the probability πk of assignment to each community k in the network.

The matrix B behaves exactly the same as it did with the a posteriori SBM. Finally, let’s think about how to write down the generative model in the a posteriori SBM. The model for the a posteriori SBM is, in fact, nearly the same as for the a priori SBM: we still say that given τi=k and τj=k, that aij are independent Bern(bkk). Here, however, we also describe that ττi are sampled independent and identically from Categorical(π), as we learned above. If A is the adjacency matrix for an a posteriori SBM network with parameters π and B, we write that ASBMn(π,B).

11.5.2.1. Probability#

What does the probability for the a posteriori SBM look like? In this case, θ=(π,B) are the parameters for the model, so the probability for a realization A of A is:

Prθ(A)=Prθ(A=A)

Next, we use the fact that the probability that A=A is, in fact, the integration (over realizations of ττ) of the joint (A,ττ). In this case, we will let T={1,...,K}n be the space of all possible realizations that ττ could take:

(11.1)#Prθ(A)=τTPrθ(A=A,ττ=τ)

Next, remember that by definition of a conditional probability for a random variable x taking value x conditioned on random variable y taking the value y, that Pr(x=x|y=y)=Pr(x=x,y=y)Pr(y=y). Note that by multiplying through by P(y=y), we can see that Pr(x=x,y=y)=Pr(x=x|y=y)Pr(y=y). Using this logic for A and ττ:

Prθ(A)=τTPrθ(A=A|ττ=τ)Pr(ττ=τ)

Intuitively, for each term in the sum, we are treating ττ as taking a fixed value, τ, to evaluate this probability statement.

We will start by describing Pr(ττ=τ). Remember that for ττ, that each entry ττi is sampled independently and identically from Categorical(π).The probability mass for a Categorical(π)-valued random variable is Pr(ττi=τi;π)=πτi. Finally, note that if we are taking the products of n πτi terms, that many of these values will end up being the same. Consider, for instance, if the vector τ=[1,2,1,2,1]. We end up with three terms of π1, and two terms of π2, and it does not matter which order we multiply them in. Rather, all we need to keep track of are the counts of each π term. Written another way, we can use the indicator that τi=k, given by 1τi=k, and a running counter over all of the community probability assignments πk to make this expression a little more sensible. We will use the symbol nk=i=1n1τi=k to denote this value, which is the number of nodes in community k:

Prθ(ττ=τ)=i=1nPrθ(ττi=τi),Independence Assumption=i=1nπτi,p.m.f. of a Categorical R.V.=k=1Kπknk,Reorganizing what we are taking products of

Next, let’s think about the conditional probability term, Prθ(A=A|ττ=τ). Remember that the entries are all independent conditional on ττ taking the value τ. It turns out this is exactly the same result that we obtained for the a priori SBM:

Prθ(A=A|ττ=τ)=k,kbkmkk(1bkk)nkkmkk

Combining these into the integrand gives:

Prθ(A)=τTPrθ(A=A|ττ=τ)Prθ(ττ=τ)=τTk=1K[πknkk=1Kbkkmkk(1bkk)nkkmkk]

Evaluating this sum explicitly proves to be relatively tedious and is a bit outside of the scope of this book, so we will omit it here.

11.5.3. Degree-Corrected Stochastic Block Model (DCSBM)#

Let’s think back to our school example for the Stochastic Block Model. Remember, we had 100 students, each of whom could go to one of two possible schools: school one or school two. Our network had 100 nodes, representing each of the students. We said that the school for which each student attended was represented by their node assignment τi to one of two possible communities. The matrix B was the block probaability matrix, where b11 was the probability that students in school one were friends, b22 was the probability that students in school two were friends, and b12=b21 was the probability that students were friends if they did not go to the same school. In this case, we said that A was an SBMn(τ,B) random network.

When would this setup not make sense? Let’s say that Alice and Bob both go to the same school, but Alice is more popular than Bob. In general since Alice is more popular than Bob, we might want to say that for any clasasmate, Alice gets an additional “popularity benefit” to her probability of being friends with the other classmate, and Bob gets an “unpopularity penalty.” The problem here is that within a single community of an SBM, the SBM assumes that the node degree (the number of nodes each nodes is connected to) is the same for all nodes within a single community. This means that we would be unable to reflect this benefit/penalty system to Alice and Bob, since each student will have the same number of friends, on average. This problem is referred to as community degree homogeneity in a Stochastic Block Model Network. Community degree homogeneity just means that the node degree is homogeneous, or the same, for all nodes within a community.

Degree Homogeneity in a Stochastic Block Model Network

Suppose that ASBMn,τ(B), where A has K=2 communities. What is the node degree of each node in A?

For an arbitrary node vi which is in community k (either one or two), we will compute the expectated value of the degree deg(vi), written E[deg(vi);τi=k]. We will let nk represent the number of nodes whose node assignments τi are to community k. Let’s see what happens:

E[deg(vi);τi=k]=E[j=1naij]=j=1nE[aij]

We use the linearity of expectation again to get from the top line to the second line. Next, instead of summing over all the nodes, we’ll break the sum up into the nodes which are in the same community as node i, and the ones in the other community k. We use the notation k to emphasize that k and k are different values:

E[deg(vi);τi=k]=j:ij,τj=kE[aij]+j:τj=kE[aij]

In the first sum, we have nk1 total edges (the number of nodes that aren’t node i, but are in the same community), and in the second sum, we have nk total edges (the number of nodes that are in the other community). Finally, we will use that the probability of an edge in the same community is bkk, but the probability of an edge between the communities is bkk. Finally, we will use that the expected value of an adjacency aij which is Bernoulli distributed is its probability:

E[deg(vi);τi=k]=j:ij,τj=kbkk+j:τj=bkk,aij are Bernoulli distributed=(nk1)bkk+nkbkk

This holds for any node i which is in community k. Therefore, the expected node degree is the same, or homogeneous, within a community of an SBM.

To address this limitation, we turn to the Degree-Corrected Stochastic Block Model, or DCSBM. As with the Stochastic Block Model, there is both a a priori and a posteriori DCSBM.

11.5.3.1. A Priori DCSBM#

Like the a priori SBM, the a priori DCSBM is where we know which nodes are in which communities ahead of time. Here, we will use the variable K to denote the number of different communiies. The a priori DCSBM has the following two parameters:

Parameter

Space

Description

B

[0,1]K×K

The block matrix, which assigns edge probabilities for pairs of communities

θ

R+n

The degree correction vector, which adjusts the degree for pairs of nodes

The latent community assignment vector ττ with a known a priori realization τ and the block matrix B are exactly the same for the a priori DCSBM as they were for the a priori SBM.

The vector θ is the degree correction vector. Each entry θi is a positive scalar. θi defines how much more (or less) edges associated with node i are connected due to their association with node i.

Finally, let’s think about how to write down the generative model for the a priori DCSBM. We say that τi=k and τj=k, aij is sampled independently from a Bern(θiθjbkk) distribution for all j>i. As we can see, θi in a sense is “correcting” the probabilities of each adjacency to node i to be higher, or lower, depending on the value of θi that that which is given by the block probabilities bk. If A is an a priori DCSBM network with parameters and B, we write that ADCSBMn,τ(θ,B).

11.5.3.1.1. Probability#

The derivation for the probability is the same as for the a priori SBM, with the change that pij=θiθjbkk instead of just bkk. This gives that the probability turns out to be:

Prθ(A)=j>i(θiθjbkk)aij(1θiθjbkk)1aij

The expression doesn’t simplify much more due to the fact that the probabilities are dependent on the particular i and j, so we can’t just reduce the statement in terms of nkk and mkk like for the SBM.

11.5.3.2. A Posteriori DCSBM#

The a posteriori DCSBM is to the a posteriori SBM what the a priori DCSBM was to the a priori SBM. The changes are very minimal, so we will omit explicitly writing it all down here so we can get this section wrapped up, with the idea that the preceding section on the a priori DCSBM should tell you what needs to change. We will leave it as an exercise to the reader to write down a model and probability statement for realizations of the DCSBM.