11.5.1. A Priori Stochastic Block Model
The a priori SBM is an SBM in which we know ahead of time (a priori) which nodes are in which communities. Here, we will use the variable to denote the maximum number of different communities. The ordering of the communities does not matter; the community we call versus versus is largely a symbolic distinction (the only thing that matters is that they are different). The a priori SBM has the following parameter:
Parameter |
Space |
Description |
|
[0,1] |
The block matrix, which assigns edge probabilities for pairs of communities |
To describe the A Priori SBM, we will designate the community each node is a part of using a vector, which has a single community assignment for each node in the network. We will call this node assignment vector , and it is a -length vector (one element for each node) with elements which can take values from to . In symbols, we would say that . What this means is that for a given element of , , that is the community assignment (either , , so on and so forth up to ) for the node. If there we hahd an example where there were communities () for instance, and the first two nodes are in community and the second two in community , then would be a vector which looks like:
Next, let’s discuss the matrix , which is known as the block matrix of the SBM. We write down that , which means that the block matrix is a matrix with rows and columns. If we have a pair of nodes and know which of the communities each node is from, the block matrix tells us the probability that those two nodes are connected. If our networks are simple, the matrix is also symmetric, which means that if where is a probability, that , too. The requirement of to be symmetric exists only if we are dealing with undirected networks.
Finally, let’s think about how to write down the generative model for the a priori SBM. Intuitionally what we want to reflect is, if we know that node is in community and node is in community , that the entry of the block matrix is the probability that and are connected. We say that given and , is sampled independently from a distribution for all . Note that the adjacencies are not necessarily identically distributed, because the probability depends on the community of edge . If is an a priori SBM network with parameter , and is a realization of the node-assignment vector, we write that .
11.5.1.1. Probability
What does the probability for the a priori SBM look like? In our previous description, we admittedly simplified things to an extent to keep the wording down. In truth, we model the a priori SBM using a latent variable model, which means that the node assignment vector, , is treated as random. For the case of the a priori SBM, it just so happens that we know the specific value that this latent variable takes, , ahead of time.
Fortunately, since is a parameter of the a priori SBM, the probability is a bit simpler than for the a posteriori SBM. This is because the a posteriori SBM requires an integration over potential realizations of , whereas the a priori SBM does not, since we already know that was realized as .
Putting these steps together gives us that:
Next, for the a priori SBM, we know that each edge only actually depends on the community assignments of nodes and , so we know that , where and are any of the possible communities. This is because the community assignments of nodes that are not nodes and do not matter for edge , due to the independence assumption.
Next, let’s think about the probability matrix for the a priori SBM. We know that, given that and , each adjacency is sampled independently and identically from a distribution. This means that . Completing our analysis from above:
Where denotes the total number of edges possible between nodes assigned to community and nodes assigned to community . That is, . Further, we will use to denote the total number of edges observed between these two communities. That is, . Note that for a single community pair, that the probability is analogous to the probability of a realization of an ER random variable.
Like the ER model, there are again equivalence classes of the sample space in terms of their probability. For a two-community setting, with and given, the equivalence classes are the sets:
The number of equivalence classes possible scales with the number of communities, and the manner in which nodes are assigned to communities (particularly, the number of nodes in each community).
11.5.2. A Posteriori Stochastic Block Model
In the a posteriori Stochastic Block Model (SBM), we consider that node assignment to one of communities is a random variable, that we don’t know already like te a priori SBM. We’re going to see a funky word come up, that you’re probably not familiar with, the probability simplex. What the heck is a probability simplex?
The intuition for a simplex is probably something you’re very familiar with, but just haven’t seen a word describe. Let’s say I have a vector, , which has a total of elements. will be a vector, which indicates the probability that a given node is assigned to each of our communities, so we need to impose some additional constraints. Symbolically, we would say that, for all , and for all :
The we’re going to use has a very special property: all of its elements are non-negative: for all , . This makes sense since is being used to represent the probability of a node being in group , so it certainly can’t be negative. Further, there’s another thing that we want our to have: in order for each element to indicate the probability of something to be assigned to , we need all of the s to sum up to one. This is because of something called the Law of Total Probability. If we have total values that could take, then it is the case that:
So, back to our question: how does a probability simplex fit in? Well, the probability simplex describes all of the possible values that our vector could take! In symbols, the probability simplex is:
So the probability simplex is just the space for all possible vectors which could indicate assignment probabilities to one of communities.
What does the probability simplex look like? Below, we take a look at the -probability simplex (2-d s) and the -probability simplex (3-dimensional s):
The values of that are in the -probability simplex are indicated by the shaded region of each figure. This comprises the pairs that fall along a diagonal line from to for the -simplex, and the tuples that fall on the surface of the triangular shape above with nodes at , , and .
This model has the following parameters:
Parameter |
Space |
Description |
|
the probability simplex |
The probability of a node being assigned to community |
|
[0,1] |
The block matrix, which assigns edge probabilities for pairs of communities |
The a posteriori SBM is a bit more complicated than the a priori SBM. We will think about the a posteriori SBM as a variation of the a priori SBM, where instead of the node-assignment vector being treated as a known fixed value (the community assignments), we will treat it as unknown. is called a latent variable, which means that it is a quantity that is never actually observed, but which will be useful for describing our model. In this case, takes values in the space . This means that for a given realization of , denoted by , that for each of the nodes in the network, we suppose that an integer value between and indicates which community a node is from. Statistically, we write that the node assignment for node , denoted by , is sampled independently and identically from . Stated another way, the vector indicates the probability of assignment to each community in the network.
The matrix behaves exactly the same as it did with the a posteriori SBM. Finally, let’s think about how to write down the generative model in the a posteriori SBM. The model for the a posteriori SBM is, in fact, nearly the same as for the a priori SBM: we still say that given and , that are independent . Here, however, we also describe that are sampled independent and identically from , as we learned above. If is the adjacency matrix for an a posteriori SBM network with parameters and , we write that .
11.5.2.1. Probability
What does the probability for the a posteriori SBM look like? In this case, are the parameters for the model, so the probability for a realization of is:
Next, we use the fact that the probability that is, in fact, the integration (over realizations of ) of the joint . In this case, we will let be the space of all possible realizations that could take:
(11.1)
Next, remember that by definition of a conditional probability for a random variable taking value conditioned on random variable taking the value , that . Note that by multiplying through by , we can see that . Using this logic for and :
Intuitively, for each term in the sum, we are treating as taking a fixed value, , to evaluate this probability statement.
We will start by describing . Remember that for , that each entry is sampled independently and identically from .The probability mass for a -valued random variable is . Finally, note that if we are taking the products of terms, that many of these values will end up being the same. Consider, for instance, if the vector . We end up with three terms of , and two terms of , and it does not matter which order we multiply them in. Rather, all we need to keep track of are the counts of each term. Written another way, we can use the indicator that , given by , and a running counter over all of the community probability assignments to make this expression a little more sensible. We will use the symbol to denote this value, which is the number of nodes in community :
Next, let’s think about the conditional probability term, . Remember that the entries are all independent conditional on taking the value . It turns out this is exactly the same result that we obtained for the a priori SBM:
Combining these into the integrand gives:
Evaluating this sum explicitly proves to be relatively tedious and is a bit outside of the scope of this book, so we will omit it here.
11.5.3. Degree-Corrected Stochastic Block Model (DCSBM)
Let’s think back to our school example for the Stochastic Block Model. Remember, we had 100 students, each of whom could go to one of two possible schools: school one or school two. Our network had 100 nodes, representing each of the students. We said that the school for which each student attended was represented by their node assignment to one of two possible communities. The matrix was the block probaability matrix, where was the probability that students in school one were friends, was the probability that students in school two were friends, and was the probability that students were friends if they did not go to the same school. In this case, we said that was an random network.
When would this setup not make sense? Let’s say that Alice and Bob both go to the same school, but Alice is more popular than Bob. In general since Alice is more popular than Bob, we might want to say that for any clasasmate, Alice gets an additional “popularity benefit” to her probability of being friends with the other classmate, and Bob gets an “unpopularity penalty.” The problem here is that within a single community of an SBM, the SBM assumes that the node degree (the number of nodes each nodes is connected to) is the same for all nodes within a single community. This means that we would be unable to reflect this benefit/penalty system to Alice and Bob, since each student will have the same number of friends, on average. This problem is referred to as community degree homogeneity in a Stochastic Block Model Network. Community degree homogeneity just means that the node degree is homogeneous, or the same, for all nodes within a community.
Degree Homogeneity in a Stochastic Block Model Network
Suppose that , where has communities. What is the node degree of each node in ?
For an arbitrary node which is in community (either one or two), we will compute the expectated value of the degree , written . We will let represent the number of nodes whose node assignments are to community . Let’s see what happens:
We use the linearity of expectation again to get from the top line to the second line. Next, instead of summing over all the nodes, we’ll break the sum up into the nodes which are in the same community as node , and the ones in the other community . We use the notation to emphasize that and are different values:
In the first sum, we have total edges (the number of nodes that aren’t node , but are in the same community), and in the second sum, we have total edges (the number of nodes that are in the other community). Finally, we will use that the probability of an edge in the same community is , but the probability of an edge between the communities is . Finally, we will use that the expected value of an adjacency which is Bernoulli distributed is its probability:
This holds for any node which is in community . Therefore, the expected node degree is the same, or homogeneous, within a community of an SBM.
To address this limitation, we turn to the Degree-Corrected Stochastic Block Model, or DCSBM. As with the Stochastic Block Model, there is both a a priori and a posteriori DCSBM.
11.5.3.1. A Priori DCSBM
Like the a priori SBM, the a priori DCSBM is where we know which nodes are in which communities ahead of time. Here, we will use the variable to denote the number of different communiies. The a priori DCSBM has the following two parameters:
Parameter |
Space |
Description |
|
[0,1] |
The block matrix, which assigns edge probabilities for pairs of communities |
|
|
The degree correction vector, which adjusts the degree for pairs of nodes |
The latent community assignment vector with a known a priori realization and the block matrix are exactly the same for the a priori DCSBM as they were for the a priori SBM.
The vector is the degree correction vector. Each entry is a positive scalar. defines how much more (or less) edges associated with node are connected due to their association with node .
Finally, let’s think about how to write down the generative model for the a priori DCSBM. We say that and , is sampled independently from a distribution for all . As we can see, in a sense is “correcting” the probabilities of each adjacency to node to be higher, or lower, depending on the value of that that which is given by the block probabilities . If is an a priori DCSBM network with parameters and , we write that .
11.5.3.1.1. Probability
The derivation for the probability is the same as for the a priori SBM, with the change that instead of just . This gives that the probability turns out to be:
The expression doesn’t simplify much more due to the fact that the probabilities are dependent on the particular and , so we can’t just reduce the statement in terms of and like for the SBM.
11.5.3.2. A Posteriori DCSBM
The a posteriori DCSBM is to the a posteriori SBM what the a priori DCSBM was to the a priori SBM. The changes are very minimal, so we will omit explicitly writing it all down here so we can get this section wrapped up, with the idea that the preceding section on the a priori DCSBM should tell you what needs to change. We will leave it as an exercise to the reader to write down a model and probability statement for realizations of the DCSBM.