11.3. Foundation
To understand network models, it is crucial to understand the concept of a network as a random quantity, taking a probability distribution. We have a realization , and we think that this realization is random in some way. Stated another way, we think that there exists a network-valued random variable that governs the realizations we get to see. Since is a random variable, we can describe it using a probability distribution. The distribution of the random network is the function which assigns probabilities to every possible configuration that could take. Notationally, we write that , which is read in words as “the random network is distributed according to .”
In the preceding description, we made a fairly substantial claim: assigns probabilities to every possible configuration that realizations of , denoted by , could take. How many possibilities are there for a network with nodes? Let’s limit ourselves to simple networks: that is, takes values that are unweighted ( is binary), undirected ( is symmetric), and loopless ( is hollow). In words, is the set of all possible adjacency matrices that correspond to simple networks with nodes. Stated another way: every that is found in is a binary matrix (), is symmetric (), and is hollow (, or for all ). We describe as:
To summarize the statement that assigns probabilities to every possible configuration that realizations of can take, we write that . This means that for any which is a possible realization of a random network , that is a probability (it takes a value between and ). If it is completely unambiguous what the random variable refers to, we might abbreviate with . This statement can alternatively be read that the probability that the random variable takes the value is . Finally, let’s address that question we had in the previous paragraph. How many possible adjacency matrices are in ?
Let’s imagine what just one can look like. Note that each matrix has possible entries, in total, since is an matrix. There are possible self-loops for a network, but since is simple, it is loopless. This means that we can subtract possible edges from , leaving us with possible edges that might not be unconnected. If we think in terms of a realization , this means that we are ignoring the diagonal entries , for all . Remember that a simple network is also undirected. In terms of the realization , this means that for every pair and , that . If we were to learn about an entry in the upper triangle of where is such that , note that we have also learned what is, too. This symmetry of means that of the entries that are not on the diagonal of , we would, in fact, “double count” the possible number of unique values that could have. This means that has a total of possible entries which are free, which is equal to the expression . Finally, note that for each entry of , that the adjacency can take one of two possible values: or . To write this down formally, for every possible edge which is randomly determined, we have two possible values that edge could take. Let’s think about building some intuition here:
If is , there are unique entry of , which takes one of values. There are possible ways that could look:
If is , there are unique entries of , each of which takes one of values. There are possible ways that could look:
How do we generalize this to an arbitrary choice of ? The answer is to use combinatorics. Basically, the approach is to look at each entry of which can take different values, and multiply the total number of possibilities by for every element which can take different values. Stated another way, if there are choices for each one of possible items, we have possible ways in which we could select those items. But we already know how many different elements there are in , so we are ready to come up with an expression for the number. In total, there are unique adjacency matrices in . Stated another way, the cardinality of , described by the expression , is . The cardinality here just means the number of elements that the set contains. When is just , note that , which when expressed as a power of , is more than possible networks that can be realized with just nodes! As increases, how many unique possible networks are there? In the below figure, look at the value of as a function of . As we can see, as gets big, grows really really fast!
So, now we know that we have probability distributions on networks, and a set which defines all of the adjacency matrices that every probability distribution must assign a probability to. Now, just what is a network model? A network model is a set of probability distributions on . Stated another way, we can describe to be:
In general, we will simplify through something called parametrization. We define to be the set of all possible parameters of the random network model, and is a particular parameter choice that governs the parameters of a specific network-valued random variable . In this case, we will write as the set:
If is a random network that follows a network model, we will write that , for some arbitrary set of parameters .
If you are used to traditional univariate or multivariate statistical modelling, an extremely natural choice for when you have a discrete sample space (like , which is discrete because we can count it) would be to use a categorical model. In the categorical model, we would have a single parameter for all possible configurations of an -node network; that is, . What is wrong with this model? The limitations are two-fold:
As we explained previously, when is just , we would need over bits of storage just to define . This amounts to more than zetabytes, which exceeds the storage capacity of the entire world.
With a single network observed (or really, any number of networks we could collect in the real world) we would never be able to get a reasonable estimate of parameters for any reasonably non-trivial number of nodes . For the case of one observed network , an estimate of (referred to as ) would simply be for to have a in the entry corresponding to our observed network, and a everywhere else. Inferentially, this would imply that the network-valued random variable which governs realizations is deterministic, even if this is not the case. Even if we collected potentially many observed networks, we would still (with very high probability) just get as a series of point masses on the observed networks we see, and s everywhere else. This would mean our parameter estimates would not generalize to new observations at all, with high probability.
So, what are some more reasonable descriptions of ? We explore some choices below. Particularly, we will be most interested in the independent-edge networks. These are the families of networks in which the generative procedure which governs the random networks assume that the edges of the network are generated independently. Statistical Independence is a property which greatly simplifies many of the modelling assumptions which are crucial for proper estimation and rigorous statistical inference, which we will learn more about in the later chapters.
11.3.1. Equivalence Classes
In all of the below models, we will explore the concept of the probability equivalence class, or an equivalence class, for short. The probability is a function which in general, describes how effective a particular observation can be described by a random variable with parameters , written . The probability will be used to describe the probability of observing the realization if the underlying random variable has parameters . Why does this matter when it comes to equivalence classes? An equivalence class is a subset of the sample space , which has the following properties. Holding the parameters fixed:
If and are members of the same equivalence class (written ), then .
If and are members of different equivalence classes; that is, and where are equivalence classes, then .
Using points 1 and 2, we can establish that if and are two different equivalence classes, then . That is, the equivalence classes are mutually disjoint.
We can use the preceding properties to deduce that given the sample space and a probability function , we can define a partition of the sample space into equivalence classes , where is an arbitrary indexing set. A partition of is a sequence of sets which are mutually disjoint, and whose union is the whole space. That is, .
We will see more below about how the equivalence classes come into play with network models, and in a later section, we will see their relevance to the estimation of the parameters .
11.3.2. Independent-Edge Random Networks
The below models are all special families of something called independent-edge random networks. An independent-edge random network is a network-valued random variable, in which the collection of edges are all independent. In words, this means that for every adjacency of the network-valued random variable , that is independent of , any time that . When the networks are simple, the easiest thing to do is to assume that each edge is connected with some probability (which might be different for each edge) . We use the subscript to denote that this probability is not necessarily the same for each edge. This simple model can be described as has the distribution , for every , and is independent of every other edge in . We only look at the entries , since our networks are simple. This means that knowing a realization of also gives us the realizaaion of (and thus is a deterministic function of ). Further, we know that the random network is loopless, which means that every . We will call the matrix the probability matrix of the network-valued random variable . In general, we will see a common theme for the probabilities of a realization of a network-valued random variable , which is that it will greatly simplify our computation. Remember that if and are binary variables which are independent, that . Using this fact:
Next, we will use the fact that if a random variable has the Bernoulli distribution with probability , that :
Now that we’ve specified a probability and a very generalizable model, we’ve learned the full story behind network models and are ready to skip to estimating parameters, right? Wrong! Unfortunately, if we tried too estimate anything about each individually, we would obtain that if we only have one realization . Even if we had many realizations of , this still would not be very interesting, since we have a lot of s to estimate, and we’ve ignored any sort of structural model that might give us deeper insight into . In the below sections, we will learn successively less restrictive (and hence, more expressive) assumptions about s, which will allow us to convey fairly complex random networks, but still enable us with plenty of intteresting things to learn about later on.