11.6. RDPGs and more general network models#

11.6.1. Random Dot Product Graph (RDPG)#

11.6.1.1. A Priori RDPG#

The a priori Random Dot Product Graph is an RDPG in which we know a priori the latent position matrix X. The a priori RDPG has the following parameter:

Parameter

Space

Description

X

Rn×d

The matrix of latent positions for each node n.

X is called the latent position matrix of the RDPG. We write that XRn×d, which means that it is a matrix with real values, n rows, and d columns. We will use the notation xi to refer to the ith row of X. xi is referred to as the latent position of a node i. This looks something like this:

X=[x1xn]

Noting that X has d columns, this implies that xiRd, or that each node’s latent position is a real-valued d-dimensional vector.

What is the generative model for the a priori RDPG? As we discussed above, given X, for all j>i, aijBern(xixj) independently. If i<j, aji=aij (the network is undirected), and aii=0 (the network is loopless). If A is an a priori RDPG with parameter X, we write that ARDPGn(X).

11.6.1.1.1. Probability#

Given X, the probability for an RDPG is relatively straightforward, as an RDPG is another Independent-Edge Random Graph. The independence assumption vastly simplifies our resulting expression. We will also use many of the results we’ve identified above, such as the p.m.f. of a Bernoulli random variable. Finally, we’ll note that the probability matrix P=(xixj), so pij=xixj:

Prθ(A)=Prθ(A)=j>iPr(aij=aij),Independence Assumption=j>i(xixj)aij(1xixj)1aij,aijBern(xixj)

Unfortunately, the probability equivalence classes are a bit harder to understand intuitionally here compared to the ER and SBM examples so we won’t write them down here, but they still exist!

11.6.1.2. A Posteriori RDPG#

Like for the a posteriori SBM, the a posteriori RDPG introduces another strange set: the intersection of the unit ball and the non-negative orthant. Huh? This sounds like a real mouthful, but it turns out to be rather straightforward. You are probably already very familiar with a particular orthant: in two-dimensions, an orthant is called a quadrant. Basically, an orthant just extends the concept of a quadrant to spaces which might have more than 2 dimensions. The non-negative orthant happens to be the orthant where all of the entries are non-negative. We call the K-dimensional non-negative orthant the set of points in K-dimensional real space, where:

{xRK:xk0 for all k}

In two dimensions, this is the traditional upper-right portion of the standard coordinate axis. To give you a picture, the 2-dimensional non-negative orthant is the blue region of the following figure:

Click to show
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.axisartist import SubplotZero
import matplotlib.patches as patch

class myAxes():
    
    def __init__(self, xlim=(-5,5), ylim=(-5,5), figsize=(6,6)):
        self.xlim = xlim
        self.ylim = ylim
        self.figsize  = figsize
        self.__scale_arrows()
    def __drawArrow(self, x, y, dx, dy, width, length):
        plt.arrow(
            x, y, dx, dy, 
            color       = 'k',
            clip_on     = False, 
            head_width  = self.head_width, 
            head_length = self.head_length
        ) 
        
    def __scale_arrows(self):
        """ Make the arrows look good regardless of the axis limits """
        xrange = self.xlim[1] - self.xlim[0]
        yrange = self.ylim[1] - self.ylim[0]
        
        self.head_width  = min(xrange/30, 0.25)
        self.head_length = min(yrange/30, 0.3)
        
    def __drawAxis(self):
        """
        Draws the 2D cartesian axis
        """
        # A subplot with two additional axis, "xzero" and "yzero"
        # corresponding to the cartesian axis
        ax = SubplotZero(self.fig, 1, 1, 1)
        self.fig.add_subplot(ax)
        
        # make xzero axis (horizontal axis line through y=0) visible.
        for axis in ["xzero","yzero"]:
            ax.axis[axis].set_visible(True)
        # make the other axis (left, bottom, top, right) invisible
        for n in ["left", "right", "bottom", "top"]:
            ax.axis[n].set_visible(False)
            
        # Plot limits
        plt.xlim(self.xlim)
        plt.ylim(self.ylim)
        ax.set_yticks([-1, 1, ])
        ax.set_xticks([-2, -1, 0, 1, 2])
        # Draw the arrows
        self.__drawArrow(self.xlim[1], 0, 0.01, 0, 0.3, 0.2) # x-axis arrow
        self.__drawArrow(0, self.ylim[1], 0, 0.01, 0.2, 0.3) # y-axis arrow
        self.ax=ax
        
    def draw(self):
        # First draw the axis
        self.fig = plt.figure(figsize=self.figsize)
        self.__drawAxis()

axes = myAxes(xlim=(-2.5,2.5), ylim=(-2,2), figsize=(9,7))
axes.draw()

rectangle =patch.Rectangle((0,0), 3, 3, fc='blue',ec="blue", alpha=.2)
axes.ax.add_patch(rectangle)
plt.show()
../../_images/rdpgs_5_0.png

Now, what is the unit ball? You are probably familiar with the idea of the unit ball, even if you haven’t heard it called that specifically. Remember that the Euclidean norm for a point x which has coordinates xi for i=1,...,K is given by the expression:

||x||2=i=1Kxi2

The Euclidean unit ball is just the set of points whose Euclidean norm is at most 1. To be more specific, the closed unit ball with the Euclidean norm is the set of points:

{xRK:||x||21}

We draw the 2-dimensional unit ball with the Euclidean norm below, where the points that make up the unit ball are shown in red:

Click to show
axes = myAxes(xlim=(-2.5,2.5), ylim=(-2,2), figsize=(9,7))
axes.draw()

circle =patch.Circle((0,0), 1, fc='red',ec="red", alpha=.3)
axes.ax.add_patch(circle)
plt.show()
../../_images/rdpgs_7_0.png

Now what is their intersection? Remember that the intersection of two sets A and B is the set:

AB={x:xA,xB}

That is, each element must be in both sets to be in the intersection. The interesction of the unit ball and the non-negative orthant will be the set:

XK={xRK:||x||21,xk0 for all k}

visually, this will be the set of points in the overlap of the unit ball and the non-negative orthant, which we show below in purple:

Click to show
axes = myAxes(xlim=(-2.5,2.5), ylim=(-2,2), figsize=(9,7))
axes.draw()

circle =patch.Circle((0,0), 1, fc='red',ec="red", alpha=.3)
axes.ax.add_patch(circle)
rectangle =patch.Rectangle((0,0), 3, 3, fc='blue',ec="blue", alpha=.2)
axes.ax.add_patch(rectangle)
plt.show()
../../_images/rdpgs_9_0.png

This space has an incredibly important corollary. It turns out that if x and y are both elements of XK, that x,y=xy, the inner product, is at most 1, and at least 0. Without getting too technical, this is because of something called the Cauchy-Schwartz inequality and the properties of XK. If you remember from linear algebra, the Cauchy-Schwartz inequality states that x,y can be at most the product of ||x||2 and ||y||2. Since x and y have norms both less than or equal to 1 (since they are on the unit ball), their inner-product is at most 1. Further, since x and y are in the non-negative orthant, their inner product can never be negative. This is because both x and y have entries which are not negative, and therefore their element-wise products can never be negative.

The a posteriori RDPG is to the a priori RDPG what the a posteriori SBM was to the a priori SBM. We instead suppose that we do not know the latent position matrix X, but instead know how we can characterize the individual latent positions. We have the following parameter:

Parameter

Space

Description

F

inner-product distributions

A distribution which governs each latent position.

The parameter F is what is known as an inner-product distribution. In the simplest case, we will assume that F is a distribution on a subset of the possible real vectors that have d-dimensions with an important caveat: for any two vectors within this subset, their inner product must be a probability. We will refer to the subset of the possible real vectors as XK, which we learned about above. This means that for any xi,xj that are in XK, it is always the case that xixj is between 0 and 1. This is essential because like previously, we will describe the distribution of each edge in the adjacency matrix using xixj to represent a probability. Next, we will treat the latent position matrix as a matrix-valued random variable which is latent (remember, latent means that we don’t get to see it in our real data). Like before, we will call xi the random latent positions for the nodes of our network. In this case, each xi is sampled independently and identically from the inner-product distribution F described above. The latent-position matrix is the matrix-valued random variable X whose entries are the latent vectors xi, for each of the n nodes.

The model for edges of the a posteriori RDPG can be described by conditioning on this unobserved latent-position matrix. We write down that, conditioned on xi=x and xj=y, that if j>i, then aij is sampled independently from a Bern(xy) distribution. As before, if i<j, aji=aij (the network is undirected), and aii=0 (the network is loopless). If A is the adjacency matrix for an a posteriori RDPG with parameter F, we write that ARDPGn(F).

11.6.1.2.1. Probability#

The probability for the a posteriori RDPG is fairly complicated. This is because, like the a posteriori SBM, we do not actually get to see the latent position matrix X, so we need to use integration to obtain an expression for the probability. Here, we are concerned with realizations of X. Remember that X is just a matrix whose rows are xi, each of which individually have have the distribution F; e.g., xiF independently. For simplicity, we will assume that F is a disrete distribution on XK. This makes the logic of what is going on below much simpler since the notation gets less complicated, but does not detract from the generalizability of the result (the only difference is that sums would be replaced by multivariate integrals, and probability mass functions replaced by probability density functions).

We will let p denote the probability mass function (p.m.f.) of this discrete distribution function F. The strategy will be to use the independence assumption, followed by integration over the relevant rows of X:

Prθ(A)=Prθ(A=A)=j>iPr(aij=aij),Independence AssumptionPr(aij=aij)=xXKyXKPr(aij=aij,xi=x,xj=y),integration over xi and xj

Next, we will simplify this expression a little bit more, using the definition of a conditional probability like we did before for the SBM:

Pr(aij=aij,xi=x,xj=y)=Pr(aij=aij|xi=x,xj=y)Pr(xi=x,xj=y)

Further, remember that if a and b are independent, then Pr(a=a,b=b)=Pr(a=a)Pr(b=b). Using that xi and xj are independent, by definition:

Pr(xi=x,xj=y)=Pr(xi=x)Pr(xj=y)

Which means that:

Pr(aij=aij,xi=x,xj=y)=Pr(aij=aij|xi=x,xj=y)Pr(xi=x)Pr(xj=y)

Finally, we that conditional on xi=xi and xj=xj, aij is Bern(xixj). This means that in terms of our probability matrix, each entry pij=xixj. Therefore:

Pr(aij=aij|xi=x,xj=y)=(xy)aij(1xy)1aij

This implies that:

Pr(aij=aij,xi=x,xj=y)=(xy)aij(1xy)1aijPr(xi=x)Pr(xj=y)

So our complete expression for the probability is:

Prθ(A)=j>ixXKyXK(xy)aij(1xy)1aijPr(xi=x)Pr(xj=y)

11.6.2. Generalized Random Dot Product Graph (GRDPG)#

The Generalized Random Dot Product Graph, or GRDPG, is the most general random network model we will consider in this book. Note that for the RDPG, the probability matrix P had entries pij=xixj. What about pji? Well, pji=xjxi, which is exactly the same as pij! This means that even if we were to consider a directed RDPG, the probabilities that can be captured are always going to be symmetric. The generalized random dot product graph, or GRDPG, relaxes this assumption. This is achieved by using two latent positin matrices, X and Y, and letting P=XY. Now, the entries pij=xiyj, but pji=xjyi, which might be different.

11.6.2.1. A Priori GRDPG#

The a priori GRDPG is a GRDPG in which we know a priori the latent position matrices X and Y. The a priori GRDPG has the following parameters:

Parameter

Space

Description

X

Rn×d

The matrix of left latent positions for each node n.

Y

Rn×d

The matrix of right latent positions for each node n.

X and Y behave nearly the same as the latent position matrix X for the a priori RDPG, with the exception that they will be called the left latent position matrix and the right latent position matrix respectively. Further, the vectors xi will be the left latent positions, and yi will be the right latent positions, for a given node i, for each node i=1,...,n.

What is the generative model for the a priori GRDPG? As we discussed above, given X and Y, for all ji, aijBern(xiyj) independently. If we consider only loopless networks, aij=0. If A is an a priori GRDPG with left and right latent position matrices X and Y, we write that AGRDPGn(X,Y).

11.6.2.2. A Posteriori GRDPG#

The A Posteriori GRDPG is very similar to the a posteriori RDPG. We have two parameters:

Parameter

Space

Description

F

inner-product distributions

A distribution which governs the left latent positions.

G

inner-product distributions

A distribution which governs the right latent positions.

Here, we treat the left and right latent position matrices as latent variable matrices, like we did for a posteriori RDPG. That is, the left latent positions are sampled independently and identically from F, and the right latent positions yi are sampled independently and identically from G.

The model for edges of the a posteriori RDPG can be described by conditioning on the unobserved left and right latent-position matrices. We write down that, conditioned on xi=x and yj=y, that if ji, then aij is sampled independently from a Bern(xy) distribution. As before, assuming the network is loopless, aii=0. If A is the adjacency matrix for an a posteriori RDPG with parameter F, we write that AGRDPGn(F,G).

11.6.3. Inhomogeneous Erdös-Rényi (IER)#

In the preceding models, we typically made assumptions about how we could characterize the edge-existence probabilities using fewer than (n2) different probabilities (one for each edge). The reason for this is that in general, n is usually relatively large, so attempting to actually learn (n2) different probabilities is not, in general, going to be very feasible (it is never feasible when we have a single network, since a single network only one observation for each independent edge). Further, it is relatively difficult to ask questions for which assuming edges share nothing in common (even if they don’t share the same probabilities, there may be properties underlying the probabilities, such as the latent positions that we saw above with the RDPG, that we might still want to characterize) is actually favorable.

Nonetheless, the most general model for an independent-edge random network is known as the Inhomogeneous Erdös-Rényi (IER) Random Network. An IER Random Network is characterized by the following parameters:

Parameter

Space

Description

P

[0,1]n×n

The edge probability matrix.

The probability matrix P is an n×n matrix, where each entry pij is a probability (a value between 0 and 1). Further, if we restrict ourselves to the case of simple networks like we have done so far, P will also be symmetric (pij=pji for all i and j). The generative model is similar to the preceding models we have seen: given the (i,j) entry of P, denoted pij, the edges aij are independent Bern(pij), for any j>i. Further, aii=0 for all i (the network is loopless), and aji=aij (the network is undirected). If A is the adjacency maatrix for an IER network with probability matarix P, we write that AIERn(P).

It is worth noting that all of the preceding models we have discussed so far are special cases of the IER model. This means that, for instance, if we were to consider only the probability matrices where all of the entries are the same, we could represent the ER models. Similarly, if we were to only to consider the probability matrices P where P=XX, we could represent any RDPG.

The IER Random Network can be thought of as the limit of Stochastic Block Models, as the number of communities equals the number of nodes in the network. Stated another way, an SBM Random Network where each node is in its own community is equivalent to an IER Random Network. Under this formulation, note that the block matarix for such an SBM, B, would have n×n unique entries. Taking P to be this block matrix shows that the IER is a limiting case of SBMs.

11.6.3.1. Probability#

The probability for a network which is IER is very straightforward. We use the independence assumption, and the p.m.f. of a Bernoulli-distributed random-variable aij:

Prθ(A)=Pr(A=A)=j>ipijaij(1pij)1aij