Generative Graph Models
Chapter 8 traditional graph generation methods >
The previous parts of this book introduced a wide variety of methods for learning representations of graphs. In this final part of the book, we will discuss a distinct but closely related task: the problem of > graph generation.
The previous part of this book introduces various methods of learning graphic representation . At the end of the book , We're going to talk about a very different but closely related task ： Graphics generation problem .
The goal of graph generation is to build models that can generate realistic（ Realistic ） graph structures. In some ways, we can view this graph generation problem as the mirror image of the graph embedding problem. Instead of assuming that we are given a graph structure G = (V,E) as input to our model, in graph generation we want the output of our model to be a graph G. Of course, simply generating an arbitrary graph is not necessarily that challenging. For instance, it is trivial to generate a fully connected graph or a graph with no edges. The key challenge in graph generation, however, is generating graphs that have certain desirable properties. As we will see in the following chapters, the way in which we define “desirable properties”—and how we perform graph generation—varies drastically between different approaches.
The goal of graphic generation is to build a model that can generate realistic graphics structure . In a way , We can think of this graph generation problem as a mirror image of graph embedding problem . In the process of graph generation , We want the output of the model to be a graph G, Instead of assuming we get a graph structure G=(V,E) As input to our model . Of course , Simply generating an arbitrary graph is not necessarily that challenging . for example , It is trivial to generate a fully connected graph or a graph without edges . However , The key challenge in graph generation is to generate graphics with some of the required attributes . As we will see in the next chapter , We define “ Ideal properties ” The way -- And how we do graph generation -- There is a big difference between the different methods
In this chapter, we begin with a discussion of traditional approaches to graph generation. These tradiational approaches pre-date most research on graph representation learning—and even machine learning research in general. The methods we will discuss in this chapter thus provide the backdrop to motivate the deep learning-based approaches that we will introduce in Chapter 9.
In this chapter , Let's first discuss the traditional graphic generation method . These traditional methods predate most of the research on graphic representation learning , Even earlier than general machine learning research . therefore , The approach we will discuss in this chapter is motivation, and we will discuss in Chapter 9 The method based on deep learning introduced in chapter provides the background .
8.1 Overview of Traditional Approaches Overview of traditional methods
Traditional approaches to graph generation generally involve specifying some kind of generative process, which defines how the edges in a graph are created. In most cases we can frame this generative process as a way of specifying the probability or likelihood P(A[u, v] = 1) of an edge existing between two nodes u and v. The challenge is crafting some sort of generative process that is both tractable and also able to generate graphs with non-trivial properties or acteristics. Tractability is essential because we want to be able to sample or analyze the graphs that are generated. However, we also want these graphs to have some properties that make them good models for the kinds of graphs we see in the real world.
Traditional graphic generation methods usually involve specifying a certain generation process , This process defines how the edges in a graph are created . in the majority of cases , We can specify that the generation process box exists in two nodes u and v The probability or possibility of an edge between P(A[u,v]=1) One way . The challenge is to design a generation process that is easy to handle and can generate graphs with nontrivial properties or characteristics . in the majority of cases , We can specify that the generation process box exists in two nodes u and v The probability or possibility of an edge between P(A[u,v]=1). Ease of handling is essential , Because we want to be able to sample or analyze the generated charts . However , We also want these graphs to have some properties , Make them good models of the graphs we see in the real world .
The three approaches we review in this subsection represent a small but representative subset of the traditional graph generation approaches that exist in the literature. For a more thorough survey and discussion, we recommend Newman  as a useful resource.
The three methods we review in this section represent a small part of the traditional graphic generation methods in the literature, but they are representative . More in-depth investigation and discussion are needed , We recommend Newman  As a useful resource .
8.2 ER Model
Perhaps the simplest and most well-known generative model of graphs is the ER model . In this model we define the likelihood of an edge occurring between any pair of nodes as
Perhaps the simplest and most famous graph generation model is ER Model . In this model , We define the possibility of edges between any pair of nodes as
where r ∈ [0,1] is parameter controlling the density of the graph. In other words, the ER model simply assumes that the probability of an edge occurring between any pairs of nodes is equal to r.
among r∈[0,1] It's a parameter that controls the density of the pattern . let me put it another way ,ER The model simply assumes that the probability of an edge between any pair of nodes is equal to r.
The ER model is attractive due to its simplicity. To generate a random ER graph, we simply choose (or sample) how many nodes we want, set the density parameter r, and then use Equation (8.1) to generate the adjacency matrix. Since the edge probabilities are all independent, the time complexity to generate a graph is O(|V|2), i.e., linear in the size of the adjacency matrix.
ER The model is attractive because of its simplicity . To generate random ER chart , We just have to choose ( Or sampling ) How many nodes are needed , Set the density parameter r, Then use the equation (8.1) Adjacency matrix . Because the probability of edges is independent , So the time complexity of generating a graph is O(|V|2), That is, the size of the adjacency matrix is linear .
The downside of the ER model, however, is that it does not generate very realistic graphs. In particular, the only property that we can control in the ER model is the density of the graph, since the parameter r is equal (in expectation) to the average degree in the graph. Other graph properties—such as the degree distribution, existence of community structures, node clustering coefficients, and the occurrence of structural motifs—are not captured by the ER model. It is well known that graphs generated by the ER model fail to reflect the distribution of these more complex graph properties, which are known to be important in the structure and function of real-world graphs.
However ,ER The disadvantage of the model is that it can't generate very realistic graphics . Specially , stay ER The only attribute we can control in the model is the density of the graph , Because parameters r be equal to ( In expectation ) The average degree in the graph .ER The model does not capture other graph properties , For example, degree distribution 、 The existence of community structure 、 The emergence of node clustering coefficients and structural primitives . as everyone knows ,ER The graph generated by the model cannot reflect the distribution of these more complex graph attributes , These attributes are very important in the structure and function of the real world map .
8.3 Stochastic Block Models Random block graphs
Many traditional graph generation approaches seek to improve the ER model by better capturing additional properties of real-world graphs, which the ER model ignores. One prominent example is the class of stochastic block models (SBMs), which seek to generate graphs with community structure.
Many traditional graphics generation methods try to improve by capturing additional attributes of real graphics ER Model , and ER The model ignores this . A prominent example is the random block model (SBM), It seeks to generate graphs with a community structure .
In a basic SBM model, we specify a number γ of different blocks: C1, ...,Cγ. Every node u ∈ V then has a probability piof belonging to block i, i.e. pi= P(u ∈ Ci),∀u ∈ V, i = 1, ..., γ wherePγ i=1pi= 1. Edge probabilities are then specified by a block-to-block probability matrix P ∈ [0,1]γ×γ, where C[i, j] gives the probability of an edge occuring between a node in block Ciand a node in block Cj. The generative process for the basic SBM model is as follows:
In the basic SBM In the model , We specify the number of different blocks γ：C1,...,Cγ. then , Every node u∈V Having belongs to a block i Probability pof, namely pi=P(u∈ci),∀u∈V,i=1,...,γ, among Pγi=1pi=1. then , From block to block probability matrix P∈[0,1]γ×γ Specify the edge probability , among C[i,j] Given block Ci Nodes and blocks in Cj The probability of edges between nodes in . basic SBM The model generation process is as follows ：