Student Paper Notes: The structure and function of complex networks by M.E.J. Newman

This time the student notes I post are coming from the reading of a scientific paper written by Mark Newman, Professor of Physics in the University of Michigan.

The link to the paper is this one. It is a 58-page paper with 419 references on complex networks. As an enticing intro to suggest its reading, nothing else is required. It complements the previous notes that I published on the Network Science book authored by Albert-László Barabási. Compared to the Network Science book, this paper is slightly more condensed.

This paper contains relevant historical references on the field of Network Science and it has also a considerable mathematical component.

The following points are notes mostly literally extracted from this paper. The intent of this post is not to create new content but rather to provide Network Science students with a (literal and non-comprehensive) summary of this paper.

Why this Information Security blog touches the field of Network Science? I am convinced that we can apply Network Science learning points to our Information Security real-life scenarios.

As always, a little disclaimer: These notes do not replace the reading of the paper, they are just that, some student notes (or fragments) of the paper.

Introduction

- This paper reviews recent and non-recent work on the structure and function of networked systems.
- Vertices are nodes and edges are links.
- Degree: The number of edges connected to a vertex.
- Field of research: Consideration of large-scale statistical properties of graphs.
- The body of theory of Network Science has three aims: First, to find statistical properties like path lengths and degree distributions that define structure and behaviour of networked systems and to suggest appropriate ways to measure these properties. Second, to propose models of networks and third, to predict the behaviour of networked systems.
- A hyperedge joins more than two vertices together.

Networks in the real world

- An acyclic graph has no closed loops. The WWW is in general cyclic.
- The small-world effect: Most pairs of vertices in most networks seem to be connected by a short path through the network.

Properties of networks

- Network transitivity (or, sometimes, clustering): "the friend of your friend is likely to be your friend".
- The clustering coefficient C is the mean probability that the friend of your friend is also your friend.
- The clustering coefficient C measures the density of triangles in the network.
- The probability that two vertices point to each other in a directed network is called reciprocity. In a directed network edges have a sense.

- In a random network, in which each edge is present or absent with equal probability, the degree distribution is binomial or Poisson in the limit of large graph size.
- Real world networks are rarely random in their degree distributions.
The degree of the vertices in most real networks are highly right-skewed i.e. their degree distributions have a long right tail of values that are far above the mean.
- Networks with power-law degree distributions are referred as scale-free (degree distribution).
- The maximum degree of a vertex in a network will in general depend on the size of the network.

- Network resilience: Distance was almost entirely unaffected by random vertex removal. However, when removal targets highest degree vertices, it has devastating effects. An example of this is the Internet.

- Assortative mixing or homophily: selective linking. A classical example of assortative mixing in social networks is mixing by race.
- In some networks, high-degree vertices tend to associate with other high-degree vertices. Most social networks appear to be assortative, other types of networks (biological, technological and information) appear to be disassortative.
- The traditional method for extracting community structure from a network is cluster analysis (also called hierarchical analysis).
- Community structure is a common network property.

- Navigation: Stanley Milgram's small-world experiment. Short paths exist in the network and network members are good at finding them. This is important e.g. for efficient databases structures or better peer-to-peer computer networks.
- Betweenness centrality of a vertex in a network is the number of geodesic paths between other vertices that run through it. Betweenness centrality can also be viewed as a measure of network resilience.

Random graphs

- Poisson random graphs (or Bernoulli graphs) are not adequate to describe some important properties of real-world networks.
- The most important property of a random graph is that it possesses a phase transition, from a low-density with few edges and all components being small, having an exponential size distribution and finite mean size to a high density phase in which an extensive fraction of all vertices are joined together in a single giant component.
- The random graph reproduces well the small-world effect, however, in almost all other respects, the properties of a random graph do no match those of real-world networks.
- A random graph has a Poisson degree distribution, entirely random mixing patterns and no correlation between degrees of adjacent vertices, no community structure and, finally, navigation is not possible using local algorithms.
- The property of real graphs that is simplest to add to random graphs is the non-Poisson degree distributions i.e. the "configuration model". An example of this would be a network with a power-law degree distribution.
- Other examples are: directed graphs, bipartite graphs (which have two types of vertices and edges running only between vertices of unlike types - these ones are sometimes studied using "one-mode" projections).
- An additional random graph model for degree correlation is the exponential random graph. A more specialised model is proposed by Maslov.

Exponential random graphs and Markov graphs

- The only solvable random graph models that currently incorporate transitivity are the bipartite and the community-structured models plus certain dual graph models.
- Progress in understanding transitivity require different (and new) models.


The small-world model

- A less sophisticated but more tractable model of a network with high transitivity. However, the degree distribution of the small-world model does not match most real-world networks very well.

- A lot of attention has been given to the average geodesic path length of the small-world model.

Models of network growth

- The best studied class of network models aim to explain the origin of highly skewed degree distributions.
- Probably Price described the first example of a scale-free network, the network of citations between scientific papers.
- Power laws arise when "the rich get richer" i.e. the amount you get goes up with the amount you already have. Price named that "cumulative advantage". Barabasi and Albert named that "preferential attachment".
- z lately denotes degree (and not m) i.e. the total number of edges in the graph.
- The mechanism of cumulative advantage proposed by Price is widely accepted as the explanation for the power-law degree distribution in real-world networks such as the WWW, the citation network and possibly the Internet.
- The difference between the Price model and the Barabasi and Albert model is that in the latter one the edges are undirected, so there is no distinction between in and out degree. The Barabasi and Albert model is simpler and slightly less realistic.
- There is a correlation between the age of vertices and their degrees, with older vertices having higher mean degree.
- Krapivsky and Redner show that there are correlations between the degrees of adjacent vertices in the model.
- The assumption of linear preferential attachment seems to be a reasonable approximation to the truth.
- The real WWW does not present the correlations between age and degree of vertices as found in the Barabasi and Albert model. This is, according to Adamic and Huberman, because the degree of vertices is also a function of their intrinsic worth.
- Bianconi and Barabasi have presented an extension of the Barabasi and Albert model: Each newly appearing vertex is given a "fitness" that represent its attractiveness i.e. its propensity to accrue new links.
- Price's model is intended to be a model of a citation network. Citation networks are really directed, acyclic and all vertices (approximately) belong to a single component, unless they do not cite and are not cited.
- Simple growth model by Callaway et al.: Vertices normally have degree zero when they are first added to the graph. This model does not show preferential attachment so no power-law distributions but exponential.
- Some networks appear to have power-law degree distributions but they do not show preferential attachment e.g. biochemical interaction networks. These networks could grow by copying vertices.

Processes taking place on networks

- Looking at the behaviour of models of physical, biological, social processes going on these networks.
- In Physics, vertices are sites and edges are bonds.
- A percolation process is one in which vertices or edges on a graph are randomly designated either "occupied" or "unoccupied", asking about various properties of the resulting vertices patterns.
- The problem of resilience to random failure of vertices in a network is equivalent to a site percolation process on the network. The number of remaining vertices that can still successfully communicate is precisely the giant component of the corresponding percolation model.
- Networks with power-law degree distributions are highly susceptible to targeted attacks: one only need to remove a small percentage of vertices to destroy the giant component entirely.
- Cascading failures: Watts provided a simple model for cascading failures as a type of percolation. It could be solved using generating function methods similar to those for simple vertex removal.

Epidemiological processes

- The SIR model divides the population into three classes: Susceptible, Infected and Recovered (with permanent illness immunity).
- Deseases do not always spread on scale-free networks.
- Vaccination can be modeled as a site percolation process.
- As networks tend to be particularly vulnerable to the removal of their highest degree vertices, targeted vaccination is expected to be particularly effective.
- It is not always easy to find the highest degree vertices in a social network.
- One is more likely to find high-degree vertices by following edges than by choosing vertices at random.
- Therefore, a population can be immunised by choosing a random person and vaccinating a friend of that person and then repeating again the process.
- The SIS model: an example is computer viruses.
- At least in networks with right-skewed degree distributions, propagation of the disease turns out to be relatively robust against random vaccionations but highly susceptible to vaccination of the highest-degree individuals.

Exhaustive search 

- A page is important if it is linked by many many pages.

Guided search

- Performs small special-purpose crawls.
- It relies on the assumption that pages containing a particular information on a particular topic tend to be clustered together in local regions of the graph.

Network navigation

- The objective is to design networks structures that make a particular search algorithm perform well.
- The "social distance" is measured by the height of the lowest level in the tree at which they are both connected. In other words, how far one must go up the tree to find the lowest “common ancestor” of the pair.

Phase transition on networks

- E.g. models of opinion formation in social networks.

Other processes on networks

- Voter models, difussion, genetic regulatory models, etc.

- The study of complex networks is still in its infancy. 
 

Happy networking!

The network castle