Using Networks To Make Predictions - A lecture (3 of 3) by Mark Newman

For those willing to get introduced to the world of complex networks, the three lectures given by Mark Newman, a British physicist, at the Santa Fe Institute on 14,15 and 16 September 2010 are a great way to get to know a little bit about this field.

The first lecture introduced the concept of networks. The second lecture talked about network characteristics (centrality, degree, transitivity, homophily and  modularity). Let's continue with the third lecture. You can find it here. This time on the impact of network science.

In this post I summarise (certainly in a very personal fashion, although some points are directly extracted from his slides) the learning points I extracted from the lecture.

Dynamics in networks
- For example, how does a rumor spread in a network?
- This aspect is much more controversial than the point touched in lectures 1 and 2.
- An example: Citation networks (e.g. the network of legal opinions or the network of scientific papers).
- "Price observed that the distribution of the number of citations a paper gets follows a power law or Pareto distribution - a fat-tailed distribution in which most papers get few citations and a few get many".
- This power law is somehow surprising.

Power laws
- In comparison to a normal distribution, the power law shows that there are some nodes with a number of links that is several orders of magnitude higher. This does not happen in normal distributions.
- Examples of cases that follow a Pareto law (power law) are word counts in books, web hits, wealth distribution, family names, city populations, etc.
- Power law - the 80/20 rule. E.g. "the top 20% own 86% of the wealth. 10% of the cities have 60% of the people. 75% of people have surnames in the top 1%.
- Power laws are a very study area in complex systems.

Where do power laws come from? Preferential attachment
- The importance of getting an early lead e.g. with an excellent product, or by good marketing.  
- A plausible theory is preferential attachment. Interestingly enough, this theory does ignore the content of the papers. It only uses the number of links the nodes have. 
- First mover advantage: In citations, if you are one of the first ones writing on a topic, your paper will be cited anyway, regardless of the content. They are the early lead in that specific field.
- How many you have depends on how many you already have.
- In conclusion, it is much more effective, according to this theory, to write a mediocre paper on tomorrow's field rather than a superb paper in today's field.
- The long tail effect: A small number of nodes with  lots of connections.

The spread of a disease over a network
- Percolation model. In a specific network, I colour some of the edges and with those I have a different network starting from my initial network.
- How does the structure of the network influence the spread of a disease?
- Degree is the number of connections you have.
- Hubs are extremely effective of passing diseases along.
- What about if we vaccinate hubs? Targeted vaccination.
- Herd immunisation.
- Targeted attacks are much more effective (clear link with information security)
- We can use the network itself to find the hubs.
- People who should be vaccinated are the most mentioned friends.

Network robustness
- Can we tell that a network is robust by looking at its structure? Let's go back to the concept of homophily (mentioned in Part 2 or 3 of this series of lectures).
- Homophily by degree: Party people hanging out with party people (positive correlation coefficient in social networks- high degree nodes connect with high degree nodes).
- You get a very dense core and very clean borders. Social networks are then very robust networks. This is exactly the opposite we would like in terms of disease spread.
- Social networks are very robust and easy to vaccinate against diseases.
- Internet is fragile however. The high degree nodes connect with the low degree nodes. The highly dense nodes connect with scarcely connected nodes. The high degree nodes are spread out all over the network. Those networks are not so robust. They are fragile. If you knock down nodes with high degree, you knock down the network very quickly.
- Number of connections (x axis) is the degree.
- The crucial factor in the spread of disease is airplanes.

Future directions
- Great slide: This is very very new field. "We need to
- Improve the measurement of networks.
- Understand how networks change over time.
- Understand how changing a network can change its performance, and perhaps improve it.
- Get better at predicting network phenomena.
- Predict how society will react or evolve based on social networks.
- Prevent disease outbreaks before they happen. 
- And..?"
- Sometimes you engineer a network and sometimes it works!


Networking city