The friendship paradox is the phenomenon first observed by the sociologist Scott L. Feld in 1991 that most people have fewer friends than their friends have, on average. It can be explained as a form of sampling bias
in which people with greater numbers of friends have an increased
likelihood of being observed among one's own friends. In contradiction
to this, most people believe that they have more friends than their
friends have.
The same observation can be applied more generally to social networks defined by other relations than friendship: for instance, most people's sexual partners have had (on the average) a greater number of sexual partners than they have.
The same observation can be applied more generally to social networks defined by other relations than friendship: for instance, most people's sexual partners have had (on the average) a greater number of sexual partners than they have.
Mathematical explanation
In spite of its apparently paradoxical nature, the phenomenon is real, and can be explained as a consequence of the general mathematical properties of social networks. The mathematics behind this are directly related to the arithmetic-geometric mean inequality and the Cauchy–Schwarz inequality.
Formally, Feld assumes that a social network is represented by an undirected graph G = (V, E), where the set V of vertices corresponds to the people in the social network, and the set E of edges corresponds to the friendship relation between pairs of people. That is, he assumes that friendship is a symmetric relation: if X is a friend of Y, then Y is a friend of X. He models the average number of friends of a person in the social network as the average of the degrees of the vertices in the graph. That is, if vertex v has d(v) edges touching it (representing a person who has d(v) friends), then the average number μ of friends of a random person in the graph is
The average number of friends that a typical friend has can be
modeled by choosing a random person (who has at least one friend), and
then calculating how many friends their friends have on average. This
amounts to choosing, uniformly at random, an edge of the graph
(representing a pair of friends) and an endpoint of that edge (one of
the friends), and again calculating the degree of the selected endpoint.
The probability of a certain vertex to be chosen is :
The first factor corresponds to how likely it is that the chosen edge
contains the vertex, which increases when the vertex has more friends.
The halving factor simply comes from the fact that each edge has two
vertices. So the expected value of the number of friends of a (randomly
chosen) friend is :
We know from the definition of variance that :
where is the variance of the degrees in the graph. This allows us to compute the desired expected value :
For a graph that has vertices of varying degrees (as is typical for social networks), both μ and are positive, which implies that the average degree of a friend is strictly greater than the average degree of a random node.
Another way of understanding how the first term came is as follows. For each friendship (u, v), a node u mentions that v is a friend and v has d(v) friends. There are d(v) such friends who mention this. Hence the square of d(v) term. We add this for all such friendships in the network from both the u's and v's
perspective, which gives the numerator. The denominator is the number
of total such friendships, which is twice the total edges in the network
(one from the u's perspective and the other from the v's).
After this analysis, Feld goes on to make some more qualitative
assumptions about the statistical correlation between the number of
friends that two friends have, based on theories of social networks such
as assortative mixing,
and he analyzes what these assumptions imply about the number of people
whose friends have more friends than they do. Based on this analysis,
he concludes that in real social networks, most people are likely to
have fewer friends than the average of their friends' numbers of
friends. However, this conclusion is not a mathematical certainty; there
exist undirected graphs (such as the graph formed by removing a single
edge from a large complete graph)
that are unlikely to arise as social networks but in which most
vertices have higher degree than the average of their neighbors'
degrees.
Applications
The
analysis of the friendship paradox implies that the friends of randomly
selected individuals are likely to have higher than average centrality. This observation has been used as a way to forecast and slow the course of epidemics,
by using this random selection process to choose individuals to
immunize or monitor for infection while avoiding the need for a complex
computation of the centrality of all nodes in the network.
A study in 2010 by Christakis and Fowler showed that flu
outbreaks can be detected almost 2 weeks before traditional surveillance
measures can by using the friendship paradox in monitoring the
infection in a social network. They found that using the friendship paradox to analyze the health of central
friends is "an ideal way to predict outbreaks, but detailed information
doesn't exist for most groups, and to produce it would be
time-consuming and costly."
The "generalized friendship paradox" states that the friendship
paradox applies to other characteristics as well. For example, one's
co-authors are on average likely to be more prominent, with more
publications, more citations and more collaborators, or one's followers on Twitter have more followers. The same effect has also been demonstrated for Subjective Well-Being by Bollen et al (2017),
who used a large-scale Twitter network and longitudinal data on
subjective well-being for each individual in the network to demonstrate
that both a Friendship and a "happiness" paradox can occur in online
social networks.