The term "network medicine" was introduced by Albert-László Barabási in an the article "Network Medicine – From Obesity to the 'Diseasome'", published in The New England Journal of Medicine, in 2007. Barabási states that biological systems,
similarly to social and technological systems, contain many components
that are connected in complicated relationships but are organized by
simple principles. Relaying on the tools and principles of network theory, the organizing principles can be analyzed by representing systems as complex networks, which are collections of nodes
linked together by a particular biological or molecular relationship.
For networks pertaining to medicine, nodes represent biological factors (biomolecules,
diseases, phenotypes, etc.) and links (edges) represent their
relationships (physical interactions, shared metabolic pathway, shared
gene, shared trait, etc.).
Barabasi suggested that understanding human disease requires us to focus on three key networks, the metabolic network, the disease network, and the social network. The network medicine is based on the idea that understanding complexity of gene regulation, metabolic reactions, and protein-protein interactions
and that representing these as complex networks will shed light on the
causes and mechanisms of diseases. It is possible, for example, to infer
a bipartite graph representing the connections of diseases to their associated genes using the OMIM database.
The projection of the diseases, called the human disease network (HDN),
is a network of diseases connected to each other if they share a common
gene. Using the HDN, diseases can be classified and analyzed through
the genetic relationships between them. Network medicine has proven to
be a valuable tool in analyzing big biomedical data.
Using interactome networks, one can discover and classify
diseases, as well as develop treatments through knowledge of its
associations and their role in the networks. One observation is that
diseases can be classified not by their principle phenotypes (pathophenotype) but by their disease module, which is a neighborhood or group of components in the interactome that, if disrupted, results in a specific pathophenotype.
Disease modules can be used in a variety of ways, such as predicting
disease genes that have not been discovered yet. Therefore, network
medicine looks to identify the disease module for a specific pathophenotype using clustering algorithms.
Human disease networks,
also called the diseasome, are networks in which the nodes are diseases
and the links, the strength of correlation between them. This
correlation is commonly quantified based on associated cellular
components that two diseases share. The first-published human disease
network (HDN) looked at genes, finding that many of the disease
associated genes are non-essential genes, as these are the genes that do not completely disrupt the network and are able to be passed down generations. Metabolic disease networks (MDN), in which two diseases are connected by a shared metabolite or metabolic pathway, have also been extensively studied and is especially relevant in the case of metabolic disorders.
Three representations of the diseasome are:
Shared gene formalism states that if a gene is
linked to two different disease phenotypes, then the two diseases likely
have a common genetic origin (genetic disorders).
Shared metabolic pathway formalism states that if a
metabolic pathway is linked to two different diseases, then the two
diseases likely have a shared metabolic origin (metabolic disorders).
Disease comorbidity formalism uses phenotypic disease networks (PDN), where two diseases are linked if the observed comorbidity between their phenotypes exceeds a predefined threshold.
This does not look at the mechanism of action of diseases, but captures
disease progression and how highly connected diseases correlate to
higher mortality rates.
Some disease networks connect diseases to associated factors outside the human cell. Networks of environmental and genetic etiological factors linked with shared diseases, called the "etiome", can be also used to assess the clustering of environmental factors in these networks and understand the role of the environment on the interactome.
The human symptom-disease network (HSDN), published in June 2014,
showed that the symptoms of disease and disease associated cellular
components were strongly correlated and that diseases of the same
categories tend to form highly connected communities, with respect to
their symptoms.
Network pharmacology is a developing field based in systems pharmacology that looks at the effect of drugs on both the interactome and the diseasome. The topology of a biochemical reaction network determines the shape of drug dose-response curve as well as the type of drug-drug interactions,
thus can help design efficient and safe therapeutic strategies. In
addition, the drug-target network (DTN) can play an important role in
understanding the mechanisms of action of approved and experimental
drugs. The network theory view of pharmaceuticals is based on the effect of the drug in the interactome, especially the region that the drug target occupies. Combination therapy for a complex disease (polypharmacology) is suggested in this field since one active pharmaceutical ingredient (API) aimed at one target may not affect the entire disease module. The concept of disease modules can be used to aid in drug discovery, drug design, and the development of biomarkers for disease detection.
There can be a variety of ways to identifying drugs using network
pharmacology; a simple example of this is the "guilt by association"
method. This states if two diseases are treated by the same drug, a drug
that treats one disease may treat the other. Drug repurposing, drug-drug interactions and drug side-effects have also been studied in this field.
The next iteration of network pharmacology used entirely different
disease definitions, defined as dysfunction in signaling modules derived
from protein-protein interaction modules. The latter as well as the
interactome had many conceptual shortcomings, e.g., each protein appears
only once in the interactome, whereas in reality, one protein can occur
in different contexts and different cellular locations. Such signaling
modules are therapeutically best targeted at several sites, which is now
the new and clinically applied definition of network pharmacology. To
achieve higher than current precision, patients must not be selected
solely on descriptive phenotypes but also based on diagnostics that
detect the module dysregulation. Moreover, such mechanism-based network
pharmacology has the advantage that each of the drugs used within one
module is highly synergistic, which allows for reducing the doses of
each drug, which then reduces the potential of these drugs acting on
other proteins outside the module and hence the chance for unwanted side
effects.
Network epidemics
Network epidemics has been built by applying network science to existing epidemic models, as many transportation networks and social networks play a role in the spread of disease. Social networks have been used to assess the role of social ties in the spread of obesity in populations. Epidemic models and concepts, such as spreading and contact tracing, have been adapted to be used in network analysis. These models can be used in public health policies, in order to implement strategies such as targeted immunization and has been recently used to model the spread of the Ebola virus epidemic in West Africa across countries and continents.
Drug prescription networks (DPNs)
Recently,
some researchers tended to represent medication use in form of
networks. The nodes in these networks represent medications and the
edges represent some sort of relationship between these medications.
Cavallo et al. (2013)
described the topology of a co-prescription network to demonstrate
which drug classes are most co-prescribed. Bazzoni et al. (2015) concluded that the DPNs of co-prescribed medications are dense, highly clustered, modular and assortative. Askar et al. (2021) created a network of the severe drug-drug interactions (DDIs) showing that it consisted of many clusters.
Other networks
The development of organs
and other biological systems can be modelled as network structures
where the clinical (e.g., radiographic, functional) characteristics can
be represented as nodes and the relationships between these
characteristics are represented as the links
among such nodes. Therefore, it is possible to use networks to model how organ systems dynamically interact.
Massachusetts Institute of Technology
offers an undergraduate course called "Network Medicine: Using Systems
Biology and Signaling Networks to Create Novel Cancer Therapeutics". Also, Harvard
Catalyst (The Harvard Clinical and Translational Science Center) offers
a three-day course entitled "Introduction to Network Medicine", open to
clinical and science professionals with doctorate degrees.
The analysis of electric power systems could be conducted using network theory from two main points of view:
An abstract perspective (i.e., as a graph consists from nodes
and edges), regardless of the electric power aspects (e.g., transmission
line impedances). Most of these studies focus only on the abstract
structure of the power grid using node degree distribution and
betweenness distribution, which introduces substantial insight regarding
the vulnerability assessment of the grid. Through these types of
studies, the category of the grid structure could be identified from the
complex network perspective (e.g., single-scale, scale-free). This
classification might help the electric power system engineers in the
planning stage or while upgrading the infrastructure (e.g., add a new
transmission line) to maintain a proper redundancy level in the
transmission system.
Weighted graphs that blend an abstract understanding of complex network theories and electric power systems properties.
Since the 1970s, the empirical study of networks has played a central role in social science, and many of the mathematical and statistical tools used for studying networks have been first developed in sociology. Amongst many other applications, social network analysis has been used to understand the diffusion of innovations, news and rumors. Similarly, it has been used to examine the spread of both diseases and health-related behaviors. It has also been applied to the study of markets, where it has been used to examine the role of trust in exchange relationships and of social mechanisms in setting prices. It has been used to study recruitment into political movements, armed groups, and other social organizations. It has also been used to conceptualize scientific disagreements as well as academic prestige. More recently, network analysis (and its close cousin traffic analysis) has gained a significant use in military intelligence, for uncovering insurgent networks of both hierarchical and leaderless nature.
With the recent explosion of publicly available high throughput biological data, the analysis of molecular networks has gained significant interest.
The type of analysis in this context is closely related to social
network analysis, but often focusing on local patterns in the network.
For example, network motifs are small subgraphs that are over-represented in the network. Similarly, activity motifs
are patterns in the attributes of nodes and edges in the network that
are over-represented given the network structure. Using networks to
analyze patterns in biological systems, such as food-webs, allows us to
visualize the nature and strength of interactions between species. The
analysis of biological networks with respect to diseases has led to the development of the field of network medicine. Recent examples of application of network theory in biology include applications to understanding the cell cycle as well as a quantitative framework for developmental processes.
Narrative network analysis
The automatic parsing of textual corpora has enabled the extraction of actors and their relational networks on a vast scale. The resulting narrative networks,
which can contain thousands of nodes, are then analyzed by using tools
from Network theory to identify the key actors, the key communities or
parties, and general properties such as robustness or structural
stability of the overall network, or centrality of certain nodes. This automates the approach introduced by Quantitative Narrative Analysis, whereby subject-verb-object triplets are identified with pairs of actors linked by an action, or pairs formed by actor-object.
Link analysis
Link analysis
is a subset of network analysis, exploring associations between
objects. An example may be examining the addresses of suspects and
victims, the telephone numbers they have dialed, and financial
transactions that they have partaken in during a given timeframe, and
the familial relationships between these subjects as a part of police
investigation. Link analysis here provides the crucial relationships and
associations between very many objects of different types that are not
apparent from isolated pieces of information. Computer-assisted or fully
automatic computer-based link analysis is increasingly employed by banks and insurance agencies in fraud detection, by telecommunication operators in telecommunication network analysis, by medical sector in epidemiology and pharmacology, in law enforcement investigations, by search engines for relevance rating (and conversely by the spammers for spamdexing and by business owners for search engine optimization),
and everywhere else where relationships between many objects have to be
analyzed. Links are also derived from similarity of time behavior in
both nodes. Examples include climate networks where the links between
two locations (nodes) are determined, for example, by the similarity of
the rainfall or temperature fluctuations in both sites.
Web link analysis
Several Web searchranking algorithms use link-based centrality metrics, including Google's PageRank, Kleinberg's HITS algorithm, the CheiRank and TrustRank
algorithms. Link analysis is also conducted in information science and
communication science in order to understand and extract information
from the structure of collections of web pages. For example, the
analysis might be of the interlinking between politicians' websites or
blogs. Another use is for classifying pages according to their mention
in other pages.
These concepts are used to characterize the linking preferences of
hubs in a network. Hubs are nodes which have a large number of links.
Some hubs tend to link to other hubs while others avoid connecting to
hubs and prefer to connect to nodes with low connectivity. We say a hub
is assortative when it tends to connect to other hubs. A disassortative
hub avoids connecting to other hubs. If hubs have connections with the
expected random probabilities, they are said to be neutral. There are
three methods to quantify degree correlations.
Recurrence networks
The recurrence matrix of a recurrence plot
can be considered as the adjacency matrix of an undirected and
unweighted network. This allows for the analysis of time series by
network measures. Applications range from detection of regime changes
over characterizing dynamics to synchronization analysis.
Spatial networks
Many
real networks are embedded in space. Examples include, transportation
and other infrastructure networks, brain neural networks. Several models
for spatial networks have been developed.
Spread
Content in a complex network can spread via two major methods: conserved spread and non-conserved spread.
In conserved spread, the total amount of content that enters a complex
network remains constant as it passes through. The model of conserved
spread can best be represented by a pitcher containing a fixed amount of
water being poured into a series of funnels connected by tubes. Here,
the pitcher represents the original source and the water is the content
being spread. The funnels and connecting tubing represent the nodes and
the connections between nodes, respectively. As the water passes from
one funnel into another, the water disappears instantly from the funnel
that was previously exposed to the water. In non-conserved spread, the
amount of content changes as it enters and passes through a complex
network. The model of non-conserved spread can best be represented by a
continuously running faucet running through a series of funnels
connected by tubes. Here, the amount of water from the original source
is infinite. Also, any funnels that have been exposed to the water
continue to experience the water even as it passes into successive
funnels. The non-conserved model is the most suitable for explaining
the transmission of most infectious diseases, neural excitation, information and rumors, etc.
Network immunization
The
question of how to immunize efficiently scale free networks which
represent realistic networks such as the Internet and social networks
has been studied extensively. One such strategy is to immunize the
largest degree nodes, i.e., targeted (intentional) attacks since for this case is relatively high and fewer nodes are needed to be immunized.
However, in most realistic networks the global structure is not available and the largest degree nodes are unknown.
The
study of networks has emerged in diverse disciplines as a means of
analyzing complex relational data. The earliest known paper in this
field is the famous Seven Bridges of Königsberg written by Leonhard Euler in 1736. Euler's mathematical description of vertices and edges was the foundation of graph theory, a branch of mathematics that studies the properties of pairwise relations in a network structure. The field of graph theory continued to develop and found applications in chemistry (Sylvester, 1878).
Dénes Kőnig,
a Hungarian mathematician and professor, wrote the first book in Graph
Theory, entitled "Theory of finite and infinite graphs", in 1936.
In the 1930s Jacob Moreno, a psychologist in the Gestalt tradition, arrived in the United States. He developed the sociogram
and presented it to the public in April 1933 at a convention of medical
scholars. Moreno claimed that "before the advent of sociometry no one
knew what the interpersonal structure of a group 'precisely' looked
like" (Moreno, 1953). The sociogram was a representation of the social
structure of a group of elementary school students. The boys were
friends of boys and the girls were friends of girls with the exception
of one boy who said he liked a single girl. The feeling was not
reciprocated. This network representation of social structure was found
so intriguing that it was printed in The New York Times (April 3, 1933, page 17). The sociogram has found many applications and has grown into the field of social network analysis.
Probabilistic theory in network science developed as an offshoot of graph theory with Paul Erdős and Alfréd Rényi's eight famous papers on random graphs. For social networks the exponential random graph model or p* is a notational framework used to represent the probability space of a tie occurring in a social network. An alternate approach to network probability structures is the network probability matrix,
which models the probability of edges occurring in a network, based on
the historic presence or absence of the edge in a sample of networks.
In 1998, David Krackhardt and Kathleen Carley
introduced the idea of a meta-network with the PCANS Model. They
suggest that "all organizations are structured along these three
domains, Individuals, Tasks, and Resources". Their paper introduced the
concept that networks occur across multiple domains and that they are
interrelated. This field has grown into another sub-discipline of
network science called dynamic network analysis.
More recently other network science efforts have focused on mathematically describing different network topologies. Duncan Watts and Steven Strogatz reconciled empirical data on networks with mathematical representation, describing the small-world network. Albert-László Barabási and Reka Albert discovered scale-free networks,
a property that captures the fact that in real network hubs coexist
with many small degree vertices, and offered a dynamical model to
explain the origin of this scale-free state.
Department of Defense initiatives
The U.S. military first became interested in network-centric warfare
as an operational concept based on network science in 1996. John A.
Parmentola, the U.S. Army Director for Research and Laboratory
Management, proposed to the Army's Board on Science and Technology
(BAST) on December 1, 2003 that Network Science become a new Army
research area. The BAST, the Division on Engineering and Physical
Sciences for the National Research Council (NRC) of the National
Academies, serves as a convening authority for the discussion of science
and technology issues of importance to the Army and oversees
independent Army-related studies conducted by the National Academies.
The BAST conducted a study to find out whether identifying and funding a
new field of investigation in basic research, Network Science, could
help close the gap between what is needed to realize Network-Centric
Operations and the current primitive state of fundamental knowledge of
networks.
As a result, the BAST issued the NRC study in 2005 titled Network
Science (referenced above) that defined a new field of basic research
in Network Science for the Army. Based on the findings and
recommendations of that study and the subsequent 2007 NRC report titled
Strategy for an Army Center for Network Science, Technology, and
Experimentation, Army basic research resources were redirected to
initiate a new basic research program in Network Science. To build a
new theoretical foundation for complex networks, some of the key Network
Science research efforts now ongoing in Army laboratories address:
Mathematical models of network behavior to predict performance with network size, complexity, and environment
Optimized human performance required for network-enabled warfare
Networking within ecosystems and at the molecular level in cells.
As initiated in 2004 by Frederick I. Moxley with support he solicited
from David S. Alberts, the Department of Defense helped to establish the
first Network Science Center in conjunction with the U.S. Army at the
United States Military Academy (USMA). Under the tutelage of Dr. Moxley
and the faculty of the USMA, the first interdisciplinary undergraduate
courses in Network Science were taught to cadets at West Point.
In order to better instill the tenets of network science among its
cadre of future leaders, the USMA has also instituted a five-course
undergraduate minor in Network Science.
In 2006, the U.S. Army and the United Kingdom (UK) formed the Network and Information Science International Technology Alliance,
a collaborative partnership among the Army Research Laboratory, UK
Ministry of Defense and a consortium of industries and universities in
the U.S. and UK. The goal of the alliance is to perform basic research
in support of Network- Centric Operations across the needs of both
nations.
In 2009, the U.S. Army formed the Network Science CTA, a collaborative research alliance among the Army Research Laboratory, CERDEC,
and a consortium of about 30 industrial R&D labs and universities
in the U.S. The goal of the alliance is to develop a deep understanding
of the underlying commonalities among intertwined social/cognitive,
information, and communications networks, and as a result improve our
ability to analyze, predict, design, and influence complex systems
interweaving many kinds of networks.
Subsequently, as a result of these efforts, the U.S. Department
of Defense has sponsored numerous research projects that support Network
Science.
Network Classification
Deterministic Network
The
definition of deterministic network is defined compared with the
definition of probabilistic network. In un-weighted deterministic
networks, edges either exist or not, usually we use 0 to represent
non-existence of an edge while 1 to represent existence of an edge. In
weighted deterministic networks, the edge value represents the weight of
each edge, for example, the strength level.
Probabilistic Network
In
probabilistic networks, values behind each edge represent the
likelihood of the existence of each edge. For example, if one edge has a
value equals to 0.9, we say the existence probability of this edge is
0.9.
Network properties
Often,
networks have certain attributes that can be calculated to analyze the
properties & characteristics of the network. The behavior of these
network properties often define network models
and can be used to analyze how certain models contrast to each other.
Many of the definitions for other terms used in network science can be
found in Glossary of graph theory.
Size
The size of a network can refer to the number of nodes or, less commonly, the number of edges which (for connected graphs with no multi-edges) can range from (a tree) to
(a complete graph). In the case of a simple graph (a network in which
at most one (undirected) edge exists between each pair of vertices, and
in which no vertices connect to themselves), we have ; for directed graphs (with no self-connected nodes), ; for directed graphs with self-connections allowed, . In the circumstance of a graph within which multiple edges may exist between a pair of vertices, .
Density
The density of a network is defined as a normalized ratio between 0 and 1 of the number of edges to the number of possible edges in a network with nodes. Network density is a measure of the percentage of "optional" edges that exist in the network and can be computed as
where and are the minimum and maximum number of edges in a connected network with nodes, respectively. In the case of simple graphs, is given by the binomial coefficient and , giving density
.
Another possible equation is whereas the ties are unidirectional (Wasserman & Faust 1994). This gives a better overview over the network density, because unidirectional relationships can be measured.
Planar network density
The density of a network, where there is no intersection between edges, is defined as a ratio of the number of edges to the number of possible edges in a network with nodes, given by a graph with no intersecting edges , giving
Average degree
The degree of a node is the number of edges connected to it. Closely related to the density of a network is the average degree, (or, in the case of directed graphs, ,
the former factor of 2 arising from each edge in an undirected graph
contributing to the degree of two distinct vertices). In the ER random graph model () we can compute the expected value of (equal to the expected value of of an arbitrary vertex): a random vertex has other vertices in the network available, and with probability , connects to each. Thus, .
Average shortest path length (or characteristic path length)
The average shortest path length is calculated by finding the shortest path
between all pairs of nodes, and taking the average over all paths of
the length thereof (the length being the number of intermediate edges
contained in the path, i.e., the distance between the two vertices
within the graph). This shows us, on average, the number of steps it
takes to get from one member of the network to another. The behavior of
the expected average shortest path length (that is, the ensemble average
of the average shortest path length) as a function of the number of
vertices of a random network model defines whether that model exhibits the small-world effect; if it scales as ,
the model generates small-world nets. For faster-than-logarithmic
growth, the model does not produce small worlds. The special case of is known as ultra-small world effect.
Diameter of a network
As
another means of measuring network graphs, we can define the diameter
of a network as the longest of all the calculated shortest paths in a
network. It is the shortest distance between the two most distant nodes
in the network. In other words, once the shortest path length from
every node to all other nodes is calculated, the diameter is the longest
of all the calculated path lengths. The diameter is representative of
the linear size of a network. If node A-B-C-D are connected, going from
A->D this would be the diameter of 3 (3-hops, 3-links).
Clustering coefficient
The
clustering coefficient is a measure of an
"all-my-friends-know-each-other" property. This is sometimes described
as the friends of my friends are my friends. More precisely, the
clustering coefficient of a node is the ratio of existing links
connecting a node's neighbors to each other to the maximum possible
number of such links. The clustering coefficient for the entire network
is the average of the clustering coefficients of all the nodes. A high
clustering coefficient for a network is another indication of a small world.
The clustering coefficient of the 'th node is
where is the number of neighbours of the 'th node, and is the number of connections between these neighbours. The maximum possible number of connections between neighbors is, then,
From a probabilistic standpoint, the expected local clustering
coefficient is the likelihood of a link existing between two arbitrary
neighbors of the same node.
Connectedness
The
way in which a network is connected plays a large part into how
networks are analyzed and interpreted. Networks are classified in four
different categories:
Clique/Complete Graph: a completely connected
network, where all nodes are connected to every other node. These
networks are symmetric in that all nodes have in-links and out-links
from all others.
Giant Component: A single connected component which contains most of the nodes in the network.
Weakly Connected Component: A collection of nodes in which there exists a path from any node to any other, ignoring directionality of the edges.
Strongly Connected Component: A collection of nodes in which there exists a directed path from any node to any other.
Centrality indices produce rankings which seek to identify the most
important nodes in a network model. Different centrality indices encode
different contexts for the word "importance." The betweenness centrality, for example, considers a node highly important if it form bridges between many other nodes. The eigenvalue centrality,
in contrast, considers a node highly important if many other highly
important nodes link to it. Hundreds of such measures have been proposed
in the literature.
Centrality indices are only accurate for identifying the most
important nodes. The measures are seldom, if ever, meaningful for the
remainder of network nodes. Also, their indications are only accurate within their assumed context
for importance, and tend to "get it wrong" for other contexts.
For example, imagine two separate communities whose only link is an
edge between the most junior member of each community. Since any
transfer from one community to the other must go over this link, the two
junior members will have high betweenness centrality. But, since they
are junior, (presumably) they have few connections to the "important"
nodes in their community, meaning their eigenvalue centrality would be
quite low.
Limitations to centrality measures have led to the development of more general measures.
Two examples are
the accessibility, which uses the diversity of random walks to measure how accessible the rest of the network is from a given start node,
and the expected force, derived from the expected value of the force of infection generated by a node.
Both of these measures can be meaningfully computed from the structure of the network alone.
Nodes in a network may be partitioned into groups representing
communities. Depending on the context, communities may be distinct or
overlapping. Typically, nodes in such communities will be strongly
connected to other nodes in the same community, but weakly connected to
nodes outside the community. In the absence of a ground truth describing the community structure
of a specific network, several algorithms have been developed to infer
possible community structures using either supervised of unsupervised
clustering methods.
Network models
Network models serve as a foundation to understanding interactions within empirical complex networks. Various random graph generation models produce network structures that may be used in comparison to real-world complex networks.
Erdős–Rényi random graph model
The Erdős–Rényi model, named for Paul Erdős and Alfréd Rényi, is used for generating random graphs in which edges are set between nodes with equal probabilities. It can be used in the probabilistic method
to prove the existence of graphs satisfying various properties, or to
provide a rigorous definition of what it means for a property to hold
for almost all graphs.
To generate an Erdős–Rényi model two parameters must be specified: the total number of nodes n and the probability p that a random pair of nodes has an edge.
Because the model is generated without bias to particular nodes,
the degree distribution is binomial: for a randomly chosen vertex ,
In this model the clustering coefficient is 0a.s. The behavior of can be broken into three regions.
Subcritical: All components are simple and very small, the largest component has size ;
Critical: ;
Supercritical: where is the positive solution to the equation .
The largest connected component has high complexity. All other components are simple and small .
Configuration model
The configuration model takes a degree sequence or degree distribution
(which subsequently is used to generate a degree sequence) as the
input, and produces randomly connected graphs in all respects other than
the degree sequence. This means that for a given choice of the degree
sequence, the graph is chosen uniformly at random from the set of all
graphs that comply with this degree sequence. The degree of a randomly chosen vertex is an independent and identically distributed random variable with integer values. When , the configuration graph contains the giant connected component, which has infinite size.
The rest of the components have finite sizes, which can be quantified
with the notion of the size distribution. The probability that a randomly sampled node is connected to a component of size is given by convolution powers of the degree distribution:
where denotes the degree distribution and . The giant component can be destroyed by randomly removing the critical fraction of all edges. This process is called percolation on random networks. When the second moment of the degree distribution is finite, , this critical edge fraction is given by , and the average vertex-vertex distance in the giant component scales logarithmically with the total size of the network, .
In the directed configuration model, the degree of a node is given by two numbers, in-degree and out-degree , and consequently, the degree distribution is two-variate. The expected number of in-edges and out-edges coincides, so that . The directed configuration model contains the giant component iff
Note that and
are equal and therefore interchangeable in the latter inequality. The
probability that a randomly chosen vertex belongs to a component of size
is given by:
An initial lattice structure is used to generate a Watts–Strogatz model. Each node in the network is initially linked to its closest neighbors. Another parameter is specified as the rewiring probability. Each edge has a probability that it will be rewired to the graph as a random edge. The expected number of rewired links in the model is .
As the Watts–Strogatz model begins as non-random lattice
structure, it has a very high clustering coefficient along with high
average path length. Each rewire is likely to create a shortcut between
highly connected clusters. As the rewiring probability increases, the
clustering coefficient decreases slower than the average path length. In
effect, this allows the average path length of the network to decrease
significantly with only slightly decreases in clustering coefficient.
Higher values of p force more rewired edges, which in effect makes the
Watts–Strogatz model a random network.
Barabási–Albert (BA) preferential attachment model
The Barabási–Albert model
is a random network model used to demonstrate a preferential attachment
or a "rich-get-richer" effect. In this model, an edge is most likely
to attach to nodes with higher degrees.
The network begins with an initial network of m0 nodes. m0 ≥ 2
and the degree of each node in the initial network should be at
least 1, otherwise it will always remain disconnected from the rest of
the network.
In the BA model, new nodes are added to the network one at a time. Each new node is connected to
existing nodes with a probability that is proportional to the number of
links that the existing nodes already have. Formally, the probability pi that the new node is connected to node i is
where ki is the degree of node i.
Heavily linked nodes ("hubs") tend to quickly accumulate even more
links, while nodes with only a few links are unlikely to be chosen as
the destination for a new link. The new nodes have a "preference" to
attach themselves to the already heavily linked nodes.
The degree distribution resulting from the BA model is scale free, in
particular, for large degree it is a power law of the form:
Hubs exhibit high betweenness centrality which allows short paths to
exist between nodes. As a result, the BA model tends to have very short
average path lengths. The clustering coefficient of this model also
tends to 0.
The Barabási–Albert model
was developed for undirected networks, aiming to explain the
universality of the scale-free property, and applied to a wide range of
different networks and applications. The directed version of this model
is the Price model which was developed to just citation networks.
In non-linear preferential attachment (NLPA), existing nodes in the
network gain new edges proportionally to the node degree raised to a
constant positive power, . Formally, this means that the probability that node gains a new edge is given by
If , NLPA reduces to the BA model and is referred to as "linear". If , NLPA is referred to as "sub-linear" and the degree distribution of the network tends to a stretched exponential distribution. If , NLPA is referred to as "super-linear" and a small number of nodes connect to almost all other nodes in the network. For both and , the scale-free property of the network is broken in the limit of infinite system size. However, if is only slightly larger than , NLPA may result in degree distributions which appear to be transiently scale free.
Mediation-driven attachment (MDA) model
In the mediation-driven attachment (MDA) model in which a new node coming with edges picks an existing connected node at random and then connects itself not with that one but with of its neighbors chosen also at random. The probability that the node of the existing node picked is
The factor is the inverse of the harmonic mean
(IHM) of degrees of the neighbors of a node . Extensive numerical investigation suggest that for an approximately the mean IHM value in the large limit becomes a constant which means .
It implies that the higher the
links (degree) a node has, the higher its chance of gaining more links
since they can be
reached in a larger number of ways through mediators which essentially
embodies the intuitive
idea of rich get richer mechanism (or the preferential attachment rule
of the Barabasi–Albert model). Therefore, the MDA network can be seen to
follow
the PA rule but in disguise.
However, for it describes the winner takes it all mechanism as we find that almost of the total nodes have degree one and one is super-rich in degree. As value increases the disparity between the super rich and poor decreases and as we find a transition from rich get super richer to rich get richer mechanism.
Fitness model
Another model where the key ingredient is the nature of the vertex has been introduced by Caldarelli et al. Here a link is created between two vertices with a probability given by a linking function of the fitnesses of the vertices involved.
The degree of a vertex i is given by
If is an invertible and increasing function of , then
the probability distribution is given by
As a result, if the fitnesses are distributed as a power law, then also the node degree does.
Less intuitively with a fast decaying probability distribution as
together with a linking function of the kind
with a constant and the Heavyside function, we also obtain
scale-free networks.
Such model has been successfully applied to describe trade between nations by using GDP as fitness for the various nodes and a linking function of the kind
We adopt the notation to represent a random graph via a set of nodes and a collection of tie variables , indexed by pairs of nodes , where if the nodes are connected by an edge and otherwise.
The basic assumption of ERGMs is that the structure in an observed graph can be explained by a given vector of sufficient statistics which are a function of the observed network and, in some cases, nodal attributes. The probability of a graph in an ERGM is defined by:
where is a vector of model parameters associated with and is a normalising constant.
Since the 1970s, the empirical study of networks has played a central role in social science, and many of the mathematical and statistical tools used for studying networks have been first developed in sociology. Amongst many other applications, social network analysis has been used to understand the diffusion of innovations, news and rumors. Similarly, it has been used to examine the spread of both diseases and health-related behaviors. It has also been applied to the study of markets, where it has been used to examine the role of trust in exchange relationships and of social mechanisms in setting prices. Similarly, it has been used to study recruitment into political movements
and social organizations. It has also been used to conceptualize
scientific disagreements as well as academic prestige. In the second
language acquisition literature, it has an established history in study
abroad research, revealing how peer learner interaction networks
influence their language progress. More recently, network analysis (and its close cousin traffic analysis) has gained a significant use in military intelligence, for uncovering insurgent networks of both hierarchical and leaderless nature. In criminology,
it is being used to identify influential actors in criminal gangs,
offender movements, co-offending, predict criminal activities and make
policies.
Dynamic network analysis
Dynamic network analysis
examines the shifting structure of relationships among different
classes of entities in complex socio-technical systems effects, and
reflects social stability and changes such as the emergence of new
groups, topics, and leaders. Dynamic Network Analysis focuses on meta-networks composed of multiple types of nodes (entities) and multiple types of links.
These entities can be highly varied. Examples include people,
organizations, topics, resources, tasks, events, locations, and beliefs.
Dynamic network techniques are particularly useful for assessing
trends and changes in networks over time, identification of emergent
leaders, and examining the co-evolution of people and ideas.
Biological network analysis
With
the recent explosion of publicly available high throughput biological
data, the analysis of molecular networks has gained significant
interest. The type of analysis in this content are closely related to
social network analysis, but often focusing on local patterns in the
network. For example, network motifs are small subgraphs that are over-represented in the network. Activity motifs
are similar over-represented patterns in the attributes of nodes and
edges in the network that are over represented given the network
structure. The analysis of biological networks has led to the development of network medicine, which looks at the effect of diseases in the interactome.
Link analysis
Link
analysis is a subset of network analysis, exploring associations
between objects. An example may be examining the addresses of suspects
and victims, the telephone numbers they have dialed and financial
transactions that they have partaken in during a given timeframe, and
the familial relationships between these subjects as a part of police
investigation. Link analysis here provides the crucial relationships and
associations between very many objects of different types that are not
apparent from isolated pieces of information. Computer-assisted or fully
automatic computer-based link analysis is increasingly employed by banks and insurance agencies in fraud detection, by telecommunication operators in telecommunication network analysis, by medical sector in epidemiology and pharmacology, in law enforcement investigations, by search engines for relevance rating (and conversely by the spammers for spamdexing and by business owners for search engine optimization), and everywhere else where relationships between many objects have to be analyzed.
Pandemic analysis
The SIR model is one of the most well known algorithms on predicting the spread of global pandemics within an infectious population.
Susceptible to infected
The formula above describes the "force" of infection for each susceptible unit in an infectious population, where β is equivalent to the transmission rate of said disease.
To track the change of those susceptible in an infectious population:
Infected to recovered
Over time, the number of those infected fluctuates by: the specified rate of recovery, represented by but deducted to one over the average infectious period , the numbered of infectious individuals, , and the change in time, .
Infectious period
Whether a population will be overcome by a pandemic, with regards to the SIR model, is dependent on the value of or the "average people infected by an infected individual."
Web link analysis
Several Web searchranking algorithms use link-based centrality metrics, including (in order of appearance) Marchiori's Hyper Search, Google's PageRank, Kleinberg's HITS algorithm, the CheiRank and TrustRank
algorithms. Link analysis is also conducted in information science and
communication science in order to understand and extract information
from the structure of collections of web pages. For example, the
analysis might be of the interlinking between politicians' web sites or
blogs.
PageRank
PageRank
works by randomly picking "nodes" or websites and then with a certain
probability, "randomly jumping" to other nodes. By randomly jumping to
these other nodes, it helps PageRank completely traverse the network as
some webpages exist on the periphery and would not as readily be
assessed.
Each node, , has a PageRank as defined by the sum of pages that link to times one over the outlinks or "out-degree" of times the "importance" or PageRank of .
Random jumping
As
explained above, PageRank enlists random jumps in attempts to assign
PageRank to every website on the internet. These random jumps find
websites that might not be found during the normal search methodologies
such as breadth-first search and depth-first search.
In an improvement over the aforementioned formula for determining
PageRank includes adding these random jump components. Without the
random jumps, some pages would receive a PageRank of 0 which would not
be good.
The first is , or the probability that a random jump will occur. Contrasting is the "damping factor", or .
Another way of looking at it:
Centrality measures
Information about the relative importance of nodes and edges in a graph can be obtained through centrality measures, widely used in disciplines like sociology.
Centrality measures are essential when a network analysis has to answer
questions such as: "Which nodes in the network should be targeted to
ensure that a message or information spreads to all or most nodes in the
network?" or conversely, "Which nodes should be targeted to curtail the
spread of a disease?". Formally established measures of centrality are degree centrality, closeness centrality, betweenness centrality, eigenvector centrality, and katz centrality. The objective of network analysis generally determines the type of centrality measure(s) to be used.
Degree centrality of a node in a network is the number of links (vertices) incident on the node.
Closeness centrality determines how "close" a node is to
other nodes in a network by measuring the sum of the shortest distances
(geodesic paths) between that node and all other nodes in the network.
Betweenness centrality determines the relative importance of a
node by measuring the amount of traffic flowing through that node to
other nodes in the network. This is done by measuring the fraction of
paths connecting all pairs of nodes and containing the node of interest.
Group Betweenness centrality measures the amount of traffic flowing
through a group of nodes.
Eigenvector centrality is a more sophisticated version of
degree centrality where the centrality of a node not only depends on the
number of links incident on the node but also the quality of those
links. This quality factor is determined by the eigenvectors of the
adjacency matrix of the network.
Katz centrality of a node is measured by summing the geodesic
paths between that node and all (reachable) nodes in the network.
These paths are weighted, paths connecting the node with its immediate
neighbors carry higher weights than those which connect with nodes
farther away from the immediate neighbors.
Spread of content in networks
Content in a complex network can spread via two major methods: conserved spread and non-conserved spread.
In conserved spread, the total amount of content that enters a complex
network remains constant as it passes through. The model of conserved
spread can best be represented by a pitcher containing a fixed amount of
water being poured into a series of funnels connected by tubes. Here,
the pitcher represents the original source and the water is the content
being spread. The funnels and connecting tubing represent the nodes and
the connections between nodes, respectively. As the water passes from
one funnel into another, the water disappears instantly from the funnel
that was previously exposed to the water. In non-conserved spread, the
amount of content changes as it enters and passes through a complex
network. The model of non-conserved spread can best be represented by a
continuously running faucet running through a series of funnels
connected by tubes. Here, the amount of water from the original source
is infinite. Also, any funnels that have been exposed to the water
continue to experience the water even as it passes into successive
funnels. The non-conserved model is the most suitable for explaining
the transmission of most infectious diseases.
The SIR model
In
1927, W. O. Kermack and A. G. McKendrick created a model in which they
considered a fixed population with only three compartments, susceptible:
, infected, , and recovered, . The compartments used for this model consist of three classes:
is used to represent the number of individuals not yet infected with the disease at time t, or those susceptible to the disease
denotes the number of individuals who have been infected with the
disease and are capable of spreading the disease to those in the
susceptible category
is the compartment used for those individuals who have been infected
and then recovered from the disease. Those in this category are not
able to be infected again or to transmit the infection to others.
The flow of this model may be considered as follows:
Using a fixed population, , Kermack and McKendrick derived the following equations:
Several assumptions were made in the formulation of these equations:
First, an individual in the population must be considered as having an
equal probability as every other individual of contracting the disease
with a rate of ,
which is considered the contact or infection rate of the disease.
Therefore, an infected individual makes contact and is able to transmit
the disease with others per unit time and the fraction of contacts by an infected with a susceptible is . The number of new infections in unit time per infective then is , giving the rate of new infections (or those leaving the susceptible category) as
(Brauer & Castillo-Chavez, 2001). For the second and third
equations, consider the population leaving the susceptible class as
equal to the number entering the infected class. However, infectives
are leaving this class per unit time to enter the recovered/removed
class at a rate per unit time (where represents the mean recovery rate, or the mean infective period). These processes which occur simultaneously are referred to as the Law of Mass Action,
a widely accepted idea that the rate of contact between two groups in a
population is proportional to the size of each of the groups concerned
(Daley & Gani, 2005). Finally, it is assumed that the rate of
infection and recovery is much faster than the time scale of births and
deaths and therefore, these factors are ignored in this model.
More can be read on this model on the Epidemic model page.
The master equation approach
A master equation
can express the behaviour of an undirected growing network where, at
each time step, a new node is added to the network, linked to an old
node (randomly chosen and without preference). The initial network is
formed by two nodes and two links between them at time , this configuration is necessary only to simplify further calculations, so at time the network have nodes and links.
The master equation for this network is:
where is the probability to have the node with degree at time , and is the time step when this node was added to the network. Note that there are only two ways for an old node to have links at time :
The node have degree at time and will be linked by the new node with probability
Already has degree at time and will not be linked by the new node.
After simplifying this model, the degree distribution is
Based on this growing network, an epidemic model is developed
following a simple rule: Each time the new node is added and after
choosing the old node to link, a decision is made: whether or not this
new node will be infected. The master equation for this epidemic model
is:
where represents the decision to infect () or not (). Solving this master equation, the following solution is obtained:
Multilayer networks are networks with multiple kinds of relations.
Attempts to model real-world systems as multidimensional networks have
been used in various fields such as social network analysis,
economics, history, urban and international transport, ecology,
psychology, medicine, biology, commerce, climatology, physics,
computational neuroscience, operations management, and finance.
Interdependent networks
are networks where the functioning of nodes in one network depends on
the functioning of nodes in another network. In nature, networks rarely
appear in isolation, rather, usually networks are typically elements in
larger systems, and interact with elements in that complex system. Such
complex dependencies can have non-trivial effects on one another. A well
studied example is the interdependency of infrastructure networks,
the power stations which form the nodes of the power grid require fuel
delivered via a network of roads or pipes and are also controlled via
the nodes of communications network. Though the transportation network
does not depend on the power network to function, the communications
network does. In such infrastructure networks, the disfunction of a
critical number of nodes in either the power network or the
communication network can lead to cascading failures across the system
with potentially catastrophic result to the whole system functioning.
If the two networks were treated in isolation, this important feedback
effect would not be seen and predictions of network robustness would be
greatly overestimated.