A gene (or genetic) regulatory network (GRN) is a collection of molecular regulators that interact with each other and with other substances in the cell to govern the gene expression levels of mRNA and proteins. These play a central role in morphogenesis, the creation of body structures, which in turn is central to evolutionary developmental biology (evo-devo).
The regulator can be DNA, RNA, protein
and complexes of these. The interaction can be direct or indirect
(through transcribed RNA or translated protein). In general, each mRNA
molecule goes on to make a specific protein (or set of proteins). In
some cases this protein will be structural,
and will accumulate at the cell membrane or within the cell to give it
particular structural properties. In other cases the protein will be an enzyme,
i.e., a micro-machine that catalyses a certain reaction, such as the
breakdown of a food source or toxin. Some proteins though serve only to
activate other genes, and these are the transcription factors that are the main players in regulatory networks or cascades. By binding to the promoter
region at the start of other genes they turn them on, initiating the
production of another protein, and so on. Some transcription factors are
inhibitory.
In single-celled organisms, regulatory networks respond to the
external environment, optimizing the cell at a given time for survival
in this environment. Thus a yeast cell, finding itself in a sugar
solution, will turn on genes to make enzymes that process the sugar to
alcohol.
This process, which we associate with wine-making, is how the yeast
cell makes its living, gaining energy to multiply, which under normal
circumstances would enhance its survival prospects.
In multicellular animals the same principle has been put in the service of gene cascades that control body-shape.
Each time a cell divides, two cells result which, although they contain
the same genome in full, can differ in which genes are turned on and
making proteins. Sometimes a 'self-sustaining feedback loop' ensures
that a cell maintains its identity and passes it on. Less understood is
the mechanism of epigenetics by which chromatin
modification may provide cellular memory by blocking or allowing
transcription. A major feature of multicellular animals is the use of morphogen
gradients, which in effect provide a positioning system that tells a
cell where in the body it is, and hence what sort of cell to become. A
gene that is turned on in one cell may make a product that leaves the
cell and diffuses
through adjacent cells, entering them and turning on genes only when it
is present above a certain threshold level. These cells are thus
induced into a new fate, and may even generate other morphogens that signal back to the original cell. Over longer distances morphogens may use the active process of signal transduction. Such signalling controls embryogenesis, the building of a body plan from scratch through a series of sequential steps. They also control and maintain adult bodies through feedback processes, and the loss of such feedback because of a mutation can be responsible for the cell proliferation that is seen in cancer. In parallel with this process of building structure, the gene cascade turns on genes that make structural proteins that give each cell the physical properties it needs.
Overview
At
one level, biological cells can be thought of as "partially mixed bags"
of biological chemicals – in the discussion of gene regulatory networks,
these chemicals are mostly the messenger RNAs (mRNAs) and proteins
that arise from gene expression. These mRNA and proteins interact with
each other with various degrees of specificity. Some diffuse around the
cell. Others are bound to cell membranes,
interacting with molecules in the environment. Still others pass
through cell membranes and mediate long range signals to other cells in a
multi-cellular organism. These molecules and their interactions
comprise a gene regulatory network. A typical gene regulatory network looks something like this:
The nodes of this network can represent genes, proteins, mRNAs,
protein/protein complexes or cellular processes. Nodes that are depicted
as lying along vertical lines are associated with the cell/environment
interfaces, while the others are free-floating and can diffuse.
Edges between nodes represent interactions between the nodes, that can
correspond to individual molecular reactions between DNA, mRNA, miRNA,
proteins or molecular processes through which the products of one gene
affect those of another, though the lack of experimentally obtained
information often implies that some reactions are not modeled at such a
fine level of detail. These interactions can be inductive (usually
represented by arrowheads or the + sign), with an increase in the
concentration of one leading to an increase in the other, inhibitory
(represented with filled circles, blunt arrows or the minus sign), with
an increase in one leading to a decrease in the other, or dual, when
depending of the circumstances the regulator can activate or inhibit the
target node. The nodes can regulate themselves directly or indirectly,
creating feedback loops, which form cyclic chains of dependencies in the
topological network. The network structure is an abstraction of the
system's molecular or chemical dynamics, describing the manifold ways in
which one substance affects all the others to which it is connected. In
practice, such GRNs are inferred from the biological literature on a
given system and represent a distillation of the collective knowledge
about a set of related biochemical reactions. To speed up the manual
curation of GRNs, some recent efforts try to use text mining, curated
databases, network inference from massive data, model checking and other
information extraction technologies for this purpose.
Genes can be viewed as nodes in the network, with input being proteins such as transcription factors, and outputs being the level of gene expression.
The value of the node depends of a function which depends in the value
of its regulators in previous time steps (in the Boolean network
described below these are Boolean functions, typically AND, OR, and NOT). These functions have been interpreted as performing a kind of information processing
within the cell, which determines cellular behavior. The basic drivers
within cells are concentrations of some proteins, which determine both
spatial (location within the cell or tissue) and temporal (cell cycle or
developmental stage) coordinates of the cell, as a kind of "cellular
memory". The gene networks are only beginning to be understood, and it
is a next step for biology to attempt to deduce the functions for each
gene "node", to help understand the behavior of the system in increasing levels of complexity, from gene to signaling pathway, cell or tissue level.
Mathematical models
of GRNs have been developed to capture the behavior of the system being
modeled, and in some cases generate predictions corresponding with
experimental observations. In some other cases, models have proven to
make accurate novel predictions, which can be tested experimentally,
thus suggesting new approaches to explore in an experiment that
sometimes wouldn't be considered in the design of the protocol of an
experimental laboratory. Modeling techniques include differential equations (ODEs), Boolean networks, Petri nets, Bayesian networks, graphical Gaussian models, Stochastic, and Process Calculi. Conversely, techniques have been proposed for generating models of GRNs that best explain a set of time series
observations. Recently it has been shown that ChIP-seq signal of
Histone modification are more correlated with transcription factor
motifs at promoters in comparison to RNA level.
Hence it is proposed that time-series histone modification ChIP-seq
could provide more reliable inference of gene-regulatory networks in
comparison to methods based on expression levels.
Structure and evolution
Global feature
Gene regulatory networks are generally thought to be made up of a few highly connected nodes (hubs) and many poorly connected nodes nested within a hierarchical regulatory regime. Thus gene regulatory networks approximate a hierarchical scale free network topology. This is consistent with the view that most genes have limited pleiotropy and operate within regulatory modules. This structure is thought to evolve due to the preferential attachment of duplicated genes to more highly connected genes. Recent work has also shown that natural selection tends to favor networks with sparse connectivity.
There are primarily two ways that networks can evolve, both of
which can occur simultaneously. The first is that network topology can
be changed by the addition or subtraction of nodes (genes) or parts of
the network (modules) may be expressed in different contexts. The Drosophila Hippo signaling pathway provides a good example. The Hippo signaling pathway controls both mitotic growth and post-mitotic cellular differentiation.
Recently it was found that the network the Hippo signaling pathway
operates in differs between these two functions which in turn changes
the behavior of the Hippo signaling pathway. This suggests that the
Hippo signaling pathway operates as a conserved regulatory module that
can be used for multiple functions depending on context.
Thus, changing network topology can allow a conserved module to serve
multiple functions and alter the final output of the network. The second
way networks can evolve is by changing the strength of interactions
between nodes, such as how strongly a transcription factor may bind to a
cis-regulatory element. Such variation in strength of network edges has
been shown to underlie between species variation in vulva cell fate
patterning of Caenorhabditis worms.
Local feature
Another widely cited characteristic of gene regulatory network is their abundance of certain repetitive sub-networks known as network motifs.
Network motifs can be regarded as repetitive topological patterns when
dividing a big network into small blocks. Previous analysis found
several types of motifs that appeared more often in gene regulatory
networks than in randomly generated networks.
As an example, one such motif is called feed-forward loops, which
consist three nodes. This motif is the most abundant among all possible
motifs made up of three nodes, as is shown in the gene regulatory
networks of fly, nematode, and human.
The enriched motifs have been proposed to follow convergent evolution, suggesting they are "optimal designs" for certain regulatory purposes.
For example, modeling shows that feed-forward loops are able to
coordinate the change in node A (in terms of concentration and activity)
and the expression dynamics of node C, creating different input-output
behaviors. The galactose utilization system of E. coli contains a feed-forward loop which accelerates the activation of galactose utilization operon galETK, potentially facilitating the metabolic transition to galactose when glucose is depleted. The feed-forward loop in the arabinose utilization systems of E.coli
delays the activation of arabinose catabolism operon and transporters,
potentially avoiding unnecessary metabolic transition due to temporary
fluctuations in upstream signaling pathways. Similarly in the Wnt signaling pathway of Xenopus,
the feed-forward loop acts as a fold-change detector that responses to
the fold change, rather than the absolute change, in the level of
β-catenin, potentially increasing the resistance to fluctuations in
β-catenin levels. Following the convergent evolution hypothesis, the enrichment of feed-forward loops would be an adaptation
for fast response and noise resistance. A recent research found that
yeast grown in an environment of constant glucose developed mutations in
glucose signaling pathways and growth regulation pathway, suggesting
regulatory components responding to environmental changes are
dispensable under constant environment.
On the other hand, some researchers hypothesize that the enrichment of network motifs is non-adaptive.
In other words, gene regulatory networks can evolve to a similar
structure without the specific selection on the proposed input-output
behavior. Support for this hypothesis often comes from computational
simulations. For example, fluctuations in the abundance of feed-forward
loops in a model that simulates the evolution of gene regulatory
networks by randomly rewiring nodes may suggest that the enrichment of
feed-forward loops is a side-effect of evolution.
In another model of gene regulator networks evolution, the ratio of the
frequencies of gene duplication and gene deletion show great influence
on network topology: certain ratios lead to the enrichment of
feed-forward loops and create networks that show features of
hierarchical scale free networks.
Bacterial regulatory networks
Regulatory networks allow bacteria to adapt to almost every environmental niche on earth.
A network of interactions among diverse types of molecules including
DNA, RNA, proteins and metabolites, is utilised by the bacteria to
achieve regulation of gene expression. In bacteria, the principal
function of regulatory networks is to control the response to
environmental changes, for example nutritional status and environmental
stress. A complex organization of networks permits the microorganism to coordinate and integrate multiple environmental signals.
Modeling
Coupled ordinary differential equations
It is common to model such a network with a set of coupled ordinary differential equations (ODEs) or SDEs, describing the reaction kinetics of the constituent parts. Suppose that our regulatory network has nodes, and let represent the concentrations of the corresponding substances at time . Then the temporal evolution of the system can be described approximately by
where the functions express the dependence of on the concentrations of other substances present in the cell. The functions are ultimately derived from basic principles of chemical kinetics or simple expressions derived from these e.g. Michaelis-Menten enzymatic kinetics. Hence, the functional forms of the are usually chosen as low-order polynomials or Hill functions that serve as an ansatz for the real molecular dynamics. Such models are then studied using the mathematics of nonlinear dynamics. System-specific information, like reaction rate constants and sensitivities, are encoded as constant parameters.
By solving for the fixed point of the system:
for all ,
one obtains (possibly several) concentration profiles of proteins and
mRNAs that are theoretically sustainable (though not necessarily stable). Steady states of kinetic equations thus correspond to potential cell types, and oscillatory solutions to the above equation to naturally cyclic cell types. Mathematical stability of these attractors can usually be characterized by the sign of higher derivatives at critical points, and then correspond to biochemical stability of the concentration profile. Critical points and bifurcations
in the equations correspond to critical cell states in which small
state or parameter perturbations could switch the system between one of
several stable differentiation fates. Trajectories correspond to the
unfolding of biological pathways and transients of the equations to
short-term biological events. For a more mathematical discussion, see
the articles on non-linearity, dynamical systems, bifurcation theory, and chaos theory.
Boolean network
The following example illustrates how a Boolean network
can model a GRN together with its gene products (the outputs) and the
substances from the environment that affect it (the inputs). Stuart Kauffman was among the first biologists to use the metaphor of Boolean networks to model genetic regulatory networks.
- Each gene, each input, and each output is represented by a node in a directed graph in which there is an arrow from one node to another if and only if there is a causal link between the two nodes.
- Each node in the graph can be in one of two states: on or off.
- For a gene, "on" corresponds to the gene being expressed; for inputs and outputs, "off" corresponds to the substance being present.
- Time is viewed as proceeding in discrete steps. At each step, the new state of a node is a Boolean function of the prior states of the nodes with arrows pointing towards it.
The validity of the model can be tested by comparing simulation
results with time series observations. A partial validation of a Boolean
network model can also come from testing the predicted existence of a
yet unknown regulatory connection between two particular transcription
factors that each are nodes of the model.
Continuous networks
Continuous
network models of GRNs are an extension of the boolean networks
described above. Nodes still represent genes and connections between
them regulatory influences on gene expression. Genes in biological
systems display a continuous range of activity levels and it has been
argued that using a continuous representation captures several
properties of gene regulatory networks not present in the Boolean model. Formally most of these approaches are similar to an artificial neural network, as inputs to a node are summed up and the result serves as input to a sigmoid function, e.g., but proteins do often control gene expression in a synergistic, i.e. non-linear, way. However, there is now a continuous network model
that allows grouping of inputs to a node thus realizing another level
of regulation. This model is formally closer to a higher order recurrent neural network. The same model has also been used to mimic the evolution of cellular differentiation and even multicellular morphogenesis.
Stochastic gene networks
Recent (as of 2007) experimental results
have demonstrated that gene expression is a stochastic process. Thus,
many authors are now using the stochastic formalism, after the work by
Arkin et al. Works on single gene expression and small synthetic genetic networks, such as the genetic toggle switch of Tim Gardner and Jim Collins,
provided additional experimental data on the phenotypic variability and
the stochastic nature of gene expression. The first versions of
stochastic models of gene expression involved only instantaneous
reactions and were driven by the Gillespie algorithm.
Since some processes, such as gene transcription, involve many
reactions and could not be correctly modeled as an instantaneous
reaction in a single step, it was proposed to model these reactions as
single step multiple delayed reactions in order to account for the time
it takes for the entire process to be complete.
From here, a set of reactions were proposed
that allow generating GRNs. These are then simulated using a modified
version of the Gillespie algorithm, that can simulate multiple time
delayed reactions (chemical reactions where each of the products is
provided a time delay that determines when will it be released in the
system as a "finished product").
For example, basic transcription of a gene can be represented by
the following single-step reaction (RNAP is the RNA polymerase, RBS is
the RNA ribosome binding site, and Pro i is the promoter region of gene i):
Furthermore, there seems to be a trade-off between the noise in gene
expression, the speed with which genes can switch, and the metabolic
cost associated their functioning. More specifically, for any given
level of metabolic cost, there is an optimal trade-off between noise and
processing speed and increasing the metabolic cost leads to better
speed-noise trade-offs.
A recent work proposed a simulator (SGNSim, Stochastic Gene Networks Simulator),
that can model GRNs where transcription and translation are modeled as
multiple time delayed events and its dynamics is driven by a stochastic
simulation algorithm (SSA) able to deal with multiple time delayed
events.
The time delays can be drawn from several distributions and the reaction
rates from complex
functions or from physical parameters. SGNSim can generate ensembles of
GRNs within a set of user-defined parameters, such as topology. It can
also be used to model specific GRNs and systems of chemical reactions.
Genetic perturbations such as gene deletions, gene over-expression,
insertions, frame shift mutations can also be modeled as well.
The GRN is created from a graph with the desired topology,
imposing in-degree and out-degree distributions. Gene promoter
activities are affected by other genes expression products that act as
inputs, in the form of monomers or combined into multimers and set as
direct or indirect. Next, each direct input is assigned to an operator
site and different transcription factors can be allowed, or not, to
compete for the same operator site, while indirect inputs are given a
target. Finally, a function is assigned to each gene, defining the
gene's response to a combination of transcription factors (promoter
state). The transfer functions (that is, how genes respond to a
combination of inputs) can be assigned to each combination of promoter
states as desired.
In other recent work, multiscale models of gene regulatory
networks have been developed that focus on synthetic biology
applications. Simulations have been used that model all biomolecular
interactions in transcription, translation, regulation, and induction of
gene regulatory networks, guiding the design of synthetic systems.
Prediction
Other
work has focused on predicting the gene expression levels in a gene
regulatory network. The approaches used to model gene regulatory
networks have been constrained to be interpretable and, as a result, are
generally simplified versions of the network. For example, Boolean
networks have been used due to their simplicity and ability to handle
noisy data but lose data information by having a binary representation
of the genes. Also, artificial neural networks omit using a hidden layer
so that they can be interpreted, losing the ability to model higher
order correlations in the data. Using a model that is not constrained to
be interpretable, a more accurate model can be produced. Being able to
predict gene expressions more accurately provides a way to explore how
drugs affect a system of genes as well as for finding which genes are
interrelated in a process. This has been encouraged by the DREAM
competition which promotes a competition for the best prediction algorithms. Some other recent work has used artificial neural networks with a hidden layer.