Histone H1 is one of the five main histoneprotein families which are components of chromatin in eukaryotic cells. Though highly conserved, it is nevertheless the most variable histone in sequence across species.
Structure
A diagram showing where H1 can be found in the nucleosome
Metazoan H1 proteins feature a central globular "winged helix" domain and long C- and short N-terminal
tails. H1 is involved with the packing of the "beads on a string"
sub-structures into a high order structure, whose details have not yet
been solved. H1 found in protists and bacteria, otherwise known as nucleoproteins HC1 and HC2 (PfamPF07432, PF07382), lack the central domain and the N-terminal tail.
H1 is less conserved than core histones. The globular domain is the most conserved part of H1.
Function
Unlike the other histones, H1 does not make up the nucleosome
"bead". Instead, it sits on top of the structure, keeping in place the
DNA that has wrapped around the nucleosome. H1 is present in half the
amount of the other four histones, which contribute two molecules to
each nucleosome bead. In addition to binding to the nucleosome, the H1
protein binds to the "linker DNA" (approximately 20-80 nucleotides in
length) region between nucleosomes, helping stabilize the zig-zagged
30 nm chromatin fiber. Much has been learned about histone H1 from studies on purified chromatin
fibers. Ionic extraction of linker histones from native or
reconstituted chromatin promotes its unfolding under hypotonic
conditions from fibers of 30 nm width to beads-on-a-string nucleosome
arrays.
It is uncertain whether H1 promotes a solenoid-like
chromatin fiber, in which exposed linker DNA is shortened, or whether
it merely promotes a change in the angle of adjacent nucleosomes,
without affecting linker length However, linker histones have been demonstrated to drive the compaction of chromatin fibres that had been reconstituted in vitro using synthetic DNA arrays of the strong '601' nucleosome positioning element.
Nuclease digestion and DNA footprinting experiments suggest that the
globular domain of histone H1 localizes near the nucleosome dyad, where
it protects approximately 15-30 base pairs of additional DNA. In addition, experiments on reconstituted chromatin reveal a characteristic stem motif at the dyad in the presence of H1.
Despite gaps in our understanding, a general model has emerged wherein
H1's globular domain closes the nucleosome by crosslinking incoming and
outgoing DNA, while the tail binds to linker DNA and neutralizes its
negative charge.
Many experiments addressing H1 function have been performed on
purified, processed chromatin under low-salt conditions, but H1's role
in vivo is less certain. Cellular studies have shown that overexpression
of H1 can cause aberrant nuclear morphology and chromatin structure,
and that H1 can serve as both a positive and negative regulator of
transcription, depending on the gene. In Xenopus
egg extracts, linker histone depletion causes ~2-fold lengthwise
extension of mitotic chromosomes, while overexpression causes
chromosomes to hypercompact into an inseparable mass.
Complete knockout of H1 in vivo has not been achieved in multicellular
organisms due to the existence of multiple isoforms that may be present
in several gene clusters, but various linker histone isoforms have been
depleted to varying degrees in Tetrahymena,
C. elegans, Arabidopsis, fruit fly, and mouse, resulting in various
organism-specific defects in nuclear morphology, chromatin structure,
DNA methylation, and/or specific gene expression.
Dynamics
While
most histone H1 in the nucleus is bound to chromatin, H1 molecules
shuttle between chromatin regions at a fairly high rate.
It is difficult to understand how such a dynamic protein could be
a structural component of chromatin, but it has been suggested that the
steady-state equilibrium within the nucleus still strongly favors
association between H1 and chromatin, meaning that despite its dynamics,
the vast majority of H1 at any given timepoint is chromatin bound.
H1 compacts and stabilizes DNA under force and during chromatin
assembly, which suggests that dynamic binding of H1 may provide
protection for DNA in situations where nucleosomes need to be removed.
Cytoplasmic factors appear to be necessary for the dynamic
exchange of histone H1 on chromatin, but these have yet to be
specifically identified.
H1 dynamics may be mediated to some degree by O-glycosylation and
phosphorylation. O-glycosylation of H1 may promote chromatin
condensation and compaction. Phosphorylation during interphase has been
shown to decrease H1 affinity for chromatin and may promote chromatin
decondensation and active transcription. However, during mitosis
phosphorylation has been shown to increase the affinity of H1 for
chromosomes and therefore promote mitotic chromosome condensation.
The H1 family in animals includes multiple H1 isoforms that can be
expressed in different or overlapping tissues and developmental stages
within a single organism. The reason for these multiple isoforms remains
unclear, but both their evolutionary conservation from sea urchin to
humans as well as significant differences in their amino acid sequences
suggest that they are not functionally equivalent. One isoform is histone H5, which is only found in avian erythrocytes, which are unlike mammalian erythrocytes in that they have nuclei. Another isoform is the oocyte/zygotic
H1M isoform (also known as B4 or H1foo), found in sea urchins, frogs,
mice, and humans, which is replaced in the embryo by somatic isoforms
H1A-E, and H10 which resembles H5. Despite having more negative charges than somatic isoforms, H1M binds with higher affinity to mitotic chromosomes in Xenopus egg extracts.
Post-translational modifications
Like
other histones, the histone H1 family is extensively
post-translationally modified (PTMs). This includes serine and threonine
phosphorylation, lysine acetylation, lysine methylation and
ubiquitination. These PTMs serve a variety of functions but are less well studied than the PTMs of other histones.
Schematic representation of the assembly of the core histones into the nucleosome
In biology, histones are highly basicproteins abundant in lysine and arginine residues that are found in eukaryoticcell nuclei and in most Archaealphyla. They act as spools around which DNA winds to create structural units called nucleosomes.Nucleosomes in turn are wrapped into 30-nanometer fibers that form tightly packed chromatin. Histones prevent DNA from becoming tangled and protect it from DNA damage. In addition, histones play important roles in gene regulation and DNA replication. Without histones, unwound DNA in chromosomes
would be very long. For example, each human cell has about 1.8 meters
of DNA if completely stretched out; however, when wound about histones,
this length is reduced to about 9 micrometers (0.09 mm) of 30 nm
diameter chromatin fibers.
There are five families of histones, which are designated H1/H5
(linker histones), H2, H3, and H4 (core histones). The nucleosome core
is formed of two H2A-H2B dimers and a H3-H4 tetramer. The tight wrapping of DNA around histones, is to a large degree, a result of electrostatic attraction between the positively charged histones and negatively charged phosphate backbone of DNA.
Histones may be chemically modified through the action of enzymes
to regulate gene transcription. The most common modifications are the methylation of arginine or lysine residues or the acetylation of lysine. Methylation can affect how other proteins such as transcription factors
interact with the nucleosomes. Lysine acetylation eliminates a positive
charge on lysine thereby weakening the electrostatic attraction between
histone and DNA, resulting in partial unwinding of the DNA, making it
more accessible for gene expression.
Classes and variants
Histone heterooctamer (H3,H4,H2A,H2B) + DNA fragment, Frog
Five major families of histone proteins exist: H1/H5, H2A, H2B, H3, and H4.
Histones H2A, H2B, H3 and H4 are known as the core or nucleosomal
histones, while histones H1/H5 are known as the linker histones.
The core histones all exist as dimers,
which are similar in that they all possess the histone fold domain:
three alpha helices linked by two loops. It is this helical structure
that allows for interaction between distinct dimers, particularly in a
head-tail fashion (also called the handshake motif). The resulting four distinct dimers then come together to form one octameric nucleosome core, approximately 63 Angstroms in diameter (a solenoid (DNA)-like particle). Around 146 base pairs
(bp) of DNA wrap around this core particle 1.65 times in a left-handed
super-helical turn to give a particle of around 100 Angstroms across. The linker histone H1 binds the nucleosome at the entry and exit sites of the DNA, thus locking the DNA into place
and allowing the formation of higher order structure. The most basic
such formation is the 10 nm fiber or beads on a string conformation.
This involves the wrapping of DNA around nucleosomes with approximately
50 base pairs of DNA separating each pair of nucleosomes (also referred to as linker DNA).
Higher-order structures include the 30 nm fiber (forming an irregular
zigzag) and 100 nm fiber, these being the structures found in normal
cells. During mitosis and meiosis, the condensed chromosomes are assembled through interactions between nucleosomes and other regulatory proteins.
Histones are subdivided into canonical replication-dependent histones, whose genes are expressed during the S-phase of the cell cycle and replication-independent histone variants,
expressed during the whole cell cycle. In mammals, genes encoding
canonical histones are typically clustered along chromosomes in 4
different highly-conserved loci, lack introns and use a stem loop structure at the 3' end instead of a polyA tail. Genes encoding histone variants are usually not clustered, have introns and their mRNAs are regulated with polyA tails.[10]
Complex multicellular organisms typically have a higher number of
histone variants providing a variety of different functions.
Functionally, histone variants contribute to transcriptional control,
epigenetic memory, and DNA repair, serving specialized functions beyond
nucleosome packaging which plays distinct roles in chromatin dynamics.
For example, H2A.Z
is enriched at regulatory elements and promoters of actively
transcribed genes, where it modulates nucleosome stability and
transcription factor binding. In contrast, H3.3, a replacement variant
of Histone H3,
is associated with active transcription and is preferentially deposited
at enhancer elements and transcribed gene bodies. Another critical
variant, CENPA, replaces H3 in centromeric nucleosomes, providing a structural foundation essential for chromosome segregation.
Variants also play essential roles in DNA repair.
Variants such as H2A.X are phosphorylated at sites of DNA damage,
marking regions for recruitment of repair proteins. This modification,
commonly referred to as γH2A.X, serves as a key signal in the cellular
response to double-strand breaks, facilitating efficient DNA repair processes. Defects in histone variant regulation have been linked to genome instability, a hallmark of many cancers and age-related diseases.
Recent data are accumulating about the roles of diverse histone
variants highlighting the functional links between variants and the
delicate regulation of organism development. Histone variants proteins from different organisms, their classification and variant specific features can be found in "HistoneDB 2.0 - Variants" database. Several pseudogenes have also been discovered and identified in very close sequences of their respective functional ortholog genes.
The following is a list of human histone proteins, genes and pseudogenes:
The nucleosome core is formed of two H2A-H2B dimers and a H3-H4 tetramer, forming two nearly symmetrical halves by tertiary structure (C2 symmetry; one macromolecule is the mirror image of the other).
The H2A-H2B dimers and H3-H4 tetramer also show pseudodyad symmetry.
The 4 'core' histones (H2A, H2B, H3 and H4) are relatively similar in
structure and are highly conserved through evolution, all featuring a 'helix turn helix
turn helix' motif (DNA-binding protein motif that recognize specific
DNA sequence). They also share the feature of long 'tails' on one end
of the amino acid structure - this being the location of post-translational modification (see below).
Archaeal histone only contains a H3-H4 like dimeric structure
made out of a single type of unit. Such dimeric structures can stack
into a tall superhelix ("hypernucleosome") onto which DNA coils in a
manner similar to nucleosome spools. Only some archaeal histones have tails.
The distance between the spools around which eukaryotic cells wind their DNA has been determined to range from 59 to 70 Å.
In all, histones make five types of interactions with DNA:
Salt bridges and hydrogen bonds between side chains of basic amino acids (especially lysine and arginine) and phosphate oxygens on DNA
Helix-dipoles form alpha-helixes in H2B, H3, and H4 cause a net positive charge to accumulate at the point of interaction with negatively charged phosphate groups on DNA
Hydrogen bonds between the DNA backbone and the amide group on the main chain of histone proteins
Nonpolar interactions between the histone and deoxyribose sugars on DNA
Non-specific minor groove insertions of the H3 and H2B N-terminal tails into two minor grooves each on the DNA molecule
The highly basic nature of histones, aside from facilitating DNA-histone interactions, contributes to their water solubility.
In general, genes that are active have less bound histone, while inactive genes are highly associated with histones during interphase. It also appears that the structure of histones has been evolutionarily conserved, as any deleterious mutations would be severely maladaptive. All histones have a highly positively charged N-terminus with many lysine and arginine residues.
Evolution and species distribution
Core histones are found in the nuclei of eukaryoticcells and in most Archaeal phyla, but not in bacteria. The unicellular algae known as dinoflagellates were previously thought to be the only eukaryotes that completely lack histones, but later studies showed that their DNA still encodes histone genes.
Unlike the core histones, homologs of the lysine-rich linker histone
(H1) proteins are found in bacteria, otherwise known as nucleoprotein
HC1/HC2.
It has been proposed that core histone proteins are
evolutionarily related to the helical part of the extended AAA+ ATPase
domain, the C-domain, and to the N-terminal substrate recognition domain
of Clp/Hsp100 proteins. Despite the differences in their topology,
these three folds share a homologous helix-strand-helix (HSH) motif. It's also proposed that they may have evolved from ribosomal proteins (RPS6/RPS15), both being short and basic proteins.
Archaeal histones may well resemble the evolutionary precursors to eukaryotic histones.
Histone proteins are among the most highly conserved proteins in
eukaryotes, emphasizing their important role in the biology of the
nucleus. In contrast mature sperm cells largely use protamines to package their genomic DNA, most likely because this allows them to achieve an even higher packaging ratio.
There are some variant forms in some of the major classes.
They share amino acid sequence homology and core structural similarity
to a specific class of major histones but also have their own feature
that is distinct from the major histones. These minor histones usually carry out specific functions of the chromatin metabolism. For example, histone H3-like CENPA is associated with only the centromere
region of the chromosome. Histone H2A variant H2A.Z is associated with
the promoters of actively transcribed genes and also involved in the
prevention of the spread of silent heterochromatin. Furthermore, H2A.Z has roles in chromatin for genome stability. Another H2A variant H2A.X is phosphorylated at S139 in regions around double-strand breaks and marks the region undergoing DNA repair. Histone H3.3 is associated with the body of actively transcribed genes.
Function
Basic units of chromatin structure
Compacting DNA strands
Histones act as spools around which DNA winds. This enables the compaction necessary to fit the large genomes of eukaryotes inside cell nuclei: the compacted molecule is 40,000 times shorter than an unpacked molecule.
Chromatin regulation
Histone tails and their function in chromatin formation
Schematic representation of histone modifications. Based on Rodriguez-Paredes and Esteller, Nature, 2011
A huge catalogue of histone modifications have been described, but a
functional understanding of most is still lacking. Collectively, it is
thought that histone modifications may underlie a histone code,
whereby combinations of histone modifications have specific meanings.
However, most functional data concerns individual prominent histone
modifications that are biochemically amenable to detailed study.
Chemistry
Lysine methylation
The addition of one, two, or many methyl groups to lysine has little
effect on the chemistry of the histone; methylation leaves the charge of
the lysine intact and adds a minimal number of atoms so steric
interactions are mostly unaffected. However, proteins containing Tudor,
chromo or PHD domains, amongst others, can recognise lysine methylation
with exquisite sensitivity and differentiate mono, di and tri-methyl
lysine, to the extent that, for some lysines (e.g.: H4K20) mono, di and
tri-methylation appear to have different meanings. Because of this,
lysine methylation tends to be a very informative mark and dominates the
known histone modification functions.
Glutamine serotonylation
Recently it has been shown, that the addition of a serotonin
group to the position 5 glutamine of H3, happens in serotonergic cells
such as neurons. This is part of the differentiation of the serotonergic
cells. This post-translational modification happens in conjunction with
the H3K4me3 modification. The serotonylation potentiates the binding of
the general transcription factor TFIID to the TATA box.
Arginine methylation
What was said above of the chemistry of lysine methylation also
applies to arginine methylation, and some protein domains—e.g., Tudor
domains—can be specific for methyl arginine instead of methyl lysine.
Arginine is known to be mono- or di-methylated, and methylation can be
symmetric or asymmetric, potentially with different meanings.
Arginine citrullination
Enzymes called peptidylarginine deiminases
(PADs) hydrolyze the imine group of arginines and attach a keto group,
so that there is one less positive charge on the amino acid residue.
This process has been involved in the activation of gene expression by
making the modified histones less tightly bound to DNA and thus making
the chromatin more accessible.
PADs can also produce the opposite effect by removing or inhibiting
mono-methylation of arginine residues on histones and thus antagonizing
the positive effect arginine methylation has on transcriptional
activity.
Lysine acetylation
Addition of an acetyl group has a major chemical effect on lysine as
it neutralises the positive charge. This reduces electrostatic
attraction between the histone and the negatively charged DNA backbone,
loosening the chromatin structure; highly acetylated histones form more
accessible chromatin and tend to be associated with active
transcription. Lysine acetylation appears to be less precise in meaning
than methylation, in that histone acetyltransferases tend to act on more
than one lysine; presumably this reflects the need to alter multiple
lysines to have a significant effect on chromatin structure. The
modification includes H3K27ac.
Serine/threonine/tyrosine phosphorylation
Addition of a negatively charged phosphate group can lead to major
changes in protein structure, leading to the well-characterised role of phosphorylation
in controlling protein function. It is not clear what structural
implications histone phosphorylation has, but histone phosphorylation
has clear functions as a post-translational modification, and binding
domains such as BRCT have been characterised.
Effects on transcription
Most well-studied histone modifications are involved in control of transcription.
Actively transcribed genes
Two histone modifications are particularly associated with active transcription:
Trimethylation of H3 lysine 4 (H3K4me3)
This trimethylation occurs at the promoter of active genes and is performed by the COMPASS complex.
Despite the conservation of this complex and histone modification from
yeast to mammals, it is not entirely clear what role this modification
plays. However, it is an excellent mark of active promoters and the
level of this histone modification at a gene's promoter is broadly
correlated with transcriptional activity of the gene. The formation of
this mark is tied to transcription in a rather convoluted manner: early
in transcription of a gene, RNA polymerase II undergoes a switch from initiating' to 'elongating', marked by a change in the phosphorylation states of the RNA polymerase II C terminal domain (CTD). The same enzyme that phosphorylates the CTD also phosphorylates the Rad6 complex, which in turn adds a ubiquitin mark to H2B K123 (K120 in mammals). H2BK123Ub occurs throughout transcribed regions, but this mark is required for COMPASS to trimethylate H3K4 at promoters.
This trimethylation occurs in the body of active genes and is deposited by the methyltransferase Set2. This protein associates with elongating RNA polymerase II, and H3K36Me3 is indicative of actively transcribed genes.
H3K36Me3 is recognised by the Rpd3 histone deacetylase complex, which
removes acetyl modifications from surrounding histones, increasing
chromatin compaction and repressing spurious transcription.
Increased chromatin compaction prevents transcription factors from
accessing DNA, and reduces the likelihood of new transcription events
being initiated within the body of the gene. This process therefore
helps ensure that transcription is not interrupted.
Repressed genes
Three histone modifications are particularly associated with repressed genes:
Trimethylation of H3 lysine 27 (H3K27me3)
This histone modification is deposited by the polycomb complex PRC2. It is a clear marker of gene repression, and is likely bound by other proteins to exert a repressive function. Another polycomb complex, PRC1, can bind H3K27me3 and adds the histone modification H2AK119Ub which aids chromatin compaction.
Based on this data it appears that PRC1 is recruited through the action
of PRC2, however, recent studies show that PRC1 is recruited to the
same sites in the absence of PRC2.
This modification is tightly associated with heterochromatin,
although its functional importance remains unclear. This mark is placed
by the Suv4-20h methyltransferase, which is at least in part recruited
by heterochromatin protein 1.
Bivalent promoters
Analysis of histone modifications in embryonic stem cells (and other stem cells) revealed many gene promoters carrying both H3K4Me3 and H3K27Me3,
in other words these promoters display both activating and repressing
marks simultaneously. This peculiar combination of modifications marks
genes that are poised for transcription; they are not required in stem
cells, but are rapidly required after differentiation into some
lineages. Once the cell starts to differentiate, these bivalent
promoters are resolved to either active or repressive states depending
on the chosen lineage.
Other functions
DNA damage repair
Marking
sites of DNA damage is an important function for histone modifications.
Without a repair marker, DNA would get destroyed by damage accumulated
from sources such as the ultraviolet radiation of the sun.
Phosphorylation of H2AX at serine 139 (γH2AX)
Phosphorylated H2AX (also known as gamma H2AX) is a marker for DNA double strand breaks, and forms part of the response to DNA damage.
H2AX is phosphorylated early after detection of DNA double strand
break, and forms a domain extending many kilobases either side of the
damage. Gamma H2AX acts as a binding site for the protein MDC1, which in turn recruits key DNA repair proteins and as such, gamma H2AX forms a vital part of the machinery that ensures genome stability.
Acetylation of H3 lysine 56 (H3K56Ac)
H3K56Acx is required for genome stability. H3K56 is acetylated by the p300/Rtt109 complex,
but is rapidly deacetylated around sites of DNA damage. H3K56
acetylation is also required to stabilise stalled replication forks,
preventing dangerous replication fork collapses. Although in general mammals make far greater use of histone
modifications than microorganisms, a major role of H3K56Ac in DNA
replication exists only in fungi, and this has become a target for
antibiotic development.
Trimethylation of H3 lysine 36 (H3K36me3)
H3K36me3 has the ability to recruit the MSH2-MSH6 (hMutSα) complex of the DNA mismatch repair pathway. Consistently, regions of the human genome with high levels of H3K36me3 accumulate less somatic mutations due to mismatch repair activity.
Chromosome condensation
Phosphorylation of H3 at serine 10 (phospho-H3S10)
The mitotic kinase aurora B phosphorylates histone H3 at serine 10, triggering a cascade of changes that mediate mitotic chromosome condensation.
Condensed chromosomes therefore stain very strongly for this mark, but
H3S10 phosphorylation is also present at certain chromosome sites
outside mitosis, for example in pericentric heterochromatin of cells
during G2. H3S10 phosphorylation has also been linked to DNA damage
caused by R-loop formation at highly transcribed sites.
Phosphorylation H2B at serine 10/14 (phospho-H2BS10/14)
Phosphorylation of H2B at serine 10 (yeast) or serine 14 (mammals)
is also linked to chromatin condensation, but for the very different
purpose of mediating chromosome condensation during apoptosis.
This mark is not simply a late acting bystander in apoptosis as yeast
carrying mutations of this residue are resistant to hydrogen
peroxide-induced apoptotic cell death.
Addiction
Epigenetic modifications of histone tails in specific regions of the brain are of central importance in addictions.
Once particular epigenetic alterations occur, they appear to be long
lasting "molecular scars" that may account for the persistence of
addictions.
Cigarette smokers (about 15% of the US population) are usually addicted to nicotine.
After 7 days of nicotine treatment of mice, acetylation of both
histone H3 and histone H4 was increased at the FosB promoter in the nucleus accumbens of the brain, causing 61% increase in FosB expression. This would also increase expression of the splice variantDelta FosB. In the nucleus accumbens of the brain, Delta FosB functions as a "sustained molecular switch" and "master control protein" in the development of an addiction.
About 7% of the US population is addicted to alcohol.
In rats exposed to alcohol for up to 5 days, there was an increase in
histone 3 lysine 9 acetylation in the pronociceptin promoter in the
brain amygdala
complex. This acetylation is an activating mark for pronociceptin.
The nociceptin/nociceptin opioid receptor system is involved in the
reinforcing or conditioning effects of alcohol.
Methamphetamine addiction occurs in about 0.2% of the US population. Chronic methamphetamine use causes methylation of the lysine in position 4 of histone 3 located at the promoters of the c-fos and the C-C chemokine receptor 2 (ccr2) genes, activating those genes in the nucleus accumbens (NAc). c-fos is well known to be important in addiction. The ccr2 gene is also important in addiction, since mutational inactivation of this gene impairs addiction.
Histone Chaperones
Histone chaperones (biology)
are specialized proteins that assist in the proper handling, transport,
and assembly of histones, preventing their aggregation and ensuring
their appropriate deposition onto DNA. These proteins play a crucial
role in regulating nucleosome assembly and disassembly, influencing transcriptional activity, DNA replication, and repair. Unlike enzymatic chromatin remodeling,
histone chaperones function by binding histones in a regulated manner,
modulating chromatin structure without direct catalytic activity.
One key function of histone chaperones is maintaining a reservoir
of histones, regulating their supply to ensure proper chromatin
formation. During DNA replication and transcription (biology),
histone chaperones such as ASF1 and FACT facilitate nucleosome
reassembly, ensuring the preservation of histone modifications that
define cellular identity. Moreover, histone chaperones contribute to nucleosome disassembly in response to cellular stress or DNA damage, thereby allowing access to repair machinery.
Histone chaperones also participate in the selective deposition of histone variants, which are functionally distinct from canonical histones. For example, HIRA is a chaperone that specifically deposits the histone variant H3.3, a marker of active chromatin regions. Similarly, CAF-1
is responsible for incorporating H3.1 and H3.2 into newly replicated
DNA, highlighting the functional specialization within chaperone
networks.
Given their critical roles, misregulation of histone chaperones
has been implicated in diseases such as cancer. Aberrant chaperone
activity can lead to improper histone deposition, genome instability, and altered gene expression, contributing to tumorigenesis.
Current research is exploring histone chaperones as potential
therapeutic targets, particularly in cancers characterized by disrupted
chromatin landscapes.
Chaperone Networks
The coordinated action of multiple histone chaperones forms an intricate network responsible for histone transport, Chromatin assembly factor 1, and genome maintenance. Chaperone networks facilitate the transport of histones which are synthesized in the cytoplasm and must be escorted to the cell nucleus. This network ensures histones are deposited at the appropriate genomic locations, maintaining chromatin integrity and function.
Histone chaperones play a crucial role in responding to DNA
damage by regulating chromatin accessibility. For example, in response
to double strand breaks,
chaperones such as FACT and ASF1 help disassemble nucleosomes at damage
sites, allowing repair factors to access the lesion. Once repair is
completed, these chaperones facilitate the reassembly of nucleosomes, restoring chromatin structure and ensuring epigenetic information is maintained.
In addition to their role in genome stability, histone chaperones contribute to epigenetic inheritance.
During cell division, chromatin states must be faithfully propagated to
daughter cells. Chaperones help distribute parental histones onto newly
synthesized DNA strands, preserving histone modifications and ensuring
continuity of cellular identity. Disruptions in these processes can lead
to epigenetic abnormalities associated with developmental disorders.
Synthesis
The
first step of chromatin structure duplication is the synthesis of
histone proteins: H1, H2A, H2B, H3, H4. These proteins are synthesized
during S phase of the cell cycle. There are different mechanisms which
contribute to the increase of histone synthesis.
Yeast
Yeast
carry one or two copies of each histone gene, which are not clustered
but rather scattered throughout chromosomes. Histone gene transcription
is controlled by multiple gene regulatory proteins such as transcription
factors which bind to histone promoter regions. In budding yeast, the
candidate gene for activation of histone gene expression is SBF. SBF is a
transcription factor that is activated in late G1 phase, when it
dissociates from its repressor Whi5. This occurs when Whi5 is phosphorylated by Cdc8 which is a G1/S Cdk.
Suppression of histone gene expression outside of S phases is
dependent on Hir proteins which form inactive chromatin structure at the
locus of histone genes, causing transcriptional activators to be
blocked.
Metazoan
In metazoans
the increase in the rate of histone synthesis is due to the increase in
processing of pre-mRNA to its mature form as well as decrease in mRNA
degradation; this results in an increase of active mRNA for translation
of histone proteins. The mechanism for mRNA activation has been found to
be the removal of a segment of the 3' end of the mRNA strand, and is
dependent on association with stem-loop binding protein (SLBP). SLBP also stabilizes histone mRNAs during S phase by blocking degradation by the 3'hExo nuclease.
SLBP levels are controlled by cell-cycle proteins, causing SLBP to
accumulate as cells enter S phase and degrade as cells leave S phase.
SLBP are marked for degradation by phosphorylation at two threonine
residues by cyclin dependent kinases, possibly cyclin A/ cdk2, at the
end of S phase.
Metazoans also have multiple copies of histone genes clustered on
chromosomes which are localized in structures called Cajal bodies as
determined by genome-wide chromosome conformation capture analysis
(4C-Seq).
Link between cell-cycle control and synthesis
Nuclear
protein Ataxia-Telangiectasia (NPAT), also known as nuclear protein
coactivator of histone transcription, is a transcription factor which
activates histone gene transcription on chromosomes 1 and 6 of human
cells. NPAT is also a substrate of cyclin E-Cdk2, which is required for
the transition between G1 phase and S phase. NPAT activates histone gene
expression only after it has been phosphorylated by the G1/S-Cdk cyclin
E-Cdk2 in early S phase. This shows an important regulatory link between cell-cycle control and histone synthesis.
History
Histones were discovered in 1884 by Albrecht Kossel. The word "histone" dates from the late 19th century and is derived from the German word "Histon", a word itself of uncertain origin, perhaps from Ancient Greek ἵστημι (hístēmi, “make stand”) or ἱστός (histós, “loom”).
In the early 1960s, before the types of histones were known and
before histones were known to be highly conserved across taxonomically
diverse organisms, James F. Bonner
and his collaborators began a study of these proteins that were known
to be tightly associated with the DNA in the nucleus of higher
organisms. Bonner and his postdoctoral fellow Ru Chih C. Huang
showed that isolated chromatin would not support RNA transcription in
the test tube, but if the histones were extracted from the chromatin,
RNA could be transcribed from the remaining DNA. Their paper became a citation classic.
Paul T'so and James Bonner had called together a World Congress on
Histone Chemistry and Biology in 1964, in which it became clear that
there was no consensus on the number of kinds of histone and that no one
knew how they would compare when isolated from different organisms.
Bonner and his collaborators then developed methods to separate each
type of histone, purified individual histones, compared amino acid
compositions in the same histone from different organisms, and compared
amino acid sequences of the same histone from different organisms in
collaboration with Emil Smith from UCLA. For example, they found Histone IV sequence to be highly conserved between peas and calf thymus.
However, their work on the biochemical characteristics of individual
histones did not reveal how the histones interacted with each other or
with DNA to which they were tightly bound.
Also in the 1960s, Vincent Allfrey and Alfred Mirsky
had suggested, based on their analyses of histones, that acetylation
and methylation of histones could provide a transcriptional control
mechanism, but did not have available the kind of detailed analysis that
later investigators were able to conduct to show how such regulation
could be gene-specific.
Until the early 1990s, histones were dismissed by most as inert
packing material for eukaryotic nuclear DNA, a view based in part on the
models of Mark Ptashne
and others, who believed that transcription was activated by
protein-DNA and protein-protein interactions on largely naked DNA
templates, as is the case in bacteria.
During the 1980s, Yahli Lorch and Roger Kornberg showed that a nucleosome on a core promoter prevents the initiation of transcription in vitro, and Michael Grunstein
demonstrated that histones repress transcription in vivo, leading to
the idea of the nucleosome as a general gene repressor. Relief from
repression is believed to involve both histone modification and the
action of chromatin-remodeling complexes. Vincent Allfrey and Alfred
Mirsky had earlier proposed a role of histone modification in
transcriptional activation, regarded as a molecular manifestation of epigenetics. Michael Grunstein and David Allis
found support for this proposal, in the importance of histone
acetylation for transcription in yeast and the activity of the
transcriptional activator Gcn5 as a histone acetyltransferase.
The discovery of the H5 histone appears to date back to the 1970s, and it is now considered an isoform of Histone H1.
ChIP-on-chip (also known as ChIP-chip) is a technology that combines chromatin immunoprecipitation ('ChIP') with DNA microarray ("chip"). Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNAin vivo. Specifically, it allows the identification of the cistrome, the sum of binding sites, for DNA-binding proteins on a genome-wide basis. Whole-genome analysis can be performed to determine the locations of binding sites for almost any protein of interest. As the name of the technique suggests, such proteins are generally those operating in the context of chromatin. The most prominent representatives of this class are transcription factors, replication-related proteins, like origin recognition complex protein (ORC), histones, their variants, and histone modifications.
The goal of ChIP-on-chip is to locate protein binding sites that may help identify functional elements in the genome.
For example, in the case of a transcription factor as a protein of
interest, one can determine its transcription factor binding sites
throughout the genome. Other proteins allow the identification of promoter regions, enhancers, repressors and silencing elements, insulators, boundary elements, and sequences that control DNA replication.
If histones are subject of interest, it is believed that the
distribution of modifications and their localizations may offer new
insights into the mechanisms of regulation.
One of the long-term goals ChIP-on-chip was designed for is to establish a catalogue of (selected) organisms that lists all protein-DNA interactions
under various physiological conditions. This knowledge would ultimately
help in the understanding of the machinery behind gene regulation, cell proliferation,
and disease progression. Hence, ChIP-on-chip offers both potential to
complement our knowledge about the orchestration of the genome on the
nucleotide level and information on higher levels of information and
regulation as it is propagated by research on epigenetics.
The technical platforms to conduct ChIP-on-chip experiments are DNA microarrays, or "chips". They can be classified and distinguished according to various characteristics:
Probe type: DNA arrays can comprise either mechanically spotted cDNAs or PCR-products, mechanically spotted oligonucleotides, or oligonucleotides that are synthesized in situ. The early versions of microarrays were designed to detect RNAs from expressed genomic regions (open reading frames aka ORFs). Although such arrays are perfectly suited to study gene expression profiles, they have limited importance in ChIP experiments since most "interesting" proteins with respect to this technique bind in intergenic regions.
Nowadays, even custom-made arrays can be designed and fine-tuned to
match the requirements of an experiment. Also, any sequence of
nucleotides can be synthesized to cover genic as well as intergenic
regions.
Probe size: Early version of cDNA arrays had a probe
length of about 200bp. Latest array versions use oligos as short as 70-
(Microarrays, Inc.) to 25-mers (Affymetrix). (Feb 2007)
Probe composition: There are tiled and non-tiled DNA
arrays. Non-tiled arrays use probes selected according to non-spatial
criteria, i.e., the DNA sequences used as probes have no fixed distances
in the genome. Tiled arrays, however, select a genomic region (or even a
whole genome) and divide it into equal chunks. Such a region is called
tiled path. The average distance between each pair of neighboring chunks
(measured from the center of each chunk) gives the resolution of the
tiled path. A path can be overlapping, end-to-end or spaced.
Array size: The first microarrays used for ChIP-on-Chip
contained about 13,000 spotted DNA segments representing all ORFs and
intergenic regions from the yeast genome.
Nowadays, Affymetrix offers whole-genome tiled yeast arrays with a
resolution of 5bp (all in all 3.2 million probes). Tiled arrays for the
human genome become more and more powerful, too. Just to name one
example, Affymetrix offers a set of seven arrays with about 90 million
probes, spanning the complete non-repetitive part of the human genome
with about 35bp spacing. (Feb 2007)
Besides the actual microarray, other hard- and software equipment is
necessary to run ChIP-on-chip experiments. It is generally the case that
one company's microarrays can not be analyzed by another company's
processing hardware. Hence, buying an array requires also buying the
associated workflow equipment. The most important elements are, among
others, hybridization ovens, chip scanners, and software packages for
subsequent numerical analysis of the raw data.
Workflow of a ChIP-on-chip experiment
Starting
with a biological question, a ChIP-on-chip experiment can be divided
into three major steps: The first is to set up and design the experiment
by selecting the appropriate array and probe type. Second, the actual
experiment is performed in the wet-lab. Last, during the dry-lab portion
of the cycle, gathered data are analyzed to either answer the initial
question or lead to new questions so that the cycle can start again.
Wet-lab portion of the workflow
Workflow overview of the wet-lab portion of a ChIP-on-chip experiment.
In the first step, the protein of interest (POI) is cross-linked with the DNA site it binds to in an in vitro environment. Usually this is done by a gentle formaldehyde fixation that is reversible with heat.
Then, the cells are lysed and the DNA is sheared by sonication or using micrococcal nuclease. This results in double-stranded chunks of DNA fragments, normally 1 kb or less in length. Those that were cross-linked to the POI form a POI-DNA complex.
In the next step, only these complexes are filtered out of the set of DNA fragments, using an antibody
specific to the POI. The antibodies may be attached to a solid surface,
may have a magnetic bead, or some other physical property that allows
separation of cross-linked complexes and unbound fragments. This
procedure is essentially an immunoprecipitation (IP) of the protein. This can be done either by using a tagged protein with an antibody against the tag (ex. FLAG, HA, c-myc) or with an antibody to the native protein.
The cross-linking of POI-DNA complexes is reversed (usually by
heating) and the DNA strands are purified. For the rest of the workflow,
the POI is no longer necessary.
After an amplification and denaturation step, the single-stranded DNA fragments are labeled with a fluorescent tag such as Cy5 or Alexa 647.
Finally, the fragments are poured over the surface of the DNA
microarray, which is spotted with short, single-stranded sequences that
cover the genomic portion of interest. Whenever a labeled fragment
"finds" a complementary fragment on the array, they will hybridize and form again a double-stranded DNA fragment.
Workflow overview of the dry-lab portion of a ChIP-on-chip experiment.
After a sufficiently large time frame to allow hybridization, the
array is illuminated with fluorescent light. Those probes on the array
that are hybridized to one of the labeled fragments emit a light signal
that is captured by a camera. This image contains all raw data for the
remaining part of the workflow.
This raw data, encoded as false-color image,
needs to be converted to numerical values before the actual analysis
can be done. The analysis and information extraction of the raw data
often remains the most challenging part for ChIP-on-chip experiments.
Problems arise throughout this portion of the workflow, ranging from the
initial chip read-out, to suitable methods to subtract background
noise, and finally to appropriate algorithms that normalize the data and make it available for subsequent statistical analysis,
which then hopefully lead to a better understanding of the biological
question that the experiment seeks to address. Furthermore, due to the
different array platforms and lack of standardization between them, data
storage and exchange is a huge problem. Generally speaking, the data
analysis can be divided into three major steps:
During the first step, the captured fluorescence signals from the
array are normalized, using control signals derived from the same or a
second chip. Such control signals tell which probes on the array were
hybridized correctly and which bound nonspecifically.
In the second step, numerical and statistical tests are applied
to control data and IP fraction data to identify POI-enriched regions
along the genome. The following three methods are used widely: median percentile rank, single-array error, and sliding-window.
These methods generally differ in how low-intensity signals are
handled, how much background noise is accepted, and which trait for the
data is emphasized during the computation. In the recent past, the
sliding-window approach seems to be favored and is often described as
most powerful.
In the third step, these regions are analyzed further. If, for
example, the POI was a transcription factor, such regions would
represent its binding sites. Subsequent analysis then may want to infer
nucleotide motifs and other patterns to allow functional annotation of
the genome.
Strengths and weaknesses
Using tiled arrays, ChIP-on-chip
allows for high resolution of genome-wide maps. These maps can
determine the binding sites of many DNA-binding proteins like
transcription factors and also chromatin modifications.
Although ChIP-on-chip can be a powerful technique in the area of
genomics, it is very expensive. Most published studies using
ChIP-on-chip repeat their experiments at least three times to ensure
biologically meaningful maps. The cost of the DNA microarrays is often a
limiting factor to whether a laboratory should proceed with a
ChIP-on-chip experiment. Another limitation is the size of DNA fragments
that can be achieved. Most ChIP-on-chip protocols utilize sonication as
a method of breaking up DNA into small pieces. However, sonication is
limited to a minimal fragment size of 200 bp. For higher resolution
maps, this limitation should be overcome to achieve smaller fragments,
preferably to single nucleosome
resolution. As mentioned previously, the statistical analysis of the
huge amount of data generated from arrays is a challenge and
normalization procedures should aim to minimize artifacts and determine
what is really biologically significant. So far, application to
mammalian genomes has been a major limitation, for example, due to the
significant percentage of the genome that is occupied by repeats.
However, as ChIP-on-chip technology advances, high resolution whole
mammalian genome maps should become achievable.
Antibodies used for ChIP-on-chip can be an important limiting factor. ChIP-on-chip requires highly specific antibodies that must recognize its epitope in free solution and also under fixed conditions. If it is demonstrated to successfully immunoprecipitate cross-linked chromatin, it is termed "ChIP-grade". Companies that provide ChIP-grade antibodies include Abcam, Cell Signaling Technology, Santa Cruz, and Upstate. To overcome the problem of specificity, the protein of interest can be fused to a tag like FLAG or HA that are recognized by antibodies. An alternative to ChIP-on-chip that does not require antibodies is DamID.
Also available are antibodies against a specific histone modification like H3
tri methyl K4. As mentioned before, the combination of these antibodies
and ChIP-on-chip has become extremely powerful in determining whole
genome analysis of histone modification patterns and will contribute
tremendously to our understanding of the histone code and epigenetics.
A study demonstrating the non-specific nature of DNA binding
proteins has been published in PLoS Biology. This indicates that
alternate confirmation of functional relevancy is a necessary step in
any ChIP-chip experiment.
History
A first ChIP-on-chip experiment was performed in 1999 to analyze the distribution of cohesin along buddingyeast chromosome III.
Although the genome was not completely represented, the protocol in
this study remains equivalent as those used in later studies. The
ChIP-on-chip technique using all of the ORFs of the genome (that
nevertheless remains incomplete, missing intergenic regions) was then
applied successfully in three papers published in 2000 and 2001. The authors identified binding sites for individual transcription factors in the buddingyeastSaccharomyces cerevisiae. In 2002, Richard Young's group
determined the genome-wide positions of 106 transcription factors using
a c-Myc tagging system in yeast. The first demonstration of the
mammalian ChIp-on-chip technique reported the isolation of nine
chromatin fragments containing weak and strong E2F binding site was done
by Peggy Farnham's lab in collaboration with Michael Zhang's lab and
published in 2001.
This study was followed several months later in a collaboration
between the Young lab with the laboratory of Brian Dynlacht which used
the ChIP-on-chip technique to show for the first time that E2F targets
encode components of the DNA damage checkpoint and repair pathways, as
well as factors involved in chromatin assembly/condensation, chromosome
segregation, and the mitotic spindle checkpoint Other applications for ChIP-on-chip include DNA replication, recombination,
and chromatin structure. Since then, ChIP-on-chip has become a powerful
tool in determining genome-wide maps of histone modifications and many
more transcription factors. ChIP-on-chip in mammalian systems has been
difficult due to the large and repetitive genomes. Thus, many studies in
mammalian cells have focused on select promoter regions that are
predicted to bind transcription factors and have not analyzed the entire
genome. However, whole mammalian genome arrays have recently become
commercially available from companies like Nimblegen. In the future, as
ChIP-on-chip arrays become more and more advanced, high resolution whole
genome maps of DNA-binding proteins and chromatin components for
mammals will be analyzed in more detail.
Alternatives
Introduced in 2007, ChIP sequencing
(ChIP-seq) is a technology that uses chromatin immunoprecipitation to
crosslink the proteins of interest to the DNA but then instead of using a
micro-array, it uses the more accurate, higher throughput method of
sequencing to localize interaction points.
DamID is an alternative method that does not require antibodies.
ChIP-exo uses exonuclease treatment to achieve up to single base pair resolution.
CUT&RUN sequencing uses antibody recognition with targeted enzymatic cleavage to address some technical limitations of ChIP.