Search This Blog

Saturday, August 31, 2024

DNA digital data storage

From Wikipedia, the free encyclopedia

DNA digital data storage is the process of encoding and decoding binary data to and from synthesized strands of DNA.

While DNA as a storage medium has enormous potential because of its high storage density, its practical use is currently severely limited because of its high cost and very slow read and write times.

In June 2019, scientists reported that all 16 GB of text from the English Wikipedia had been encoded into synthetic DNA. In 2021, scientists reported that a custom DNA data writer had been developed that was capable of writing data into DNA at 1 Mbps.

Encoding methods

Many methods for encoding data in DNA are possible. The optimal methods are those that make economical use of DNA and protect against errors. If the message DNA is intended to be stored for a long period of time, for example, 1,000 years, it is also helpful if the sequence is obviously artificial and the reading frame is easy to identify.

Encoding text

Several simple methods for encoding text have been proposed. Most of these involve translating each letter into a corresponding "codon", consisting of a unique small sequence of nucleotides in a lookup table. Some examples of these encoding schemes include Huffman codes, comma codes, and alternating codes.

Encoding arbitrary data

To encode arbitrary data in DNA, the data is typically first converted into ternary (base 3) data rather than binary (base 2) data. Each digit (or "trit") is then converted to a nucleotide using a lookup table. To prevent homopolymers (repeating nucleotides), which can cause problems with accurate sequencing, the result of the lookup also depends on the preceding nucleotide. Using the example lookup table below, if the previous nucleotide in the sequence is T (thymine), and the trit is 2, the next nucleotide will be G (guanine).

Trits to nucleotides (example)
Previous 0 1 2
T A C G
G T A C
C G T A
A C G T

Various systems may be incorporated to partition and address the data, as well as to protect it from errors. One approach to error correction is to regularly intersperse synchronization nucleotides between the information-encoding nucleotides. These synchronization nucleotides can act as scaffolds when reconstructing the sequence from multiple overlapping strands.

In vivo

The genetic code within living organisms can potentially be co-opted to store information. Furthermore synthetic biology can be used to engineer cells with "molecular recorders" to allow the storage and retrieval of information stored in the cell's genetic material. CRISPR gene editing can also be used to insert artificial DNA sequences into the genome of the cell. For encoding developmental lineage data (molecular flight recorder), roughly 30 trillion cell nuclei per mouse * 60 recording sites per nucleus * 7-15 bits per site yields about 2 TeraBytes per mouse written (but only very selectively read).

In-vivo light-based direct image and data recording

A proof-of-concept in-vivo direct DNA data recording system was demonstrated through incorporation of optogenetically regulated recombinases as part of an engineered "molecular recorder" allows for direct encoding of light-based stimuli into engineered E.coli cells. This approach can also be parallelized to store and write text or data in 8-bit form through the use of physically separated individual cell cultures in cell-culture plates.

This approach leverages the editing of a "recorder plasmid" by the light-regulated recombinases, allowing for identification of cell populations exposed to different stimuli. This approach allows for the physical stimulus to be directly encoded into the "recorder plasmid" through recombinase action. Unlike other approaches, this approach does not require manual design, insertion and cloning of artificial sequences to record the data into the genetic code. In this recording process, each individual cell population in each cell-culture plate culture well can be treated as a digital "bit", functioning as a biological transistor capable of recording a single bit of data.

History

The idea of DNA digital data storage dates back to 1959, when the physicist Richard P. Feynman, in "There's Plenty of Room at the Bottom: An Invitation to Enter a New Field of Physics" outlined the general prospects for the creation of artificial objects similar to objects of the microcosm (including biological) and having similar or even more extensive capabilities. In 1964–65, Mikhail Samoilovich Neiman, the Soviet physicist, published 3 articles about microminiaturization in electronics at the molecular-atomic level, which independently presented general considerations and some calculations regarding the possibility of recording, storage, and retrieval of information on synthesized DNA and RNA molecules. After the publication of the first M.S. Neiman's paper and after receiving by Editor the manuscript of his second paper (January, the 8th, 1964, as indicated in that paper) the interview with cybernetician Norbert Wiener was published. N. Wiener expressed ideas about miniaturization of computer memory, close to the ideas, proposed by M. S. Neiman independently. These Wiener's ideas M. S. Neiman mentioned in the third of his papers. This story is described in details.

One of the earliest uses of DNA storage occurred in a 1988 collaboration between artist Joe Davis and researchers from Harvard University. The image, stored in a DNA sequence in E.coli, was organized in a 5 x 7 matrix that, once decoded, formed a picture of an ancient Germanic rune representing life and the female Earth. In the matrix, ones corresponded to dark pixels while zeros corresponded to light pixels.

In 2007 a device was created at the University of Arizona using addressing molecules to encode mismatch sites within a DNA strand. These mismatches were then able to be read out by performing a restriction digest, thereby recovering the data.

In 2011, George Church, Sri Kosuri, and Yuan Gao carried out an experiment that would encode a 659 kb book that was co-authored by Church. To do this, the research team did a two-to-one correspondence where a binary zero was represented by either an adenine or cytosine and a binary one was represented by a guanine or thymine. After examination, 22 errors were found in the DNA.

In 2012, George Church and colleagues at Harvard University published an article in which DNA was encoded with digital information that included an HTML draft of a 53,400 word book written by the lead researcher, eleven JPEG images and one JavaScript program. Multiple copies for redundancy were added and 5.5 petabits can be stored in each cubic millimeter of DNA. The researchers used a simple code where bits were mapped one-to-one with bases, which had the shortcoming that it led to long runs of the same base, the sequencing of which is error-prone. This result showed that besides its other functions, DNA can also be another type of storage medium such as hard disk drives and magnetic tapes.

In 2013, an article led by researchers from the European Bioinformatics Institute (EBI) and submitted at around the same time as the paper of Church and colleagues detailed the storage, retrieval, and reproduction of over five million bits of data. All the DNA files reproduced the information with an accuracy between 99.99% and 100%. The main innovations in this research were the use of an error-correcting encoding scheme to ensure the extremely low data-loss rate, as well as the idea of encoding the data in a series of overlapping short oligonucleotides identifiable through a sequence-based indexing scheme. Also, the sequences of the individual strands of DNA overlapped in such a way that each region of data was repeated four times to avoid errors. Two of these four strands were constructed backwards, also with the goal of eliminating errors. The costs per megabyte were estimated at $12,400 to encode data and $220 for retrieval. However, it was noted that the exponential decrease in DNA synthesis and sequencing costs, if it continues into the future, should make the technology cost-effective for long-term data storage by 2023.

In 2013, a software called DNACloud was developed by Manish K. Gupta and co-workers to encode computer files to their DNA representation. It implements a memory efficiency version of the algorithm proposed by Goldman et al. to encode (and decode) data to DNA (.dnac files).

The long-term stability of data encoded in DNA was reported in February 2015, in an article by researchers from ETH Zurich. The team added redundancy via Reed–Solomon error correction coding and by encapsulating the DNA within silica glass spheres via Sol-gel chemistry.

In 2016 research by Church and Technicolor Research and Innovation was published in which, 22 MB of a MPEG compressed movie sequence were stored and recovered from DNA. The recovery of the sequence was found to have zero errors.

In March 2017, Yaniv Erlich and Dina Zielinski of Columbia University and the New York Genome Center published a method known as DNA Fountain that stored data at a density of 215 petabytes per gram of DNA. The technique approaches the Shannon capacity of DNA storage, achieving 85% of the theoretical limit. The method was not ready for large-scale use, as it costs $7000 to synthesize 2 megabytes of data and another $2000 to read it.

In March 2018, University of Washington and Microsoft published results demonstrating storage and retrieval of approximately 200MB of data. The research also proposed and evaluated a method for random access of data items stored in DNA. In March 2019, the same team announced they have demonstrated a fully automated system to encode and decode data in DNA.

Research published by Eurecom and Imperial College in January 2019, demonstrated the ability to store structured data in synthetic DNA. The research showed how to encode structured or, more specifically, relational data in synthetic DNA and also demonstrated how to perform data processing operations (similar to SQL) directly on the DNA as chemical processes.

In April 2019, due to a collaboration with TurboBeads Labs in Switzerland, Mezzanine by Massive Attack was encoded into synthetic DNA, making it the first album to be stored in this way.

In June 2019, scientists reported that all 16 GB of Wikipedia have been encoded into synthetic DNA. In 2021, CATALOG reported that they had developed a custom DNA writer capable of writing data at 1 Mbps into DNA.

The first article describing data storage on native DNA sequences via enzymatic nicking was published in April 2020. In the paper, scientists demonstrate a new method of recording information in DNA backbone which enables bit-wise random access and in-memory computing.

In 2021, a research team at Newcastle University led by N. Krasnogor implemented a stack data structure using DNA, allowing for last-in, first-out (LIFO) data recording and retrieval. Their approach used hybridization and strand displacement to record DNA signals in DNA polymers, which were then released in reverse order. The study demonstrated that data structure-like operations are possible in the molecular realm. The researchers also explored the limitations and future improvements for dynamic DNA data structures, highlighting the potential for DNA-based computational systems.

Davos Bitcoin Challenge

On January 21, 2015, Nick Goldman from the European Bioinformatics Institute (EBI), one of the original authors of the 2013 Nature paper, announced the Davos Bitcoin Challenge at the World Economic Forum annual meeting in Davos. During his presentation, DNA tubes were handed out to the audience, with the message that each tube contained the private key of exactly one bitcoin, all coded in DNA. The first one to sequence and decode the DNA could claim the bitcoin and win the challenge. The challenge was set for three years and would close if nobody claimed the prize before January 21, 2018.

Almost three years later on January 19, 2018, the EBI announced that a Belgian PhD student, Sander Wuyts, of the University of Antwerp and Vrije Universiteit Brussel, was the first one to complete the challenge. Next to the instructions on how to claim the bitcoin (stored as a plain text and PDF file), the logo of the EBI, the logo of the company that printed the DNA (CustomArray), and a sketch of James Joyce were retrieved from the DNA.

The Lunar Library

The Lunar Library, launched on the Beresheet Lander by the Arch Mission Foundation, carries information encoded in DNA, which includes 20 famous books and 10,000 images. This was one of the optimal choices of storage, as DNA can last a long time. The Arch Mission Foundation suggests that it can still be read after billions of years. The lander crashed on 11 April 2019 and was lost.

DNA of things

The concept of the DNA of Things (DoT) was introduced in 2019 by a team of researchers from Israel and Switzerland, including Yaniv Erlich and Robert Grass. DoT encodes digital data into DNA molecules, which are then embedded into objects. This gives the ability to create objects that carry their own blueprint, similar to biological organisms. In contrast to Internet of things, which is a system of interrelated computing devices, DoT creates objects which are independent storage objects, completely off-grid.

As a proof of concept for DoT, the researcher 3D-printed a Stanford bunny which contains its blueprint in the plastic filament used for printing. By clipping off a tiny bit of the ear of the bunny, they were able to read out the blueprint, multiply it and produce a next generation of bunnies. In addition, the ability of DoT to serve for steganographic purposes was shown by producing non-distinguishable lenses which contain a YouTube video integrated into the material.

DNA synthesis

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/DNA_synthesis
Structure of double-stranded DNA, the product of DNA synthesis, showing individual nucleotide units and bonds.

DNA synthesis is the natural or artificial creation of deoxyribonucleic acid (DNA) molecules. DNA is a macromolecule made up of nucleotide units, which are linked by covalent bonds and hydrogen bonds, in a repeating structure. DNA synthesis occurs when these nucleotide units are joined to form DNA; this can occur artificially (in vitro) or naturally (in vivo). Nucleotide units are made up of a nitrogenous base (cytosine, guanine, adenine or thymine), pentose sugar (deoxyribose) and phosphate group. Each unit is joined when a covalent bond forms between its phosphate group and the pentose sugar of the next nucleotide, forming a sugar-phosphate backbone. DNA is a complementary, double stranded structure as specific base pairing (adenine and thymine, guanine and cytosine) occurs naturally when hydrogen bonds form between the nucleotide bases.

There are several different definitions for DNA synthesis: it can refer to DNA replication - DNA biosynthesis (in vivo DNA amplification), polymerase chain reaction - enzymatic DNA synthesis (in vitro DNA amplification) or gene synthesis - physically creating artificial gene sequences. Though each type of synthesis is very different, they do share some features. Nucleotides that have been joined to form polynucleotides can act as a DNA template for one form of DNA synthesis - PCR - to occur. DNA replication also works by using a DNA template, the DNA double helix unwinds during replication, exposing unpaired bases for new nucleotides to hydrogen bond to. Gene synthesis, however, does not require a DNA template and genes are assembled de novo.

DNA synthesis occurs in all eukaryotes and prokaryotes, as well as some viruses. The accurate synthesis of DNA is important in order to avoid mutations to DNA. In humans, mutations could lead to diseases such as cancer so DNA synthesis, and the machinery involved in vivo, has been studied extensively throughout the decades. In the future these studies may be used to develop technologies involving DNA synthesis, to be used in data storage.

DNA replication

Overview of the steps in DNA replication
DNA replication, and the various enzymes involved

In nature, DNA molecules are synthesised by all living cells through the process of DNA replication. This typically occurs as a part of cell division. DNA replication occurs so, during cell division, each daughter cell contains an accurate copy of the genetic material of the cell. In vivo DNA synthesis (DNA replication) is dependent on a complex set of enzymes which have evolved to act during the S phase of the cell cycle, in a concerted fashion. In both eukaryotes and prokaryotes, DNA replication occurs when specific topoisomerases, helicases and gyrases (replication initiator proteins) uncoil the double-stranded DNA, exposing the nitrogenous bases. These enzymes, along with accessory proteins, form a macromolecular machine which ensures accurate duplication of DNA sequences. Complementary base pairing takes place, forming a new double-stranded DNA molecule. This is known as semi-conservative replication since one strand of the new DNA molecule is from the 'parent' strand.

Continuously, eukaryotic enzymes encounter DNA damage which can perturb DNA replication. This damage is in the form of DNA lesions that arise spontaneously or due to DNA damaging agents. DNA replication machinery is therefore highly controlled in order to prevent collapse when encountering damage. Control of the DNA replication system ensures that the genome is replicated only once per cycle; over-replication induces DNA damage. Deregulation of DNA replication is a key factor in genomic instability during cancer development.

This highlights the specificity of DNA synthesis machinery in vivo. Various means exist to artificially stimulate the replication of naturally occurring DNA, or to create artificial gene sequences. However, DNA synthesis in vitro can be a very error-prone process.

DNA repair synthesis

Damaged DNA is subject to repair by several different enzymatic repair processes, where each individual process is specialized to repair particular types of damage. The DNA of humans is subject to damage from multiple natural sources and insufficient repair is associated with disease and premature aging. Most DNA repair processes form single-strand gaps in DNA during an intermediate stage of the repair, and these gaps are filled in by repair synthesis. The specific repair processes that require gap filling by DNA synthesis include nucleotide excision repair, base excision repair, mismatch repair, homologous recombinational repair, non-homologous end joining and microhomology-mediated end joining.

Reverse Transcription

Reverse transcription is part of the replication cycle of particular virus families, including retroviruses. It involves copying RNA into double-stranded complementary DNA (cDNA), using reverse transcriptase enzymes. In retroviruses, viral RNA is inserted into a host cell nucleus. There, a viral reverse transcriptase enzyme adds DNA nucleotides onto the RNA sequence, generating cDNA that is inserted into the host cell genome by the enzyme integrase, encoding viral proteins.

Polymerase chain reaction

A polymerase chain reaction is a form of enzymatic DNA synthesis in the laboratory, using cycles of repeated heating and cooling of the reaction for DNA melting and enzymatic replication of the DNA.

DNA synthesis during PCR is very similar to living cells but has very specific reagents and conditions. During PCR, DNA is chemically extracted from host chaperone proteins then heated, causing thermal dissociation of the DNA strands. Two new cDNA strands are built from the original strand, these strands can be split again to act as the template for further PCR products. The original DNA is multiplied through many rounds of PCR. More than a billion copies of the original DNA strand can be made.

Random mutagenesis

For many experiments, such as structural and evolutionary studies, scientists need to produce a large library of variants of a particular DNA sequence. Random mutagenesis takes place in vitro, when mutagenic replication with a low fidelity DNA polymerase is combined with selective PCR amplification to produce many copies of mutant DNA.

RT-PCR

RT-PCR differs from conventional PCR as it synthesizes cDNA from mRNA, rather than template DNA. The technique couples a reverse transcription reaction with PCR-based amplification, as an RNA sequence acts as a template for the enzyme, reverse transcriptase. RT-PCR is often used to test gene expression in particular tissue or cell types at various developmental stages or to test for genetic disorders.

Gene synthesis

Artificial gene synthesis is the process of synthesizing a gene in vitro without the need for initial template DNA samples. In 2010 J. Craig Venter and his team were the first to use entirely synthesized DNA to create a self-replicating microbe, dubbed Mycoplasma laboratorium.

Oligonucleotide synthesis

Oligonucleotide synthesis is the chemical synthesis of sequences of nucleic acids. The majority of biological research and bioengineering involves synthetic DNA, which can include oligonucleotides, synthetic genes, or even chromosomes. Today, all synthetic DNA is custom-built using the phosphoramidite method by Marvin H. Caruthers. Oligos are synthesized from building blocks which replicate natural bases. The process has been automated since the late 1970s and can be used to form desired genetic sequences as well as for other uses in medicine and molecular biology. However, creating sequences chemically is impractical beyond 200-300 bases, and is an environmentally hazardous process. These oligos, of around 200 bases, can be connected using DNA assembly methods, creating larger DNA molecules.

Some studies have explored the possibility of enzymatic synthesis using terminal deoxynucleotidyl transferase (TdT), a DNA polymerase that requires no template. However, this method is not yet as effective as chemical synthesis, and is not commercially available.

With advances in artificial DNA synthesis, the possibility of DNA data storage is being explored. With its ultrahigh storage density and long-term stability, synthetic DNA is an interesting option to store large amounts of data. Although information can be retrieved very quickly from DNA through next generation sequencing technologies, de novo synthesis of DNA is a major bottleneck in the process. Only one nucleotide can be added per cycle, with each cycle taking seconds, so the overall synthesis is very time-consuming, as well as very error prone. However, if biotechnology improves, synthetic DNA could one day be used in data storage.

Base pair synthesis

It has been reported that new nucleobase pairs can be synthesized, as well as A-T (adenine - thymine) and G-C (guanine - cytosine). Synthetic nucleotides can be used to expand the genetic alphabet and allow specific modification of DNA sites. Even just a third base pair would expand the number of amino acids that can be encoded by DNA from the existing 20 amino acids to a possible 172. Hachimoji DNA is built from eight nucleotide letters, forming four possible base pairs. It therefore doubles the information density of natural DNA. In studies, RNA has even been produced from hachimoji DNA. This technology could also be used to allow data storage in DNA.

Oligonucleotide

From Wikipedia, the free encyclopedia

Oligonucleotides are short DNA or RNA molecules, oligomers, that have a wide range of applications in genetic testing, research, and forensics. Commonly made in the laboratory by solid-phase chemical synthesis, these small fragments of nucleic acids can be manufactured as single-stranded molecules with any user-specified sequence, and so are vital for artificial gene synthesis, polymerase chain reaction (PCR), DNA sequencing, molecular cloning and as molecular probes. In nature, oligonucleotides are usually found as small RNA molecules that function in the regulation of gene expression (e.g. microRNA), or are degradation intermediates derived from the breakdown of larger nucleic acid molecules.

Oligonucleotides are characterized by the sequence of nucleotide residues that make up the entire molecule. The length of the oligonucleotide is usually denoted by "-mer" (from Greek meros, "part"). For example, an oligonucleotide of six nucleotides (nt) is a hexamer, while one of 25 nt would usually be called a "25-mer". Oligonucleotides readily bind, in a sequence-specific manner, to their respective complementary oligonucleotides, DNA, or RNA to form duplexes or, less often, hybrids of a higher order. This basic property serves as a foundation for the use of oligonucleotides as probes for detecting specific sequences of DNA or RNA. Examples of procedures that use oligonucleotides include DNA microarrays, Southern blots, ASO analysis, fluorescent in situ hybridization (FISH), PCR, and the synthesis of artificial genes.

Oligonucleotides are composed of 2'-deoxyribonucleotides (oligodeoxyribonucleotides), which can be modified at the backbone or on the 2' sugar position to achieve different pharmacological effects. These modifications give new properties to the oligonucleotides and make them a key element in antisense therapy.

Synthesis

Oligonucleotides are chemically synthesized using building blocks, protected phosphoramidites of natural or chemically modified nucleosides or, to a lesser extent, of non-nucleosidic compounds. The oligonucleotide chain assembly proceeds in the 3' to 5' direction by following a routine procedure referred to as a "synthetic cycle". Completion of a single synthetic cycle results in the addition of one nucleotide residue to the growing chain. A less than 100% yield of each synthetic step and the occurrence of side reactions set practical limits of the efficiency of the process. In general, oligonucleotide sequences are usually short (13–25 nucleotides long). The maximum length of synthetic oligonucleotides hardly exceeds 200 nucleotide residues. HPLC and other methods can be used to isolate products with the desired sequence.

Chemical modifications

Creating chemically stable short oligonucleotides was the earliest challenge in developing ASO therapies. Naturally occurring oligonucleotides are easily degraded by nucleases, an enzyme that cleaves nucleotides and is ample in every cell type. Short oligonucleotide sequences also have weak intrinsic binding affinities, which contributes to their degradation in vivo.

Backbone modifications

Nucleoside organothiophosphate (PS) analogs of nucleotides give oligonucleotides some beneficial properties. Key beneficial properties that PS backbones give nucleotides are diastereomer identification of each nucleotide and the ability to easily follow reactions involving the phosphorothioate nucleotides, which is useful in oligonucleotide synthesis. PS backbone modifications to oligonucleotides protects them against unwanted degradation by enzymes. Modifying the nucleotide backbone is widely used because it can be achieved with relative ease and accuracy on most nucleotides. Fluorescent modifications on 5' and 3' end of oligonucleotides was reported to evaluate the oligonucleotides structures, dynamics and interactions with respect to environment.

Sugar ring modifications

Another modification that is useful for medical applications of oligonucleotides is 2' sugar modifications. Modifying the 2' position sugar increases the effectiveness of oligonucleotides by enhancing the target binding capabilities of oligonucleotides, specifically in antisense oligonucleotides therapies. They also decrease non specific protein binding, increasing the accuracy of targeting specific proteins. Two of the most commonly used modifications are 2'-O-methyl and the 2'-O-methoxyethyl. Fluorescent modifications on the nucleobase was also reported.

Antisense oligonucleotides

Antisense oligonucleotides (ASO) are single strands of DNA or RNA that are complementary to a chosen sequence. In the case of antisense RNA they prevent protein translation of certain messenger RNA strands by binding to them, in a process called hybridization. Antisense oligonucleotides can be used to target a specific, complementary (coding or non-coding) RNA. If binding takes place this hybrid can be degraded by the enzyme RNase H. RNase H is an enzyme that hydrolyzes RNA, and when used in an antisense oligonucleotide application results in 80-95% down-regulation of mRNA expression.

The use of Morpholino antisense oligonucleotides for gene knockdowns in vertebrates, which is now a standard technique in developmental biology and is used to study altered gene expression and gene function, was first developed by Janet Heasman using Xenopus. FDA-approved Morpholino drugs include eteplirsen and golodirsen. The antisense oligonucleotides have also been used to inhibit influenza virus replication in cell lines.

Neurodegenerative diseases that are a result of a single mutant protein are good targets for antisense oligonucleotide therapies because of their ability to target and modify very specific sequences of RNA with high selectivity. Many genetic diseases including Huntington's disease, Alzheimer's disease, Parkinson's disease, and amyotrophic lateral sclerosis (ALS) have been linked to DNA alterations that result in incorrect RNA sequences and result in mistranslated proteins that have a toxic physiological effect.

Cell internalisation

Cell uptake/internalisation still represents the biggest hurdle towards successful oligonucleotide (ON) therapeutics. A straightforward uptake, like for most small-molecule drugs, is hindered by the polyanionic backbone and the molecular size of ONs. The exact mechanisms of uptake and intracellular trafficking towards the place of action are still largely unclear. Moreover, small differences in ON structure/modification (vide supra) and difference in cell type leads to huge differences in uptake. It is believed that cell uptake occurs on different pathways after adsorption of ONs on the cell surface. Notably, studies show that most tissue culture cells readily take up ASOs (phosphorothiote linkage) in a non-productive way, meaning that no antisense effect is observed. In contrast to that conjugation of ASO with ligands recognised by G-coupled receptors leads to an increased productive uptake. Next to that classification (non-productive vs. productive), cell internalisation mostly proceeds in an energy-dependant way (receptor mediated endocytosis) but energy-independent passive diffusion (gymnosis) may not be ruled out. After passing the cell membrane, ON therapeutics are encapsulated in early endosomes which are transported towards late endosomes which are ultimately fused with lysosomes containing degrading enzymes at low pH. To exert its therapeutic function, the ON needs to escape the endosome prior to its degradation. Currently there is no universal method to overcome the problems of delivery, cell uptake and endosomal escape, but there exist several approaches which are tailored to specific cells and their receptors.

A conjugation of ON therapeutics to an entity responsible for cell recognition/uptake not only increases the uptake (vide supra) but is also believed to decrease the complexity of the cell uptake as mainly one (ideally known) mechanism is then involved. This has been achieved with small molecule-ON conjugates for example bearing an N-acetyl galactosamine which targets receptors of hepatocytes. These conjugates are an excellent example for obtaining an increased cell uptake paired with targeted delivery as the corresponding receptors are overexpressed on the target cells leading to a targeted therapeutic (compare antibody-drug conjugates which exploit overexpressed receptors on cancer cells). Another broadly used and heavily investigated entity for targeted delivery and increased cell uptake of oligonucleotides are antibodies.

Analytical techniques

Chromatography

Alkylamides can be used as chromatographic stationary phases. Those phases have been investigated for the separation of oligonucleotides. Ion-pair reverse-phase high-performance liquid chromatography is used to separate and analyse the oligonucleotides after automated synthesis.

Mass spectrometry

A mixture of 5-methoxysalicylic acid and spermine can be used as a matrix for oligonucleotides analysis in MALDI mass spectrometry. ElectroSpray Ionization Mass Spectrometry (ESI-MS) is also a powerful tool to characterize the mass of oligonucleotides.

DNA microarray

DNA microarrays are a useful analytical application of oligonucleotides. Compared to standard cDNA microarrays, oligonucleotide based microarrays have more controlled specificity over hybridization, and the ability to measure the presence and prevalence of alternatively spliced or polyadenylated sequences. One subtype of DNA microarrays can be described as substrates (nylon, glass, etc.) to which oligonucleotides have been bound at high density. There are a number of applications of DNA microarrays within the life sciences.

Amylin

From Wikipedia, the free encyclopedia

Amylin, or islet amyloid polypeptide (IAPP), is a 37-residue peptide hormone. It is co-secreted with insulin from the pancreatic β-cells in the ratio of approximately 100:1 (insulin:amylin). Amylin plays a role in glycemic regulation by slowing gastric emptying and promoting satiety, thereby preventing post-prandial spikes in blood glucose levels.

IAPP is processed from an 89-residue coding sequence. Proislet amyloid polypeptide (proIAPP, proamylin, proislet protein) is produced in the pancreatic beta cells (β-cells) as a 67 amino acid, 7404 Dalton pro-peptide and undergoes post-translational modifications including protease cleavage to produce amylin.

Synthesis

ProIAPP consists of 67 amino acids, which follow a 22 amino acid signal peptide which is rapidly cleaved after translation of the 89 amino acid coding sequence. The human sequence (from N-terminus to C-terminus) is:

(MGILKLQVFLIVLSVALNHLKA) TPIESHQVEKR^ KCNTATCATQRLANFLVHSSNNFGAILSSTNVGSNTYG^ KR^ NAVEVLKREPLNYLPL. The signal peptide is removed during translation of the protein and transport into the endoplasmic reticulum. Once inside the endoplasmic reticulum, a disulfide bond is formed between cysteine residues numbers 2 and 7. Later in the secretory pathway, the precursor undergoes additional proteolysis and posttranslational modification (indicated by ^). 11 amino acids are removed from the N-terminus by the enzyme proprotein convertase 2 (PC2) while 16 are removed from the C-terminus of the proIAPP molecule by proprotein convertase 1/3 (PC1/3). At the C-terminus Carboxypeptidase E then removes the terminal lysine and arginine residues. The terminal glycine amino acid that results from this cleavage allows the enzyme peptidylglycine alpha-amidating monooxygenase (PAM) to add an amine group. After this the transformation from the precursor protein proIAPP to the biologically active IAPP is complete (IAPP sequence: KCNTATCATQRLANFLVHSSNNFGAILSSTNVGSNTY).

Regulation

Insofar as both IAPP and insulin are produced by the pancreatic β-cells, impaired β-cell function (due to lipotoxicity and glucotoxicity) will affect both insulin and IAPP production and release.

Insulin and IAPP are regulated by similar factors since they share a common regulatory promoter motif. The IAPP promoter is also activated by stimuli which do not affect insulin, such as tumor necrosis factor alpha and fatty acids. One of the defining features of Type 2 diabetes is insulin resistance. This is a condition wherein the body is unable to utilize insulin effectively, resulting in increased insulin production; since proinsulin and proIAPP are cosecreted, this results in an increase in the production of proIAPP as well. Although little is known about IAPP regulation, its connection to insulin indicates that regulatory mechanisms that affect insulin also affect IAPP. Thus blood glucose levels play an important role in regulation of proIAPP synthesis.

Function

Amylin functions as part of the endocrine pancreas and contributes to glycemic control. The peptide is secreted from the pancreatic islets into the blood circulation and is cleared by peptidases in the kidney. It is not found in the urine.

Amylin's metabolic function is well-characterized as an inhibitor of the appearance of nutrient [especially glucose] in the plasma. It thus functions as a synergistic partner to insulin, with which it is cosecreted from pancreatic beta cells in response to meals. The overall effect is to slow the rate of appearance (Ra) of glucose in the blood after eating; this is accomplished via coordinate slowing down gastric emptying, inhibition of digestive secretion [gastric acid, pancreatic enzymes, and bile ejection], and a resulting reduction in food intake. Appearance of new glucose in the blood is reduced by inhibiting secretion of the gluconeogenic hormone glucagon. These actions, which are mostly carried out via a glucose-sensitive part of the brain stem, the area postrema, may be over-ridden during hypoglycemia. They collectively reduce the total insulin demand.

Amylin also acts in bone metabolism, along with the related peptides calcitonin and calcitonin gene related peptide.

Rodent amylin knockouts do not have a normal reduction of appetite following food consumption. Because it is an amidated peptide, like many neuropeptides, it is believed to be responsible for the effect on appetite.

Structure

The human form of IAPP has the amino acid sequence KCNTATCATQRLANFLVHSSNNFGAILSSTNVGSNTY, with a disulfide bridge between cysteine residues 2 and 7. Both the amidated C-terminus and the disulfide bridge are necessary for the full biological activity of amylin. IAPP is capable of forming amyloid fibrils in vitro. Within the fibrillization reaction, the early prefibrillar structures are extremely toxic to beta-cell and insuloma cell cultures. Later amyloid fiber structures also seem to have some cytotoxic effect on cell cultures. Studies have shown that fibrils are the end product and not necessarily the most toxic form of amyloid proteins/peptides in general. A non-fibril forming peptide (1–19 residues of human amylin) is toxic like the full-length peptide but the respective segment of rat amylin is not. It was also demonstrated by solid-state NMR spectroscopy that the fragment 20-29 of the human-amylin fragments membranes. Rats and mice have six substitutions (three of which are proline substitutions at positions 25, 28 and 29) that are believed to prevent the formation of amyloid fibrils, although not completely as seen by its propensity to form amyloid fibrils in vitro. Rat IAPP is nontoxic to beta-cells when overexpressed in transgenic rodents.

History

Before amylin deposition was associated with diabetes, already in 1901, scientists described the phenomenon of "islet hyalinization", which could be found in some cases of diabetes. A thorough study of this phenomenon was possible much later. In 1986, the isolation of an aggregate from an insulin-producing tumor was successful, a protein called IAP (Insulinoma Amyloid Peptide) was characterized, and amyloids were isolated from the pancreas of a diabetic patient, but the isolated material was not sufficient for full characterization. This was achieved only a year later by two research teams whose research was a continuation of the work from 1986.

Clinical significance

ProIAPP has been linked to Type 2 diabetes and the loss of islet β-cells. Islet amyloid formation, initiated by the aggregation of proIAPP, may contribute to this progressive loss of islet β-cells. It is thought that proIAPP forms the first granules that allow for IAPP to aggregate and form amyloid which may lead to amyloid-induced apoptosis of β-cells.

IAPP is cosecreted with insulin. Insulin resistance in Type 2 diabetes produces a greater demand for insulin production which results in the secretion of proinsulin. ProIAPP is secreted simultaneously, however, the enzymes that convert these precursor molecules into insulin and IAPP, respectively, are not able to keep up with the high levels of secretion, ultimately leading to the accumulation of proIAPP.

In particular, the impaired processing of proIAPP that occurs at the N-terminal cleavage site is a key factor in the initiation of amyloid. Post-translational modification of proIAPP occurs at both the carboxy terminus and the amino terminus, however, the processing of the amino terminus occurs later in the secretory pathway. This might be one reason why it is more susceptible to impaired processing under conditions where secretion is in high demand. Thus, the conditions of Type 2 diabetes—high glucose concentrations and increased secretory demand for insulin and IAPP—could lead to the impaired N-terminal processing of proIAPP. The unprocessed proIAPP can then serve as the nucleus upon which IAPP can accumulate and form amyloid.

The amyloid formation might be a major mediator of apoptosis, or programmed cell death, in the islet β-cells. Initially, the proIAPP aggregates within secretory vesicles inside the cell. The proIAPP acts as a seed, collecting matured IAPP within the vesicles, forming intracellular amyloid. When the vesicles are released, the amyloid grows as it collects even more IAPP outside the cell. The overall effect is an apoptosis cascade initiated by the influx of ions into the β-cells.

General Scheme for Amyloid Formation

In summary, impaired N-terminal processing of proIAPP is an important factor initiating amyloid formation and β-cell death. These amyloid deposits are pathological characteristics of the pancreas in Type 2 diabetes. However, it is still unclear as to whether amyloid formation is involved in or merely a consequence of type 2 diabetes. Nevertheless, it is clear that amyloid formation reduces working β-cells in patients with Type 2 diabetes. This suggests that repairing proIAPP processing may help to prevent β-cell death, thereby offering hope as a potential therapeutic approach for Type 2 diabetes.

Amyloid deposits deriving from islet amyloid polypeptide (IAPP, or amylin) are commonly found in pancreatic islets of patients suffering diabetes mellitus type 2, or containing an insulinoma cancer. While the association of amylin with the development of type 2 diabetes has been known for some time, its direct role as the cause has been harder to establish. Some studies suggest that amylin, like the related beta-amyloid (Abeta) associated with Alzheimer's disease, can induce apoptotic cell-death in insulin-producing beta cells, an effect that may be relevant to the development of type 2 diabetes.

A 2008 study reported a synergistic effect for weight loss with leptin and amylin coadministration in diet-induced obese rats by restoring hypothalamic sensitivity to leptin. However, in clinical trials, the study was halted at Phase 2 in 2011 when a problem involving antibody activity that might have neutralized the weight-loss effect of metreleptin in two patients who took the drug in a previously completed clinical study. The study combined metreleptin, a version of the human hormone leptin, and pramlintide, which is Amylin's diabetes drug Symlin, into a single obesity therapy. A proteomics study showed that human amylin shares common toxicity targets with beta-amyloid (Abeta), suggesting that type 2 diabetes and Alzheimer's disease share common toxicity mechanisms.

Pharmacology

A synthetic analog of human amylin with proline substitutions in positions 25, 26 and 29, or pramlintide (brand name Symlin), was approved in 2005 for adult use in patients with both diabetes mellitus type 1 and diabetes mellitus type 2. Insulin and pramlintide, injected separately but both before a meal, work together to control the post-prandial glucose excursion.

Amylin is degraded in part by insulin-degrading enzyme. Another long- acting analogue of Amylin is Cagrilintide being developed by Novo Nordisk ( now in the Phase 3 trials with the proposed brand name CagriSema co- formulated with Semaglutide as a once weekly subcutaneous injection ) as a measure to treat type II DM and obesity.

Receptors

There appear to be at least three distinct receptor complexes that amylin binds to with high affinity. All three complexes contain the calcitonin receptor at the core, plus one of three receptor activity-modifying proteins, RAMP1, RAMP2, or RAMP3.

Nucleoside phosphoramidite

From Wikipedia, the free encyclopedia
Protected 2'-deoxynucleoside phosphoramidites.

Nucleoside phosphoramidites are derivatives of natural or synthetic nucleosides. They are used to synthesize oligonucleotides, relatively short fragments of nucleic acid and their analogs. Nucleoside phosphoramidites were first introduced in 1981 by Beaucage and Caruthers. To avoid undesired side reactions, reactive hydroxy and exocyclic amino groups present in natural or synthetic nucleosides are appropriately protected. As long as a nucleoside analog contains at least one hydroxy group, the use of the appropriate protecting strategy allows one to convert that to the respective phosphoramidite and to incorporate the latter into synthetic nucleic acids. To be incorporated in the middle of an oligonucleotide chain using phosphoramidite strategy, the nucleoside analog must possess two hydroxy groups or, less often, a hydroxy group and another nucleophilic group (amino or mercapto). Examples include, but are not limited to, alternative nucleotides, LNA, morpholino, nucleosides modified at the 2'-position (OMe, protected NH2, F), nucleosides containing non-canonical bases (hypoxanthine and xanthine contained in natural nucleosides inosine and xanthosine, respectively, tricyclic bases such as G-clamp, etc.) or bases derivatized with a fluorescent group or a linker arm.

Preparation

There are three main methods for the preparation of nucleoside phosphoramidites.

  • DMT = 4,4'-dimethoxytrityl; B = optionally protected nucleic base; R = phosphate protecting group
    The common method involves treatment of a protected nucleoside bearing a single free hydroxy group with phosphorodiamidite under the catalytic action of a weak acid. Although some bisamidites were reported as thermally unstable compounds, 2-cyanoethyl N,N,N',N'-tetraisopropylphosphorodiamidite, the amidite used to prepare commercial nucleoside phosphoramidites is relatively stable. It can be synthesized using a two-step, one-pot procedure and purified by vacuum distillation. An excellent review outlines the use of the latter reagent in preparation of nucleosidic and non-nucleosidic phosphoramidites in great detail.
  • In the second method, the protected nucleoside is treated with the phosphorochloridite in the presence of an organic base, most commonly N-ethyl-N,N-diisopropylamine (Hunig's base).
  • In the third method, the protected nucleoside is first treated with chloro N,N,N',N'-tetraisopropyl phosphorodiamidite in the presence of an organic base, most commonly N-ethyl-N,N-diisopropylamine (Hunig's base) to form a protected nucleoside diamidite. The latter is treated with an alcohol respective to the desired phosphite protecting group, for instance, 2-cyanoethanol, in the presence of a weak acid.

Nucleoside phosphoramidites are purified by column chromatography on silica gel. To warrant the stability of the phosphoramidite moiety, it is advisable to equilibrate the column with an eluent containing 3 to 5% of triethylamine and maintain this concentration in the eluent throughout the entire course of the separation. The purity of a phosphoramidite may be assessed by 31P NMR spectroscopy. As the P(III) atom in a nucleoside phosphoramidite is chiral, it displays two peaks at about 149 ppm corresponding to the two diastereomers of the compound. The potentially present phosphite triester impurity displays peak at 138–140 ppm. H-phosphonate impurities display peaks at 8 and 10 ppm.

Chemical properties of phosphoramidite moiety

Nucleoside phosphoramidites are relatively stable compounds with a prolonged shelf-life when stored as powders under anhydrous conditions in the absence of air at temperatures below 4 °C. The amidites withstand mild basic conditions. In contrast, in the presence of even mild acids, phosphoramidites perish almost instantaneously. The phosphoramidites are relatively stable to hydrolysis under neutral conditions. For instance, half-life of 2-cyanoethyl 5'-O-(4,4'-dimethoxytrityl)thymidine-3'-O-(N,N-diisopropylamino)phosphite in 95% aqueous acetonitrile at 25 °C is 200 h.

  • X = O, S, NH.
    The most important feature of phosphoramidites is their ability to undergo the phosphoramidite coupling reaction that is, to react with nucleophilic groups in the presence of an acidic azole catalyst, 1H-tetrazole, 2-ethylthiotetrazole, 2-benzylthiotetrazole, 4,5-dicyanoimidazole, or a number of similar compounds. The reaction proceeds extremely rapidly. This very feature makes nucleoside phosphoramidites useful intermediates in oligonucleotide synthesis. Stereochemically, the phosphoramidite coupling leads to the epimerisation (forming of diastereomers) at the P(III) chiral center.

When water is served as a nucleophile, the product is an H-phosphonate diester as shown in Scheme above. Due to the presence of residual water in solvents and reagents, the formation of the latter compound is the most common complication in the preparative use of phosphoramidites, particularly in oligonucleotide synthesis.

  • X = S, Se.
    Phosphoramidites are readily oxidized with weak oxidating reagents, for instance, with aqueous iodine in the presence of weak bases or with hydrogen peroxide to form the respective phosphoramidates.

Similarly, phosphoramidites react with other chalcogens. When brought in contact with a solution of sulfur or a number of compounds collectively referred to as sulfurizing agents, phosphoramidites quantitatively form phosphorothioamidates. The reaction with selenium or selenium derivatives produces phosphoroselenoamidates. In all reactions of this type, the configuration at the phosphorus atom is retained.

  • Nucleoside phosphoramidites undergo Michaelis-Arbuzov reaction to form the respective phosphonamidates. One example describes the preparation of phosphonamidates in the presence of acrylonitrile. Reportedly, at room temperature the reaction is stereoselective with the retention of configuration at the phosphorus center. In contrast, when carried out at 55 °C, the reaction leads to racemized products.
  • Similarly to phosphines and tertiary phosphites, phosphoramidites readily undergo Staudinger reaction.

(RO)2P-N(R1)2 + R2-N3 + H2O ---- (RO)2P(=O)-N(R1)2 + R2-NH2 + N2;

Protecting strategy

The naturally occurring nucleotides (nucleoside-3'- or 5'-phosphates) and their phosphodiester analogs are insufficiently reactive to afford an expeditious synthetic preparation of oligonucleotides in high yields. The selectivity and the rate of the formation of internucleosidic linkages are dramatically improved by using 3'-O-(N,N-diisopropyl phosphoramidite) derivatives of nucleosides (nucleoside phosphoramidites) that serve as building blocks in phosphite triester methodology. To prevent undesired side reactions, all other functional groups present in nucleosides must be rendered unreactive (protected) by attaching protecting groups. Upon the completion of the oligonucleotide chain assembly, all the protecting groups are removed to yield the desired oligonucleotides. Below, the protecting groups currently used in commercially available and most common nucleoside phosphoramidite building blocks are briefly reviewed:

  • The 5'-hydroxyl group is protected by an acid-labile DMT (4,4'-dimethoxytrityl) group.
  • Thymine and uracil, nucleic bases of thymidine and uridine, respectively, do not have exocyclic amino groups and hence do not require any protection. In contrast, nucleic bases adenine, cytosine, and guanine bear the exocyclic amino groups, which are reactive with the activated phosphoramidites under the conditions of the coupling reaction. Although, at the expense of additional steps in the synthetic cycle, the oligonucleotide chain assembly may be carried out using phosphoramidites with unprotected amino groups, most often these are kept permanently protected over the entire length of the oligonucleotide chain assembly. The protection of the exocyclic amino groups must be orthogonal to that of the 5'-hydroxy group because the latter is removed at the end of each synthetic cycle. The simplest to implement and hence the most widely accepted is the strategy where the exocyclic amino groups bear a base-labile protection. Most often, two protection schemes are used.
  • In the first, the standard and more robust scheme (Figure), Bz (benzoyl) protection is used for A, dA, C, dC, G, and dG are protected with isobutyryl group. More recently, Ac (acetyl) group is often used to protect C and dC as shown in Figure.
  • In the second, mild protection scheme, A and dA are protected with isobutyryl or phenoxyacetyl groups (PAC). C and dC bear acetyl protection, and G and dG are protected with 4-isopropylphenoxyacetyl (i-Pr-PAC) or dimethylformamidino (dmf) groups. Mild protecting groups are removed more readily than the standard protecting groups. However, the phosphoramidites bearing these groups are less stable when stored in solution.
  • The phosphite group is protected by a base-labile 2-cyanoethyl group. Once a phosphoramidite has been coupled to the solid support-bound oligonucleotide and the phosphite moieties have been converted to the P(V) species, the presence of the phosphate protection is not mandatory for the successful conducting of further coupling reactions.
2'-O-Protected ribonucleoside phosphoramidites.
  • In RNA synthesis, the 2'-hydroxy group is protected with TBDMS (t-butyldimethylsilyl) group. or with TOM (tri-iso-propylsilyloxymethyl) group, both being removable by treatment with fluoride ion.
  • The phosphite moiety also bears a diisopropylamino (iPr2N) group reactive under acidic conditions. On activation, the diisopropylamino group leaves, to be substituted by the 5'-hydroxy group of the support-bound oligonucleotide.

Macrocycle

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Macrocycle
Erythromycin, a macrolide antibiotic, is one of many naturally occurring macrocycles.

Macrocycles are often described as molecules and ions containing a ring of twelve or more atoms. Classical examples include the crown ethers, calixarenes, porphyrins, and cyclodextrins. Macrocycles describe a large, mature area of chemistry.

IUPAC definition

Macrocycle: Cyclic macromolecule or a macromolecular cyclic portion of a macromolecule. Note 1: A cyclic macromolecule has no end-groups but may nevertheless be regarded as a chain.

Note 2: In the literature, the term macrocycle is sometimes used for molecules of low relative molecular mass that would not be considered macromolecules.

Synthesis

The formation of macrocycles by ring-closure is called macrocylization. Pioneering work was reported for studies on terpenoid macrocycles. The central challenge to macrocyclization is that ring-closing reactions do not favor the formation of large rings. Instead, small rings or polymers tend to form. This kinetic problem can be addressed by using high-dilution reactions, whereby intramolecular processes are favored relative to polymerizations.

Some macrocyclizations are favored using template reactions. Templates are ions, molecules, surfaces etc. that bind and pre-organize compounds, guiding them toward formation of a particular ring size. The crown ethers are often generated in the presence of an alkali metal cation, which organizes the condensing components by complexation. An illustrative macrocyclization is the synthesis of (−)-muscone from (+)-citronellal. The 15-membered ring is generated by ring-closing metathesis.

Synthesis of muscone via RCM
Uroporphyrinogen III, biosynthetic precursor to porphyrins.

Stereocontrol

In stereochemistry, macrocyclic stereocontrol refers to the directed outcome of a given intermolecular or intramolecular chemical reaction that is governed by the conformational preference of a macrocycle.

Stereocontrol for cyclohexane rings is well established in organic chemistry, in large part due to the axial/equatorial preferential positioning of substituents on the ring. Macrocyclic stereocontrol models the substitution and reactions of medium and large rings in organic chemistry, with remote stereogenic elements providing enough conformational influence to direct the outcome of a reaction.

Early assumptions towards macrocycles in synthetic chemistry considered them far too floppy to provide any degree of stereochemical or regiochemical control in a reaction. The experiments of W. Clark Still in the late 1970s and 1980s challenged this assumption, while several others found crystallographic data and NMR data that suggested macrocyclic rings were not the floppy, conformationally ill-defined species many assumed.

The degree to which a macrocyclic ring is either rigid or floppy depends significantly on the substitution of the ring and the overall size. Significantly, even small conformational preferences, such as those envisioned in floppy macrocycles, can profoundly influence the ground state of a given reaction, providing stereocontrol such as in the synthesis of miyakolide. Computational modeling can predict conformations of medium rings with reasonable accuracy, as Still used molecular mechanics modeling computations to predict ring conformations to determine potential reactivity and stereochemical outcomes.

Reaction classes used in synthesis of natural products under the macrocyclic stereocontrol model for obtaining a desired stereochemistry include: hydrogenations such as in neopeltolide  and (±)-methynolide, epoxidations such as in (±)-periplanone B and lonomycin A, hydroborations such as in 9-dihydroerythronolide B, enolate alkylations such as in (±)-3-deoxyrosaranolide, dihydroxylations such as in cladiell-11-ene-3,6,7-triol, and reductions such as in eucannabinolide.

Conformational preferences

Macrocycles can access a number of stable conformations, with preferences to reside in those that minimize the number of transannular nonbonded interactions within the ring. Medium rings (8-11 atoms) are the most strained with between 9-13 (kcal/mol) strain energy; analysis of the factors important in considering larger macrocyclic conformations can thus be modeled by looking at medium ring conformations. Conformational analysis of odd-membered rings suggests they tend to reside in less symmetrical forms with smaller energy differences between stable conformations.

Cyclooctane

Conformational analysis of medium rings begins with examination of cyclooctane. Spectroscopic methods have determined that cyclooctane possesses three main conformations: chair-boat, chair-chair, and boat-boat. Cyclooctane prefers to reside in a chair-boat conformation, minimizing the number of eclipsing ethane interactions (shown in blue), as well as torsional strain. The chair-chair conformation is the second most abundant conformation at room temperature, with a ratio of 96:4 chair-boat:chair-chair observed.

Substitution positional preferences in the ground state conformer of methyl cyclooctane can be approximated using parameters similar to those for smaller rings. In general, the substituents exhibit preferences for equatorial placement, except for the lowest energy structure (pseudo A-value of -0.3 kcal/mol in figure below) in which axial substitution is favored. The "pseudo A-value" is best treated as the approximate energy difference between placing the methyl substituent in the equatorial or axial positions. The most energetically unfavorable interaction involves axial substitution at the vertex of the boat portion of the ring (6.1 kcal/mol).

These energetic differences can help rationalize the lowest energy conformations of 8 atom ring structures containing an sp2 center. In these structures, the chair-boat is the ground state model, with substitution forcing the structure to adopt a conformation such that non-bonded interactions are minimized from the parent structure. From the cyclooctene figure below, it can be observed that one face is more exposed than the other, foreshadowing a discussion of privileged attack angles (see peripheral attack).

X-ray analysis of functionalized cyclooctanes provided proof of conformational preferences in these medium rings. Significantly, calculated models matched the obtained X-ray data, indicating that computational modeling of these systems could in some cases quite accurately predict conformations. The increased sp2 character of the cyclopropane rings favor them to be placed similarly such that they relieve non-bonded interactions.

Cyclodecane

Similar to cyclooctane, a cyclodecane ring exhibits several conformations with two lower energy conformations. The boat-chair-boat conformation is energetically minimized, while the chair-chair-chair conformation has significant eclipsing interactions.

These ground-state conformational preferences are useful analogies to more highly functionalized macrocyclic ring systems, where local effects can still be governed to first approximation by energy minimized conformations even though the larger ring size allows more conformational flexibility of the entire structure. For example, in methyl cyclodecane, the ring can be expected to adopt the minimized conformation of boat-chair-boat. The figure below shows the energetic penalty between placing the methyl group at certain sites within the boat-chair-boat structure. Unlike canonical small ring systems, the cyclodecane system with the methyl group placed at the "corners" of the structure exhibits no preference for axial vs. equatorial positioning due to the presence of an unavoidable gauche-butane interaction in both conformations. Significantly more intense interactions develop when the methyl group is placed in the axial position at other sites in the boat-chair-boat conformation.

Cyclodecane Figure 2.jpeg

Larger ring systems

Similar principles guide the lowest energy conformations of larger ring systems. Along with the acyclic stereocontrol principles outlined below, subtle interactions between remote substituents in large rings, analogous to those observed for 8-10 membered rings, can influence the conformational preferences of a molecule. In conjunction with remote substituent effects, local acyclic interactions can also play an important role in determining the outcome of macrocyclic reactions. The conformational flexibility of larger rings potentially allows for a combination of acyclic and macrocyclic stereocontrol to direct reactions.

Reactivity and conformational preferences

The stereochemical result of a given reaction on a macrocycle capable of adopting several conformations can be modeled by a Curtin-Hammett scenario. In the diagram below, the two ground state conformations exist in an equilibrium, with some difference in their ground state energies. Conformation B is lower in energy than conformation A, and while possessing a similar energy barrier to its transition state in a hypothetical reaction, thus the product formed is predominantly product B (P B) arising from conformation B via transition state B (TS B). The inherent preference of a ring to exist in one conformation over another provides a tool for stereoselective control of reactions by biasing the ring into a given configuration in the ground state. The energy differences, ΔΔG and ΔG0 are significant considerations in this scenario. The preference for one conformation over another can be characterized by ΔG0, the free energy difference, which can, at some level, be estimated from conformational analysis. The free energy difference between the two transition states of each conformation on its path to product formation is given by ΔΔG. The value of ΔG0 between not just one, but many accessible conformations is the underlying energetic impetus for reactions occurring from the most stable ground state conformation and is the crux of the peripheral attack model outlined below.

The peripheral attack model

Macrocyclic rings containing sp2 centers display a conformational preference for the sp2 centers to avoid transannular nonbonded interactions by orienting perpendicular to the plan of the ring. Clark W. Still proposed that the ground state conformations of macrocyclic rings, containing the energy minimized orientation of the sp2 center, display one face of an olefin outwards from the ring. Addition of reagents from the outside the olefin face and the ring (peripheral attack) is thus favored, while attack from across the ring on the inward diastereoface is disfavored. Ground state conformations dictate the exposed face of the reactive site of the macrocycle, thus both local and distant stereocontrol elements must be considered. The peripheral attack model holds well for several classes of macrocycles, though relies on the assumption that ground state geometries remain unperturbed in the corresponding transition state of the reaction.

Early investigations of macrocyclic stereocontrol studied the alkylation of 8-membered cyclic ketones with varying substitution. In the example below, alkylation of 2-methylcyclooctanone occurred to yield the predominantly trans product. Proceeding from the lowest energy conformation of 2-methylcycloctanone, peripheral attack is observed from either one of the low energy (energetic difference of 0.5 (kcal/mol)) enolate conformations, resulting in a trans product from either of the two depicted transition state conformations.

Unlike the cyclooctanone case, alkylation of 2-cyclodecanone rings does not display significant diastereoselectivity.

However, 10-membered cyclic lactones display significant diastereoselectivity. The proximity of the methyl group to the ester linkage was directly correlated with the diastereomeric ratio of the reaction products, with placement at the 9 position (below) yielding the highest selectivity. In contrast, when the methyl group was placed at the 7 position, a 1:1 mixture of diastereomers was obtained. Placement of the methyl group at the 9-position in the axial position yields the most stable ground state conformation of the 10-membered ring leading to high diastereoselectivity.

Conjugate addition to the E-enone below also follows the expected peripheral attack model to yield predominantly trans product. High selectivity in this addition can be attributed to the placement of sp2 centers such that transannular nonbonded interactions are minimized, while also placing the methyl substitution in the more energetically favorable position for cyclodecane rings. This ground state conformation heavily biases conjugate addition to the less hindered diastereoface.

Similar to intermolecular reactions, intramolecular reactions can show significant stereoselectivity from the ground state conformation of the molecule. In the intramolecular Diels-Alder reaction depicted below, the lowest energy conformation yields the observed product. The structure minimizing repulsive steric interactions provides the observed product by having the lowest barrier to a transition state for the reaction. Though no external attack by a reagent occurs, this reaction can be thought of similarly to those modeled with peripheral attack; the lowest energy conformation is the most likely to react for a given reaction.

The lowest energy conformations of macrocycles also influence intramolecular reactions involving transannular bond formation. In the intramolecular Michael addition sequence below, the ground state conformation minimizes transannular interactions by placing the sp2 centers at the appropriate vertices, while also minimizing diaxial interactions.

Prominent examples in synthesis

These principles have been applied in multiple natural product targets containing medium and large rings. The syntheses of cladiell-11-ene-3,6,7- triol, (±)-periplanone B, eucannabinolide, and neopeltolide are all significant in their usage of macrocyclic stereocontrol en route to obtaining the desired structural targets.

Cladiell-11-ene-3,6,7-triol

The cladiellin family of marine natural products possesses interesting molecular architecture, generally containing a 9-membered medium-sized ring. The synthesis of (−)-cladiella-6,11-dien-3-ol allowed access to a variety of other members of the cladiellin family. Notably, the conversion to cladiell-11-ene-3,6,7-triol makes use of macrocyclic stereocontrol in the dihydroxylation of a trisubstituted olefin. Below is shown the synthetic step controlled by the ground state conformation of the macrocycle, allowing stereoselective dihydroxylation without the usage of an asymmetric reagent. This example of substrate controlled addition is an example of the peripheral attack model in which two centers on the molecule are added two at once in a concerted fashion.

(±)-Periplanone B

The synthesis of (±)-periplanone B is a prominent example of macrocyclic stereocontrol. Periplanone B is a sex pheromone of the American female cockroach, and has been the target of several synthetic attempts. Significantly, two reactions on the macrocyclic precursor to (±)-periplanone B were directed using only ground state conformational preferences and the peripheral attack model. Reacting from the most stable boat-chair-boat conformation, asymmetric epoxidation of the cis-internal olefin can be achieved without using a reagent-controlled epoxidation method or a directed epoxidation with an allylic alcohol.

Epoxidation of the ketone was achieved, and can be modeled by peripheral attack of the sulfur ylide on the carbonyl group in a Johnson-Corey-Chaykovsky reaction to yield the protected form of (±)-periplanone B. Deprotection of the alcohol followed by oxidation yielded the desired natural product.

Eucannabinolide

In the synthesis of the cytotoxic germacranolide sesquiterpene eucannabinolide, Still demonstrates the application of the peripheral attack model to the reduction of a ketone to set a new stereocenter using NaBH4. Significantly, the synthesis of eucannabinolide relied on the usage of molecular mechanics (MM2) computational modeling to predict the lowest energy conformation of the macrocycle to design substrate-controlled stereochemical reactions.

Neopeltolide

Neopeltolide was originally isolated from sponges near the Jamaican coast and exhibits nanomolar cytoxic activity against several lines of cancer cells. The synthesis of the neopeltolide macrocyclic core displays a hydrogenation controlled by the ground state conformation of the macrocycle.

Occurrence and applications

One important application are the many macrocyclic antibiotics, the macrolides, e.g. clarithromycin. Many metallocofactors are bound to macrocyclic ligands, which include porphyrins, corrins, and chlorins. These rings arise from multistep biosynthetic processes that also feature macrocycles.

Macrocycles often bind ions and facilitate ion transport across hydrophobic membranes and solvents. The macrocycle envelops the ion with a hydrophobic sheath, which facilitates phase transfer properties.

The potassium (K+) complex of the macrocycle 18-crown-6 .

Macrocycles are often bioactive and could be useful for drug delivery.

Subdivisions

Representation of a Lie group

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Representation_of_a_Lie_group...