‘DNA Printing’ in the Cloud, Part 1
In DNA printing, genetic code becomes computer code. This transformation occurs when the chemical bases adenine, thymine, cytosine and guanine present in a chemical mix or gene sequence are translated by computer through gel electrophoresis technology into their representative letters: A/T, T/A, C/G, G/C.
This alphabet code was formalized in 1970 by the International Union of Pure and Applied Chemistry (IUPAC) for integration into a text-based bioinformatics format, called “FASTA,” in which nucleotides are represented symbolically using single letters.
Also known as “artificial gene sequencing, synthesis and protein production,” DNA printing is a method in synthetic biology that is used to create artificial genes in the laboratory. What sets it apart from molecular cloning and polymerase chain reaction (PCR) is that scientists can use DNA printing to make a completely synthetic double-stranded DNA molecule artificially, without the need for preexisting DNA sequences.
The science behind DNA printing of rDNA and proteins is known as “phosphonamidite chemistry” and “solid-phase DNA synthesis.”
Artificial DNA in a Jar
“This means you can buy in jars chemicals which are derived from sugar cane, and the chemical phosphoramidites in these four bottles end up being the four bases of DNA … A/T, C/G, T/A, G/C … in a form that can be readily assembled,” explained DrewEndy, assistant professor of bioengineering at Stanford University, in a 2008 Long New Foundation presentation titled “Creating Synthetic DNA.”
“So, you hook these bottles up to a machine, and into the machine comes information from a computer, a sequence of DNA … whatever you would like to build, and that machine will stitch the genetic materials together from scratch,” he continued. “It’s DNA synthesis … . You take information and the raw chemicals and you compile genetic material. It’s practically speaking the coolest, most impressive/scary technology I’ve encountered.”
Artificial DNA synthesis involves building a man-made version of the nucleic acid strands that form genetic code.
Currently, solid-phase synthesis is carried out automatically using computer-controlled instruments.
A “gene of interest” fragment sequence FASTA file is downloaded to an automated synthesizer. The synthesizer computer’s onboard synthesis program applies this code to an actual phosphoramidite chemical mix of nucleobase pairs, the building blocks of DNA — adenine/thymine, cytosine/guanine — represented in the computer as the letters AT/CG.
The desired AT/CG sequence is entered on a keyboard and the system’s microprocessor automatically opens the valves of the containers of successive AT/CG phosphoramidite nucleotide bases, reagents and solvents needed at each step, into a synthesizer column, which is packed with tiny microbeads (called a “resin”) made of controlled pore glass (CPG), polystyrene or silica. These beads provide support on which DNA molecules are assembled.
The phosphoramidite building blocks are coupled sequentially to the beads that support the growing nucleotide chain in the order required by the sequence of the “gene of interest” and the intended downstream protein product (e.g., a vaccine, biologic). The chemical succinyl acts as a sequence-specific linker of phosphoramidite molecules to target beads.
Upon the completion of the chain assembly process and after all steps are finished, the synthesized compound is cleaved chemically from the solid-phase beads, released to solution and deprotected, and the resulting strand of synthetic gene or genes is collected for purification.
The method has been used to generate functional bacterial or yeast chromosomes containing approximately 1 million base pairs. (By comparison, the human genome is made up of 3 billion base pairs).
Making a Protein – Proteomics in Action
Once purified, the gene is ready to make a protein. The journey from gene to protein is complex and tightly controlled within each cell.
Isolation of a specific gene begins with scientists constructing a DNA library — a comprehensive collection of cloned DNA fragments from a particular cell, tissue or organism.
The DNA containing the target gene(s) is split into fragments using restriction enzymes or the protein Cas9 (or CRISPR-associated), an enzyme that acts like a pair of “molecular scissors” capable of cutting strands of DNA.
The target gene of interest in a segment of DNA is isolated and inserted into the purified DNA genome of a self-replicating genetic element — generally a virus or a bacterial plasmid. The gene of interest merges with the plasmid’s DNA to make a recombinant DNA molecule known as a plasmid “cloning expression vector.”
Cloning vectors are plasmids used primarily to propagate DNA. An expression vector is a specialized type of cloning vector designed to allow transcription of the genetic information into messenger RNA (mRNA) and translation into a protein.
Because bacteria divide rapidly, they can be used as “factories” to copy DNA fragments in large quantities. E. coli is used widely in laboratories as a host organism because it is easy to manipulate and inexpensive to grow. E. Coli is the most common prokaryotic (no membrane-bound nucleus) organism used in research. It is an excellent host for producing various proteins, and was one of the first organisms to have its genome sequenced, in 1997.
Once the vector is inserted into an E. coli bacteria cell (transformation) for amplification, the rDNA molecule replicates inside the host E. coli bacteria cell while the host cell divides, forming a clone of cells called a “library.”
DNA contains the instructions to assemble amino acids in a specific order. Each cell type only “turns on” (or expresses) the genes that have the code for the proteins it needs to use.
Double-stranded DNA “breathes” (frays) in a rhythmic unwrapping and rewrapping, zippering and unzippering — a dynamic opening and closing of “bubbles” between the two strands that leads to the breaking apart of base pairs.
The bubble opening between the two strands results in a transient single-stranded DNA region containing one or more bases, allowing proteins to gain their initial access to DNA through ribonucleic acid (RNA), a long, single-stranded chain of cells that process protein.
There are four types of RNA, and each is encoded by its own type of gene: mRNA (messenger RNA) encodes amino acid sequence of a polypeptide; tRNA (transfer RNA) brings amino acids to ribosomes during translation; rRNA (ribosomal RNA), along with ribosomal proteins, makes up the ribosomes — the organelles that translate the mRNA; and snRNA (small nuclear RNA), along with proteins, forms complexes that are used in RNA processing.
Gene DNA sequences instruct cells to produce particular proteins. RNA enzymes read the information in a DNA molecule and transcribe it into the intermediary messenger ribonucleic acid (mRNA) molecule.
Transcription begins when an enzyme called “RNA polymerase” attaches to the newly opened DNA template strand and begins assembling a new chain of nucleotides to produce a complementary RNA strand.
The Universal Genetic Code contained in DNA sequences enables a cell to translate the nucleotide “language” of DNA into the amino acid “language” of proteins made of long chains of amino acids joined end to end. Amino acids have many functions, but the most well known is that they are the building blocks for protein synthesis.
The genes in RNA that code for proteins are composed of codons, a triplet of adjacent nucleotides (ATC/GAC, etc.) in the messenger RNA (mRNA) chain. Each codon codes for a single, specific amino acid in the synthesis of a protein molecule.
Here’s where the gene of interest begins morphing into the protein of interest. When the DNA gene of interest segment is fully transcribed into RNA, one base of DNA corresponds to one base of RNA, now mRNA.
This DNA-created mRNA molecule then carries DNA’s coded instructions for making a protein. The DNA information contained in the mRNA molecule has been translated into the “language” of amino acids, the building blocks of proteins.
Together, transcription and translation are known as “gene expression” or “protein synthesis,” all of which describe the same process that takes place in the cell cytoplasm — the cell substance between the cell nucleus and outer membrane.
After building the template to construct a protein, the mRNA molecule brings the DNA message out of the cell nucleus into the cell cytoplasm to protein-manufacturing ribosomes. Ribosomal ribonucleic acid (rRNA), the RNA component of the ribosome, is essential for protein synthesis.
During translation, ribosomal subunits assemble together like a sandwich on the strand of mRNA newly arrived from the cell nucleus with its genetic code for creating a protein. The ribosomal subunits proceed to attract transfer RNA (tRNA) molecules tethered to amino acids.
E. coli has amino acids within the cell, or can pull them into the cytoplasm from an outside environment like a nutrient mix. tRNA transfers amino acids from the cell cytoplasm to the ribosome.
The complex ribosomal structures physically move along an mRNA molecule like a train on a track, catalyzing the assembly of amino acids into protein chains. They also bind tRNAs and various accessory molecules necessary for protein synthesis.
A long chain of amino acids emerges as the ribosome decodes the mRNA sequence into a polypeptide chain, or a new protein.
As the recombinant proteins are produced by the cloned genes, the E. coli host cells start accumulating. Surviving clones that carry the protein of interest form a colony, which is grown into a large culture.
The next task is to collect and purify the specific product, i.e., the desired recombinant protein. The first step in the collection of recombinant DNA expressed in E. coli is the lysis (loosening, destruction) of the E. coli cell to release the protein of interest.
In the cell lysis process, the bacteria’s cell membrane is ruptured, exposing the contents. Lipids from the cell membrane and the nucleus are broken down with detergents and surfactants. Extraction, separation and purification are the techniques used to concentrate the protein of interest macromolecule.
The purification of the newly created target protein is a necessary step after its extraction from the E. coli bacterium and its separation from cell debris and other insoluble material, contaminants, the crude biological source, plasmid DNA, and other proteins and macromolecules. Purification is achieved either by enzymatic or chemical means.
Most commercial proteins are developed in phosphate buffered saline solutions. Liquid formulations usually are preferred for injectable protein therapeutics (in terms of convenience for the end user and ease of preparation for the manufacturer).
The most common liquid product containers are bottles, flasks, vials and trays. The liquid form is not always feasible, given the susceptibility of proteins to denaturation and aggregation under stresses such as heating, freezing, pH changes and agitation, all of which could result in the loss of biological activity.
Lyophilization, also called “freeze-drying,” is one method of drying biological materials that minimizes damage to its internal structure. Lyophilization generally results in improved stability profiles.
Lyophilized protein products can be shipped and stored in powder form in plastic and glass jars and bottles. At time of use, the original liquid formulation is reconstituted. The protein can be supplied in a two-chamber cartridge, with the lyophilized powder in the front chamber and a diluent in the rear chamber. A reconstitution device is used to mix the diluent and powder.
Some proteins designed for oral consumption can be distributed as capsules consisting of powder or jelly enclosed in a dissolvable gelatin container. A tablet is a compressed powder in solid form.
DNA Synthesizers
DNA synthesizers are machines used to custom-build DNA molecules to contain a particular sequence of nucleotides. DNA synthesizers can create specific DNA molecules for use in the treatment of a variety of diseases by replacing a faulty or damaged section of DNA with a repaired section.
The devices accept digital representations of DNA in the FASTA file format over the Internet, and reconstruct them using chemicals represented by the four AT/CG nitrogenous nucleotide bases that make up DNA.
Following are some examples of leading commercial DNA synthesizers:
- the GenPlus Next-Gen HT Gene Synthesis platform from GenScript Biotech Corp.;
- a semiconductor-based synthetic DNA manufacturing process featuring a high-throughput silicon platform from Twist Bioscience Corp.;
- the Invitrogen GeneArt GeneAssembler gene synthesis platform from Thermo Fisher Scientific; and
- the Gene Designer from ATUM (formerly DNA2).