How DNA Works
A 7-minute read
Every cell in your body contains about two meters of DNA, coiled so tightly it fits inside a space 100,000 times thinner than a human hair. That molecule is the instruction manual for building and running you.
In 1952, Rosalind Franklin aimed X-rays at a crystal of DNA and captured an image that would reshape biology. The photograph, known as Photo 51, revealed a fuzzy X pattern that told a precise story: DNA was a helix. A year later, James Watson and Francis Crick used her data to build the first accurate model of the double helix structure. What they described that morning in a Cambridge pub as having “found the secret of life” was, in a real sense, exactly that.
The short answer
DNA (deoxyribonucleic acid) is a molecule that stores the genetic instructions used to build and operate living organisms. It consists of two strands wound around each other in a double helix, connected by four chemical bases: adenine, thymine, cytosine, and guanine. The sequence of those bases encodes genes, which tell cells how to make proteins, and proteins do nearly everything else.
The full picture
The structure: a twisted ladder
Think of DNA as a twisted ladder. The two sides of the ladder are made of alternating sugar (deoxyribose) and phosphate molecules. The rungs connecting the two sides are pairs of chemical bases: adenine always pairs with thymine, and cytosine always pairs with guanine. These specific partnerships, called base pairs, are what make DNA copyable and readable.
The human genome contains about 3.2 billion base pairs. If you stretched out the DNA from a single cell, it would be roughly two meters long. Yet it’s coiled, folded, and compacted into a nucleus that’s about 6 micrometers across. The compaction ratio is extraordinary, like fitting a thread 40 kilometers long into a walnut shell.
The four bases spell out a code. Groups of three bases (called codons) specify individual amino acids, and strings of amino acids fold into proteins. The full set of instructions in a human genome is about 750 megabytes of data, roughly the capacity of a CD.
Genes: the functional units
Not all DNA is genes. Of the 3.2 billion base pairs in human DNA, only about 1.5% encodes proteins. The rest was long dismissed as “junk DNA,” but large portions of it regulate when and how genes are switched on, help maintain chromosome structure, and perform other functions researchers are still mapping.
A gene is a stretch of DNA that encodes instructions for making a specific protein. Humans have roughly 20,000 protein-coding genes, a surprisingly low number given our biological complexity. A fruit fly has about 14,000. The difference in complexity between species comes less from the number of genes and more from how genes are regulated and combined.
Genes are located on chromosomes. Humans have 46 chromosomes, arranged in 23 pairs: one set from each parent. The 23rd pair determines biological sex: XX in females, XY in males.
How DNA is read: transcription and translation
The process of turning a gene into a protein happens in two stages.
First, transcription: a molecular machine called RNA polymerase reads the DNA sequence and produces a matching strand of messenger RNA (mRNA). mRNA is a single-stranded molecule that carries the gene’s instructions out of the nucleus.
Second, translation: the mRNA travels to ribosomes, the cell’s protein factories. There, transfer RNA (tRNA) molecules read the mRNA three bases at a time, each bringing the matching amino acid. The ribosome links those amino acids together in sequence, producing a protein.
The protein then folds into a specific three-dimensional shape determined by its amino acid sequence, and that shape determines what the protein does. Hemoglobin carries oxygen. Insulin signals cells to absorb glucose. Collagen provides structural support. Virtually every biological function is executed by a protein encoded in DNA.
How DNA copies itself: replication
Before a cell divides, it must copy all 3.2 billion base pairs of its DNA. This happens through a process called replication, which is remarkably accurate: the error rate is roughly one mistake per billion base pairs copied, achieved through multiple proofreading mechanisms.
An enzyme called helicase unzips the double helix, separating the two strands. Another enzyme, DNA polymerase, reads each strand and builds a new complementary strand alongside it. Because A pairs only with T and C pairs only with G, each original strand serves as a template for an exact copy. The result is two identical double helices where one existed before.
The entire human genome is copied in about eight hours. Multiple replication forks start simultaneously, working in parallel, to complete the job in time for cell division.
Mutations: when the code changes
Mutations are changes to the DNA sequence. They can be caused by errors during replication, exposure to UV radiation or certain chemicals, or viral infections. Most mutations are either repaired by the cell’s DNA repair machinery or are neutral in effect. Some are harmful. A small number are beneficial.
Cells have multiple layers of DNA repair. Mismatch repair enzymes scan newly synthesized DNA for wrong base pairs. Nucleotide excision repair removes chemically damaged bases. If the damage is too severe to fix, cells can trigger self-destruction (apoptosis) to prevent passing on broken instructions.
When repair mechanisms fail, mutations accumulate. This is central to aging and cancer: over decades, errors build up in the DNA of dividing cells, sometimes hitting genes that control cell growth and eventually causing uncontrolled division.
Epigenetics: reading the same code differently
Your genome is fixed from conception onward (setting aside mutations). But which genes get expressed varies enormously between cell types and in response to environment. This is the domain of epigenetics.
Chemical tags called methyl groups can attach to DNA bases, typically silencing the nearby gene. Proteins called histones, around which DNA is wrapped, can be chemically modified to either loosen or tighten the DNA’s packaging, making genes more or less accessible to RNA polymerase. Diet, stress, exercise, and exposure to chemicals can all affect these epigenetic marks, changing gene expression without changing the DNA sequence itself.
Some epigenetic marks are heritable, passed from parent to child, which means the environment experienced by one generation can influence gene expression in the next, without any change to the underlying sequence.
Why it matters
The practical implications of understanding DNA have already transformed medicine. Genetic tests can now identify mutations that raise risk for breast cancer (BRCA1 and BRCA2), predict how a patient will respond to specific drugs (pharmacogenomics), and diagnose rare hereditary disorders before symptoms appear. The cost of sequencing an entire human genome fell from roughly $100 million in 2001 to under $1,000 by 2016, and continues to drop.
Beyond medicine, DNA forensics can identify individuals from trace amounts of biological material. Agriculture uses genetic knowledge to breed crops with higher yields or disease resistance. Evolutionary biology can now trace human migration out of Africa tens of thousands of years ago using DNA from ancient bones.
CRISPR-based therapies that rewrite the genetic code are now entering clinical use. The first approved CRISPR treatment, Casgevy, corrects a single mutation in the hemoglobin gene to cure sickle cell disease. Understanding DNA is the prerequisite for all of it.
Common misconceptions
“Your genes determine your destiny.” Genes set probabilities, not outcomes. Someone carrying a BRCA1 mutation has a significantly elevated lifetime risk of breast cancer, but many carriers never develop it. Environment, lifestyle, epigenetics, and chance all interact with genetic predisposition. The phrase “genes load the gun, environment pulls the trigger” is a simplification, but it captures something real.
“Humans have more genes than simpler organisms.” Not necessarily. Humans have roughly 20,000 protein-coding genes. The water flea Daphnia pulex has about 31,000. The number of genes correlates poorly with organismal complexity. What matters more is how genes are regulated, spliced, and combined, not how many there are.
“DNA is identical in every cell.” The sequence is (almost) identical, but expression varies dramatically. A muscle cell and a neuron contain the same genome but behave entirely differently because different sets of genes are switched on. The genome is less like a single instruction manual and more like a library: every cell has access to every book, but each cell type only reads certain ones.
Key terms
- DNA (deoxyribonucleic acid): The molecule that stores genetic information as a sequence of four chemical bases
- Base pair: A matched pair of DNA bases connected across the two strands: adenine with thymine, cytosine with guanine
- Gene: A stretch of DNA that encodes instructions for making a specific protein
- Genome: The complete set of DNA in an organism, including all genes and non-coding regions
- Chromosome: A long DNA molecule, tightly coiled around proteins, that carries many genes
- mRNA (messenger RNA): A single-stranded molecule that carries a gene’s instructions from the nucleus to ribosomes
- Protein: A molecule built from amino acids according to DNA instructions; performs most biological functions
- Mutation: A change in the DNA sequence, which may be neutral, harmful, or occasionally beneficial
- Epigenetics: Chemical modifications to DNA or histones that change gene expression without altering the DNA sequence
- Replication: The process of copying DNA before cell division