An analogy from everyday life may be useful here. The process of moving from DNA to mRNA to protein is a bit like controlling an image from a digital photograph. Let’s say we take a photograph on a digital camera of the most amazing thing in the world. We want other people to have access to the image, but we don’t want them to be able to change the original in any way. The raw data file from the camera is like the DNA blueprint. We copy it into another format, that can’t be changed very much – a PDF maybe – and then we email out thousands of copies of this PDF, to everyone who asks for it. The PDF is the messenger RNA. If people want to, they can print paper copies from this PDF, as many as they want, and these paper copies are the proteins. So everyone in the world can print the image, but there is only one original file.
Why so complicated, why not just have a direct mechanism? There are a number of good reasons that evolution has favoured this indirect method. One of them is to prevent damage to the script, the original image file. When DNA is unzipped it is relatively susceptible to damage and that’s something that cells have evolved to avoid. The indirect way in which DNA codes for proteins minimises the period of time for which a particular stretch of DNA is open and vulnerable. The other reason this indirect method has been favoured by evolution is that it allows a lot of control over the amount of a specific protein that’s produced, and this creates flexibility.
Consider the protein called alcohol dehydrogenase (ADH). This is produced in the liver and breaks down alcohol. If we drink a lot of alcohol, the cells of our livers will increase the amounts of ADH they produce. If we don’t drink for a while, the liver will produce less of this protein. This is one of the reasons why people who drink frequently are better able to tolerate the immediate effects of alcohol than those who rarely drink, who will become tipsy very quickly on just a couple of glasses of wine. The more often we drink alcohol, the more ADH protein our livers produce (up to a limit). The cells of the liver don’t do this by increasing the number of copies of the
As we shall see, epigenetics is one of the mechanisms a cell uses to control the amount of a particular protein that is produced, especially by controlling how many mRNA copies are made from the original template.
The last few paragraphs have all been about how genes encode proteins. How many genes are there in our cells? This seems like a simple question but oddly enough there is no agreed figure on this. This is because scientists can’t agree on how to define a gene. It used to be quite straightforward – a gene was a stretch of DNA that encoded a protein. We now know that this is far too simplistic. However, it’s certainly true to say that all proteins are encoded by genes, even if not all genes encode proteins. There are about 20,000 to 24,000 protein-encoding genes in our DNA, a much lower estimate than the 100,000 that scientists thought was a good guess just ten years ago[17]
.Editing the script
Most genes in human cells have quite a similar structure. There’s a region at the beginning called the promoter, which binds the protein complexes that copy the DNA to form mRNA. The protein complexes move along through what’s known as the body of the gene, making a long mRNA strand, until they finally fall off at the end of the gene.
Imagine a gene body that is 3,000 base-pairs long, a perfectly sensible length for a gene. The mRNA will also be 3,000 base-pairs long. Each amino acid is encoded by a codon composed of three bases, so we would predict that this mRNA will encode a protein that is 1,000 amino acids long. But, perhaps unexpectedly, what we find is that the protein is usually considerably shorter than this.
If the sequence of a gene is typed out it looks like a long string of combinations of the letters A, C, G and T. But if we analyse this with the right software, we find that we can divide that long string into two types of sequences. The first type is called an exon (for
When the mRNA is first copied from the DNA it contains the whole run of exons and introns. Once this long RNA molecule has been created, another multi-sub-unit protein complex comes along. It removes all the intron sequences and then joins up the exons to create an mRNA that codes for a continuous run of amino acids. This editing process is called splicing.