So what are these non-coding regions of the genome doing, and why are they so important? It’s when we start to consider this that we begin to notice what a strong effect language and terminology have on human thought processes. These regions are called non-coding, but what we mean is that they don’t code for
There is a well-known scientific proverb: absence of evidence is not the same as evidence of absence. For example, in astronomy, once scientists had developed telescopes that could detect infrared radiation, they were able to detect thousands of stars that had never been ‘seen’ before. The stars had always been there, but we couldn’t detect them conclusively until we had an instrument for doing so. A more everyday example might be a mobile phone signal. Such signals are all around us, but we cannot detect them unless we have a mobile phone. In other words, what we find depends very much on how we are looking.
Scientists identify the genes which are expressed in a specific cell type by analysing the RNA molecules. This is done by extracting all the RNA from cells and then analysing it using various different techniques, so that you build a database of all the RNA molecules that are present. When researchers in the 1980s first began investigating which genes were expressed in a given cell type, the techniques available were relatively insensitive. They were also designed to detect only mRNA molecules, as these were the ones that were assumed to be important. These methods tended to be good at detecting highly expressed mRNAs and quite poor at detecting the less well-expressed sequences. Another confounding factor was that the software used to analyse mRNA was set so that it would ignore signals originally generated from repetitive, i.e. ‘junk’, DNA.
These techniques served us very well for profiling the mRNA that we were already interested in – the mRNA molecules that coded for proteins. But as we have seen, this only represents about 2 per cent of the genome. It wasn’t until new detection technologies were coupled with hugely increased computing power that we began to realise that something very interesting was happening in the remaining 98 per cent – the non-coding part of our genome.
With these improved methodologies, the scientific world began to appreciate that there was actually a huge amount of transcription going on in the parts of the genome that didn’t code for proteins. Initially this was dismissed as ‘transcriptional noise’. It was suggested that there was a baseline murmur of expression from all over the genome, as if these regions of DNA occasionally produced an RNA molecule that got above a detection threshold. The concept was that although we could detect these molecules with our new, more sensitive equipment, they weren’t really biologically meaningful.
The phrase ‘transcriptional noise’ implies a basically random event. However, the patterns of expression of these non-protein-coding RNAs were different for different cell types, which suggested that their transcription was far from random[130]
. For example, there was a lot of this expression in the brain. It’s now become clear that the patterns of expression are different in different brain regions[131]. This effect is reproducible when the various brain regions are compared from different individuals. This isn’t what we would expect if this low-level transcription of RNA was a purely random process.It is becoming clearer that this transcription from genes that don’t code for protein is actually critically important for cellular function. Oddly, however, we remain caught in a linguistic trap of our own making. The RNA that is produced from these regions, the RNA that was previously under our radar, is still called non-coding RNA (ncRNA). It’s a sloppy shorthand, because what we really mean is non-
Re-defining rubbish
This is the paradigm shift. For at least 40 years molecular biologists and geneticists have focused almost exclusively on the genes that code for proteins, and the proteins themselves. There have been exceptions, but we’ve just treated these as the odd bits of rubble on the top of the shed. But non-coding RNAs are finally starting to stand firmly alongside proteins as fully functional molecules. Different but equal.