Genes Shmenes

Once upon a time, Mendel invented this really neat theory that talked about the ratios of inheritance of biological expressions over the iterative process of generations. Mendelian genetics talked about recessive and dominant eigentraits he called genes. Mendel was very successful in accounting for biological expression in these terms, and so people set out trying to find some visible evidence of the existence of the eigentraits.

A few decades later, Rosalind Franklin did some really neat crystallography which Watson and Crick interpreted as helical structure thereby figuring out how DNA could make an excellent vector for inheritance of eigentraits. Their work spawned a huge endeavor, culminating in the vast undertaking we call genomics. A lot of this activity has been funded on the basis that genes are little programs sitting in the DNA, each corresponding with a Mendelian trait, and all we have to do is cut and paste these programs to obtain marvellous new medicines and diagnostic tools. In fact the analogy between genes and programs goes all the way back to Turing, who based the theory of computability on an only slightly veiled analogy to DNA transcription.

Now the trouble is, there are no genes in the genome. Not even one. The emperor of bioscience wears no clothes.

Woah! That's not right! How dare he! Off with his head!

Sorry to be the one to break it to you, but your model of the genome is too simple to be true. In fact what's in there is a heckuvalot more complicated than little programs each corresponding with a Mendelian eigentrait. Here's just a little of it:

In short, Mendelian genes are behaviors of a hugely complex phase space of protein interactions, only some of which involve DNA expression feedback loops. There are no such things as genes - not as the nice neat digital protein replicators used to hype investment in *omics.

I smell a ParadigmShift in the wind ...


(My contribution should probably be to alter the above text, thus preserving pure DocumentMode; however, as the above text has an oratorical style, i will instead just add this footnote)

There are genes in the genome. Lots of them. Molecular biologists generally take the word 'gene' to mean 'a single transcription unit and its associated cis-acting regulatory sequences'; this is a contiguous, reasonably well-defined unit on the genome. It is, however, certainly a lot more complicated than one gene per trait (mostly - having cystic fibrosis, for instance, is linked to a single gene). Exons and introns are part of genes. Exons do get transcribed. Any one exon does appear in several different transcripts, but they are all different transcripts of one gene (unless there is "splicing in trans", which is rare). Proteins don't get recombined (except in the case of inteins, which are very rare).

The post-translational modifications you mention often lead to proteins which are combinations of the products of multiple translation units - these are called multi-modular proteins. The word recombination doesn't usually mean this, however, so the text above has been altered accordingly. Thanks! I still disagree. Modular proteins are the products of single genes which happen to be factored into a sequence of independently-folding domains. You may be using a broader definition of 'protein' in which a protein can consist of several polypeptides, each encoded by its own gene (i would call this a 'protein complex'); this is not particularly connected with post-translational modification (although such modifications are often a crucial way of regulating the assembly of such complexes, and this their function). There are ways in which whole protein molecules are fixed together by post-translational modifications, but they are not, in general, diversity-generating (eg proteins A and B may be linked by a disulphide bridge, but there is never a case where any one of A, B and C and any one of X, Y and Z can be linked), and once the link has been forged, it is not unmade until the protein is degraded.

I'll try to explain a little. This is a very simplified account!

The genome is a set of chromosomes. A chromosome is a sequence of bases, and is thus essentially a string over the alphabet "ACGT". Everything the genome does is because there are molecules in the cell which recognise particular substrings in the genome (their 'binding sites') and have appropriate biochemical behaviour. A gene is a complex region of the genome, containing binding sites for a wide array of molecules, which directs the production of a particular type of protein. The production of a protein may be divided into three steps: transcription, splicing and translation.

Transcription is the process of copying the text of part of the genome (a 'transcription unit') into an intermediate form, a molecule of RNA (which is also a sequence of bases). The 'header' of a gene contains a region known as a promoter, which contains multiple binding sites for proteins involved in transcription; the presence of a promoter thus drives transcription. The 'trailer' of a gene contains sequences defining the end of the transcription unit (usually something called a 'polyadenylation signal', but sometimes a thing called a 'terminator').

Splicing is a process in which bits of the RNA transcript are removed; this where exons and introns come in. The transcript contains binding sites for molecules involved in splicing. The location of these defines a pattern in the RNA in which some bits are chopped out, and the regions on either side joined together. The bits which are chopped out are called introns, and the bits which are left in are called exons. It's a bit more complicated than this, because sometimes exons get removed too, or the splicing machinery changes its mind about where the edges of an exon are, with the result that there are many possible splicing products. The decision as to which is made involves both the static sequence elements in the transcript and the dynamic state of the cell.

Translation is a process in which the RNA transcript is used to control the synthesis of a protein. Again, this involves a sequence in the RNA to which the translation machinery binds, and a sequence which makes it stop at the end. Translation is fairly straightforward in information terms: groups of three bases are read off and used as a key into a lookup table of amino acids; the selected amino acid is added to the growing protein.

Thus, one gene on the genome specifies a single primary transcript, which can be spliced to form several different mature transcripts, which in turn specify one protein each. There is extra complexity at each of these steps which i have not mentioned: transcription and splicing occur simultaneously (well, they form a pipeline), and splicing can affect the position at which transcription terminates; various editing processes can alter the RNA (but these are rare); specific sequences in the RNA can override the normal translation process; after proteins are produced, there are a host of post-translational modifications which alter them, which are specified not just by the protein sequence, but by the three-dimensional shape into which it folds, which is itself a very complicated function of its sequence.

Oh, and non-local electron signalling along the genome is, as far as I know, at best, speculative.

ExecutiveSummary?: Biology is a lot more complicated than you think.

-- TomAnderson


This is all very well, and thank you for taking the time Tom, but it doesn't address the thesis above: Mendelian genes and genomic genes bear no particular relationship to each other. There are some critical sections of the genome which, if altered, prevent some of our molecular machinery from working. This is similar to pulling a chip out of a motherboard. If you do that and suddenly your memory manager breaks, that doesn't mean your memory manager is contained by that chip. Likewise there are no genes on the genome.

Let's make a long story short, based on known facts: a lot of the old dogma is incorrect, yes, and both you and Tom state things about the added complexity that is now understood to go beyond the old dogma.

But there are classic genes that are exons that are Mendelian traits, even though that turns out not to be the entire story, not by a long shot, and even though some biologists are still in denial about that. Because that is part of the story, however, it is simply false to say "there are no genes on the genome". There are some classical "genes" in the classical genome, and there's also a lot more going on as well.

Right ... there are SingletonPatterns in OO designs so it is simply false to say "there are no procedures in OO". The point is simply that the Mendelian gene = genomic sequence view is a FrameProblem.

I would quibble with a few parts of what Tom said, but otherwise what he said is based on good evidence, whereas saying "there are no genes on the genome" is merely inflammatory rhetoric that vastly exaggerates what is known.

Can you list a few of your classical genes so we can go deeper here?

The way that I would phrase it is that "gene" should be defined in terms of Mendelian traits when possible, and those biologists who insist that a "gene" must be defined in terms of simple exons are stubborn reactionaries using an unhelpful definition that will eventually be universally superseded. A "gene" should be considered to be a unit of transmission, whether via DNA or not, so that e.g. directly copied nuclear metabolilc RNA that behaves as a Mendelian trait should also be called a "gene", even though that's not so neat and tidy as the previous DNA dogma.

BTW this is precisely why there are so many fewer "genes" in the human genome than had previously been predicted. If one insists on "genes" being exons, yes, there are surprisingly few, but taking all known mechanisms into account, there's no surprise. You can therefore tell the stubborn reactionaries simply by whether they're quoted in the news as being surprised at how few genes there are. -- DougMerritt


but your model of the genome is too simple to be true

I should certainly hope so :)

(AllModelsAreWrongSomeModelsAreUseful)


I recently read, and very much enjoyed, two books that touch indirectly on this topic: TheGeniusWithin and EvoDevo.

TheGeniusWithin postulates that cells, immune systems, ant colonies, bacteria species, and our brains (among other biological systems) are "intelligent", in the sense they can learn, remember, and respond to stimuli accordingly.

(UniversalMind ...)

EvoDevo explains a new discipline that studies how 1) a single egg turns into an adult animal (development, or devo), and 2) how changes in this process produce new species (evolution, or evo). As part of explaining such Kipling-esque topics such as "how the zebra got its stripes" and "how the butterfly got its spots", this book talks about "on-off switches", located near (but outside of) the active areas of our DNA that are called (have traditionally been called?) "genes". This book says "genes" comprise about 1% of DNA, and "switches" comprise about 3% of DNA (if my leaky memory is correct).


See Also: MemesShmemes


CategoryBiology


EditText of this page (last edited June 27, 2006) or FindPage with title or text search