Showing posts with label genes. Show all posts
Showing posts with label genes. Show all posts

Saturday, February 4, 2023

Chapter 7: Gene Families and the Birth and Death of Genes

Introduction

The histone gene family. Definition of gene family. Pseudogenes. (p. 170-171)

The birth and death of genes

As genome evolve, new genes are born and old genes die. "Birth & death evolution" was mainly developed and promoted by Masatochi Nei beginning in the early 1970s. Many new genes arise by gene duplication but most of them become pseudogenes within a few million years. Some evolve new functions by subfunctionalizaton or neofunctionalization. (pp. 172-174)
[On the evolution of duplicated genes: subfunctionalization vs neofunctionalization]

   Box: The smell of sweat (pp. 174-175)

Gene duplication and mutationism

Gene duplication is due mostly to errors in recombination. This is a subset of segmental duplication and it leads to genome expansion. The creation of new genes by mutation is a key aspect of mutationism. (p. 175-177)
[Mutation, Randomness, & Evolution] [Replaying life's tape] [What is "structuralism"?] [Reactionary fringe meets mutation-biased adaptation: Introduction]

Whole genome duplications and the fate of genes

Polyploidization and hybridization give rise to species with twice as much DNA. The fate of that extra DNA, especially extra genes, can be tracked over time. It looks like the extra DNA is another example of junk DNA, lending support to the idea that species can tolerate large amounts of nonfunctional DNA. (pp. 177-179)
[The birth and death of salmon genes] [Birth and death of genes in a hybrid frog genome]

   Box: Real orphans in the human genome

Completely new genes, de novo genes, are rare but there are genuine examples of genes that are unique in the human genome (ORFans). They arise by gene duplication and they are often polymophic. (p. 180)

Different kinds of pseudogenes

There are four different kinds of pseudogenes: death of a duplicated gene, processed, unitary, and polymorphic. The human genome has about 15,000 pseudogenes (5% of the genome) and almost all of them are junk. The fixation of a pseudogene involves two steps; mutation and fixation by random genetic drift. Pseudogenes can become unrecognizable after 100 million years. (pp. 181-184)
[Is the high frequency of blood type O in native Americans due to random genetic drift?]

   Box: Conserved pseudogenes and Ken Miller's argument against intelligent design

The presence of a conserved pseudogene in the beta globin gene cluster in chimpanzee and human genomes is difficult to explain by intelligent design. The fact that a small segment of the beta-globin pseudogene contains a SAR sequence is irrelevant to the main argument. (pp. 185-186)

Are they really pseudogenes?

Pseudogenes are broken genes and they are junk by any reasonable definition (see "If It Walks Like a Duck" in chapter 3). Some scientists who are opposed to junk DNA have claimed that most pseudogenes must be functional based on the fact that a tiny nunmber have secondarily acquired a functon. This is an example of cherry picking. (p. 186-188)
[Are pseudogenes really pseudogenes?]

   Box: The short legs of dachhunds (p. 188-189)

How accurate is the genome sequence?

The accuracy of DNA sequencing methods is approaching 99.99%. If that is coupled to 30x coverage, the overall accuracy is good enough to reliably distinguish between functional genes and pseudogenes. You also need a reliable sequence of your personal genome if you are going to make decisions about your health. (pp. 189-191)

Notes for Chapter 7 (pp. 327-328)

Friday, February 3, 2023

Chapter 6: How Many Genes? How Many Proteins?

Introduction
I think there are about 25,000 genes in the human genome but the annotated human genome says there are 45,000 and many scientists claim there are a lot more genes. Why is there a controversy over the number of genes? (pp. 136-137)
Defining a gene
It's important to have a usuable definition of a gene. I define a gene as a DNA sequence that's transcribed to produce a functional product. The important point is that the gene product (RNA or protein) must have a biological function. (pp. 137-138)
[Dan Graur proposes a new definition of "gene"] [Gerald Fink promotes a new definition of a gene]
The molecular gene and the Mendelian gene
I'm talking about the molecular gene. The Mendelian gene is used in genetics and it's similar to the definition Richard Dawkins uses in his book The Selfish Gene. (pp. 138-139)
Counting genes
Draft sequences of genomes always contain predictions of large numbers of genes that are subsequently eliminated by annotators as more information becomes available. The current best estimates are that there are somewhat fewer than 20,000 protein-coding genes. (pp. 139-142))
[The 20th anniversary of the human genome sequence: 3. How many genes?] [How many protein-coding genes in the human genome? (2)] [How many protein-coding genes in the human genome?]
Counting proteins
The latest count is 18,407 proteins detected and 1,343 probable proteins that haven't yet been found for a total of 19,750. (pp. 142-143)
[How many proteins in the human proteome?]
The functions of protein-coding genes
There are about 10,000 housekeeping genes that encode the proteins required for basic metabolic processes. (pp. 143-144)
Historical estimates of the number of genes
Historical estimates predicted that the human genome would have about 30,000 genes and those estimates turned out to be approximately correct. Guesstimates about larger numbers of genes (e.g. 100,000) were not based on facts. (pp. 144-146)
[False history and the number of genes: 2016]
Confusion about the number of genes
The popular press claimed that knowledgeable scientists were predicting 100,000 genes but that's not correct. (p. 147)
[Nature falls (again) for gene hype]
The Deflated Ego Problem
Many scientists don't believe that humans could only have the same number of genes as nematodes and flowering plants. I call this The Deflated Ego Problem. (pp. 147-149)
[Deflated egos and the G-value paradox] [Revisiting the deflated ego problem] [The Deflated Ego Problem]
Introns and the size of genes
A typical protein-coding gene is 61,700 bp long but most of this is introns. Coding regions occupy about 1% of the genome and introns take up 37%. Genes account for 45% of the genome when you add in the noncoding genes. This number is not widely reported in the popular press. (pp. 149-151)
Introns are mostly junk
The weight of evidence strongly favors the view that most of the DNA in introns is junk. The splice sites and the minumum amount of DNA required to form a loop suggest that only 50 bp in each intron is functional DNA. (pp. 151-152)
[Are introns mostly junk?] [Are splice variants functional or noise?]
   Box: Yeast loses its introns
Yeast has lost most of its introns since it diverged from other fungi. Most of the rest can be deleted without causing any decrease in fitness but a few seem to be essential. More that 98% of the introns in yeast are dispensible, confirming the idea that introns are mostly junk. (pp. 153-154)
[Yeast loses its introns]
Alternative splicing: common or rare?
One way to solve the Deflated Ego Problem is to assume that human genes can make many different proteins by an alternative splicing mechanism. There are many real examples of biologically relevant alternative splicing. (pp. 154-156)
[Debating alternative splicing (Part I)] [Debating alternative splicing (Part II)] [Debating alternative splicing (Part III)] [Debating alternative splicing (Part IV)]
How does alternative splicing work?
Biologically relevant alternative splicing occurs when splicing factors alter the activity of the spliceosome. Splicing errors are common and mispliced transcripts (junk RNA) are easily detectable and entered into the transcript databases. (pp. 156-160)
Splicing errors are the best explanation
It's relatively easy to identify most splicing errors and eliminate those transcripts from the annotated reference genome. The vast majority of splice variants fall into the splicing errors category. (pp. 160-163)
[Splicing errors or alternative splicing?] [Alternative splicing and evolution] [Using conservation to determine whether splice variants are functional] [Splice variants of the human triose phosphate isomerase gene: is alternative splicing real?]
The case for splicing errors
There are 4 good reasons for concluding that true alternative splicing is confined to less than 5% of human protein-coding genes. (pp. 163)
[The frequency of splicing errors reflects the balance between selection and drift]
The controversy and how it’s reported
The controversy over the abundance of real alternative splicing is mostly ignored in the scientific literature and in the popular press. It is widely assumed that almost all human genes are alternatively spliced. (p. 164-165)
[Alternative splicing: function vs noise] [The persistent myth of alternative splicing] [The textbook view of alternative splicing] [The proteome complexity myth]
   Box: The false logic of the argument for complexity
If alternative splicing is going to solve the Defalted Ego Problem then it must distinguish humans from other species. But all species produce abundant transcripts due to splicing errors so humans are no different than nematodes or flowering plants. (pp. 166-167)
[Alternative splicing in the nematode C. elegans]
Alternative splicing and disease
Genetic diseases can be caused by errors in splicing. Their widespread occurance is taken to be proof that alternative splicing is ubiquitous, but disease-causing splice errors can also occur in junk DNA. (pp. 167-169)
Notes for Chapter 6 (pp. 324-327)