Saturday, February 4, 2023

Chapter 7: Gene Families and the Birth and Death of Genes

Introduction

The histone gene family. Definition of gene family. Pseudogenes. (p. 170-171)

The birth and death of genes

As genome evolve, new genes are born and old genes die. "Birth & death evolution" was mainly developed and promoted by Masatochi Nei beginning in the early 1970s. Many new genes arise by gene duplication but most of them become pseudogenes within a few million years. Some evolve new functions by subfunctionalizaton or neofunctionalization. (pp. 172-174)
[On the evolution of duplicated genes: subfunctionalization vs neofunctionalization]

   Box: The smell of sweat (pp. 174-175)

Gene duplication and mutationism

Gene duplication is due mostly to errors in recombination. This is a subset of segmental duplication and it leads to genome expansion. The creation of new genes by mutation is a key aspect of mutationism. (p. 175-177)
[Mutation, Randomness, & Evolution] [Replaying life's tape] [What is "structuralism"?] [Reactionary fringe meets mutation-biased adaptation: Introduction]

Whole genome duplications and the fate of genes

Polyploidization and hybridization give rise to species with twice as much DNA. The fate of that extra DNA, especially extra genes, can be tracked over time. It looks like the extra DNA is another example of junk DNA, lending support to the idea that species can tolerate large amounts of nonfunctional DNA. (pp. 177-179)
[The birth and death of salmon genes] [Birth and death of genes in a hybrid frog genome]

   Box: Real orphans in the human genome

Completely new genes, de novo genes, are rare but there are genuine examples of genes that are unique in the human genome (ORFans). They arise by gene duplication and they are often polymophic. (p. 180)

Different kinds of pseudogenes

There are four different kinds of pseudogenes: death of a duplicated gene, processed, unitary, and polymorphic. The human genome has about 15,000 pseudogenes (5% of the genome) and almost all of them are junk. The fixation of a pseudogene involves two steps; mutation and fixation by random genetic drift. Pseudogenes can become unrecognizable after 100 million years. (pp. 181-184)
[Is the high frequency of blood type O in native Americans due to random genetic drift?]

   Box: Conserved pseudogenes and Ken Miller's argument against intelligent design

The presence of a conserved pseudogene in the beta globin gene cluster in chimpanzee and human genomes is difficult to explain by intelligent design. The fact that a small segment of the beta-globin pseudogene contains a SAR sequence is irrelevant to the main argument. (pp. 185-186)

Are they really pseudogenes?

Pseudogenes are broken genes and they are junk by any reasonable definition (see "If It Walks Like a Duck" in chapter 3). Some scientists who are opposed to junk DNA have claimed that most pseudogenes must be functional based on the fact that a tiny nunmber have secondarily acquired a functon. This is an example of cherry picking. (p. 186-188)
[Are pseudogenes really pseudogenes?]

   Box: The short legs of dachhunds (p. 188-189)

How accurate is the genome sequence?

The accuracy of DNA sequencing methods is approaching 99.99%. If that is coupled to 30x coverage, the overall accuracy is good enough to reliably distinguish between functional genes and pseudogenes. You also need a reliable sequence of your personal genome if you are going to make decisions about your health. (pp. 189-191)

Notes for Chapter 7 (pp. 327-328)

Friday, February 3, 2023

Chapter 6: How Many Genes? How Many Proteins?

Introduction
I think there are about 25,000 genes in the human genome but the annotated human genome says there are 45,000 and many scientists claim there are a lot more genes. Why is there a controversy over the number of genes? (pp. 136-137)
Defining a gene
It's important to have a usuable definition of a gene. I define a gene as a DNA sequence that's transcribed to produce a functional product. The important point is that the gene product (RNA or protein) must have a biological function. (pp. 137-138)
[Dan Graur proposes a new definition of "gene"] [Gerald Fink promotes a new definition of a gene]
The molecular gene and the Mendelian gene
I'm talking about the molecular gene. The Mendelian gene is used in genetics and it's similar to the definition Richard Dawkins uses in his book The Selfish Gene. (pp. 138-139)
Counting genes
Draft sequences of genomes always contain predictions of large numbers of genes that are subsequently eliminated by annotators as more information becomes available. The current best estimates are that there are somewhat fewer than 20,000 protein-coding genes. (pp. 139-142))
[The 20th anniversary of the human genome sequence: 3. How many genes?] [How many protein-coding genes in the human genome? (2)] [How many protein-coding genes in the human genome?]
Counting proteins
The latest count is 18,407 proteins detected and 1,343 probable proteins that haven't yet been found for a total of 19,750. (pp. 142-143)
[How many proteins in the human proteome?]
The functions of protein-coding genes
There are about 10,000 housekeeping genes that encode the proteins required for basic metabolic processes. (pp. 143-144)
Historical estimates of the number of genes
Historical estimates predicted that the human genome would have about 30,000 genes and those estimates turned out to be approximately correct. Guesstimates about larger numbers of genes (e.g. 100,000) were not based on facts. (pp. 144-146)
[False history and the number of genes: 2016]
Confusion about the number of genes
The popular press claimed that knowledgeable scientists were predicting 100,000 genes but that's not correct. (p. 147)
[Nature falls (again) for gene hype]
The Deflated Ego Problem
Many scientists don't believe that humans could only have the same number of genes as nematodes and flowering plants. I call this The Deflated Ego Problem. (pp. 147-149)
[Deflated egos and the G-value paradox] [Revisiting the deflated ego problem] [The Deflated Ego Problem]
Introns and the size of genes
A typical protein-coding gene is 61,700 bp long but most of this is introns. Coding regions occupy about 1% of the genome and introns take up 37%. Genes account for 45% of the genome when you add in the noncoding genes. This number is not widely reported in the popular press. (pp. 149-151)
Introns are mostly junk
The weight of evidence strongly favors the view that most of the DNA in introns is junk. The splice sites and the minumum amount of DNA required to form a loop suggest that only 50 bp in each intron is functional DNA. (pp. 151-152)
[Are introns mostly junk?] [Are splice variants functional or noise?]
   Box: Yeast loses its introns
Yeast has lost most of its introns since it diverged from other fungi. Most of the rest can be deleted without causing any decrease in fitness but a few seem to be essential. More that 98% of the introns in yeast are dispensible, confirming the idea that introns are mostly junk. (pp. 153-154)
[Yeast loses its introns]
Alternative splicing: common or rare?
One way to solve the Deflated Ego Problem is to assume that human genes can make many different proteins by an alternative splicing mechanism. There are many real examples of biologically relevant alternative splicing. (pp. 154-156)
[Debating alternative splicing (Part I)] [Debating alternative splicing (Part II)] [Debating alternative splicing (Part III)] [Debating alternative splicing (Part IV)]
How does alternative splicing work?
Biologically relevant alternative splicing occurs when splicing factors alter the activity of the spliceosome. Splicing errors are common and mispliced transcripts (junk RNA) are easily detectable and entered into the transcript databases. (pp. 156-160)
Splicing errors are the best explanation
It's relatively easy to identify most splicing errors and eliminate those transcripts from the annotated reference genome. The vast majority of splice variants fall into the splicing errors category. (pp. 160-163)
[Splicing errors or alternative splicing?] [Alternative splicing and evolution] [Using conservation to determine whether splice variants are functional] [Splice variants of the human triose phosphate isomerase gene: is alternative splicing real?]
The case for splicing errors
There are 4 good reasons for concluding that true alternative splicing is confined to less than 5% of human protein-coding genes. (pp. 163)
[The frequency of splicing errors reflects the balance between selection and drift]
The controversy and how it’s reported
The controversy over the abundance of real alternative splicing is mostly ignored in the scientific literature and in the popular press. It is widely assumed that almost all human genes are alternatively spliced. (p. 164-165)
[Alternative splicing: function vs noise] [The persistent myth of alternative splicing] [The textbook view of alternative splicing] [The proteome complexity myth]
   Box: The false logic of the argument for complexity
If alternative splicing is going to solve the Defalted Ego Problem then it must distinguish humans from other species. But all species produce abundant transcripts due to splicing errors so humans are no different than nematodes or flowering plants. (pp. 166-167)
[Alternative splicing in the nematode C. elegans]
Alternative splicing and disease
Genetic diseases can be caused by errors in splicing. Their widespread occurance is taken to be proof that alternative splicing is ubiquitous, but disease-causing splice errors can also occur in junk DNA. (pp. 167-169)
Notes for Chapter 6 (pp. 324-327)

Chaper 5: The Big Picture

Introduction

DNA sequencing and assembly. Cost of sequencing. (pp. 116-118)

A typical gene

DNA sequences are depositied in GenBank. The gene for triose phosphate isomerase (TPI1) is a typical gene. Decoding a protein-coding gene. (pp. 118-122)

Annotators interpret the genome

Human annotators must interpret the DNA sequence. (pp. 122-123)
[ Contaminated genome sequences]

How much of the genome has been sequenced?
About 95% of the genome has been sequenced in the standard reference genome. The rest is estimated from the size of the gaps giving a total of 3.1 Gb. The complete telomere-telomere sequence of T2T-CHM13 is also 3.1 Gb. (pp. 123-125)
[Karen Miga and the telomere-to-telomere consortium] [A complete human genome sequence (2022)] [What do we do with two different human genome reference sequences?] [How big is the human genome (2023)?]
Whose genome was sequenced?
The Celera sequence was mostly Craig Venter's genome. The IHGP standard reference genome was originally a composite of several difference individuals from Buffalo (New York, USA). (pp. 125-126)
How many genes?

The original genome sequence predicted 30,000-40,000 protein-coding genes but that number has dropped to about 20,000 in the current standard reference genome. There are about 5,000 noncoding genes but this number is disputed. Introns take up most of a protein-coding gene and introns are mostly junk DNA. (pp. 126-128)
[Are introns mostly junk?]

Pseudogenes
There are abot 15,000 pseudogenes derived from protein-coding genes. The number derived from noncoding genes is not known. Pseudogenes account for about 5% of the genome. (p. 128)
Regulatory sequences
If we assume about 200 bp of regulatory sequence for each gene then regulatory sequences account for less than 0.2% of your genome. Many scientists believe this number should be much higher. (pp. 128-129)
Origins of replication
There are about 30,000-50, 000 functioning origins of replication accounting for <0.3% of your genome. (pp. 129-130)
Centromeres
About 1% of your genome is occupied by centromeres. (p. 130)
[Centromere DNA] [Minimum Centromere Size in Plants]
Telomeres
Telomere sequences are about 0.1%. (pp. 130-131)
[Telomeres]
Scaffold Attachment regions (SARs)
SARs are required for chromatin organizaton and it's not clear how much DNA sequence is required. Assuming 100,000 loops and 100 bp of SAR per loop gives 0.3% of the genome. (p. 131)
Transposons
About 55% of the genome contains transposon-related and virus-related sequences. They are scattered throughout the genome including within introns. (pp. 131-132)
Viruses
Defective viruses take up about 9% of the genome and functional, dormant, viruses account for less than 0.1%. (p. 132)
Mitochondrial DNA
Less than 0.01% of your genome is occupied by mitochondrial DNA fragments. (p. 132)
How much of our genome is functional?
Adding up all the known functional sequences gives a value of about 4% functional. The actual amount is probably closer to 8-10% based on sequence conservation. The total amount of presumed junk DNA comes to 89%. About 90% of your genome is junk. (pp. 132-133)
[The 20th anniversary of the human genome sequence: 4. Functional DNA in our genome]
What is junk DNA?
Junk DNA is DNA that can be deleted without reducing the fitness of the individual. The debate is not whether junk DNA exists (it does) but over the amount of junk DNA. Opponents of junk DNA think that it would have been eliminated by natural selection if it were really junk. This is a common view in the popular press and even in the scientific literature. My vew is that genomes are sloppy and natural selection isn't capable of purging junk DNA in species with large genomes. (pp. 133-135)
[Identifying functional DNA (and junk) by purifying selection]
Notes for Chapter 5 (p. 324)

Wednesday, February 1, 2023

Chapter 4: Why Don't Mutations Kill Us?

Introduction
Gregor Mendel and mutations. Spontaneous mutations. Rate of mutation. (pp. 82-83)
[Mutation, Randomness, & Evolution]
Why aren’t we extinct? - a 100-year old problem
History of mutation load (genetic load). Prediction of 30,000 genes. (pp. 83-84)
[What Is a Mutation?] [Genetic Load, Neutral Theory, and Junk DNA]
Biochemical mutation rate
Knowing the overall error rate of DNA replication (10-10 mutations per base pair) and the number of cell divisions in the germ line gives an average of 138 new mutations per generation. (pp. 84-85)
[Parental age and the human mutation rate ] [Estimating the Human Mutation Rate: Biochemical Method] [Human Y Chromosome Mutation Rates] [Mutation Rates]
Phylogenetic mutation rate
If you know the number of generations since the time of a common ancestor then you can calculate a mutation rate by looking at sequences that are evolving at the neutral rate. (pp. 85-86)
[Estimating the Human Mutation Rate: Phylogenetic Method] [Calculating time of divergence using genome sequences and mutation rates (humans vs other apes)]
   Box: Tick, tock, the molecular clock (p. 87)
   [The Modern Molecular Clock] [Can some genomes evolve more slowly than others?]
   [Reading the Entrails of Chickens] [Calibrating the Molecular Clock]
The direct method of calculating mutation rate
Comparing the sequences of a child and both parents gives you the number of new mutations per generation. (p. 88)
[Direct Measurement of Human Mutation Rate] [Parental age and the human mutation rate] [Estimating the Human Mutation Rate: Direct Method] [Human Mutation Rates] [Human mutation rates - what's the right number?] [Somatic cell mutation rate in humans]
You are not Craig Venter
Craig Venter's genome sequence was the first one to include all 46 chromosomes separately. The amount of heterogeneity in human genomes means that no two individuals are alike. (pp. 89-90)
[What happens when twins get their DNA tested?] [Genetic variation in the human population] [Genetic variation and the complete human genome sequence] [Sequencing both copies of your diploid genome] [Sequencing human diploid genomes] [All about Craig]
Revisiting the genetic load argument
Given the mutation rate and the probability of deleterious mutations, only a small percenage of the human genome can be susceptible to mutation or our species would go extinct. (pp. 90-94)
[Revisiting the genetic load argument with Dan Graur]
   Box: Human gene knockouts (pp. 92-93)
How much of our genome is conserved?
About 8-10% of the DNA sequences in the human genome are conserved in other species. (pp. 94-95)
Defining function
The best definition of function is the maintenance definition that relies on purifying selection. Functional DNA is any stretch of DNA whose deletion from the genome would reduce the fitness of the individual. (pp. 96-98)
[Identifying functional DNA (and junk) by purifying selection] [On the Meaning of the Word "Function"] [The Function Wars: Part I] [The Function Wars: Part II] [The Function Wars: Part III] [The Function Wars: Part IV] [Restarting the function wars (The Function Wars Part V)] [The Function Wars Part VI: The problem with selected effect function] [The Function Wars Part VII: Function monism vs function pluralism] [The Function Wars Part VIII: Selected effect function and de novo genes] [The Function Wars Part IX: Stefan Linquist on Causal Role vs Selected Effect] [The Function Wars Part X: "Spam DNA"?]
   Box: Levels of selection (pp. 99-101)
   [The Function Wars Part XIII: Ford Doolittle writes about transposons and levels of selection]
Why is the evidence of sequence conservation so hard to accept?
There are several arguments against sequence conservation as an indicator of function. (pp. 101-103)
   Box: Deleting DNA to prove that it is junk (pp. 104-105)
Bulk DNA hypotheses
Skeletal DNA hypotheses. The bodyguard hypothesis. Genetic diversity. (pp. 105-110)
[Teaching about genomes using Nessa Carey's book: Junk DNA]
Medical relevance
Medical relevance is a weak argument for function because mutations in junk DNA can cause genetic diseases. (pp. 110-112)
[Junk DNA vs noncoding]
Ignoring history
Opponents of junk DNA have propagated a false narrative about the history of junk DNA by claiming that scientists in the late 1960s and early 1970s thought that all noncoding DNA was junk. (pp. 112-115)
[The "standard" view of junk DNA is completely wrong] [Junk DNA vs noncoding DNA] [The surprising (?) conservation of noncoding DNA] [More misconceptions about junk DNA - what are we doing wrong?] [Alan McHughen defends his views on junk DNA] [A University of Chicago history graduate student's perspective on junk DNA] [Nature journalist is confused about noncoding RNAs and junk] [What is the dominant view of junk DNA?]
Notes for Chapter 4 (pp. 321-324)

Monday, August 29, 2022

Chapter 3: Repetitive DNA and Mobile Genetic Elements

Introduction
Half of our genome is composed of highly repetitive DNA and moderately repetitive DNA. Satellite DNA. C0t curves. (pp. 57-58)
[Transcription activity in repeat regions of the human genome]
Centromeres
Centromeres contain highly repetitive DNA. (p. 58)
[The structures of centromeres]
Telomeres

Telomeres at the ends of chromosomes contain repetitive DNA. (pp. 58-59)

   Box: Dead centromeres and telomeres (pp. 59-60)

Short tandem repeats (STRs)

Short tandem repeats (STRs) are short stretches of repetitive DNA. (p. 60)

   Box: DNA fingerprints (pp. 60-61)

Mobile genetic elements
Moderately repetitive DNA consists of interspersed copies of viruses and transposons. (p. 61)
Hidden viruses in your genome
The human genome contains copies of DNA viruses and RNA viruses. Most of them are due to ancient insertions and the viral genomes have acquired inactivating mutations. Many virus-related sequences are just fragments of the original virus genome. (pp. 61-65)
What do we need to know about transposons?
The two main tpes of transposons are DNA transposons and RNA transposons (retrotransposons). (pp. 65-67)
LINES and SINES
Long interspersed elements (LINEs) are transposons that carry a gene for reverse transcriptase. Most LINE-related sequences are degenerate versions of a once-active transposons. Short interspersed elements are derived from small noncoding genes and they require exogenous reverse transcriptase to propagate. Alu elements are one example of a SINE and there are more than one million copies in the human genome. (pp. 67-70)
How much of our genome is composed of transposon-related sequences?
Most of the transposon-related sequences are inactive fragments of the original transposons. It's diffficult to get a precise estimate of the total amount of transposon-related sequences but it's probably at least 50% of the human genome.(pp. 70-72)

   BOX: What does the humped bladderwort tell us about junk DNA? (p. 72)

Selfish genes and selfish DNA
Selfish DNA refers to DNA sequences that can propagate by themselves within the genome. (p. 73)
[Junk DNA and selfish DNA] [The selfish gene vs the lucky allele]
Exaptation versus the post hoc fallacy
Some transposon-related sequences have secondarily acquired a function that contributes to the fitness of the organism. This is an example of exaptation. Some scientists believe that transposon-related sequences are retained in order to serve as a reservoir for future exaptation but this argment is related to a logical fallacy called the post hoc fallacy. (pp. 73-78)
[Peter Larsen: "There is no such thing as 'junk DNA'"]
Mitochondria are invading your genome!
The human genome contains fragments of mitochondrial DNA that have recently been incorprated by accident. (pp. 78-79)
[How much mitochondrial DNA in your genome?]
On the origin of junk DNA

A lot of junk DNA originates from ancient insertions of transposons and their subsequent degeneration by acquiring mutations. (pp. 79-80)

If it walks like a duck ...

Transposons look like junk, behave like junk, and evolve like junk, so let's just call them junk. (pp. 80-81)

Notes for Chapter 3 (pp. 320-321)

Thursday, August 11, 2022

Chapter 1: Introducing Genomes

Introduction
The discovery of DNA structure and the structure of nuleotides. Defining the 5′ and 3′ ends. (pp. 7-9)
The double helix
The structure of polynucleotides and the double helix. Base pairs, stacking interactions, and hydrogen bonds. (pp. 9-13)
The goal of the human genome project was to sequence all of the base pairs
Writing DNA sequences. (pp. 13-14)
Prokaryotes and eukaryotes
Differences between prokaryotes and eukaryotes. Bacteria vs prokaryotes. The Age of Bacteria. "Higher" vs "lower." (pp. 14-16)
[We live in the age of bacteria]
How big is your genome?

Historical estimates of the weight of DNA (3.5 pg). Calculating the number of base pairs (3.2 × 109 bp). Length of the genome. (pp. 16-18)
[How big is the human genome (2023)?] [Genome size confusion]

Packaging DNA: nucleosomes and chromatin
Histones, core particles, nucleosomes, and chromatin. Heterochromatin and euchromatin. Sequencing the euchromatic genome. (pp. 19-21)
Transcription
"A gene is a DNA sequence that's transcribed to produce a functional product." Transcription initiation, elongaton, termination. (pp. 21-24)
Translation
Messenger RNA. Gene orientation: template strand, coding strand. Initiation, elongation (peptide bond), termination. (pp. 24-27)
The genetic code
Aminacylated tRNA. Standard genetic code. (pp. 27-29)
Introns and exons
Protein-coding genes, noncoding genes. RNA processing, splicing, spliceosome. (pp. 29-32)
Notes for Chapter 1 (pp. 317-318)

Chapter 2: The Evolution of Sloppy Genomes

Introduction
Pufferfish, lungfish, frogs, and the C-Value Paradox. (pp. 33-34)
The complexity of genomes
Reassociation kinetics (C0t curves). Highly repetitive DNA, moderately repetitive DNA, unique sequence DNA. (pp. 34-35)
Variation in genome size
Junk DNA explains the variations in genome size. The C-Value Enigma. You don't need new genes to explain complexity. (pp. 35-37)
Instantaneous genome doubling
Polyploidy. Brassica species. Organisms can tolerate extra DNA. (pp. 38-39)
The Onion Test

The Onion test is a way of testing your junk DNA hypotheses. (pp. 39-40)
[The Onion Test]

Modern evolutionary theory

Evolution is a change in the frequency of alleles in a population. Adaptation, fixation, postive selection, negative selection, purifying selection. (pp. 40-41)
[Is the Modern Synthesis effectively dead?] [Kevin Laland's view of "modern" evolutionary theory (again)]

Random genetic drift
Beanbag genetics and population genetics. Fixation by random genetic drift. (pp. 41-43)
[On the importance of random genetic drift in modern evolutionary theory] [Evolution by chance] [The role of chance in evolution] [One philosopher's view of random genetic drift] [A philosopher's view of random genetic drift]
Neutral Theory

Kimura and the promotion of the neutral theory. Random genetic drift as a major cause of evolution. (pp. 43-44)
[Celebrating 50 years of Neutral Theory]

Nearly neutral Theory
Ohta and the nearly neutral Theory. The fixation of slightly deleterious alleles.(pp. 44-45)
Population genetics

Population size and selection coefficients. Probability of fixation. Population size and the fixation of slightly deleterious alleles. Drift-Barrier Hypothesis. (pp. 45-49)
[Learning about modern evolutionary theory: the drift-barrier hypothesis]

   Box: Are humans are still evolving? (pp. 49-50)

On the evolution of sloppy genomes

Insertions, deletions, and random genetic drift. (pp. 50-54)
[Evolution by Accident]

   Box: Chromosome dynamics (p. 54)
   [Segmental duplications in the human genome]

Bacteria have small genomes
Why do bacteria have small genomes? (pp. 54-56)

Notes for Chapter 2 (pp. 318-320)