: pseudogenes

Saturday, February 4, 2023

Chapter 7: Gene Families and the Birth and Death of Genes

Introduction

The histone gene family. Definition of gene family. Pseudogenes. (p. 170-171)

The birth and death of genes

As genome evolve, new genes are born and old genes die. "Birth & death evolution" was mainly developed and promoted by Masatochi Nei beginning in the early 1970s. Many new genes arise by gene duplication but most of them become pseudogenes within a few million years. Some evolve new functions by subfunctionalizaton or neofunctionalization. (pp. 172-174)
[On the evolution of duplicated genes: subfunctionalization vs neofunctionalization]

Box: The smell of sweat (pp. 174-175)

Gene duplication and mutationism

Gene duplication is due mostly to errors in recombination. This is a subset of segmental duplication and it leads to genome expansion. The creation of new genes by mutation is a key aspect of mutationism. (p. 175-177)
[Mutation, Randomness, & Evolution] [Replaying life's tape] [What is "structuralism"?] [Reactionary fringe meets mutation-biased adaptation: Introduction]

Whole genome duplications and the fate of genes

Polyploidization and hybridization give rise to species with twice as much DNA. The fate of that extra DNA, especially extra genes, can be tracked over time. It looks like the extra DNA is another example of junk DNA, lending support to the idea that species can tolerate large amounts of nonfunctional DNA. (pp. 177-179)
[The birth and death of salmon genes] [Birth and death of genes in a hybrid frog genome]

Box: Real orphans in the human genome

Completely new genes, de novo genes, are rare but there are genuine examples of genes that are unique in the human genome (ORFans). They arise by gene duplication and they are often polymophic. (p. 180)

Different kinds of pseudogenes

There are four different kinds of pseudogenes: death of a duplicated gene, processed, unitary, and polymorphic. The human genome has about 15,000 pseudogenes (5% of the genome) and almost all of them are junk. The fixation of a pseudogene involves two steps; mutation and fixation by random genetic drift. Pseudogenes can become unrecognizable after 100 million years. (pp. 181-184)
[Is the high frequency of blood type O in native Americans due to random genetic drift?]

Box: Conserved pseudogenes and Ken Miller's argument against intelligent design

The presence of a conserved pseudogene in the beta globin gene cluster in chimpanzee and human genomes is difficult to explain by intelligent design. The fact that a small segment of the beta-globin pseudogene contains a SAR sequence is irrelevant to the main argument. (pp. 185-186)

Are they really pseudogenes?

Pseudogenes are broken genes and they are junk by any reasonable definition (see "If It Walks Like a Duck" in chapter 3). Some scientists who are opposed to junk DNA have claimed that most pseudogenes must be functional based on the fact that a tiny nunmber have secondarily acquired a functon. This is an example of cherry picking. (p. 186-188)
[Are pseudogenes really pseudogenes?]

Box: The short legs of dachhunds (p. 188-189)

How accurate is the genome sequence?

The accuracy of DNA sequencing methods is approaching 99.99%. If that is coupled to 30x coverage, the overall accuracy is good enough to reliably distinguish between functional genes and pseudogenes. You also need a reliable sequence of your personal genome if you are going to make decisions about your health. (pp. 189-191)

Notes for Chapter 7 (pp. 327-328)

Friday, February 3, 2023

Chaper 5: The Big Picture

Introduction

DNA sequencing and assembly. Cost of sequencing. (pp. 116-118)

A typical gene

DNA sequences are depositied in GenBank. The gene for triose phosphate isomerase (TPI1) is a typical gene. Decoding a protein-coding gene. (pp. 118-122)

Annotators interpret the genome

Human annotators must interpret the DNA sequence. (pp. 122-123)
[ Contaminated genome sequences]

How much of the genome has been sequenced?

About 95% of the genome has been sequenced in the standard reference genome. The rest is estimated from the size of the gaps giving a total of 3.1 Gb. The complete telomere-telomere sequence of T2T-CHM13 is also 3.1 Gb. (pp. 123-125)
[Karen Miga and the telomere-to-telomere consortium] [A complete human genome sequence (2022)] [What do we do with two different human genome reference sequences?] [How big is the human genome (2023)?]

Whose genome was sequenced?

The Celera sequence was mostly Craig Venter's genome. The IHGP standard reference genome was originally a composite of several difference individuals from Buffalo (New York, USA). (pp. 125-126)

How many genes?

The original genome sequence predicted 30,000-40,000 protein-coding genes but that number has dropped to about 20,000 in the current standard reference genome. There are about 5,000 noncoding genes but this number is disputed. Introns take up most of a protein-coding gene and introns are mostly junk DNA. (pp. 126-128)
[Are introns mostly junk?]

Pseudogenes

There are abot 15,000 pseudogenes derived from protein-coding genes. The number derived from noncoding genes is not known. Pseudogenes account for about 5% of the genome. (p. 128)

Regulatory sequences

If we assume about 200 bp of regulatory sequence for each gene then regulatory sequences account for less than 0.2% of your genome. Many scientists believe this number should be much higher. (pp. 128-129)

Origins of replication

There are about 30,000-50, 000 functioning origins of replication accounting for <0.3% of your genome. (pp. 129-130)

Centromeres

About 1% of your genome is occupied by centromeres. (p. 130)
[Centromere DNA] [Minimum Centromere Size in Plants]

Telomeres

Telomere sequences are about 0.1%. (pp. 130-131)
[Telomeres]

Scaffold Attachment regions (SARs)

SARs are required for chromatin organizaton and it's not clear how much DNA sequence is required. Assuming 100,000 loops and 100 bp of SAR per loop gives 0.3% of the genome. (p. 131)

Transposons

About 55% of the genome contains transposon-related and virus-related sequences. They are scattered throughout the genome including within introns. (pp. 131-132)

Viruses

Defective viruses take up about 9% of the genome and functional, dormant, viruses account for less than 0.1%. (p. 132)

Mitochondrial DNA

Less than 0.01% of your genome is occupied by mitochondrial DNA fragments. (p. 132)

How much of our genome is functional?

Adding up all the known functional sequences gives a value of about 4% functional. The actual amount is probably closer to 8-10% based on sequence conservation. The total amount of presumed junk DNA comes to 89%. About 90% of your genome is junk. (pp. 132-133)
[The 20th anniversary of the human genome sequence: 4. Functional DNA in our genome]

What is junk DNA?

Junk DNA is DNA that can be deleted without reducing the fitness of the individual. The debate is not whether junk DNA exists (it does) but over the amount of junk DNA. Opponents of junk DNA think that it would have been eliminated by natural selection if it were really junk. This is a common view in the popular press and even in the scientific literature. My vew is that genomes are sloppy and natural selection isn't capable of purging junk DNA in species with large genomes. (pp. 133-135)
[Identifying functional DNA (and junk) by purifying selection]

Notes for Chapter 5 (p. 324)

Pages

Saturday, February 4, 2023

Chapter 7: Gene Families and the Birth and Death of Genes

Friday, February 3, 2023

Chaper 5: The Big Picture