Showing posts with label junk DNA. Show all posts
Showing posts with label junk DNA. Show all posts

Tuesday, May 2, 2023

Prologue

Introduction

It is humbling for me and awe-inspiring to realize that we have caught the first glimpse of our own instruction book, previously known only to God.

                                                                                                Francis Collins (2000)

Those were the words of Francis Collins when the President of the United States, Bill Clinton, announced the completion of the human genome sequence on June 26, 2000. But in spite of what Collins said there were a great many people besides God who had a pretty good idea of what was in your genome. Knowledgeable experts had been predicting for the past 30 years that the human genome would contain about 30,000 genes and lots of other functional regions. They estimated that the human genome was about 90% junk.

The publication of the human genome sequence proved that those knowledgeable experts were correct. (pp. 1-5)

The junk DNA wars
Most scientists were reluctant to believe the experts and they developed all sorts of hypotheses and speculations to avoid accepting the evidence that most of our genome is junk. This spawned the junk DNA wars that continue to this day.

"In this book I will attempt to show you that the concept of junk DNA is compatible with all the evidence, consistent with our understanding of evolution and population genetics, and possess extraordinary explanatory power. It helps us make sense of biology. I will also show you that all the arguments against junk DNA are incompatible with our present understanding of molecular biology, incompatible with evolution, and lack explanatory power. They do not make sense." (pp. 5-6)

Chapter 11: Zen and the Art of Coping with a Sloppy Genome

Introduction
The title comes from Zen and the Art of Motorcycle Maintenance, one of the most popular philosophy books of all time. The theme of my book is very different; it's about the idea that life at the molecular level is very messy and error-prone and looks nothing like a well-constructed Swiss watch. The author of Zen writes a number of short essays called Chautauquas and I'm going to close this book with a few of my own. (pp. 297-298)
The limitations of genomics
Genomics focuses on a global analysis of the entire genome rather than on specific genes. Genomic studies collect large amounts of data that may be useful in uncovering new features and in forming new hypotheses but those hypotheses still need to be tested at the level of individual genes. Genomics workers often believe they have discovered novel features of the genome that overthrow old ideas—features such as tens of thousands of noncoding genes, abundant alternative splicing, and huge amounts of regulatory sequence—but they have only discovered data that may or may not point to new features pending closer analysis. (pp. 299-302)
The function wars
The ENCODE publicity campaign kicked off an extended discussion about the meaning of the word function—a discussion that Alex Palazzo calls the 'function wars.' The new function wars drew in philosophers who have been debating the meaning of function for many decades. The wars are over and the most reasonable definition of molecular function is the maintenance definition that describes functional DNA as DNA that is currently maintained by natural selection (purifying selection.) (pp. 302-307)
[ENCODE and their current definition of "function"] [Identifying functional DNA (and junk) by purifying selection] [The Function Wars Part IX: Stefan Linquist on Causal Role vs Selected Effect] [The Function Wars Part VIII: Selected effect function and de novo genes] [The Function Wars Part VII: Function monism vs function pluralism] [The function wars are over] [Philosophers talking about genes] [When philosophers write about evolution] [When philosophers talk about genomes]

Scientific revolutions
The scientific literature and the popular press are full of reports of scientific revolutions that have just overthrown some old paradigm causing the textbooks to be rewritten. That's not how science really works. Most scientific revolutions develop slowly over a period of many years as more and more data causes us to revise our old ideas. Many of the so-called revolutions reported in the popular press are actually paradigm shafts, not paradigm shifts. The concept of junk DNA refers to a real revolution in our thinking about genomes. It developed over many years in the 1960s and 70s but it failed to convince most biologists. All of the announcements about disproving junk DNA are fake revolutions and paradigm shafts. (pp. 307-311)
[ENCODE and their current definition of "function"] [University press releases are a major source of science misinformation] [Press release from the Francis Crick Institute misrepresents junk DNA]
No comfort for Intelligent Design Creationists
Intelligent Design Creationists have been predicting for years that most of our genome would turn out to be functional. They interpret recent results to be a vindication of their prediction. I hope I've demonstrated that they are wrong. (pp. 311-312)
[Religion vs science (junk DNA): a blast from the past] [Stephen Meyer "predicts" there's no junk DNA] [Do Intelligent Design Creationists still think junk DNA refutes ID?] [You need to understand biology if you are going to debate an Intelligent Design Creationist]
Scientific controversies
There is a genuine controversy over the amount of junk DNA in the human genome but the existence of this controversy is hidden from the general public because most scientists ignore it. Why do most scientists refuse to even consider the idea that our genome could be full of junk? I outline four reasons for this behavior. (pp. 321-315)
[Scientists say "sloppy science" more serious than fraud]
Coping with a sloppy genome
I hope I've convinced you that it's possible to live with the idea that 90% of our genome is junk. (p. 315)

Notes for Chapter 11 (pp. 333-334)

References (pp. 335-358)

Index (pp. 359-372)

Saturday, February 18, 2023

Chapter 10: Turning Genes On and Off

Introduction
Francis Collins, and many others, believe that the concept of junk DNA is outmoded because recent discoveries have shown that most of the human genome is devoted to regulation. This is part of a clash of worldviews where one side sees the genome as analogous to a finely tuned Swiss watch with no room for junk and the other sees the genome as a sloppy entity that's just good enough to survive. (pp. 264-266)
What is regulation?
Regulation refers to gene expression that can be modified according to environmental conditions. (pp. 266-267)
[Protein concentrations in E. coli are mostly controlled at the level of transcription initiation]
Stochastic gene expression
The rate of transcription of a gene can vary from cell to cell due to the stochastic nature of transcription factor binding and the initiation of RNA synthesis. This is not regulation. (pp. 267-268)
What do we know about regulatory sequences?
Most transcription factors bind within a few hundred base pairs of the promoter. With a few exceptions, regulatory sequences are found in close proximity to the 5′ end of the gene. (pp. 268-270)
[Are multiple transcription start sites functional or mistakes?] [How many enhancers in the human genome?] [Are most transcription factor binding sites functional?] [The Encyclopedia of Evolutionary Biology revisits junk DNA]

   Box: The making of a queen by regulating gene expression (p. 270)

Regulation and evolution
One way of resolving the Deflated Ego Problem is to speculate that humans have a much more sophisticated regulatory network than other "lower" species. This is consistent with evo-devo, which postulates that the differences between species is due more to differences in regulating gene expression that differences in the number of genes. However, most of the results from evo-devo suggest that the differences in regulation are due to very small changes in the binding of existing transcription factors and not huge changes in genome organization. (pp. 271-273)

   Box: Can complex regulation evolve by acccident? (pp. 273-274)

Regulating gene expression by rearranging the genome
There are some well-studied examples of regulation that are connected to rearranging the genome by recombination. (pp. 275-277)
Open and closed domains
DNA is more accessible to transcription factors when nucleosomes are loosely organized in an open domain. Gene expression is repressed when the gene is embedded in a highly structured closed domain (heterochromatin). The transition from closed to open domains is coupled to demthylation of DNA and modification of histones. The DNase I sensitivity of DNA in an open domain is correlated with transcription activity. The spontaneous "breathing" of heterochromatic regions allows transcription actors to bind. (pp. 277-279)
[Epigenetic markers in the last 8% of the human genome sequence] [Chromatin organization at promoters in yeast cells]

   Box: X-chromosome inactivation (pp. 279-280)
   [Escape from X chromosome inactivation

The recruitment model of gene expression
The recruitment model of gene expression says that the binding of transcription factors triggers the demethylation of DNA and the modifiction of histone proteins to maintain an open domain. The histone code model is often connected to the belief that DNA demethylation and histone modification are the key events in regulation and not epiphenomena. This view is usually associated with a strong belief in the importance of epigenetics. (pp. 281-282
ENCODE promotes regulation
ENCODE researchers postulate that at least 20% of the genome is required for regulation and there are dozens of transcription factor binding sites for each gene. According to this view, this sophisticaed regulation explains why complex humans can exist with the same number of genes as most other species. (pp. 283-285)
[ENCODE's false claims about the number of regulatory sites per gene]
Does regulation explain junk? How can we test the hypothesis?
There are very few known examples of human genes with the complicated regulatory mechanisms promoted by the ENCODE leaders. The few published genomics tests of the hypothesis do not support it. What's missing is the random genome project in order to emphasize the importance of a negative control. (pp. 285-287)
[How much of the human genome is devoted to regulation?]

   Box: A thought experiment (pp. 287-288)

3D chromosomes
One possibility is that a lot of extra DNA is required in humans in order to organize genes into large functional loops of chromatin. This idea has been promoted by several scientists, including Emile Zuckerkandl. (pp. 288-291)
What the heck is epigenetics?
The most useful definition of epigenetics is the Holliday definition that restricts the term to changes that could be inherited by daughter cells following cell division. Proponents of epigenetics claim that chromatin markers can be passed from generation to generation in humans and they determine whether a gene will be expressed or silenced. There is no known mechanism for passing such markers from somatic cells to the germ line. (pp. 292-293)
[What the heck is epigenetics?] [Nessa Carey talks about epigenetics] [What do believers in epigenetics think about junk DNA?]
Restriction/modification and the inheritance of methylated nucleotides
The restriction/modification system in bacteria is a good example of how methylation signals can be passed to daughter cells following cell division but it does not explain epigenetics. There is a lot of hype associated with epigenetics and much of it is unjustified.(pp. 293-296)

Notes for Chapter 10 (pp. 331-333)

Chapter 9: The ENCODE Publicity Campaign

Introduction
On Sept. 5, 2012 Nature published a number of papers by the ENCODE Consortium. (The papers were rejected by Science.) The main summary paper announced that 80% of the human genome has a function and many of the ENCODE leaders pronounced the death of junk DNA. (pp. 238-141)
[The 10th anniversary of the ENCODE publicity campaign fiasco]
ENCODE results
The main results were that the human genome has 20,687 protein-coding genes and 18,451 noncoding genes. About 62% of the genome is transcribed. There are 636,336 binding sites for the 120 transcription factors they examined and these cover 231 million bp or 8.1% of the genome. The researchers identified more than 5 million open chromatin domains accounting for about 40% of the genome. If you add up all the biochemcally active DNA it somes to 80.4% of the genome. (pp. 241-244)
The ENCODE publicity campaign
The papers that appeared in the Sept. 5th edition of Nature were accompanied by a massive publicity campaign orgnanized by the editors at Nature. There were press releases from the univesities and government research centers that were involved in the project. The main message was that 80% of the genome is functional and the idea of junk DNA has been refuted. The de facto ENCODE leader, Ewan Birney, was hailed as a "Big Talker." (pp. 244-246)
[The ENCODE publicity campaign of 2007]
Criticisms of ENCODE
The blogosphere, Twitter, and Facebook erupted immediately with posts criticising ENCODE for misleading the public about the meaning of function and pointing out that junk DNA is alive and well. Brendan Maher, the feature editor for Nature realized the next day (Sept. 6) that they had a problem and he announced that the main purpose of the publicity capmaign was to create "the biggest splash possible" to promote results that usually don't get much attention in the popular press. He conceded that the claim of 80% functional might have been an exaggeration. Over the next couple of years a number of papers critical of the ENCODE claim have been published in the scientific literature. I have never seen such a strong and rapid criticism of papers published by leading scientists in a well-respected journal like Nature (pp. 247-254)
Science journal doubles down
In December 2012 Science listed the ENCODE results as one of the breakthroughs of the year. Although it ackonwledged the controversy, it still reported that 80% of the human genome is functional. (pp. 254-255)
ENCODE backpedals
In 2014, the ENCODE researchers partially retracted their claims about function and announced that the main purpose of ENCODE is to map all the spurious transcripts and spurious transcription factor biding sites in order to provide a resource for the community of scientists. (pp. 255-260)
[ENCODE and their current definition of "function"] [The Function Wars Part XII: Revising history and defending ENCODE] [Manolis Kellis dismisses junk DNA] [What did ENCODE researchers say on Reddit?] [Tim Minchin's "Storm," the animated movie, and another no-so-good Minchin cartoon]
ENCODE III
ENCODE III said in 2020 that there are 20,225 protein-coding genes and 37,595 noncoding genes. There are now 2,157,387 open chromatin domains and 1,224,154 transcription factor binding sites. ENCODE III made no claims about function. (pp. 260-261)
What went wrong?
ENCODE failed to consider the null hypothesis of no function. The researchers failed to acknolwedge the critisisms of their claims back in 2007 and they failed to take into account alternative explanations of their data. This is not how science is supposed to work. (pp. 261-263)
[The 20th anniversary of the human genome sequence: 6. Nature doubles down on ENCODE results] [Style vs substance in science communication: The role of science writers in major science journals]

Notes for Chapter 9 (pp. 330-331)

Saturday, February 4, 2023

Chapter 7: Gene Families and the Birth and Death of Genes

Introduction

The histone gene family. Definition of gene family. Pseudogenes. (p. 170-171)

The birth and death of genes

As genome evolve, new genes are born and old genes die. "Birth & death evolution" was mainly developed and promoted by Masatochi Nei beginning in the early 1970s. Many new genes arise by gene duplication but most of them become pseudogenes within a few million years. Some evolve new functions by subfunctionalizaton or neofunctionalization. (pp. 172-174)
[On the evolution of duplicated genes: subfunctionalization vs neofunctionalization]

   Box: The smell of sweat (pp. 174-175)

Gene duplication and mutationism

Gene duplication is due mostly to errors in recombination. This is a subset of segmental duplication and it leads to genome expansion. The creation of new genes by mutation is a key aspect of mutationism. (p. 175-177)
[Mutation, Randomness, & Evolution] [Replaying life's tape] [What is "structuralism"?] [Reactionary fringe meets mutation-biased adaptation: Introduction]

Whole genome duplications and the fate of genes

Polyploidization and hybridization give rise to species with twice as much DNA. The fate of that extra DNA, especially extra genes, can be tracked over time. It looks like the extra DNA is another example of junk DNA, lending support to the idea that species can tolerate large amounts of nonfunctional DNA. (pp. 177-179)
[The birth and death of salmon genes] [Birth and death of genes in a hybrid frog genome]

   Box: Real orphans in the human genome

Completely new genes, de novo genes, are rare but there are genuine examples of genes that are unique in the human genome (ORFans). They arise by gene duplication and they are often polymophic. (p. 180)

Different kinds of pseudogenes

There are four different kinds of pseudogenes: death of a duplicated gene, processed, unitary, and polymorphic. The human genome has about 15,000 pseudogenes (5% of the genome) and almost all of them are junk. The fixation of a pseudogene involves two steps; mutation and fixation by random genetic drift. Pseudogenes can become unrecognizable after 100 million years. (pp. 181-184)
[Is the high frequency of blood type O in native Americans due to random genetic drift?]

   Box: Conserved pseudogenes and Ken Miller's argument against intelligent design

The presence of a conserved pseudogene in the beta globin gene cluster in chimpanzee and human genomes is difficult to explain by intelligent design. The fact that a small segment of the beta-globin pseudogene contains a SAR sequence is irrelevant to the main argument. (pp. 185-186)

Are they really pseudogenes?

Pseudogenes are broken genes and they are junk by any reasonable definition (see "If It Walks Like a Duck" in chapter 3). Some scientists who are opposed to junk DNA have claimed that most pseudogenes must be functional based on the fact that a tiny nunmber have secondarily acquired a functon. This is an example of cherry picking. (p. 186-188)
[Are pseudogenes really pseudogenes?]

   Box: The short legs of dachhunds (p. 188-189)

How accurate is the genome sequence?

The accuracy of DNA sequencing methods is approaching 99.99%. If that is coupled to 30x coverage, the overall accuracy is good enough to reliably distinguish between functional genes and pseudogenes. You also need a reliable sequence of your personal genome if you are going to make decisions about your health. (pp. 189-191)

Notes for Chapter 7 (pp. 327-328)

Friday, February 3, 2023

Chaper 5: The Big Picture

Introduction

DNA sequencing and assembly. Cost of sequencing. (pp. 116-118)

A typical gene

DNA sequences are depositied in GenBank. The gene for triose phosphate isomerase (TPI1) is a typical gene. Decoding a protein-coding gene. (pp. 118-122)

Annotators interpret the genome

Human annotators must interpret the DNA sequence. (pp. 122-123)
[ Contaminated genome sequences]

How much of the genome has been sequenced?
About 95% of the genome has been sequenced in the standard reference genome. The rest is estimated from the size of the gaps giving a total of 3.1 Gb. The complete telomere-telomere sequence of T2T-CHM13 is also 3.1 Gb. (pp. 123-125)
[Karen Miga and the telomere-to-telomere consortium] [A complete human genome sequence (2022)] [What do we do with two different human genome reference sequences?] [How big is the human genome (2023)?]
Whose genome was sequenced?
The Celera sequence was mostly Craig Venter's genome. The IHGP standard reference genome was originally a composite of several difference individuals from Buffalo (New York, USA). (pp. 125-126)
How many genes?

The original genome sequence predicted 30,000-40,000 protein-coding genes but that number has dropped to about 20,000 in the current standard reference genome. There are about 5,000 noncoding genes but this number is disputed. Introns take up most of a protein-coding gene and introns are mostly junk DNA. (pp. 126-128)
[Are introns mostly junk?]

Pseudogenes
There are abot 15,000 pseudogenes derived from protein-coding genes. The number derived from noncoding genes is not known. Pseudogenes account for about 5% of the genome. (p. 128)
Regulatory sequences
If we assume about 200 bp of regulatory sequence for each gene then regulatory sequences account for less than 0.2% of your genome. Many scientists believe this number should be much higher. (pp. 128-129)
Origins of replication
There are about 30,000-50, 000 functioning origins of replication accounting for <0.3% of your genome. (pp. 129-130)
Centromeres
About 1% of your genome is occupied by centromeres. (p. 130)
[Centromere DNA] [Minimum Centromere Size in Plants]
Telomeres
Telomere sequences are about 0.1%. (pp. 130-131)
[Telomeres]
Scaffold Attachment regions (SARs)
SARs are required for chromatin organizaton and it's not clear how much DNA sequence is required. Assuming 100,000 loops and 100 bp of SAR per loop gives 0.3% of the genome. (p. 131)
Transposons
About 55% of the genome contains transposon-related and virus-related sequences. They are scattered throughout the genome including within introns. (pp. 131-132)
Viruses
Defective viruses take up about 9% of the genome and functional, dormant, viruses account for less than 0.1%. (p. 132)
Mitochondrial DNA
Less than 0.01% of your genome is occupied by mitochondrial DNA fragments. (p. 132)
How much of our genome is functional?
Adding up all the known functional sequences gives a value of about 4% functional. The actual amount is probably closer to 8-10% based on sequence conservation. The total amount of presumed junk DNA comes to 89%. About 90% of your genome is junk. (pp. 132-133)
[The 20th anniversary of the human genome sequence: 4. Functional DNA in our genome]
What is junk DNA?
Junk DNA is DNA that can be deleted without reducing the fitness of the individual. The debate is not whether junk DNA exists (it does) but over the amount of junk DNA. Opponents of junk DNA think that it would have been eliminated by natural selection if it were really junk. This is a common view in the popular press and even in the scientific literature. My vew is that genomes are sloppy and natural selection isn't capable of purging junk DNA in species with large genomes. (pp. 133-135)
[Identifying functional DNA (and junk) by purifying selection]
Notes for Chapter 5 (p. 324)

Monday, August 29, 2022

Chapter 3: Repetitive DNA and Mobile Genetic Elements

Introduction
Half of our genome is composed of highly repetitive DNA and moderately repetitive DNA. Satellite DNA. C0t curves. (pp. 57-58)
[Transcription activity in repeat regions of the human genome]
Centromeres
Centromeres contain highly repetitive DNA. (p. 58)
[The structures of centromeres]
Telomeres

Telomeres at the ends of chromosomes contain repetitive DNA. (pp. 58-59)

   Box: Dead centromeres and telomeres (pp. 59-60)

Short tandem repeats (STRs)

Short tandem repeats (STRs) are short stretches of repetitive DNA. (p. 60)

   Box: DNA fingerprints (pp. 60-61)

Mobile genetic elements
Moderately repetitive DNA consists of interspersed copies of viruses and transposons. (p. 61)
Hidden viruses in your genome
The human genome contains copies of DNA viruses and RNA viruses. Most of them are due to ancient insertions and the viral genomes have acquired inactivating mutations. Many virus-related sequences are just fragments of the original virus genome. (pp. 61-65)
What do we need to know about transposons?
The two main tpes of transposons are DNA transposons and RNA transposons (retrotransposons). (pp. 65-67)
LINES and SINES
Long interspersed elements (LINEs) are transposons that carry a gene for reverse transcriptase. Most LINE-related sequences are degenerate versions of a once-active transposons. Short interspersed elements are derived from small noncoding genes and they require exogenous reverse transcriptase to propagate. Alu elements are one example of a SINE and there are more than one million copies in the human genome. (pp. 67-70)
How much of our genome is composed of transposon-related sequences?
Most of the transposon-related sequences are inactive fragments of the original transposons. It's diffficult to get a precise estimate of the total amount of transposon-related sequences but it's probably at least 50% of the human genome.(pp. 70-72)

   BOX: What does the humped bladderwort tell us about junk DNA? (p. 72)

Selfish genes and selfish DNA
Selfish DNA refers to DNA sequences that can propagate by themselves within the genome. (p. 73)
[Junk DNA and selfish DNA] [The selfish gene vs the lucky allele]
Exaptation versus the post hoc fallacy
Some transposon-related sequences have secondarily acquired a function that contributes to the fitness of the organism. This is an example of exaptation. Some scientists believe that transposon-related sequences are retained in order to serve as a reservoir for future exaptation but this argment is related to a logical fallacy called the post hoc fallacy. (pp. 73-78)
[Peter Larsen: "There is no such thing as 'junk DNA'"]
Mitochondria are invading your genome!
The human genome contains fragments of mitochondrial DNA that have recently been incorprated by accident. (pp. 78-79)
[How much mitochondrial DNA in your genome?]
On the origin of junk DNA

A lot of junk DNA originates from ancient insertions of transposons and their subsequent degeneration by acquiring mutations. (pp. 79-80)

If it walks like a duck ...

Transposons look like junk, behave like junk, and evolve like junk, so let's just call them junk. (pp. 80-81)

Notes for Chapter 3 (pp. 320-321)