Showing posts with label genomes. Show all posts
Showing posts with label genomes. Show all posts

Tuesday, May 2, 2023

Chapter 11: Zen and the Art of Coping with a Sloppy Genome

Introduction
The title comes from Zen and the Art of Motorcycle Maintenance, one of the most popular philosophy books of all time. The theme of my book is very different; it's about the idea that life at the molecular level is very messy and error-prone and looks nothing like a well-constructed Swiss watch. The author of Zen writes a number of short essays called Chautauquas and I'm going to close this book with a few of my own. (pp. 297-298)
The limitations of genomics
Genomics focuses on a global analysis of the entire genome rather than on specific genes. Genomic studies collect large amounts of data that may be useful in uncovering new features and in forming new hypotheses but those hypotheses still need to be tested at the level of individual genes. Genomics workers often believe they have discovered novel features of the genome that overthrow old ideas—features such as tens of thousands of noncoding genes, abundant alternative splicing, and huge amounts of regulatory sequence—but they have only discovered data that may or may not point to new features pending closer analysis. (pp. 299-302)
The function wars
The ENCODE publicity campaign kicked off an extended discussion about the meaning of the word function—a discussion that Alex Palazzo calls the 'function wars.' The new function wars drew in philosophers who have been debating the meaning of function for many decades. The wars are over and the most reasonable definition of molecular function is the maintenance definition that describes functional DNA as DNA that is currently maintained by natural selection (purifying selection.) (pp. 302-307)
[ENCODE and their current definition of "function"] [Identifying functional DNA (and junk) by purifying selection] [The Function Wars Part IX: Stefan Linquist on Causal Role vs Selected Effect] [The Function Wars Part VIII: Selected effect function and de novo genes] [The Function Wars Part VII: Function monism vs function pluralism] [The function wars are over] [Philosophers talking about genes] [When philosophers write about evolution] [When philosophers talk about genomes]

Scientific revolutions
The scientific literature and the popular press are full of reports of scientific revolutions that have just overthrown some old paradigm causing the textbooks to be rewritten. That's not how science really works. Most scientific revolutions develop slowly over a period of many years as more and more data causes us to revise our old ideas. Many of the so-called revolutions reported in the popular press are actually paradigm shafts, not paradigm shifts. The concept of junk DNA refers to a real revolution in our thinking about genomes. It developed over many years in the 1960s and 70s but it failed to convince most biologists. All of the announcements about disproving junk DNA are fake revolutions and paradigm shafts. (pp. 307-311)
[ENCODE and their current definition of "function"] [University press releases are a major source of science misinformation] [Press release from the Francis Crick Institute misrepresents junk DNA]
No comfort for Intelligent Design Creationists
Intelligent Design Creationists have been predicting for years that most of our genome would turn out to be functional. They interpret recent results to be a vindication of their prediction. I hope I've demonstrated that they are wrong. (pp. 311-312)
[Religion vs science (junk DNA): a blast from the past] [Stephen Meyer "predicts" there's no junk DNA] [Do Intelligent Design Creationists still think junk DNA refutes ID?] [You need to understand biology if you are going to debate an Intelligent Design Creationist]
Scientific controversies
There is a genuine controversy over the amount of junk DNA in the human genome but the existence of this controversy is hidden from the general public because most scientists ignore it. Why do most scientists refuse to even consider the idea that our genome could be full of junk? I outline four reasons for this behavior. (pp. 321-315)
[Scientists say "sloppy science" more serious than fraud]
Coping with a sloppy genome
I hope I've convinced you that it's possible to live with the idea that 90% of our genome is junk. (p. 315)

Notes for Chapter 11 (pp. 333-334)

References (pp. 335-358)

Index (pp. 359-372)

Saturday, February 18, 2023

Chapter 10: Turning Genes On and Off

Introduction
Francis Collins, and many others, believe that the concept of junk DNA is outmoded because recent discoveries have shown that most of the human genome is devoted to regulation. This is part of a clash of worldviews where one side sees the genome as analogous to a finely tuned Swiss watch with no room for junk and the other sees the genome as a sloppy entity that's just good enough to survive. (pp. 264-266)
What is regulation?
Regulation refers to gene expression that can be modified according to environmental conditions. (pp. 266-267)
[Protein concentrations in E. coli are mostly controlled at the level of transcription initiation]
Stochastic gene expression
The rate of transcription of a gene can vary from cell to cell due to the stochastic nature of transcription factor binding and the initiation of RNA synthesis. This is not regulation. (pp. 267-268)
What do we know about regulatory sequences?
Most transcription factors bind within a few hundred base pairs of the promoter. With a few exceptions, regulatory sequences are found in close proximity to the 5′ end of the gene. (pp. 268-270)
[Are multiple transcription start sites functional or mistakes?] [How many enhancers in the human genome?] [Are most transcription factor binding sites functional?] [The Encyclopedia of Evolutionary Biology revisits junk DNA]

   Box: The making of a queen by regulating gene expression (p. 270)

Regulation and evolution
One way of resolving the Deflated Ego Problem is to speculate that humans have a much more sophisticated regulatory network than other "lower" species. This is consistent with evo-devo, which postulates that the differences between species is due more to differences in regulating gene expression that differences in the number of genes. However, most of the results from evo-devo suggest that the differences in regulation are due to very small changes in the binding of existing transcription factors and not huge changes in genome organization. (pp. 271-273)

   Box: Can complex regulation evolve by acccident? (pp. 273-274)

Regulating gene expression by rearranging the genome
There are some well-studied examples of regulation that are connected to rearranging the genome by recombination. (pp. 275-277)
Open and closed domains
DNA is more accessible to transcription factors when nucleosomes are loosely organized in an open domain. Gene expression is repressed when the gene is embedded in a highly structured closed domain (heterochromatin). The transition from closed to open domains is coupled to demthylation of DNA and modification of histones. The DNase I sensitivity of DNA in an open domain is correlated with transcription activity. The spontaneous "breathing" of heterochromatic regions allows transcription actors to bind. (pp. 277-279)
[Epigenetic markers in the last 8% of the human genome sequence] [Chromatin organization at promoters in yeast cells]

   Box: X-chromosome inactivation (pp. 279-280)
   [Escape from X chromosome inactivation

The recruitment model of gene expression
The recruitment model of gene expression says that the binding of transcription factors triggers the demethylation of DNA and the modifiction of histone proteins to maintain an open domain. The histone code model is often connected to the belief that DNA demethylation and histone modification are the key events in regulation and not epiphenomena. This view is usually associated with a strong belief in the importance of epigenetics. (pp. 281-282
ENCODE promotes regulation
ENCODE researchers postulate that at least 20% of the genome is required for regulation and there are dozens of transcription factor binding sites for each gene. According to this view, this sophisticaed regulation explains why complex humans can exist with the same number of genes as most other species. (pp. 283-285)
[ENCODE's false claims about the number of regulatory sites per gene]
Does regulation explain junk? How can we test the hypothesis?
There are very few known examples of human genes with the complicated regulatory mechanisms promoted by the ENCODE leaders. The few published genomics tests of the hypothesis do not support it. What's missing is the random genome project in order to emphasize the importance of a negative control. (pp. 285-287)
[How much of the human genome is devoted to regulation?]

   Box: A thought experiment (pp. 287-288)

3D chromosomes
One possibility is that a lot of extra DNA is required in humans in order to organize genes into large functional loops of chromatin. This idea has been promoted by several scientists, including Emile Zuckerkandl. (pp. 288-291)
What the heck is epigenetics?
The most useful definition of epigenetics is the Holliday definition that restricts the term to changes that could be inherited by daughter cells following cell division. Proponents of epigenetics claim that chromatin markers can be passed from generation to generation in humans and they determine whether a gene will be expressed or silenced. There is no known mechanism for passing such markers from somatic cells to the germ line. (pp. 292-293)
[What the heck is epigenetics?] [Nessa Carey talks about epigenetics] [What do believers in epigenetics think about junk DNA?]
Restriction/modification and the inheritance of methylated nucleotides
The restriction/modification system in bacteria is a good example of how methylation signals can be passed to daughter cells following cell division but it does not explain epigenetics. There is a lot of hype associated with epigenetics and much of it is unjustified.(pp. 293-296)

Notes for Chapter 10 (pp. 331-333)

Chapter 9: The ENCODE Publicity Campaign

Introduction
On Sept. 5, 2012 Nature published a number of papers by the ENCODE Consortium. (The papers were rejected by Science.) The main summary paper announced that 80% of the human genome has a function and many of the ENCODE leaders pronounced the death of junk DNA. (pp. 238-141)
[The 10th anniversary of the ENCODE publicity campaign fiasco]
ENCODE results
The main results were that the human genome has 20,687 protein-coding genes and 18,451 noncoding genes. About 62% of the genome is transcribed. There are 636,336 binding sites for the 120 transcription factors they examined and these cover 231 million bp or 8.1% of the genome. The researchers identified more than 5 million open chromatin domains accounting for about 40% of the genome. If you add up all the biochemcally active DNA it somes to 80.4% of the genome. (pp. 241-244)
The ENCODE publicity campaign
The papers that appeared in the Sept. 5th edition of Nature were accompanied by a massive publicity campaign orgnanized by the editors at Nature. There were press releases from the univesities and government research centers that were involved in the project. The main message was that 80% of the genome is functional and the idea of junk DNA has been refuted. The de facto ENCODE leader, Ewan Birney, was hailed as a "Big Talker." (pp. 244-246)
[The ENCODE publicity campaign of 2007]
Criticisms of ENCODE
The blogosphere, Twitter, and Facebook erupted immediately with posts criticising ENCODE for misleading the public about the meaning of function and pointing out that junk DNA is alive and well. Brendan Maher, the feature editor for Nature realized the next day (Sept. 6) that they had a problem and he announced that the main purpose of the publicity capmaign was to create "the biggest splash possible" to promote results that usually don't get much attention in the popular press. He conceded that the claim of 80% functional might have been an exaggeration. Over the next couple of years a number of papers critical of the ENCODE claim have been published in the scientific literature. I have never seen such a strong and rapid criticism of papers published by leading scientists in a well-respected journal like Nature (pp. 247-254)
Science journal doubles down
In December 2012 Science listed the ENCODE results as one of the breakthroughs of the year. Although it ackonwledged the controversy, it still reported that 80% of the human genome is functional. (pp. 254-255)
ENCODE backpedals
In 2014, the ENCODE researchers partially retracted their claims about function and announced that the main purpose of ENCODE is to map all the spurious transcripts and spurious transcription factor biding sites in order to provide a resource for the community of scientists. (pp. 255-260)
[ENCODE and their current definition of "function"] [The Function Wars Part XII: Revising history and defending ENCODE] [Manolis Kellis dismisses junk DNA] [What did ENCODE researchers say on Reddit?] [Tim Minchin's "Storm," the animated movie, and another no-so-good Minchin cartoon]
ENCODE III
ENCODE III said in 2020 that there are 20,225 protein-coding genes and 37,595 noncoding genes. There are now 2,157,387 open chromatin domains and 1,224,154 transcription factor binding sites. ENCODE III made no claims about function. (pp. 260-261)
What went wrong?
ENCODE failed to consider the null hypothesis of no function. The researchers failed to acknolwedge the critisisms of their claims back in 2007 and they failed to take into account alternative explanations of their data. This is not how science is supposed to work. (pp. 261-263)
[The 20th anniversary of the human genome sequence: 6. Nature doubles down on ENCODE results] [Style vs substance in science communication: The role of science writers in major science journals]

Notes for Chapter 9 (pp. 330-331)

Saturday, February 4, 2023

Chapter 7: Gene Families and the Birth and Death of Genes

Introduction

The histone gene family. Definition of gene family. Pseudogenes. (p. 170-171)

The birth and death of genes

As genome evolve, new genes are born and old genes die. "Birth & death evolution" was mainly developed and promoted by Masatochi Nei beginning in the early 1970s. Many new genes arise by gene duplication but most of them become pseudogenes within a few million years. Some evolve new functions by subfunctionalizaton or neofunctionalization. (pp. 172-174)
[On the evolution of duplicated genes: subfunctionalization vs neofunctionalization]

   Box: The smell of sweat (pp. 174-175)

Gene duplication and mutationism

Gene duplication is due mostly to errors in recombination. This is a subset of segmental duplication and it leads to genome expansion. The creation of new genes by mutation is a key aspect of mutationism. (p. 175-177)
[Mutation, Randomness, & Evolution] [Replaying life's tape] [What is "structuralism"?] [Reactionary fringe meets mutation-biased adaptation: Introduction]

Whole genome duplications and the fate of genes

Polyploidization and hybridization give rise to species with twice as much DNA. The fate of that extra DNA, especially extra genes, can be tracked over time. It looks like the extra DNA is another example of junk DNA, lending support to the idea that species can tolerate large amounts of nonfunctional DNA. (pp. 177-179)
[The birth and death of salmon genes] [Birth and death of genes in a hybrid frog genome]

   Box: Real orphans in the human genome

Completely new genes, de novo genes, are rare but there are genuine examples of genes that are unique in the human genome (ORFans). They arise by gene duplication and they are often polymophic. (p. 180)

Different kinds of pseudogenes

There are four different kinds of pseudogenes: death of a duplicated gene, processed, unitary, and polymorphic. The human genome has about 15,000 pseudogenes (5% of the genome) and almost all of them are junk. The fixation of a pseudogene involves two steps; mutation and fixation by random genetic drift. Pseudogenes can become unrecognizable after 100 million years. (pp. 181-184)
[Is the high frequency of blood type O in native Americans due to random genetic drift?]

   Box: Conserved pseudogenes and Ken Miller's argument against intelligent design

The presence of a conserved pseudogene in the beta globin gene cluster in chimpanzee and human genomes is difficult to explain by intelligent design. The fact that a small segment of the beta-globin pseudogene contains a SAR sequence is irrelevant to the main argument. (pp. 185-186)

Are they really pseudogenes?

Pseudogenes are broken genes and they are junk by any reasonable definition (see "If It Walks Like a Duck" in chapter 3). Some scientists who are opposed to junk DNA have claimed that most pseudogenes must be functional based on the fact that a tiny nunmber have secondarily acquired a functon. This is an example of cherry picking. (p. 186-188)
[Are pseudogenes really pseudogenes?]

   Box: The short legs of dachhunds (p. 188-189)

How accurate is the genome sequence?

The accuracy of DNA sequencing methods is approaching 99.99%. If that is coupled to 30x coverage, the overall accuracy is good enough to reliably distinguish between functional genes and pseudogenes. You also need a reliable sequence of your personal genome if you are going to make decisions about your health. (pp. 189-191)

Notes for Chapter 7 (pp. 327-328)