Showing posts with label regulatory sequences. Show all posts
Showing posts with label regulatory sequences. Show all posts

Saturday, February 18, 2023

Chapter 10: Turning Genes On and Off

Introduction
Francis Collins, and many others, believe that the concept of junk DNA is outmoded because recent discoveries have shown that most of the human genome is devoted to regulation. This is part of a clash of worldviews where one side sees the genome as analogous to a finely tuned Swiss watch with no room for junk and the other sees the genome as a sloppy entity that's just good enough to survive. (pp. 264-266)
What is regulation?
Regulation refers to gene expression that can be modified according to environmental conditions. (pp. 266-267)
[Protein concentrations in E. coli are mostly controlled at the level of transcription initiation]
Stochastic gene expression
The rate of transcription of a gene can vary from cell to cell due to the stochastic nature of transcription factor binding and the initiation of RNA synthesis. This is not regulation. (pp. 267-268)
What do we know about regulatory sequences?
Most transcription factors bind within a few hundred base pairs of the promoter. With a few exceptions, regulatory sequences are found in close proximity to the 5′ end of the gene. (pp. 268-270)
[Are multiple transcription start sites functional or mistakes?] [How many enhancers in the human genome?] [Are most transcription factor binding sites functional?] [The Encyclopedia of Evolutionary Biology revisits junk DNA]

   Box: The making of a queen by regulating gene expression (p. 270)

Regulation and evolution
One way of resolving the Deflated Ego Problem is to speculate that humans have a much more sophisticated regulatory network than other "lower" species. This is consistent with evo-devo, which postulates that the differences between species is due more to differences in regulating gene expression that differences in the number of genes. However, most of the results from evo-devo suggest that the differences in regulation are due to very small changes in the binding of existing transcription factors and not huge changes in genome organization. (pp. 271-273)

   Box: Can complex regulation evolve by acccident? (pp. 273-274)

Regulating gene expression by rearranging the genome
There are some well-studied examples of regulation that are connected to rearranging the genome by recombination. (pp. 275-277)
Open and closed domains
DNA is more accessible to transcription factors when nucleosomes are loosely organized in an open domain. Gene expression is repressed when the gene is embedded in a highly structured closed domain (heterochromatin). The transition from closed to open domains is coupled to demthylation of DNA and modification of histones. The DNase I sensitivity of DNA in an open domain is correlated with transcription activity. The spontaneous "breathing" of heterochromatic regions allows transcription actors to bind. (pp. 277-279)
[Epigenetic markers in the last 8% of the human genome sequence] [Chromatin organization at promoters in yeast cells]

   Box: X-chromosome inactivation (pp. 279-280)
   [Escape from X chromosome inactivation

The recruitment model of gene expression
The recruitment model of gene expression says that the binding of transcription factors triggers the demethylation of DNA and the modifiction of histone proteins to maintain an open domain. The histone code model is often connected to the belief that DNA demethylation and histone modification are the key events in regulation and not epiphenomena. This view is usually associated with a strong belief in the importance of epigenetics. (pp. 281-282
ENCODE promotes regulation
ENCODE researchers postulate that at least 20% of the genome is required for regulation and there are dozens of transcription factor binding sites for each gene. According to this view, this sophisticaed regulation explains why complex humans can exist with the same number of genes as most other species. (pp. 283-285)
[ENCODE's false claims about the number of regulatory sites per gene]
Does regulation explain junk? How can we test the hypothesis?
There are very few known examples of human genes with the complicated regulatory mechanisms promoted by the ENCODE leaders. The few published genomics tests of the hypothesis do not support it. What's missing is the random genome project in order to emphasize the importance of a negative control. (pp. 285-287)
[How much of the human genome is devoted to regulation?]

   Box: A thought experiment (pp. 287-288)

3D chromosomes
One possibility is that a lot of extra DNA is required in humans in order to organize genes into large functional loops of chromatin. This idea has been promoted by several scientists, including Emile Zuckerkandl. (pp. 288-291)
What the heck is epigenetics?
The most useful definition of epigenetics is the Holliday definition that restricts the term to changes that could be inherited by daughter cells following cell division. Proponents of epigenetics claim that chromatin markers can be passed from generation to generation in humans and they determine whether a gene will be expressed or silenced. There is no known mechanism for passing such markers from somatic cells to the germ line. (pp. 292-293)
[What the heck is epigenetics?] [Nessa Carey talks about epigenetics] [What do believers in epigenetics think about junk DNA?]
Restriction/modification and the inheritance of methylated nucleotides
The restriction/modification system in bacteria is a good example of how methylation signals can be passed to daughter cells following cell division but it does not explain epigenetics. There is a lot of hype associated with epigenetics and much of it is unjustified.(pp. 293-296)

Notes for Chapter 10 (pp. 331-333)

Friday, February 3, 2023

Chaper 5: The Big Picture

Introduction

DNA sequencing and assembly. Cost of sequencing. (pp. 116-118)

A typical gene

DNA sequences are depositied in GenBank. The gene for triose phosphate isomerase (TPI1) is a typical gene. Decoding a protein-coding gene. (pp. 118-122)

Annotators interpret the genome

Human annotators must interpret the DNA sequence. (pp. 122-123)
[ Contaminated genome sequences]

How much of the genome has been sequenced?
About 95% of the genome has been sequenced in the standard reference genome. The rest is estimated from the size of the gaps giving a total of 3.1 Gb. The complete telomere-telomere sequence of T2T-CHM13 is also 3.1 Gb. (pp. 123-125)
[Karen Miga and the telomere-to-telomere consortium] [A complete human genome sequence (2022)] [What do we do with two different human genome reference sequences?] [How big is the human genome (2023)?]
Whose genome was sequenced?
The Celera sequence was mostly Craig Venter's genome. The IHGP standard reference genome was originally a composite of several difference individuals from Buffalo (New York, USA). (pp. 125-126)
How many genes?

The original genome sequence predicted 30,000-40,000 protein-coding genes but that number has dropped to about 20,000 in the current standard reference genome. There are about 5,000 noncoding genes but this number is disputed. Introns take up most of a protein-coding gene and introns are mostly junk DNA. (pp. 126-128)
[Are introns mostly junk?]

Pseudogenes
There are abot 15,000 pseudogenes derived from protein-coding genes. The number derived from noncoding genes is not known. Pseudogenes account for about 5% of the genome. (p. 128)
Regulatory sequences
If we assume about 200 bp of regulatory sequence for each gene then regulatory sequences account for less than 0.2% of your genome. Many scientists believe this number should be much higher. (pp. 128-129)
Origins of replication
There are about 30,000-50, 000 functioning origins of replication accounting for <0.3% of your genome. (pp. 129-130)
Centromeres
About 1% of your genome is occupied by centromeres. (p. 130)
[Centromere DNA] [Minimum Centromere Size in Plants]
Telomeres
Telomere sequences are about 0.1%. (pp. 130-131)
[Telomeres]
Scaffold Attachment regions (SARs)
SARs are required for chromatin organizaton and it's not clear how much DNA sequence is required. Assuming 100,000 loops and 100 bp of SAR per loop gives 0.3% of the genome. (p. 131)
Transposons
About 55% of the genome contains transposon-related and virus-related sequences. They are scattered throughout the genome including within introns. (pp. 131-132)
Viruses
Defective viruses take up about 9% of the genome and functional, dormant, viruses account for less than 0.1%. (p. 132)
Mitochondrial DNA
Less than 0.01% of your genome is occupied by mitochondrial DNA fragments. (p. 132)
How much of our genome is functional?
Adding up all the known functional sequences gives a value of about 4% functional. The actual amount is probably closer to 8-10% based on sequence conservation. The total amount of presumed junk DNA comes to 89%. About 90% of your genome is junk. (pp. 132-133)
[The 20th anniversary of the human genome sequence: 4. Functional DNA in our genome]
What is junk DNA?
Junk DNA is DNA that can be deleted without reducing the fitness of the individual. The debate is not whether junk DNA exists (it does) but over the amount of junk DNA. Opponents of junk DNA think that it would have been eliminated by natural selection if it were really junk. This is a common view in the popular press and even in the scientific literature. My vew is that genomes are sloppy and natural selection isn't capable of purging junk DNA in species with large genomes. (pp. 133-135)
[Identifying functional DNA (and junk) by purifying selection]
Notes for Chapter 5 (p. 324)