Showing posts with label transposons. Show all posts
Showing posts with label transposons. Show all posts

Friday, February 3, 2023

Chaper 5: The Big Picture

Introduction

DNA sequencing and assembly. Cost of sequencing. (pp. 116-118)

A typical gene

DNA sequences are depositied in GenBank. The gene for triose phosphate isomerase (TPI1) is a typical gene. Decoding a protein-coding gene. (pp. 118-122)

Annotators interpret the genome

Human annotators must interpret the DNA sequence. (pp. 122-123)
[ Contaminated genome sequences]

How much of the genome has been sequenced?
About 95% of the genome has been sequenced in the standard reference genome. The rest is estimated from the size of the gaps giving a total of 3.1 Gb. The complete telomere-telomere sequence of T2T-CHM13 is also 3.1 Gb. (pp. 123-125)
[Karen Miga and the telomere-to-telomere consortium] [A complete human genome sequence (2022)] [What do we do with two different human genome reference sequences?] [How big is the human genome (2023)?]
Whose genome was sequenced?
The Celera sequence was mostly Craig Venter's genome. The IHGP standard reference genome was originally a composite of several difference individuals from Buffalo (New York, USA). (pp. 125-126)
How many genes?

The original genome sequence predicted 30,000-40,000 protein-coding genes but that number has dropped to about 20,000 in the current standard reference genome. There are about 5,000 noncoding genes but this number is disputed. Introns take up most of a protein-coding gene and introns are mostly junk DNA. (pp. 126-128)
[Are introns mostly junk?]

Pseudogenes
There are abot 15,000 pseudogenes derived from protein-coding genes. The number derived from noncoding genes is not known. Pseudogenes account for about 5% of the genome. (p. 128)
Regulatory sequences
If we assume about 200 bp of regulatory sequence for each gene then regulatory sequences account for less than 0.2% of your genome. Many scientists believe this number should be much higher. (pp. 128-129)
Origins of replication
There are about 30,000-50, 000 functioning origins of replication accounting for <0.3% of your genome. (pp. 129-130)
Centromeres
About 1% of your genome is occupied by centromeres. (p. 130)
[Centromere DNA] [Minimum Centromere Size in Plants]
Telomeres
Telomere sequences are about 0.1%. (pp. 130-131)
[Telomeres]
Scaffold Attachment regions (SARs)
SARs are required for chromatin organizaton and it's not clear how much DNA sequence is required. Assuming 100,000 loops and 100 bp of SAR per loop gives 0.3% of the genome. (p. 131)
Transposons
About 55% of the genome contains transposon-related and virus-related sequences. They are scattered throughout the genome including within introns. (pp. 131-132)
Viruses
Defective viruses take up about 9% of the genome and functional, dormant, viruses account for less than 0.1%. (p. 132)
Mitochondrial DNA
Less than 0.01% of your genome is occupied by mitochondrial DNA fragments. (p. 132)
How much of our genome is functional?
Adding up all the known functional sequences gives a value of about 4% functional. The actual amount is probably closer to 8-10% based on sequence conservation. The total amount of presumed junk DNA comes to 89%. About 90% of your genome is junk. (pp. 132-133)
[The 20th anniversary of the human genome sequence: 4. Functional DNA in our genome]
What is junk DNA?
Junk DNA is DNA that can be deleted without reducing the fitness of the individual. The debate is not whether junk DNA exists (it does) but over the amount of junk DNA. Opponents of junk DNA think that it would have been eliminated by natural selection if it were really junk. This is a common view in the popular press and even in the scientific literature. My vew is that genomes are sloppy and natural selection isn't capable of purging junk DNA in species with large genomes. (pp. 133-135)
[Identifying functional DNA (and junk) by purifying selection]
Notes for Chapter 5 (p. 324)

Monday, August 29, 2022

Chapter 3: Repetitive DNA and Mobile Genetic Elements

Introduction
Half of our genome is composed of highly repetitive DNA and moderately repetitive DNA. Satellite DNA. C0t curves. (pp. 57-58)
[Transcription activity in repeat regions of the human genome]
Centromeres
Centromeres contain highly repetitive DNA. (p. 58)
[The structures of centromeres]
Telomeres

Telomeres at the ends of chromosomes contain repetitive DNA. (pp. 58-59)

   Box: Dead centromeres and telomeres (pp. 59-60)

Short tandem repeats (STRs)

Short tandem repeats (STRs) are short stretches of repetitive DNA. (p. 60)

   Box: DNA fingerprints (pp. 60-61)

Mobile genetic elements
Moderately repetitive DNA consists of interspersed copies of viruses and transposons. (p. 61)
Hidden viruses in your genome
The human genome contains copies of DNA viruses and RNA viruses. Most of them are due to ancient insertions and the viral genomes have acquired inactivating mutations. Many virus-related sequences are just fragments of the original virus genome. (pp. 61-65)
What do we need to know about transposons?
The two main tpes of transposons are DNA transposons and RNA transposons (retrotransposons). (pp. 65-67)
LINES and SINES
Long interspersed elements (LINEs) are transposons that carry a gene for reverse transcriptase. Most LINE-related sequences are degenerate versions of a once-active transposons. Short interspersed elements are derived from small noncoding genes and they require exogenous reverse transcriptase to propagate. Alu elements are one example of a SINE and there are more than one million copies in the human genome. (pp. 67-70)
How much of our genome is composed of transposon-related sequences?
Most of the transposon-related sequences are inactive fragments of the original transposons. It's diffficult to get a precise estimate of the total amount of transposon-related sequences but it's probably at least 50% of the human genome.(pp. 70-72)

   BOX: What does the humped bladderwort tell us about junk DNA? (p. 72)

Selfish genes and selfish DNA
Selfish DNA refers to DNA sequences that can propagate by themselves within the genome. (p. 73)
[Junk DNA and selfish DNA] [The selfish gene vs the lucky allele]
Exaptation versus the post hoc fallacy
Some transposon-related sequences have secondarily acquired a function that contributes to the fitness of the organism. This is an example of exaptation. Some scientists believe that transposon-related sequences are retained in order to serve as a reservoir for future exaptation but this argment is related to a logical fallacy called the post hoc fallacy. (pp. 73-78)
[Peter Larsen: "There is no such thing as 'junk DNA'"]
Mitochondria are invading your genome!
The human genome contains fragments of mitochondrial DNA that have recently been incorprated by accident. (pp. 78-79)
[How much mitochondrial DNA in your genome?]
On the origin of junk DNA

A lot of junk DNA originates from ancient insertions of transposons and their subsequent degeneration by acquiring mutations. (pp. 79-80)

If it walks like a duck ...

Transposons look like junk, behave like junk, and evolve like junk, so let's just call them junk. (pp. 80-81)

Notes for Chapter 3 (pp. 320-321)