The proposition that DNA can be parasitic is fairly bulletproof, but it’s essentially impossible to prove any given piece of DNA nonfunctional—you can’t test it over an evolutionary timescale under all relevant conditions. Selfish DNA elements very frequently get incorporated into regulatory networks, and in fact are a major driving force behind evolution, particularly in animals, where the important differences are mostly in regulatory DNA.
Actually, we can guess that a piece of DNA is nonfunctional if it seems to have undergone neutral evolution (roughly, accumulation of functionally equivalent mutations) at a rate which implies that it was not subject to any noticeable positive selection pressure over evolutionary time. Leaving aside transposons, repetition, and so on, that’s a main part of how we know that large amounts of junk DNA really are junk.
There are pieces of DNA that preserve function, but undergo neutral evolution. A recent nature article found a not-protein-coding piece of DNA that is necessary for development (by being transcribed into RNA), that had underwent close to neutral evolution from zebrafish to human, but maintained functional conservation. That is, taking the human transcript and inserting it into zebrafish spares it from death, indicating that (almost) completely different DNA performs the same function, and that using simple conservation of non-neutral evolution we probably can’t detect it.
I’m having trouble working out the experimental conditions here. I take it they replaced a sequence of zebrafish DNA with its human equivalent, which seemed to have been undergoing nearly neutral selection, and didn’t observe developmental defects. But what was the condition where they did observe defects? If they just removed that section of DNA, that could suggest that some sequence is needed there but its contents are irrelevant. If they replaced it with a completely different section of DNA that seems like it would be a lot more surprising.
You are correct—given the information above it is possible (though unlikely) that the DNA was just there as a spacer between two other things and its content was irrelevant. However the study controlled for this—they also mutated the zebrafish DNA in specific places and were able to induce identical defects as with the deletion.
What’s happening here is that the DNA is transcribed into non protein-coding RNA. This RNA’s function and behavior will be determined by, but impossible to predict from, it’s sequence—you’re dealing not only with the physical process of molecular folding, which is intractable, but with its interactions with everything else in the cell, which is intractability squared. So there is content there but it’s unreadable to us and thus appears unconstrained. If we had a very large quantum computer we could perhaps find the 3d structure “encoded’ by it and its interaction partners, and would see the conservation of this 3d structure from fish to human.
That’s interesting. I guess my next question is, how confident are we that this sequence has been undergoing close-to-neutral selection?
I ask because if it has been undergoing close-to-neutral selection, that implies that almost all possible mutations in that region are fitness-neutral. (Which is why my thoughts turned to “something is necessary, but it doesn’t matter what”. When you call that unlikely, is that because there’s no known mechanism for it, or you just don’t think there was sufficient evidence for the hypothesis, or something else?) But… according to this study they’re not, which leaves me very confused. This doesn’t even feel like I just don’t know enough, it feels like something I think I know is wrong.
if it has been undergoing close-to-neutral selection, that implies that almost all possible mutations in that region are fitness-neutral.
There is no “neutral” evolution, as all DNA sequences are subject to several constraints, such as maintaining GC content and preventing promoters) from popping out needlessly. There is also large variability of mutation rates along different DNA regions. Together, this results in high variance of “neutral” mutation rate, and because of huge genome, making it (probably) impossible to detect even regions having quarter of neutral mutation rate. I think this is the case here.
This extends what zslastsman written regarding structure.
We can’t be totally confident. I’d guess that if you did a sensitive test of fitness (you’d need a big fish tank and a lot of time) you’d find the human sequence didn’t rescue the deletion perfectly. They’ve done this recently in c elegans—looking at long term survival in the population level, and they find a huge number of apparantly harmless mutations are very detrimental at the population level.
The reason I’d say it was unlikely is just that spacers of that kind aren’t common (I don’t know of any that aren’t inside genes). If there were to sequences on either side that needed to bend around to eachother to make contact, it could be plausible, but since they selected by epigenetic marks, rather than sequence conservation, it would be odd and novel if they’d managed to perfectly delete such a spacer (actually it would be very interesting of itself.)
I think you are being confused by two things
1) The mutation I said they made was deliberately targeted to a splice site, which are constrained (though you can’t use them to identify sequences because they are very small, and so occur randomly outside functional sequence all the time)
2) You are thinking too simplistically about sequence constraint. RNA folds by wrapping up and forming helices with itself, so the effect of a mutation is dependent on the rest of the sequence. Each mutation releases constraint on other base pairs, and introduces it to others. So as this sequence wanders through sequence space it does so in a way that preserves relationships, not absolute sequence. From it’s current position in sequence space, many mutations would be detrimental. But those residues may get the chance to mutate later on, when other residues have relieved them. This applies to proteins as well by the way. Proteins are far more conserved in 3d shape than in 2d sequence.
The DNA in the zebrafish was deleted, and the human version was inserted later, without affecting the main DNA (probably using a “plasmid”).
Without the human DNA “insert”, there was a developmental defect. with either the human DNA insert or the original zebrafish DNA (as an insert), there was no developmental defect, leading to the conclusion that the human version is functionally equivalent to the zebrafish version.
There are several complications addressed in the article, which I did not describe. Anyway, using a “control vector” is considered trivial, and I believe they checked this.
That’s true of protein coding sequence, but things are a little bit more difficult for regulatory DNA because
1)Regulatory DNA is under MUCH less sequence constraint—the relevant binding proteins are not individually fussy about their binding sites
2)Regulatory Networks have a lot of redundancy
3)Regulatory Mutations can be much more easily compensated for by other mutations—because we’re dealing with analog networks, rather than strings of amino acids.
Regulatory evolution is an immature field but it seems that an awful lot of change can occur in a short time. The literature is full of sequences that have an experimentally provable activity (put them on a plasmid with a reporter gene and off it goes) and yet show no conservation between species. There’s probably a lot more functional sequence that won’t just work on it’s own on a plasmid, or show a noticable effect from knockouts. It may be that regulatory networks are composed of a continous distribution from a few constrained elements with strong effects down to lots of unconstrained weak ones. The latter will be very, very difficult to distinguish from Junk DNA.
Data with lots of redundancy does, in a certain sense, contain a lot of junk. Junk that, although it helps reliably transmit the data, doesn’t change the meaning of the data (or doesn’t change it by much).
Actually, we can guess that a piece of DNA is nonfunctional if it seems to have undergone neutral evolution (roughly, accumulation of functionally equivalent mutations) at a rate which implies that it was not subject to any noticeable positive selection pressure over evolutionary time.
This actually isn’t necessarily true. If there is a section of the genome A that needs to act on another section of the genome C with section B in between, and A needs to act on C with a precise (or relatively so) genomic distance between them, B can neutrally evolve, even though it’s still necessary for the action of A on C, since it provides the spacing.
The only definitively nonfunctional DNA is that which has been deleted. “Nonfunctional DNA” is temporarily inactive legacy code which may at any time be restored to an active role.
In the context “how complicated is a human brain?”, DNA which is currently inactive does not count towards the answer.
That said (by which I mean “what follows doesn’t seem relevant now that I’ve realised the above, but I already wrote it”),
Is inactive DNA more likely to be restored to an active role than to get deleted? I’m not sure it makes sense to consider it functional just because it might start doing something again. When you delete a file from your hard disk, it could theoretically be restored until the disk space is actually repurposed; but if you actually wanted the file around, you just wouldn’t have deleted it. That’s not a great analogy, but...
My gut says that any large section of inactive DNA is more likely to become corrupted than to become reactivated. A corruption is pretty much any mutation in that section, whereas I imagine reactivating it would require one of a small number of specific mutations.
Counterpoint: a corruption has only a small probability of becoming fixed in the population; if reactivation is helpful, that still only has a small probability of becoming fixed, but it’s a much higher small probability.
Counter-counterpoint: no particular corruption would need to be fixed in the whole population. If there are several corruptions at independent 10% penetration each, a reactivating mutation will have a hard time becoming fixed.
Yeah, the most common protein (fragment) in the human genome is reverse transcriptase, which is only used by viruses (as far as we know). It just gets in there from old (generationally speaking) virus infections. But I’d still be surprised if we haven’t figured out some way to use those fragments left in there.
Why should we? It’s like postulating a human evolutionary use for acne—the zits don’t have to be useful for us, they’re already plenty useful for the bacterium that makes them happen.
Do you mean in the sense that we’ve adapted to having all this junk in our DNA, and would now get sick if it was all removed? That’s possible (though onions seem to be fine with more/less of it).
Well it takes something like 8 ATP per base pair to replicate DNA, so that’s a pretty hefty metabolic load. Which means, on average, it needs to compensate for that selection pressure somehow. The viruses in our lab will splice out a gene that doesn’t give benefit in maybe around 5 generations? Humans are much better at accurate replication, but I’d still think it would lose it fairly quickly.
ATP may release roughly 14 kcal/mol; the actual amount varies with local conditions (heat, temperature, pressure) and chemical concentrations. An adult human body contains very roughly 50 trillion cells. However, different cells divide at very different rates. I tried to find data estimating total divisions in the body; this Wikipedia article says 10,000 trillion divisions per human lifetime. (Let one lifetime = 80 years ~~ 2.52e9 seconds).
Now, what is a trillion? I shall assume the short scale, trillion = 1e12, and weep at the state of popular scientific literature that counts in “thousands of trillions” instead of actual numbers. This means 10000e12=1e16 cell divisions per lifetime.
We then get 14e3 / 6.022e23 (Avogadro’s constant) = 2.325e-20 calories per extra base pair replication; and 1e16 / 2.52e9 = 3.968e6 cell divisions per second on average. So an extra base pair in all of your cells costs 9.23e-14 calories per day. Note those are actual calories, not the kilocalories sometimes called “calories” marked on food. Over your lifetime an extra base pair would cost 2.325e-4 calories. That’s 0.00000235 kilocalories in your lifetime. You probably can’t make a voluntary muscle movement so small that it wouldn’t burn orders of magnitude more energy.
I am by no means an expert, but I expect evolution can’t act on something that small—it would swamped by environmental differences. A purely nonfunctional extra-base-pair mutation seems to have have an evolutionary cost of pretty much zero if you only count DNA replication costs.
Well, maybe the numbers I got are wrong. Let’s calculate the cost of replicating the whole genome. That’s about 3.2 Gb (giga base pairs), so replicating this would cost 3.2e12*2.325e-20 = 7.44e-8 calories, or 0.2952 calorie/second, or 25.5 kilocalories per day. That sounds reasonable.
What do you think? 8 ATP / base pair replication seems like a tiny energy expenditure. But, I don’t know what the typical scale is for cell biology.
Well first off, I’m going entirely on memory with the 8 ATP number. I’m 90% certain it is at least that much, but 16 is also sticking in my head as a number. The best reference I can give you is probably that you get ~30 ATP per glucose molecule that you digest. (Edit: that’s for aerobic metabolism, not anaerobic. Anaerobic is more like 2 ATP per glucose molecule.)
The other thing to consider is that typically, your cell divisions are going to be concentrated in the first 1/6th of your life or so. So averaging it over 80 years may be a little disingenuous. Cells certainly still grow later in life, but they slow down a lot.
I agree splicing out a single base is not likely to generate any measurable fitness advantage. But if you have 90% of your genome that is “junk”, that’s 0.9*25.5 kcal/day, which is about 1% of a modern daily diet, and probably a much larger portion of the diet in the ancestral environment. Requiring eating 1% more food over the course of one’s lifetime seems to me like it would be significant, or at least approaching it. But what do I know, I’m just guessing, really.
Using 16 ATP instead of 8, and 80/6=13.33 years, won’t change the result significantly. It seems off by many orders of magnitude (to claim natural selection based on energy expenditure).
1% of diet is a selectable-sized difference, certainly. But the selection pressure applies to individual base pair mutations, which are conserved or lost independently of one another (ignoring locality effects etc). The total genome size, or total “junk” size, can’t generate selection pressure unless different humans have significantly different genome size. But it looks like that’s not the case.
But the selection pressure applies to individual base pair mutations
I am confused why you believe this. Evolution need not splice out bases one base at a time. You can easily have replication errors that could splice out tens of thousands of bases at a time.
No, replication is more robust than that. I have never heard of large insertion or deletion in replication, except in highly repetitive regions (and there only dozens of bases, I think).
However, meiotic crossover is sloppy, providing the necessary variation.
Speaking of meiotic crossover, non-coding DNA provides space between coding regions, reducing the likelihood of crossover breaking them.
...I seem to have assumed the number of BP changed by small or point mutations would make up the majority of all BP changed by mutations. (I was probably primed because you started out by talking about the energy cost per BP.) Now that you’ve pointed that out, I have no good reason for that belief. I should look for quantified sources of information.
OK, so now we need to know 1) what metabolic energy order of magnitude is big enough for selection to work, and 2) the distribution of mutation sizes. I don’t feel like looking for this info right now, maybe later. It does seem plausible that for the right values of these two variables, the metabolic costs would be big enough for selection to act against random nonfunctional mutations.
But apparently there is a large amount of nonfunctional DNA, and also I’ve read that some nonfunctional mutations are fixated by drift (i.e. selection is zero on net). That’s some evidence for my guess that some (many?) nonfunctional mutations, maybe only small ones, are too small for selection pressure due to metabolic costs to have much effect.
Yeah, I will definitely concede small ones have negligible costs. And I’m not sure the answer to 1) is known, and I doubt 2) is well quantified. A good rule of thumb for 2) though is that “if you’re asking whether or not it’s possible, it probably is”. At least that’s the rule of thumb I’ve developed from asking questions in classes.
Cool calculation, but just off the top of my head, you would also need energy for DNA repair processes, which my naive guess would be O(n) in DNA length and is constantly ongoing.
Good point. And there may well be other ways that “junk” genes are metabolically expensive. For instance real genes probably aren’t perfectly nonfunctional. Maybe they make the transcription or expression of other genes more (or less) costly, or they use up energy and materials being occasionally transcribed into nonfunctional bits of RNA or protein, or bind some factors, or who knows what else. And then selection can act on that.
But the scale just seems too small for any of that matter in most cases—because it has to matter at the scale of a single base pair, because that’s the size of a mutation and point mutations can be conserved or lost independently of one another.
What is the metabolic cost (per cell per second) scale or order of magnitude where natural selection begins to operate?
Perhaps you will be astonished by parasitic DNA. It is pretty astonishing.
EDIT: ah, right: the fun video that introduced me to this.
The proposition that DNA can be parasitic is fairly bulletproof, but it’s essentially impossible to prove any given piece of DNA nonfunctional—you can’t test it over an evolutionary timescale under all relevant conditions. Selfish DNA elements very frequently get incorporated into regulatory networks, and in fact are a major driving force behind evolution, particularly in animals, where the important differences are mostly in regulatory DNA.
Actually, we can guess that a piece of DNA is nonfunctional if it seems to have undergone neutral evolution (roughly, accumulation of functionally equivalent mutations) at a rate which implies that it was not subject to any noticeable positive selection pressure over evolutionary time. Leaving aside transposons, repetition, and so on, that’s a main part of how we know that large amounts of junk DNA really are junk.
There are pieces of DNA that preserve function, but undergo neutral evolution. A recent nature article found a not-protein-coding piece of DNA that is necessary for development (by being transcribed into RNA), that had underwent close to neutral evolution from zebrafish to human, but maintained functional conservation. That is, taking the human transcript and inserting it into zebrafish spares it from death, indicating that (almost) completely different DNA performs the same function, and that using simple conservation of non-neutral evolution we probably can’t detect it.
I’m having trouble working out the experimental conditions here. I take it they replaced a sequence of zebrafish DNA with its human equivalent, which seemed to have been undergoing nearly neutral selection, and didn’t observe developmental defects. But what was the condition where they did observe defects? If they just removed that section of DNA, that could suggest that some sequence is needed there but its contents are irrelevant. If they replaced it with a completely different section of DNA that seems like it would be a lot more surprising.
You are correct—given the information above it is possible (though unlikely) that the DNA was just there as a spacer between two other things and its content was irrelevant. However the study controlled for this—they also mutated the zebrafish DNA in specific places and were able to induce identical defects as with the deletion.
What’s happening here is that the DNA is transcribed into non protein-coding RNA. This RNA’s function and behavior will be determined by, but impossible to predict from, it’s sequence—you’re dealing not only with the physical process of molecular folding, which is intractable, but with its interactions with everything else in the cell, which is intractability squared. So there is content there but it’s unreadable to us and thus appears unconstrained. If we had a very large quantum computer we could perhaps find the 3d structure “encoded’ by it and its interaction partners, and would see the conservation of this 3d structure from fish to human.
That’s interesting. I guess my next question is, how confident are we that this sequence has been undergoing close-to-neutral selection?
I ask because if it has been undergoing close-to-neutral selection, that implies that almost all possible mutations in that region are fitness-neutral. (Which is why my thoughts turned to “something is necessary, but it doesn’t matter what”. When you call that unlikely, is that because there’s no known mechanism for it, or you just don’t think there was sufficient evidence for the hypothesis, or something else?) But… according to this study they’re not, which leaves me very confused. This doesn’t even feel like I just don’t know enough, it feels like something I think I know is wrong.
There is no “neutral” evolution, as all DNA sequences are subject to several constraints, such as maintaining GC content and preventing promoters) from popping out needlessly. There is also large variability of mutation rates along different DNA regions. Together, this results in high variance of “neutral” mutation rate, and because of huge genome, making it (probably) impossible to detect even regions having quarter of neutral mutation rate. I think this is the case here.
This extends what zslastsman written regarding structure.
We can’t be totally confident. I’d guess that if you did a sensitive test of fitness (you’d need a big fish tank and a lot of time) you’d find the human sequence didn’t rescue the deletion perfectly. They’ve done this recently in c elegans—looking at long term survival in the population level, and they find a huge number of apparantly harmless mutations are very detrimental at the population level.
The reason I’d say it was unlikely is just that spacers of that kind aren’t common (I don’t know of any that aren’t inside genes). If there were to sequences on either side that needed to bend around to eachother to make contact, it could be plausible, but since they selected by epigenetic marks, rather than sequence conservation, it would be odd and novel if they’d managed to perfectly delete such a spacer (actually it would be very interesting of itself.)
I think you are being confused by two things 1) The mutation I said they made was deliberately targeted to a splice site, which are constrained (though you can’t use them to identify sequences because they are very small, and so occur randomly outside functional sequence all the time) 2) You are thinking too simplistically about sequence constraint. RNA folds by wrapping up and forming helices with itself, so the effect of a mutation is dependent on the rest of the sequence. Each mutation releases constraint on other base pairs, and introduces it to others. So as this sequence wanders through sequence space it does so in a way that preserves relationships, not absolute sequence. From it’s current position in sequence space, many mutations would be detrimental. But those residues may get the chance to mutate later on, when other residues have relieved them. This applies to proteins as well by the way. Proteins are far more conserved in 3d shape than in 2d sequence.
The DNA in the zebrafish was deleted, and the human version was inserted later, without affecting the main DNA (probably using a “plasmid”). Without the human DNA “insert”, there was a developmental defect. with either the human DNA insert or the original zebrafish DNA (as an insert), there was no developmental defect, leading to the conclusion that the human version is functionally equivalent to the zebrafish version.
How do we know whether, by replacing the insert with a random sequence of base pairs the same length, there would be no developmental defect either?
There are several complications addressed in the article, which I did not describe. Anyway, using a “control vector” is considered trivial, and I believe they checked this.
That’s true of protein coding sequence, but things are a little bit more difficult for regulatory DNA because
1)Regulatory DNA is under MUCH less sequence constraint—the relevant binding proteins are not individually fussy about their binding sites
2)Regulatory Networks have a lot of redundancy
3)Regulatory Mutations can be much more easily compensated for by other mutations—because we’re dealing with analog networks, rather than strings of amino acids.
Regulatory evolution is an immature field but it seems that an awful lot of change can occur in a short time. The literature is full of sequences that have an experimentally provable activity (put them on a plasmid with a reporter gene and off it goes) and yet show no conservation between species. There’s probably a lot more functional sequence that won’t just work on it’s own on a plasmid, or show a noticable effect from knockouts. It may be that regulatory networks are composed of a continous distribution from a few constrained elements with strong effects down to lots of unconstrained weak ones. The latter will be very, very difficult to distinguish from Junk DNA.
Data with lots of redundancy does, in a certain sense, contain a lot of junk. Junk that, although it helps reliably transmit the data, doesn’t change the meaning of the data (or doesn’t change it by much).
Yeah. What’s relevant to this discussion is complexity, not number of base pairs.
This actually isn’t necessarily true. If there is a section of the genome A that needs to act on another section of the genome C with section B in between, and A needs to act on C with a precise (or relatively so) genomic distance between them, B can neutrally evolve, even though it’s still necessary for the action of A on C, since it provides the spacing.
Thus, serving a purely structural function.
In that case the complexity in bits of B, for length N, becomes log2(N) instead of 2*N. It’s not quite 0, but it’s a lot closer.
The only definitively nonfunctional DNA is that which has been deleted. “Nonfunctional DNA” is temporarily inactive legacy code which may at any time be restored to an active role.
In the context “how complicated is a human brain?”, DNA which is currently inactive does not count towards the answer.
That said (by which I mean “what follows doesn’t seem relevant now that I’ve realised the above, but I already wrote it”),
Is inactive DNA more likely to be restored to an active role than to get deleted? I’m not sure it makes sense to consider it functional just because it might start doing something again. When you delete a file from your hard disk, it could theoretically be restored until the disk space is actually repurposed; but if you actually wanted the file around, you just wouldn’t have deleted it. That’s not a great analogy, but...
My gut says that any large section of inactive DNA is more likely to become corrupted than to become reactivated. A corruption is pretty much any mutation in that section, whereas I imagine reactivating it would require one of a small number of specific mutations.
Counterpoint: a corruption has only a small probability of becoming fixed in the population; if reactivation is helpful, that still only has a small probability of becoming fixed, but it’s a much higher small probability.
Counter-counterpoint: no particular corruption would need to be fixed in the whole population. If there are several corruptions at independent 10% penetration each, a reactivating mutation will have a hard time becoming fixed.
Here’s the concept I wanted: evolutionary capacitance.
Yeah, the most common protein (fragment) in the human genome is reverse transcriptase, which is only used by viruses (as far as we know). It just gets in there from old (generationally speaking) virus infections. But I’d still be surprised if we haven’t figured out some way to use those fragments left in there.
Why should we? It’s like postulating a human evolutionary use for acne—the zits don’t have to be useful for us, they’re already plenty useful for the bacterium that makes them happen.
Do you mean in the sense that we’ve adapted to having all this junk in our DNA, and would now get sick if it was all removed? That’s possible (though onions seem to be fine with more/less of it).
Well it takes something like 8 ATP per base pair to replicate DNA, so that’s a pretty hefty metabolic load. Which means, on average, it needs to compensate for that selection pressure somehow. The viruses in our lab will splice out a gene that doesn’t give benefit in maybe around 5 generations? Humans are much better at accurate replication, but I’d still think it would lose it fairly quickly.
I read that and thought: how much is that?
ATP may release roughly 14 kcal/mol; the actual amount varies with local conditions (heat, temperature, pressure) and chemical concentrations. An adult human body contains very roughly 50 trillion cells. However, different cells divide at very different rates. I tried to find data estimating total divisions in the body; this Wikipedia article says 10,000 trillion divisions per human lifetime. (Let one lifetime = 80 years ~~ 2.52e9 seconds).
Now, what is a trillion? I shall assume the short scale, trillion = 1e12, and weep at the state of popular scientific literature that counts in “thousands of trillions” instead of actual numbers. This means 10000e12=1e16 cell divisions per lifetime.
We then get 14e3 / 6.022e23 (Avogadro’s constant) = 2.325e-20 calories per extra base pair replication; and 1e16 / 2.52e9 = 3.968e6 cell divisions per second on average. So an extra base pair in all of your cells costs 9.23e-14 calories per day. Note those are actual calories, not the kilocalories sometimes called “calories” marked on food. Over your lifetime an extra base pair would cost 2.325e-4 calories. That’s 0.00000235 kilocalories in your lifetime. You probably can’t make a voluntary muscle movement so small that it wouldn’t burn orders of magnitude more energy.
I am by no means an expert, but I expect evolution can’t act on something that small—it would swamped by environmental differences. A purely nonfunctional extra-base-pair mutation seems to have have an evolutionary cost of pretty much zero if you only count DNA replication costs.
Well, maybe the numbers I got are wrong. Let’s calculate the cost of replicating the whole genome. That’s about 3.2 Gb (giga base pairs), so replicating this would cost 3.2e12*2.325e-20 = 7.44e-8 calories, or 0.2952 calorie/second, or 25.5 kilocalories per day. That sounds reasonable.
What do you think? 8 ATP / base pair replication seems like a tiny energy expenditure. But, I don’t know what the typical scale is for cell biology.
Well first off, I’m going entirely on memory with the 8 ATP number. I’m 90% certain it is at least that much, but 16 is also sticking in my head as a number. The best reference I can give you is probably that you get ~30 ATP per glucose molecule that you digest. (Edit: that’s for aerobic metabolism, not anaerobic. Anaerobic is more like 2 ATP per glucose molecule.)
The other thing to consider is that typically, your cell divisions are going to be concentrated in the first 1/6th of your life or so. So averaging it over 80 years may be a little disingenuous. Cells certainly still grow later in life, but they slow down a lot.
I agree splicing out a single base is not likely to generate any measurable fitness advantage. But if you have 90% of your genome that is “junk”, that’s 0.9*25.5 kcal/day, which is about 1% of a modern daily diet, and probably a much larger portion of the diet in the ancestral environment. Requiring eating 1% more food over the course of one’s lifetime seems to me like it would be significant, or at least approaching it. But what do I know, I’m just guessing, really.
Thanks for the math though, that was interesting.
Using 16 ATP instead of 8, and 80/6=13.33 years, won’t change the result significantly. It seems off by many orders of magnitude (to claim natural selection based on energy expenditure).
1% of diet is a selectable-sized difference, certainly. But the selection pressure applies to individual base pair mutations, which are conserved or lost independently of one another (ignoring locality effects etc). The total genome size, or total “junk” size, can’t generate selection pressure unless different humans have significantly different genome size. But it looks like that’s not the case.
I am confused why you believe this. Evolution need not splice out bases one base at a time. You can easily have replication errors that could splice out tens of thousands of bases at a time.
No, replication is more robust than that. I have never heard of large insertion or deletion in replication, except in highly repetitive regions (and there only dozens of bases, I think).
However, meiotic crossover is sloppy, providing the necessary variation.
Speaking of meiotic crossover, non-coding DNA provides space between coding regions, reducing the likelihood of crossover breaking them.
Meiotic crossover is what I meant, actually. Generally the polymerase itself wouldn’t skip unless the region is highly repetitive, you’re right.
...I seem to have assumed the number of BP changed by small or point mutations would make up the majority of all BP changed by mutations. (I was probably primed because you started out by talking about the energy cost per BP.) Now that you’ve pointed that out, I have no good reason for that belief. I should look for quantified sources of information.
OK, so now we need to know 1) what metabolic energy order of magnitude is big enough for selection to work, and 2) the distribution of mutation sizes. I don’t feel like looking for this info right now, maybe later. It does seem plausible that for the right values of these two variables, the metabolic costs would be big enough for selection to act against random nonfunctional mutations.
But apparently there is a large amount of nonfunctional DNA, and also I’ve read that some nonfunctional mutations are fixated by drift (i.e. selection is zero on net). That’s some evidence for my guess that some (many?) nonfunctional mutations, maybe only small ones, are too small for selection pressure due to metabolic costs to have much effect.
Yeah, I will definitely concede small ones have negligible costs. And I’m not sure the answer to 1) is known, and I doubt 2) is well quantified. A good rule of thumb for 2) though is that “if you’re asking whether or not it’s possible, it probably is”. At least that’s the rule of thumb I’ve developed from asking questions in classes.
Cool calculation, but just off the top of my head, you would also need energy for DNA repair processes, which my naive guess would be O(n) in DNA length and is constantly ongoing.
Good point. And there may well be other ways that “junk” genes are metabolically expensive. For instance real genes probably aren’t perfectly nonfunctional. Maybe they make the transcription or expression of other genes more (or less) costly, or they use up energy and materials being occasionally transcribed into nonfunctional bits of RNA or protein, or bind some factors, or who knows what else. And then selection can act on that.
But the scale just seems too small for any of that matter in most cases—because it has to matter at the scale of a single base pair, because that’s the size of a mutation and point mutations can be conserved or lost independently of one another.
What is the metabolic cost (per cell per second) scale or order of magnitude where natural selection begins to operate?