Actually, we can guess that a piece of DNA is nonfunctional if it seems to have undergone neutral evolution (roughly, accumulation of functionally equivalent mutations) at a rate which implies that it was not subject to any noticeable positive selection pressure over evolutionary time. Leaving aside transposons, repetition, and so on, that’s a main part of how we know that large amounts of junk DNA really are junk.
There are pieces of DNA that preserve function, but undergo neutral evolution. A recent nature article found a not-protein-coding piece of DNA that is necessary for development (by being transcribed into RNA), that had underwent close to neutral evolution from zebrafish to human, but maintained functional conservation. That is, taking the human transcript and inserting it into zebrafish spares it from death, indicating that (almost) completely different DNA performs the same function, and that using simple conservation of non-neutral evolution we probably can’t detect it.
I’m having trouble working out the experimental conditions here. I take it they replaced a sequence of zebrafish DNA with its human equivalent, which seemed to have been undergoing nearly neutral selection, and didn’t observe developmental defects. But what was the condition where they did observe defects? If they just removed that section of DNA, that could suggest that some sequence is needed there but its contents are irrelevant. If they replaced it with a completely different section of DNA that seems like it would be a lot more surprising.
You are correct—given the information above it is possible (though unlikely) that the DNA was just there as a spacer between two other things and its content was irrelevant. However the study controlled for this—they also mutated the zebrafish DNA in specific places and were able to induce identical defects as with the deletion.
What’s happening here is that the DNA is transcribed into non protein-coding RNA. This RNA’s function and behavior will be determined by, but impossible to predict from, it’s sequence—you’re dealing not only with the physical process of molecular folding, which is intractable, but with its interactions with everything else in the cell, which is intractability squared. So there is content there but it’s unreadable to us and thus appears unconstrained. If we had a very large quantum computer we could perhaps find the 3d structure “encoded’ by it and its interaction partners, and would see the conservation of this 3d structure from fish to human.
That’s interesting. I guess my next question is, how confident are we that this sequence has been undergoing close-to-neutral selection?
I ask because if it has been undergoing close-to-neutral selection, that implies that almost all possible mutations in that region are fitness-neutral. (Which is why my thoughts turned to “something is necessary, but it doesn’t matter what”. When you call that unlikely, is that because there’s no known mechanism for it, or you just don’t think there was sufficient evidence for the hypothesis, or something else?) But… according to this study they’re not, which leaves me very confused. This doesn’t even feel like I just don’t know enough, it feels like something I think I know is wrong.
if it has been undergoing close-to-neutral selection, that implies that almost all possible mutations in that region are fitness-neutral.
There is no “neutral” evolution, as all DNA sequences are subject to several constraints, such as maintaining GC content and preventing promoters) from popping out needlessly. There is also large variability of mutation rates along different DNA regions. Together, this results in high variance of “neutral” mutation rate, and because of huge genome, making it (probably) impossible to detect even regions having quarter of neutral mutation rate. I think this is the case here.
This extends what zslastsman written regarding structure.
We can’t be totally confident. I’d guess that if you did a sensitive test of fitness (you’d need a big fish tank and a lot of time) you’d find the human sequence didn’t rescue the deletion perfectly. They’ve done this recently in c elegans—looking at long term survival in the population level, and they find a huge number of apparantly harmless mutations are very detrimental at the population level.
The reason I’d say it was unlikely is just that spacers of that kind aren’t common (I don’t know of any that aren’t inside genes). If there were to sequences on either side that needed to bend around to eachother to make contact, it could be plausible, but since they selected by epigenetic marks, rather than sequence conservation, it would be odd and novel if they’d managed to perfectly delete such a spacer (actually it would be very interesting of itself.)
I think you are being confused by two things
1) The mutation I said they made was deliberately targeted to a splice site, which are constrained (though you can’t use them to identify sequences because they are very small, and so occur randomly outside functional sequence all the time)
2) You are thinking too simplistically about sequence constraint. RNA folds by wrapping up and forming helices with itself, so the effect of a mutation is dependent on the rest of the sequence. Each mutation releases constraint on other base pairs, and introduces it to others. So as this sequence wanders through sequence space it does so in a way that preserves relationships, not absolute sequence. From it’s current position in sequence space, many mutations would be detrimental. But those residues may get the chance to mutate later on, when other residues have relieved them. This applies to proteins as well by the way. Proteins are far more conserved in 3d shape than in 2d sequence.
The DNA in the zebrafish was deleted, and the human version was inserted later, without affecting the main DNA (probably using a “plasmid”).
Without the human DNA “insert”, there was a developmental defect. with either the human DNA insert or the original zebrafish DNA (as an insert), there was no developmental defect, leading to the conclusion that the human version is functionally equivalent to the zebrafish version.
There are several complications addressed in the article, which I did not describe. Anyway, using a “control vector” is considered trivial, and I believe they checked this.
That’s true of protein coding sequence, but things are a little bit more difficult for regulatory DNA because
1)Regulatory DNA is under MUCH less sequence constraint—the relevant binding proteins are not individually fussy about their binding sites
2)Regulatory Networks have a lot of redundancy
3)Regulatory Mutations can be much more easily compensated for by other mutations—because we’re dealing with analog networks, rather than strings of amino acids.
Regulatory evolution is an immature field but it seems that an awful lot of change can occur in a short time. The literature is full of sequences that have an experimentally provable activity (put them on a plasmid with a reporter gene and off it goes) and yet show no conservation between species. There’s probably a lot more functional sequence that won’t just work on it’s own on a plasmid, or show a noticable effect from knockouts. It may be that regulatory networks are composed of a continous distribution from a few constrained elements with strong effects down to lots of unconstrained weak ones. The latter will be very, very difficult to distinguish from Junk DNA.
Data with lots of redundancy does, in a certain sense, contain a lot of junk. Junk that, although it helps reliably transmit the data, doesn’t change the meaning of the data (or doesn’t change it by much).
Actually, we can guess that a piece of DNA is nonfunctional if it seems to have undergone neutral evolution (roughly, accumulation of functionally equivalent mutations) at a rate which implies that it was not subject to any noticeable positive selection pressure over evolutionary time.
This actually isn’t necessarily true. If there is a section of the genome A that needs to act on another section of the genome C with section B in between, and A needs to act on C with a precise (or relatively so) genomic distance between them, B can neutrally evolve, even though it’s still necessary for the action of A on C, since it provides the spacing.
The only definitively nonfunctional DNA is that which has been deleted. “Nonfunctional DNA” is temporarily inactive legacy code which may at any time be restored to an active role.
In the context “how complicated is a human brain?”, DNA which is currently inactive does not count towards the answer.
That said (by which I mean “what follows doesn’t seem relevant now that I’ve realised the above, but I already wrote it”),
Is inactive DNA more likely to be restored to an active role than to get deleted? I’m not sure it makes sense to consider it functional just because it might start doing something again. When you delete a file from your hard disk, it could theoretically be restored until the disk space is actually repurposed; but if you actually wanted the file around, you just wouldn’t have deleted it. That’s not a great analogy, but...
My gut says that any large section of inactive DNA is more likely to become corrupted than to become reactivated. A corruption is pretty much any mutation in that section, whereas I imagine reactivating it would require one of a small number of specific mutations.
Counterpoint: a corruption has only a small probability of becoming fixed in the population; if reactivation is helpful, that still only has a small probability of becoming fixed, but it’s a much higher small probability.
Counter-counterpoint: no particular corruption would need to be fixed in the whole population. If there are several corruptions at independent 10% penetration each, a reactivating mutation will have a hard time becoming fixed.
Actually, we can guess that a piece of DNA is nonfunctional if it seems to have undergone neutral evolution (roughly, accumulation of functionally equivalent mutations) at a rate which implies that it was not subject to any noticeable positive selection pressure over evolutionary time. Leaving aside transposons, repetition, and so on, that’s a main part of how we know that large amounts of junk DNA really are junk.
There are pieces of DNA that preserve function, but undergo neutral evolution. A recent nature article found a not-protein-coding piece of DNA that is necessary for development (by being transcribed into RNA), that had underwent close to neutral evolution from zebrafish to human, but maintained functional conservation. That is, taking the human transcript and inserting it into zebrafish spares it from death, indicating that (almost) completely different DNA performs the same function, and that using simple conservation of non-neutral evolution we probably can’t detect it.
I’m having trouble working out the experimental conditions here. I take it they replaced a sequence of zebrafish DNA with its human equivalent, which seemed to have been undergoing nearly neutral selection, and didn’t observe developmental defects. But what was the condition where they did observe defects? If they just removed that section of DNA, that could suggest that some sequence is needed there but its contents are irrelevant. If they replaced it with a completely different section of DNA that seems like it would be a lot more surprising.
You are correct—given the information above it is possible (though unlikely) that the DNA was just there as a spacer between two other things and its content was irrelevant. However the study controlled for this—they also mutated the zebrafish DNA in specific places and were able to induce identical defects as with the deletion.
What’s happening here is that the DNA is transcribed into non protein-coding RNA. This RNA’s function and behavior will be determined by, but impossible to predict from, it’s sequence—you’re dealing not only with the physical process of molecular folding, which is intractable, but with its interactions with everything else in the cell, which is intractability squared. So there is content there but it’s unreadable to us and thus appears unconstrained. If we had a very large quantum computer we could perhaps find the 3d structure “encoded’ by it and its interaction partners, and would see the conservation of this 3d structure from fish to human.
That’s interesting. I guess my next question is, how confident are we that this sequence has been undergoing close-to-neutral selection?
I ask because if it has been undergoing close-to-neutral selection, that implies that almost all possible mutations in that region are fitness-neutral. (Which is why my thoughts turned to “something is necessary, but it doesn’t matter what”. When you call that unlikely, is that because there’s no known mechanism for it, or you just don’t think there was sufficient evidence for the hypothesis, or something else?) But… according to this study they’re not, which leaves me very confused. This doesn’t even feel like I just don’t know enough, it feels like something I think I know is wrong.
There is no “neutral” evolution, as all DNA sequences are subject to several constraints, such as maintaining GC content and preventing promoters) from popping out needlessly. There is also large variability of mutation rates along different DNA regions. Together, this results in high variance of “neutral” mutation rate, and because of huge genome, making it (probably) impossible to detect even regions having quarter of neutral mutation rate. I think this is the case here.
This extends what zslastsman written regarding structure.
We can’t be totally confident. I’d guess that if you did a sensitive test of fitness (you’d need a big fish tank and a lot of time) you’d find the human sequence didn’t rescue the deletion perfectly. They’ve done this recently in c elegans—looking at long term survival in the population level, and they find a huge number of apparantly harmless mutations are very detrimental at the population level.
The reason I’d say it was unlikely is just that spacers of that kind aren’t common (I don’t know of any that aren’t inside genes). If there were to sequences on either side that needed to bend around to eachother to make contact, it could be plausible, but since they selected by epigenetic marks, rather than sequence conservation, it would be odd and novel if they’d managed to perfectly delete such a spacer (actually it would be very interesting of itself.)
I think you are being confused by two things 1) The mutation I said they made was deliberately targeted to a splice site, which are constrained (though you can’t use them to identify sequences because they are very small, and so occur randomly outside functional sequence all the time) 2) You are thinking too simplistically about sequence constraint. RNA folds by wrapping up and forming helices with itself, so the effect of a mutation is dependent on the rest of the sequence. Each mutation releases constraint on other base pairs, and introduces it to others. So as this sequence wanders through sequence space it does so in a way that preserves relationships, not absolute sequence. From it’s current position in sequence space, many mutations would be detrimental. But those residues may get the chance to mutate later on, when other residues have relieved them. This applies to proteins as well by the way. Proteins are far more conserved in 3d shape than in 2d sequence.
The DNA in the zebrafish was deleted, and the human version was inserted later, without affecting the main DNA (probably using a “plasmid”). Without the human DNA “insert”, there was a developmental defect. with either the human DNA insert or the original zebrafish DNA (as an insert), there was no developmental defect, leading to the conclusion that the human version is functionally equivalent to the zebrafish version.
How do we know whether, by replacing the insert with a random sequence of base pairs the same length, there would be no developmental defect either?
There are several complications addressed in the article, which I did not describe. Anyway, using a “control vector” is considered trivial, and I believe they checked this.
That’s true of protein coding sequence, but things are a little bit more difficult for regulatory DNA because
1)Regulatory DNA is under MUCH less sequence constraint—the relevant binding proteins are not individually fussy about their binding sites
2)Regulatory Networks have a lot of redundancy
3)Regulatory Mutations can be much more easily compensated for by other mutations—because we’re dealing with analog networks, rather than strings of amino acids.
Regulatory evolution is an immature field but it seems that an awful lot of change can occur in a short time. The literature is full of sequences that have an experimentally provable activity (put them on a plasmid with a reporter gene and off it goes) and yet show no conservation between species. There’s probably a lot more functional sequence that won’t just work on it’s own on a plasmid, or show a noticable effect from knockouts. It may be that regulatory networks are composed of a continous distribution from a few constrained elements with strong effects down to lots of unconstrained weak ones. The latter will be very, very difficult to distinguish from Junk DNA.
Data with lots of redundancy does, in a certain sense, contain a lot of junk. Junk that, although it helps reliably transmit the data, doesn’t change the meaning of the data (or doesn’t change it by much).
Yeah. What’s relevant to this discussion is complexity, not number of base pairs.
This actually isn’t necessarily true. If there is a section of the genome A that needs to act on another section of the genome C with section B in between, and A needs to act on C with a precise (or relatively so) genomic distance between them, B can neutrally evolve, even though it’s still necessary for the action of A on C, since it provides the spacing.
Thus, serving a purely structural function.
In that case the complexity in bits of B, for length N, becomes log2(N) instead of 2*N. It’s not quite 0, but it’s a lot closer.
The only definitively nonfunctional DNA is that which has been deleted. “Nonfunctional DNA” is temporarily inactive legacy code which may at any time be restored to an active role.
In the context “how complicated is a human brain?”, DNA which is currently inactive does not count towards the answer.
That said (by which I mean “what follows doesn’t seem relevant now that I’ve realised the above, but I already wrote it”),
Is inactive DNA more likely to be restored to an active role than to get deleted? I’m not sure it makes sense to consider it functional just because it might start doing something again. When you delete a file from your hard disk, it could theoretically be restored until the disk space is actually repurposed; but if you actually wanted the file around, you just wouldn’t have deleted it. That’s not a great analogy, but...
My gut says that any large section of inactive DNA is more likely to become corrupted than to become reactivated. A corruption is pretty much any mutation in that section, whereas I imagine reactivating it would require one of a small number of specific mutations.
Counterpoint: a corruption has only a small probability of becoming fixed in the population; if reactivation is helpful, that still only has a small probability of becoming fixed, but it’s a much higher small probability.
Counter-counterpoint: no particular corruption would need to be fixed in the whole population. If there are several corruptions at independent 10% penetration each, a reactivating mutation will have a hard time becoming fixed.
Here’s the concept I wanted: evolutionary capacitance.