It is true that protein-coding-gene expression is a process that kind of resembles an algorithm.
That’s because it is an algorithm. What else would it be?
It is not logical operations being performed on a bit string, it is chemical reactions and catalysis in actual three dimensional space.
Of course those are logical operations being performed on a bit string, again, what else would they be? Magical uncomputable non-functions?
Your major point—which I agree with—seems to be that there are a lot of hard-to-quantify factors and influences that go into determining the result, and that a focus on just DNA does not capture those interactions. But that just means a (mathematical/algorithmic) description that merely focuses on DNA would be on some levels inadequate, that an actual complete description of a cell may take additional information. However, that doesn’t mean that more thorough description wouldn’t also be an algorithm, a program that the cell executes / that describes the cell’s behavior. It would merely be a more complex one. Even that is debatable:
I agree that a long inert DNA polymer wouldn’t do much on its own, just as the same information on some HDD wouldn’t. However, from an informational perspective, if you found a complete set of a mammoth’s DNA and assorted “codings”, would that be enough for a resourceful agent to reconstitute a mammoth specimen? If the agent is also aware of some basic data about cellular structures—certainly. But I’d reckon that given enough time, even an agent with little such knowledge would figure out much of the DNA’s purpose, and eventually be able to recreate a mammoth oocyte to fill with the DNA. If that were indeed so, that would mean that from an informational perspective, the overhead was not shown to be strictly necessary to encode the “mammoth essence”—at least most of it.
I think you could reword my point to be something like: by the time you are doing something algorithmically/computationally that really recapitulates the important things happening in a cell, you are doing something more akin to a physics simulation than to running turing machines on DNA tape. At that point, when your ‘decompression algorithm’ is physics itself, calling it an algorithm seems a little out of place.
In another post just now I wrote that a genome and its products also define a whole landscape of states, not one particular organism. I can’t help but wonder just how huge that space of living states is, and how many of them correspond to normal cell types or cell states in such a mammoth, and how intractable it would be to find that one set of states that corresponds to ‘mammoth oocyte’ and produces a self-perpetuating multicellular system.
(Qualifier so I’m not drowning in some reality fluid) I’d say that there aren’t any physical processes that cannot be completely described as algorithms.
I can’t help but wonder just how huge that space of living states is, and how many of them correspond to normal cell types or cell states in such a mammoth, and how intractable it would be to find that one set of states that corresponds to ‘mammoth oocyte’ and produces a self-perpetuating multicellular system.
I wouldn’t overestimate its additional complexity, especially given that most of it ultimately derives from the relationship of different areas in the DNA sequence itself. For the predictability of results with varying slightly different states, confer e.g. the success and predictability in results (on a viral level, not a clinical results level) from manipulating lentivirii and AAVs, see for example this NEJM paper.
No physics-level simulation needed to accurately predict what a cell will do when switching out parts of its genome.
If it were otherwise (if you think about it), the whole natural virus ecosystem itself would break down.
EDIT: A different example that come to mind: Insulin which is synthesized in a laboratory strain of Escherichia coli bacteria which has been genetically altered with recombinant DNA to produce biosynthetic human insulin. No surprises there either.
My main response there is that in those situations, you are making one small change to a pre-existing system using elements that have previously been qualitatively characterized. In the case of the viral gene therapy, it’s a case of adding to the cell a DNA construct consisting of a crippled virus that can’t actually replicate in normal cells for insertion purposes, and a promoter element that turns on a reading frame next to it in any human cellular context along with the reading frame for the gene in question which has had all the frills and splice sites removed. In the case of insulin in bacteria, it’s a case of adding to the bacteria the human insulin reading frame and a few processing enzyme reading frames, each attached to a constantly-on bacterial promoter element. The overall systems of the cells are left intact, and you are just piggybacking on them.
You can do things like this because in many living systems you have elements that have been isolated and tested, and which you can say “if I stick this in, it will do X”. That has for the most part for a long time been figured out empirically, by putting into cells elements that are whole truncated, or mutated in some way and seeing which ones work and which ones dont. These days examining their chemical structures we have physical and chemical explanations for a bunch of them and how they work and are starting to get better at predicting them in particular organismal contexts, though it’s still much much harder in multicellular creatures with huge genomes than in those with compact genomes*.
When I was saying that physics-like things were needed, I was more referring to a situation in which you do not have a pre-existing living thing and are trying to work out what an element does only from its sequence. When you can test things in the correct context and start figuring out what proteins and DNA elements are important for what you can leap over this and tell what is important for what even before you really understand the physical reasons. if you were starting from just the DNA sequence and didn’t really understand what the non-DNA context for them was or possibly even how the DNA helps produce the non-DNA context, you get a much less tractible problem.
*(It’s worth noting that the ease of analysis of noncoding elements is wildly different in different organisms. Bacteria and yeast have compact promoter elements that have DNA sequences of dozens and dozens-to-hundreds of base pairs each, often with easily identifiable protein binding sites, while in animals a promoter element can be in chunks strewn across hundreds of kilobases (though severa kilobases is more typical) and is usually defined as ‘this is the smallest piece of DNA we could include and still get it to express properly’ with only a subset of computationally predicted protein binding sites actually turning out to be functionally important. A yeast centromere element for fiber attachment to chromosomes during cell division is a precisely defined 125 base pair sequence that assembles a complex of anchoring proteins on itself, while a human centromere can be the size of an entire yeast genome and is a huge array of short repeats that might just bind the fiber anchoring proteins a little bit better than random DNA. Noncoding elements get larger and less straightforward much faster than coding elements as genome size increases.)
EDIT: as for viral ecosystems, viruses can hop from species to species because related species share a lot of cellular machinery, even going back when splits happened hundreds of millions of years ago, and the virus just has to work well enough (and will immediately start adapting to its new host). Seeing as life is more than three gigayears old though, there are indeed barriers that viruses cannot cross. You will not find a virus that can infect both a bacterium and a mammal, or a mammal and a plant. When they hop from species to species or population to population the differences can render some species resistant, or change the end behavior of the virus, and you get things like the simian immunodeficiency virus hardly affecting chimps and the only-separated-by-a-century-from-it HIV killing its human host.
if you were starting from just the DNA sequence and didn’t really understand what the non-DNA context for them was or possibly even how the DNA helps produce the non-DNA context, you get a much less tractible problem.
I can’t help but feel this is related to (what I perceive as) a vast overrating of the plausibility of uploading from cryonically-preserved brain remnants. It’s late at night and I’m still woozy from finals, but it feels like someone who’s discovered they enjoy, say, classical music without much grasp of music theory or even the knowledge of how to play any instruments figuring it can’t be too hard to just brute-force a piano riff of, say, the fourth movement of Beethoven’s 9th if they just figure out by listening which notes to play. The mistake being made is a subtler and yet more important one than simply underestimating the algorithmic complexity of the desired output.
That’s because it is an algorithm. What else would it be?
Of course those are logical operations being performed on a bit string, again, what else would they be? Magical uncomputable non-functions?
Your major point—which I agree with—seems to be that there are a lot of hard-to-quantify factors and influences that go into determining the result, and that a focus on just DNA does not capture those interactions. But that just means a (mathematical/algorithmic) description that merely focuses on DNA would be on some levels inadequate, that an actual complete description of a cell may take additional information. However, that doesn’t mean that more thorough description wouldn’t also be an algorithm, a program that the cell executes / that describes the cell’s behavior. It would merely be a more complex one. Even that is debatable:
I agree that a long inert DNA polymer wouldn’t do much on its own, just as the same information on some HDD wouldn’t. However, from an informational perspective, if you found a complete set of a mammoth’s DNA and assorted “codings”, would that be enough for a resourceful agent to reconstitute a mammoth specimen? If the agent is also aware of some basic data about cellular structures—certainly. But I’d reckon that given enough time, even an agent with little such knowledge would figure out much of the DNA’s purpose, and eventually be able to recreate a mammoth oocyte to fill with the DNA. If that were indeed so, that would mean that from an informational perspective, the overhead was not shown to be strictly necessary to encode the “mammoth essence”—at least most of it.
I think you could reword my point to be something like: by the time you are doing something algorithmically/computationally that really recapitulates the important things happening in a cell, you are doing something more akin to a physics simulation than to running turing machines on DNA tape. At that point, when your ‘decompression algorithm’ is physics itself, calling it an algorithm seems a little out of place.
In another post just now I wrote that a genome and its products also define a whole landscape of states, not one particular organism. I can’t help but wonder just how huge that space of living states is, and how many of them correspond to normal cell types or cell states in such a mammoth, and how intractable it would be to find that one set of states that corresponds to ‘mammoth oocyte’ and produces a self-perpetuating multicellular system.
In your opinion, are there any physical processes which are not algorithms?
(Qualifier so I’m not drowning in some reality fluid) I’d say that there aren’t any physical processes that cannot be completely described as algorithms.
double post
I wouldn’t overestimate its additional complexity, especially given that most of it ultimately derives from the relationship of different areas in the DNA sequence itself. For the predictability of results with varying slightly different states, confer e.g. the success and predictability in results (on a viral level, not a clinical results level) from manipulating lentivirii and AAVs, see for example this NEJM paper.
No physics-level simulation needed to accurately predict what a cell will do when switching out parts of its genome.
If it were otherwise (if you think about it), the whole natural virus ecosystem itself would break down.
EDIT: A different example that come to mind: Insulin which is synthesized in a laboratory strain of Escherichia coli bacteria which has been genetically altered with recombinant DNA to produce biosynthetic human insulin. No surprises there either.
My main response there is that in those situations, you are making one small change to a pre-existing system using elements that have previously been qualitatively characterized. In the case of the viral gene therapy, it’s a case of adding to the cell a DNA construct consisting of a crippled virus that can’t actually replicate in normal cells for insertion purposes, and a promoter element that turns on a reading frame next to it in any human cellular context along with the reading frame for the gene in question which has had all the frills and splice sites removed. In the case of insulin in bacteria, it’s a case of adding to the bacteria the human insulin reading frame and a few processing enzyme reading frames, each attached to a constantly-on bacterial promoter element. The overall systems of the cells are left intact, and you are just piggybacking on them.
You can do things like this because in many living systems you have elements that have been isolated and tested, and which you can say “if I stick this in, it will do X”. That has for the most part for a long time been figured out empirically, by putting into cells elements that are whole truncated, or mutated in some way and seeing which ones work and which ones dont. These days examining their chemical structures we have physical and chemical explanations for a bunch of them and how they work and are starting to get better at predicting them in particular organismal contexts, though it’s still much much harder in multicellular creatures with huge genomes than in those with compact genomes*.
When I was saying that physics-like things were needed, I was more referring to a situation in which you do not have a pre-existing living thing and are trying to work out what an element does only from its sequence. When you can test things in the correct context and start figuring out what proteins and DNA elements are important for what you can leap over this and tell what is important for what even before you really understand the physical reasons. if you were starting from just the DNA sequence and didn’t really understand what the non-DNA context for them was or possibly even how the DNA helps produce the non-DNA context, you get a much less tractible problem.
*(It’s worth noting that the ease of analysis of noncoding elements is wildly different in different organisms. Bacteria and yeast have compact promoter elements that have DNA sequences of dozens and dozens-to-hundreds of base pairs each, often with easily identifiable protein binding sites, while in animals a promoter element can be in chunks strewn across hundreds of kilobases (though severa kilobases is more typical) and is usually defined as ‘this is the smallest piece of DNA we could include and still get it to express properly’ with only a subset of computationally predicted protein binding sites actually turning out to be functionally important. A yeast centromere element for fiber attachment to chromosomes during cell division is a precisely defined 125 base pair sequence that assembles a complex of anchoring proteins on itself, while a human centromere can be the size of an entire yeast genome and is a huge array of short repeats that might just bind the fiber anchoring proteins a little bit better than random DNA. Noncoding elements get larger and less straightforward much faster than coding elements as genome size increases.)
EDIT: as for viral ecosystems, viruses can hop from species to species because related species share a lot of cellular machinery, even going back when splits happened hundreds of millions of years ago, and the virus just has to work well enough (and will immediately start adapting to its new host). Seeing as life is more than three gigayears old though, there are indeed barriers that viruses cannot cross. You will not find a virus that can infect both a bacterium and a mammal, or a mammal and a plant. When they hop from species to species or population to population the differences can render some species resistant, or change the end behavior of the virus, and you get things like the simian immunodeficiency virus hardly affecting chimps and the only-separated-by-a-century-from-it HIV killing its human host.
I can’t help but feel this is related to (what I perceive as) a vast overrating of the plausibility of uploading from cryonically-preserved brain remnants. It’s late at night and I’m still woozy from finals, but it feels like someone who’s discovered they enjoy, say, classical music without much grasp of music theory or even the knowledge of how to play any instruments figuring it can’t be too hard to just brute-force a piano riff of, say, the fourth movement of Beethoven’s 9th if they just figure out by listening which notes to play. The mistake being made is a subtler and yet more important one than simply underestimating the algorithmic complexity of the desired output.