Wei Dai comments on Oversimplification when generalizing from DNA?

Wei Dai 25 Dec 2012 21:38 UTC
2 points
But physical interactions, the presence of non-DNA molecules, and the previous shape/state of the organism are integral, carry vast quantities of information, and are the context that makes the DNA represent information in the first place rather than just being an unstable polymer.

Your comment is really interesting, but I question this “vast quantities of information”. Suppose I want to implement a cell as a physics simulation and watch it live on in my computer.
- The relevant laws of physics don’t take a huge number of bits to describe or implement on a computer.
- If we take all the non-DNA molecules (excluding the proteins and RNA which are already coded for in the DNA), are there any really complex ones that would take a lot of bits to specify (compared to the DNA)? Or are there that many different kinds of molecules?
- The exact shape and state of a cell do contain a lot of information, but surely most of that is irrelevant. For example, I can probably get away with approximating the initial shape of the simulated cell as a sphere or ellipsoid (or some other simple geometric shape), which takes many fewer bits to specify than its actual shape. Same thing with the distributions of molecules inside the cell. I probably don’t need information about the exact location of each molecule, but can approximate the distributions using relatively simple density gradients.
So even after reading your comment, I think it’s likely that most of the complexity (in the sense of number of bits needed to implement a simulation) of a cell is in its genome. Am I wrong in any of my guesses above, or missing something else?
- CellBioGuy 29 Dec 2012 1:23 UTC
  3 points
  Parent
  You can implement A simulation. But that simulation having anything to do with any particular thing that has existed in the real world is harder.
  
  Physics itself is not hard. Applying it to large numbers of particles is hard.
  
  As for non-DNA molecules, there are all kinds of small molecule metabolites which are constantly being converted back and forth, some of which are very important (they bind to the big molecules, are part of metabolism, and I have seen some brand new research about particular proteins that only fold properly around a ‘nucleus’ formed by a particular 6-carbon molecule). But my main point I was trying to make was more along the lines of (addressing the third bullet):
  
  Shape is more detailed than general cell shape. There is fine structure in terms of internal fibers, distributions of molecules, impermiable barriers that segregate things, etc. Some of this, like the aforementioned membranes in bacteria, the self-perpetuating but never-made-from-scratch compartments that distill out their components from the general cell mileu, don’t necessarily have the DNA as a determinant but rather as something that sets up the circumstance in which it is stable. Other things like the amounts and simple distributions of molecules all come from pervious states and most possible distributions dont correspond to any real state (though doubtless many of them would be unstable and collapse down to one attractor or another that normally exists once you instantiate them).
  
  I have a hard time trying to think of the nature of the correspondence between these things and bits for a simulation besides positions of molecules, and I’m not sure in what context those bits are specified. A little help?
  - Wei Dai 29 Dec 2012 3:10 UTC
    2 points
    Parent
    
    I have a hard time trying to think of the nature of the correspondence between these things and bits for a simulation besides positions of molecules, and I’m not sure in what context those bits are specified. A little help?
    
    What you do is write a program that generates a set of particles and places them into the simulated cell, such that the resulting cell is viable and functionally equivalent to the original cell. Take the program and count its length in bits. If you haven’t programmed before you may not have much intuition about this. In that case think of it this way: if you have to describe the shape/internal structure/distributions (ETA: and structures) of molecules, in natural language and/or mathematical notation, in sufficient detail that someone else could create a physics simulation of the cell based on your description, how many bits would that take, and what fraction of those would be taken up by the DNA sequences?