Out of curiosity, has the protein problem yet been mathematically formalized so that it can be handed over to computers? That is, do we understand molecular dynamics well enough to automatically discover all possible proteins, starting from the known ones?
We could list them in order, if that’s what you mean. It would be a Library of Babel. Could we determine their structures? No. Need much much more computational power, for starters.
That’s what I intended to refer to—the tertiary structure. Have they thrown the full mathematical toolbox at it? I’ve heard that predicting it is NP-complete, in that it’s order-dependent, etc., but have they thrown all the data-mining tricks at it to find out whatever additional regularity constrains the folding so as to make it less computationally intensive?
The reason I don’t immediately assume the best minds have already considered this, is that there seems to be some disconnect between the biological and mathematical communities—just recently I heard that they just got around to using the method of adjacency matrix eigenvectors (i.e. what Google’s PageRank uses) for identifying critical species in ecosystems.
Predicting the ground state of a protein is NP-hard. But nature can’t solve NP-hard problems either, so predicting what actually happens when a protein folds is merely in BQP.
I would expect most proteins found in natural organisms to be in some sense easy instances of the protein folding problem, i.e. that the BQP method finds the ground state. Because the alternative is getting stuck in local minima, which probably means it doesn’t fold consistently to the same shape, which is probably an evolutionary disadvantage. But if there are any remaining differences, then for the purpose of protein structure prediction it’s actually the local minimum that’s the right answer, and the NP problem that doesn’t get solved is irrelevant.
And yes there are quantum simulation people hard at work on the problem, so it’s not just biologists. But I don’t know enough of the details to say whether they’ve exhausted the conventional toolbox of heavy-duty math yet.
But nature can’t solve NP-hard problems in general either, so predicting what actually happens when a protein folds is merely in BQP.
That explains why I’ve seen descriptions of folding prediction algorithms that run in polynomial time, on the order of n^5 or less with n = number of amino acids in primary chain.
I wanted to add that many proteins found in nature require chaperones to fold correctly. These can be any other molecules—usually proteins, RNA-zymes, or lipids—that influence the folding process to either assist or prevent certain configurations. They can even form temporary covalent bonds with the protein being folded. (Or permanent ones; some working proteins have attached sugars, metals, other proteins, etc.) And the protein making machinery in the ribosomes has a lot of complexity as well—amino acid chains don’t just suddenly appear and start folding.
All this makes it much harder to predict the folding and action of a protein in a real cell environment. In vivo experiments can’t be replaced by calculations without simulating a big chunk of the whole cell on a molecular level.
Why do you think nature can’t solve NP-hard problems? When you dip a twisted wire with 3D structure into a dish of liquid soap and water, and pull it out and get a soap film, didn’t it just solve an NP problem?
All of the bonds and atoms in a protein are “computing” simultaneously, so the fact that the problem is NP in terms of number of molecules isn’t a problem. I don’t understand BQP & so can’t comment on that.
Incidentally, your observation about consistent folding is often right, but some proteins have functions that depending on them folding to different shapes under different conditions. Usually these shapes are similar. I don’t know if any proteins routinely fold into 2 very-different shapes.
Why do you think nature can’t solve NP-hard problems? When you dip a twisted wire with 3D structure into a dish of liquid soap and water, and pull it out and get a soap film, didn’t it just solve an NP problem?
Oh no. Ohhhhh no. Not somebody trying to claim that nature solves problems in NP in polynomial time because of bubble shape minimization again!
Like Cyan just said, nature does not solve the NP problem of a global optimal configuration. It just finds a local optimum, which is already known to be computationally easy! Here’s a reference list.
All of the bonds and atoms in a protein are “computing” simultaneously,
And there are at most N^2 of them, so that doesn’t transform exponential into tractable. It’s not even a Grover speedup (2^N → 2^(N/2)), which we do know how to get out of a quantum computer.
Out of curiosity, has the protein problem yet been mathematically formalized so that it can be handed over to computers? That is, do we understand molecular dynamics well enough to automatically discover all possible proteins, starting from the known ones?
We could list them in order, if that’s what you mean. It would be a Library of Babel. Could we determine their structures? No. Need much much more computational power, for starters.
That’s what I intended to refer to—the tertiary structure. Have they thrown the full mathematical toolbox at it? I’ve heard that predicting it is NP-complete, in that it’s order-dependent, etc., but have they thrown all the data-mining tricks at it to find out whatever additional regularity constrains the folding so as to make it less computationally intensive?
The reason I don’t immediately assume the best minds have already considered this, is that there seems to be some disconnect between the biological and mathematical communities—just recently I heard that they just got around to using the method of adjacency matrix eigenvectors (i.e. what Google’s PageRank uses) for identifying critical species in ecosystems.
Predicting the ground state of a protein is NP-hard. But nature can’t solve NP-hard problems either, so predicting what actually happens when a protein folds is merely in BQP.
I would expect most proteins found in natural organisms to be in some sense easy instances of the protein folding problem, i.e. that the BQP method finds the ground state. Because the alternative is getting stuck in local minima, which probably means it doesn’t fold consistently to the same shape, which is probably an evolutionary disadvantage. But if there are any remaining differences, then for the purpose of protein structure prediction it’s actually the local minimum that’s the right answer, and the NP problem that doesn’t get solved is irrelevant.
And yes there are quantum simulation people hard at work on the problem, so it’s not just biologists. But I don’t know enough of the details to say whether they’ve exhausted the conventional toolbox of heavy-duty math yet.
This is a nice insight.
That explains why I’ve seen descriptions of folding prediction algorithms that run in polynomial time, on the order of n^5 or less with n = number of amino acids in primary chain.
I wanted to add that many proteins found in nature require chaperones to fold correctly. These can be any other molecules—usually proteins, RNA-zymes, or lipids—that influence the folding process to either assist or prevent certain configurations. They can even form temporary covalent bonds with the protein being folded. (Or permanent ones; some working proteins have attached sugars, metals, other proteins, etc.) And the protein making machinery in the ribosomes has a lot of complexity as well—amino acid chains don’t just suddenly appear and start folding.
All this makes it much harder to predict the folding and action of a protein in a real cell environment. In vivo experiments can’t be replaced by calculations without simulating a big chunk of the whole cell on a molecular level.
Why do you think nature can’t solve NP-hard problems? When you dip a twisted wire with 3D structure into a dish of liquid soap and water, and pull it out and get a soap film, didn’t it just solve an NP problem?
All of the bonds and atoms in a protein are “computing” simultaneously, so the fact that the problem is NP in terms of number of molecules isn’t a problem. I don’t understand BQP & so can’t comment on that.
Incidentally, your observation about consistent folding is often right, but some proteins have functions that depending on them folding to different shapes under different conditions. Usually these shapes are similar. I don’t know if any proteins routinely fold into 2 very-different shapes.
Oh no. Ohhhhh no. Not somebody trying to claim that nature solves problems in NP in polynomial time because of bubble shape minimization again!
Like Cyan just said, nature does not solve the NP problem of a global optimal configuration. It just finds a local optimum, which is already known to be computationally easy! Here’s a reference list.
The more convoluted the wire structure, the more likely the soap film is to be in a stable sub-optimal configuration.
And there are at most N^2 of them, so that doesn’t transform exponential into tractable. It’s not even a Grover speedup (2^N → 2^(N/2)), which we do know how to get out of a quantum computer.