If reality cannot solve NP-hard problems as easily as proteins are being folded, and yet proteins are getting folded, then that implies that one of the following must be true:
It turns out that reality can solve NP-hard problems after all
Protein folding is not an NP-hard problem (which implies that it is not properly understood)
Reality is not solving protein folding; it merely has a very good approximation that works on some but not necessarily all proteins (including most examples found in nature)
I am not familiar whether e.g. papers like these (“We show that the protein folding problem in the two-dimensional H-P model is NP-complete.”) accurately models what we’d call “protein folding” in nature (just because the same name is used), but prima facie there is no reason to doubt the applicability, at least for the time being. (This precludes 2.)
Regarding 3, I don’t think it would make sense to say “reality is using only a good approximation of protein folding, and by the way, we define protein folding as that which occurs in nature.” That which happens in reality is precisely—and by definition not only an approximation of—that which we call “protein folding”, isn’t it?
It’s #3. (B.Sc. in biochemistry, did my Ph.D. in proteomics.)
First, the set of polypeptide sequences that have a repeatable final conformation (and therefore “work” biologically) is tiny in comparison to the set of all possible sequences (of the 20-or-so naturally amino acid monomers). Pick a random sequence of reasonable length and make many copies and you get a gummy mess. The long slow grind of evolution has done the hard work of finding useful sequences.
Second, there is an entire class of proteins called chaperones) that assist macromolecular assembly, including protein folding. Even so, folding is a stochastic process, and a certain amount of newly synthesized proteins misfold. Some chaperones will then tag the misfolded protein with ubiquitin, which puts it on a path that ends in digestion by a proteasome.
Aaronson used to blog about instances where people thought they found nature solving a hard problem very quickly, and usually there turns out to be a problem like the protein misfolding thing; the last instance I remember was soap films/bubbles perhaps solving NP problems by producing minimal Steiner trees, and Aaronson wound up experimenting with them himself. Fun stuff.
Apologies; looking back at my post, I wasn’t clear on 3.
Protein folding, as I understand it, is the process of finding a way to fold a given protein that globally minimizes some mathematical function. I’m not sure what that function is, but this is the definition that I used in my post.
Option 2 raises the possibility that globally minimizing that function is not NP-hard, but is merely misunderstood in some way.
Option 3 raises the possibility that proteins are not (in nature) finding a global minimum; rather, they are finding a local minimum through a less computationally intensive process. Furthermore, it may be that, for proteins which have certain limits on their structure and/or their initial conditions, that local minimum is the same as the global minimum; this may lead to natural selection favouring structures which use such ‘easy proteins’, leading to the incorrect impression that a general global minimum is being found (as opposed to a handy local minimum).
this may lead to natural selection favouring structures which use such ‘easy proteins’, leading to the incorrect impression that a general global minimum is being found (as opposed to a handy local minimum).
NP hard problems are solvable (in the theoretical sense) by definition, the problem lies in their resource requirements (running time, for the usual complexity classes) as defined in relation to a UTM. (You know this, just establishing a basis.)
The assumption that the universe can be perfectly described by a computable model is satisfied just by a theoretical computational description existing, it says nothing about tractability (running times) and being able to submerge complexity classes in reality fluid or having some thoroughly defined correspondence (other than when we build hardware models ourselves, for which we define all the relevant parameters, e.g. CPU clock speed).
You may think along the lines of “if reality could (easily) solve NP hard problems for arbitrarily chosen and large inputs, we could mimick that approach and thus have a P=NP proving algorithm”, or something along those lines.
My difficulty is in how even to describe the “number of computational steps” that reality takes—do we measure it in relation to some computronium-hardware model, do we take it as discrete or continuous, what’s the sampling rate, picoseconds (as CCC said further down), Planck time intervals, or what?
In short, I have no idea about the actual computing power in terms of resource requirements of the underlying reality fluid, and thus can’t match it against UTMs in order to compare running times. Maybe you can give me some pointers.
Kawoomba, there is no known case of any NP-hard or NP-complete solution which physics finds.
In the case of proteins, if finding the lowest energy minimum of an arbitrary protein is NP-hard, then what this means in practice is that some proteins will fold up into non-lowest-energy configurations. There is no known case of a quantum process which finds an NP-hard solution to anything, including an energy minimum; on our present understanding of complexity theory and quantum mechanics ‘quantum solvable’ is still looking like a smaller class than ‘NP solvable’. Read Scott Aaronson for more.
One example here is the Steiner tree problem, which is NP-complete and can sort of be solved using soap films. Bringsjord and Taylor claimed this implies that P = NP. Scott Aaronson did some experimentation and found that soap films 1) can get stuck at local minima and 2) might take a long time to settle into a good configuration.
Heh. I remember that one, and thinking, “No… no, you can’t possibly do that using a soap bubble, that’s not even quantum and you can’t do that in classical, how would the soap molecules know where to move?”
Well. I mean, it’s quantum. But the ground state is a lump of iron, or maybe a black hole, not a low-energy soap film, so I don’t think waiting for quantum annealing will help.
I saw someone do the experiment once (school science project). Soap bubbles are pretty good at solving three- and four-element cases, as long as you make sure that all the points are actually connected.
I don’t think that three- and four-element cases have local minima, do they? That avoids (1) and a bit of gentle shaking can help speed up (2).
In the case of proteins, if finding the lowest energy minimum of an arbitrary protein is NP-hard, then what this means in practice is that some proteins will fold up into non-lowest-energy configurations.
My difficulty is in how even to describe the “number of computational steps” that reality takes
Probably the best way is to simply define a “step” in some easily measurable way, and then sit there with a stopwatch and try a few experiments. (For protein folding, the ‘stopwatch’ may need to be a fairly sophisticated piece of scientific timing instrumentation, of course, and observing the protein as it folds is rather tricky).
Another way is to take advantage of the universal speed limit to get a theoretical upper bound to the speed that reality runs at; assume that the protein molecule folds in a brute-force search pattern that never ends until it hits the right state, and assume that at any point in that process, the fastest-moving part of the molecule moves at the speed of light (it won’t have to move far, which helps) and that the sudden, intense acceleration doesn’t hurt the molecule. It’s pretty certain to be slower than that, so if this calculation says it takes longer than an hour, then it’s pretty clear that the brute force approach is not what the protein is using.
The brute-force solution, if sampling conformations at picosecond rates, has been estimated to require a time longer than the age of the universe to fold certain proteins. Yet proteins fold on a millisecond scale or faster.
That requires that the proteins fold more or less randomly, and that the brute-force algorithm is in the -folding-, rather than the development of mechanisms which force certain foldings.
In order for the problem to hold, one of three things has to hold true:
1.) The proteins fold randomly (evidence suggests otherwise, as mentioned in the wikipedia link)
2.) Only a tiny subset of possible forced foldings are useful (that is, if there are a billion different ways for protein to be forced to fold in a particular manner, only one of them does what the body needs them to do) - AND anthropic reasoning isn’t valid (that is, we can’t say that our existence requires that evolution solved this nearly-impossible-to-arrive-at-through-random-processes)
3.) The majority of possible forced holdings are incompatible (that is, if protein A folds one way, then protein B -must- fold in a particular manner, or life isn’t possible) - AND anthropic reasoning isn’t valid
ETA: If anthropic reasoning is valid AND either 2 or 3 hold otherwise, it suggests our existence was considerably less likely than we might otherwise expect.
That requires that the proteins fold more or less randomly, and that the brute-force algorithm is in the -folding-, rather than the development of mechanisms which force certain foldings.
Ah. I apologise for having misunderstood you.
In that case, yes, the mechanisms for the folding may very well have developed by a brute-force type algorithm, for all I know. (Which, on this topic, isn’t all that much) But… what are those mechanisms?
Google has pointed me to an article describing an algorithm that can apparently predict folded protein shapes pretty quickly (a few minutes on a single laptop).
Original paper here. From a quick glance, it looks like it’s only effective for certain types of protein chains.
Protein folding models must be inaccurate if they are NP-hard. Reality itself is not known to be able to solve NP-hard problems.
Yet the proteins are folding. Is that not “reality” solving the problem?
If reality cannot solve NP-hard problems as easily as proteins are being folded, and yet proteins are getting folded, then that implies that one of the following must be true:
It turns out that reality can solve NP-hard problems after all
Protein folding is not an NP-hard problem (which implies that it is not properly understood)
Reality is not solving protein folding; it merely has a very good approximation that works on some but not necessarily all proteins (including most examples found in nature)
Yes, and I’m leaning towards 1.
I am not familiar whether e.g. papers like these (“We show that the protein folding problem in the two-dimensional H-P model is NP-complete.”) accurately models what we’d call “protein folding” in nature (just because the same name is used), but prima facie there is no reason to doubt the applicability, at least for the time being. (This precludes 2.)
Regarding 3, I don’t think it would make sense to say “reality is using only a good approximation of protein folding, and by the way, we define protein folding as that which occurs in nature.” That which happens in reality is precisely—and by definition not only an approximation of—that which we call “protein folding”, isn’t it?
What do you think?
It’s #3. (B.Sc. in biochemistry, did my Ph.D. in proteomics.)
First, the set of polypeptide sequences that have a repeatable final conformation (and therefore “work” biologically) is tiny in comparison to the set of all possible sequences (of the 20-or-so naturally amino acid monomers). Pick a random sequence of reasonable length and make many copies and you get a gummy mess. The long slow grind of evolution has done the hard work of finding useful sequences.
Second, there is an entire class of proteins called chaperones) that assist macromolecular assembly, including protein folding. Even so, folding is a stochastic process, and a certain amount of newly synthesized proteins misfold. Some chaperones will then tag the misfolded protein with ubiquitin, which puts it on a path that ends in digestion by a proteasome.
Thank you, Cyan. It’s good to occasionally get someone into the debate who actually has a good understanding of the subject matter.
Aaronson used to blog about instances where people thought they found nature solving a hard problem very quickly, and usually there turns out to be a problem like the protein misfolding thing; the last instance I remember was soap films/bubbles perhaps solving NP problems by producing minimal Steiner trees, and Aaronson wound up experimenting with them himself. Fun stuff.
Apologies; looking back at my post, I wasn’t clear on 3.
Protein folding, as I understand it, is the process of finding a way to fold a given protein that globally minimizes some mathematical function. I’m not sure what that function is, but this is the definition that I used in my post.
Option 2 raises the possibility that globally minimizing that function is not NP-hard, but is merely misunderstood in some way.
Option 3 raises the possibility that proteins are not (in nature) finding a global minimum; rather, they are finding a local minimum through a less computationally intensive process. Furthermore, it may be that, for proteins which have certain limits on their structure and/or their initial conditions, that local minimum is the same as the global minimum; this may lead to natural selection favouring structures which use such ‘easy proteins’, leading to the incorrect impression that a general global minimum is being found (as opposed to a handy local minimum).
Yup.
No. Not 1. It would be front-page news all over the universe if it were 1.
NP hard problems are solvable (in the theoretical sense) by definition, the problem lies in their resource requirements (running time, for the usual complexity classes) as defined in relation to a UTM. (You know this, just establishing a basis.)
The assumption that the universe can be perfectly described by a computable model is satisfied just by a theoretical computational description existing, it says nothing about tractability (running times) and being able to submerge complexity classes in reality fluid or having some thoroughly defined correspondence (other than when we build hardware models ourselves, for which we define all the relevant parameters, e.g. CPU clock speed).
You may think along the lines of “if reality could (easily) solve NP hard problems for arbitrarily chosen and large inputs, we could mimick that approach and thus have a P=NP proving algorithm”, or something along those lines.
My difficulty is in how even to describe the “number of computational steps” that reality takes—do we measure it in relation to some computronium-hardware model, do we take it as discrete or continuous, what’s the sampling rate, picoseconds (as CCC said further down), Planck time intervals, or what?
In short, I have no idea about the actual computing power in terms of resource requirements of the underlying reality fluid, and thus can’t match it against UTMs in order to compare running times. Maybe you can give me some pointers.
Kawoomba, there is no known case of any NP-hard or NP-complete solution which physics finds.
In the case of proteins, if finding the lowest energy minimum of an arbitrary protein is NP-hard, then what this means in practice is that some proteins will fold up into non-lowest-energy configurations. There is no known case of a quantum process which finds an NP-hard solution to anything, including an energy minimum; on our present understanding of complexity theory and quantum mechanics ‘quantum solvable’ is still looking like a smaller class than ‘NP solvable’. Read Scott Aaronson for more.
One example here is the Steiner tree problem, which is NP-complete and can sort of be solved using soap films. Bringsjord and Taylor claimed this implies that P = NP. Scott Aaronson did some experimentation and found that soap films 1) can get stuck at local minima and 2) might take a long time to settle into a good configuration.
Heh. I remember that one, and thinking, “No… no, you can’t possibly do that using a soap bubble, that’s not even quantum and you can’t do that in classical, how would the soap molecules know where to move?”
Well. I mean, it’s quantum. But the ground state is a lump of iron, or maybe a black hole, not a low-energy soap film, so I don’t think waiting for quantum annealing will help.
waves soap-covered wire so it settles into low-energy minimum
dies as it turns into iron
I also seem to recall Penrose hypothesizing something about quasicrystals, though he does have an axe to grind so I’m quite sceptical.
I saw someone do the experiment once (school science project). Soap bubbles are pretty good at solving three- and four-element cases, as long as you make sure that all the points are actually connected.
I don’t think that three- and four-element cases have local minima, do they? That avoids (1) and a bit of gentle shaking can help speed up (2).
Yup.
Probably the best way is to simply define a “step” in some easily measurable way, and then sit there with a stopwatch and try a few experiments. (For protein folding, the ‘stopwatch’ may need to be a fairly sophisticated piece of scientific timing instrumentation, of course, and observing the protein as it folds is rather tricky).
Another way is to take advantage of the universal speed limit to get a theoretical upper bound to the speed that reality runs at; assume that the protein molecule folds in a brute-force search pattern that never ends until it hits the right state, and assume that at any point in that process, the fastest-moving part of the molecule moves at the speed of light (it won’t have to move far, which helps) and that the sudden, intense acceleration doesn’t hurt the molecule. It’s pretty certain to be slower than that, so if this calculation says it takes longer than an hour, then it’s pretty clear that the brute force approach is not what the protein is using.
What exactly am I missing in this argument? Evolution is perfectly capable of brute-force solutions. That’s pretty much what it’s best at.
The brute-force solution, if sampling conformations at picosecond rates, has been estimated to require a time longer than the age of the universe to fold certain proteins. Yet proteins fold on a millisecond scale or faster.
See: Levinthal’s paradox.
That requires that the proteins fold more or less randomly, and that the brute-force algorithm is in the -folding-, rather than the development of mechanisms which force certain foldings.
In order for the problem to hold, one of three things has to hold true: 1.) The proteins fold randomly (evidence suggests otherwise, as mentioned in the wikipedia link) 2.) Only a tiny subset of possible forced foldings are useful (that is, if there are a billion different ways for protein to be forced to fold in a particular manner, only one of them does what the body needs them to do) - AND anthropic reasoning isn’t valid (that is, we can’t say that our existence requires that evolution solved this nearly-impossible-to-arrive-at-through-random-processes) 3.) The majority of possible forced holdings are incompatible (that is, if protein A folds one way, then protein B -must- fold in a particular manner, or life isn’t possible) - AND anthropic reasoning isn’t valid
ETA: If anthropic reasoning is valid AND either 2 or 3 hold otherwise, it suggests our existence was considerably less likely than we might otherwise expect.
Ah. I apologise for having misunderstood you.
In that case, yes, the mechanisms for the folding may very well have developed by a brute-force type algorithm, for all I know. (Which, on this topic, isn’t all that much) But… what are those mechanisms?
Google has pointed me to an article describing an algorithm that can apparently predict folded protein shapes pretty quickly (a few minutes on a single laptop).
Original paper here. From a quick glance, it looks like it’s only effective for certain types of protein chains.
That too. Even NP-hard problems are often easy if you get the choice of which one to solve.