Yes, this is a good point. That said, while protein folding had not been entirely solved yet, it had been greatly accelerated by projects such as FoldIt, which leverage multiple human minds working in parallel on the problem all over the world. Sure, we can’t get a perfect answer with such a distributed/human-powered approach, but a perfect answer isn’t really required in practice; all we need is an answer that has a sufficiently high chance of being correct.
If we assume that there’s nothing supernatural (or “emergent”) about human minds [1], then it is likely that the problem is at least tractable. Given the vast computational power of existing computers, it is likely that the AI would have access to at least as many computational resources as the sum of all the brains who are working on FoldIt. Given Moore’s Law, it is likely that the AI would soon surpass FoldIt, and will keep expanding its power exponentially, especially if the AI is able to recursively improve its own hardware (by using purely conventional means, at least initially).
[1] Which is an assumption that both my Nanodevil’s Advocate persona and I share.
Protein folding models are generally at least as bad as NP-hard, and some models may be worse. This means that exponential improvement is unlikely. Simply put, one probably gets diminishing marginal returns for how much one can computer further in terms of how much improvement one has already done.
If reality cannot solve NP-hard problems as easily as proteins are being folded, and yet proteins are getting folded, then that implies that one of the following must be true:
It turns out that reality can solve NP-hard problems after all
Protein folding is not an NP-hard problem (which implies that it is not properly understood)
Reality is not solving protein folding; it merely has a very good approximation that works on some but not necessarily all proteins (including most examples found in nature)
I am not familiar whether e.g. papers like these (“We show that the protein folding problem in the two-dimensional H-P model is NP-complete.”) accurately models what we’d call “protein folding” in nature (just because the same name is used), but prima facie there is no reason to doubt the applicability, at least for the time being. (This precludes 2.)
Regarding 3, I don’t think it would make sense to say “reality is using only a good approximation of protein folding, and by the way, we define protein folding as that which occurs in nature.” That which happens in reality is precisely—and by definition not only an approximation of—that which we call “protein folding”, isn’t it?
It’s #3. (B.Sc. in biochemistry, did my Ph.D. in proteomics.)
First, the set of polypeptide sequences that have a repeatable final conformation (and therefore “work” biologically) is tiny in comparison to the set of all possible sequences (of the 20-or-so naturally amino acid monomers). Pick a random sequence of reasonable length and make many copies and you get a gummy mess. The long slow grind of evolution has done the hard work of finding useful sequences.
Second, there is an entire class of proteins called chaperones) that assist macromolecular assembly, including protein folding. Even so, folding is a stochastic process, and a certain amount of newly synthesized proteins misfold. Some chaperones will then tag the misfolded protein with ubiquitin, which puts it on a path that ends in digestion by a proteasome.
Aaronson used to blog about instances where people thought they found nature solving a hard problem very quickly, and usually there turns out to be a problem like the protein misfolding thing; the last instance I remember was soap films/bubbles perhaps solving NP problems by producing minimal Steiner trees, and Aaronson wound up experimenting with them himself. Fun stuff.
Apologies; looking back at my post, I wasn’t clear on 3.
Protein folding, as I understand it, is the process of finding a way to fold a given protein that globally minimizes some mathematical function. I’m not sure what that function is, but this is the definition that I used in my post.
Option 2 raises the possibility that globally minimizing that function is not NP-hard, but is merely misunderstood in some way.
Option 3 raises the possibility that proteins are not (in nature) finding a global minimum; rather, they are finding a local minimum through a less computationally intensive process. Furthermore, it may be that, for proteins which have certain limits on their structure and/or their initial conditions, that local minimum is the same as the global minimum; this may lead to natural selection favouring structures which use such ‘easy proteins’, leading to the incorrect impression that a general global minimum is being found (as opposed to a handy local minimum).
this may lead to natural selection favouring structures which use such ‘easy proteins’, leading to the incorrect impression that a general global minimum is being found (as opposed to a handy local minimum).
NP hard problems are solvable (in the theoretical sense) by definition, the problem lies in their resource requirements (running time, for the usual complexity classes) as defined in relation to a UTM. (You know this, just establishing a basis.)
The assumption that the universe can be perfectly described by a computable model is satisfied just by a theoretical computational description existing, it says nothing about tractability (running times) and being able to submerge complexity classes in reality fluid or having some thoroughly defined correspondence (other than when we build hardware models ourselves, for which we define all the relevant parameters, e.g. CPU clock speed).
You may think along the lines of “if reality could (easily) solve NP hard problems for arbitrarily chosen and large inputs, we could mimick that approach and thus have a P=NP proving algorithm”, or something along those lines.
My difficulty is in how even to describe the “number of computational steps” that reality takes—do we measure it in relation to some computronium-hardware model, do we take it as discrete or continuous, what’s the sampling rate, picoseconds (as CCC said further down), Planck time intervals, or what?
In short, I have no idea about the actual computing power in terms of resource requirements of the underlying reality fluid, and thus can’t match it against UTMs in order to compare running times. Maybe you can give me some pointers.
Kawoomba, there is no known case of any NP-hard or NP-complete solution which physics finds.
In the case of proteins, if finding the lowest energy minimum of an arbitrary protein is NP-hard, then what this means in practice is that some proteins will fold up into non-lowest-energy configurations. There is no known case of a quantum process which finds an NP-hard solution to anything, including an energy minimum; on our present understanding of complexity theory and quantum mechanics ‘quantum solvable’ is still looking like a smaller class than ‘NP solvable’. Read Scott Aaronson for more.
One example here is the Steiner tree problem, which is NP-complete and can sort of be solved using soap films. Bringsjord and Taylor claimed this implies that P = NP. Scott Aaronson did some experimentation and found that soap films 1) can get stuck at local minima and 2) might take a long time to settle into a good configuration.
Heh. I remember that one, and thinking, “No… no, you can’t possibly do that using a soap bubble, that’s not even quantum and you can’t do that in classical, how would the soap molecules know where to move?”
Well. I mean, it’s quantum. But the ground state is a lump of iron, or maybe a black hole, not a low-energy soap film, so I don’t think waiting for quantum annealing will help.
I saw someone do the experiment once (school science project). Soap bubbles are pretty good at solving three- and four-element cases, as long as you make sure that all the points are actually connected.
I don’t think that three- and four-element cases have local minima, do they? That avoids (1) and a bit of gentle shaking can help speed up (2).
In the case of proteins, if finding the lowest energy minimum of an arbitrary protein is NP-hard, then what this means in practice is that some proteins will fold up into non-lowest-energy configurations.
My difficulty is in how even to describe the “number of computational steps” that reality takes
Probably the best way is to simply define a “step” in some easily measurable way, and then sit there with a stopwatch and try a few experiments. (For protein folding, the ‘stopwatch’ may need to be a fairly sophisticated piece of scientific timing instrumentation, of course, and observing the protein as it folds is rather tricky).
Another way is to take advantage of the universal speed limit to get a theoretical upper bound to the speed that reality runs at; assume that the protein molecule folds in a brute-force search pattern that never ends until it hits the right state, and assume that at any point in that process, the fastest-moving part of the molecule moves at the speed of light (it won’t have to move far, which helps) and that the sudden, intense acceleration doesn’t hurt the molecule. It’s pretty certain to be slower than that, so if this calculation says it takes longer than an hour, then it’s pretty clear that the brute force approach is not what the protein is using.
The brute-force solution, if sampling conformations at picosecond rates, has been estimated to require a time longer than the age of the universe to fold certain proteins. Yet proteins fold on a millisecond scale or faster.
That requires that the proteins fold more or less randomly, and that the brute-force algorithm is in the -folding-, rather than the development of mechanisms which force certain foldings.
In order for the problem to hold, one of three things has to hold true:
1.) The proteins fold randomly (evidence suggests otherwise, as mentioned in the wikipedia link)
2.) Only a tiny subset of possible forced foldings are useful (that is, if there are a billion different ways for protein to be forced to fold in a particular manner, only one of them does what the body needs them to do) - AND anthropic reasoning isn’t valid (that is, we can’t say that our existence requires that evolution solved this nearly-impossible-to-arrive-at-through-random-processes)
3.) The majority of possible forced holdings are incompatible (that is, if protein A folds one way, then protein B -must- fold in a particular manner, or life isn’t possible) - AND anthropic reasoning isn’t valid
ETA: If anthropic reasoning is valid AND either 2 or 3 hold otherwise, it suggests our existence was considerably less likely than we might otherwise expect.
That requires that the proteins fold more or less randomly, and that the brute-force algorithm is in the -folding-, rather than the development of mechanisms which force certain foldings.
Ah. I apologise for having misunderstood you.
In that case, yes, the mechanisms for the folding may very well have developed by a brute-force type algorithm, for all I know. (Which, on this topic, isn’t all that much) But… what are those mechanisms?
Google has pointed me to an article describing an algorithm that can apparently predict folded protein shapes pretty quickly (a few minutes on a single laptop).
Original paper here. From a quick glance, it looks like it’s only effective for certain types of protein chains.
Actually, as someone with background in Biology I can tell you that this is not a problem you want to approach atoms-up. It’s been tried, and our computational capabilities fell woefully short of succeeding.
I should explain what “woefully short” means, so that the answer won’t be “but can’t the AI apply more computational power than us?”. Yes, presumably it can. But the scales are immense. To explain it, I will need an analogy.
Not that long ago, I had the notion that chess could be fully solved; that is, that you could simply describe every legal position and every position possible to reach from it, without duplicates, so you could use that decision tree to play a perfect game. After all, I reasoned, it’s been done with checkers; surely it’s just a matter of getting our computational power just a little bit better, right?
First I found a clever way to minimize the amount of bits necessary to describe a board position. I think I hit 34 bytes per position or so, and I guess further optimization was possible. Then, I set out to calculate how many legal board positions there are.
I stopped trying to be accurate about it when it turned out that the answer was in the vicinity of 10^68, give or take a couple orders of magnitude. That’s about a billionth billionth of the TOTAL NUMBER OF ATOMS IN THE ENTIRE UNIVERSE. You would literally need more than our entire galaxy made into a huge database just to store the information, not to mention accessing it and computing on it.
So, not anytime soon.
Now, the problem with protein folding is, it’s even more complex than chess. At the atomic level, it’s incredibly more complex than chess. Our luck is, you don’t need to fully solve it; just like today’s computers can beat human chess players without spanning the whole planet. But they do it with heuristics, approximations, sometimes machine learning (though that just gives them more heuristics and approximations). We may one day be able to fold proteins, but we will do so by making assumptions and approximations, generating useful rules of thumb, not by modeling each atom.
Yes, I understand what “exponential complexity” means :-)
It sounds, then, like you’re on the side of kalla724 and myself (and against my Devil’s Advocate persona): the AI would not be able to develop nanotechnology (or any other world-shattering technology) without performing physical experiments out in meatspace. It could do so in theory, but in practice, the computational requirements are too high.
But this puts severe constraints on the speed with which the AI’s intelligence explosion could occur. Once it hits the limits of existing technology, it will have to take a long slog through empirical science, at human-grade speeds.
Actually, I don’t know that this means it has to perform physical experiments in order to develop nanotechnology. It is quite conceivable that all the necessary information is already out there, but we haven’t been able to connect all the dots just yet.
At some point the AI hits a wall in the knowledge it can gain without physical experiments, but there’s no good way to know how far ahead that wall is.
It is quite conceivable that all the necessary information is already out there, but we haven’t been able to connect all the dots just yet.
Wouldn’t this mean that creating fully functional self-replicating nanotechnology is just a matter of performing some thorough interdisciplinary studies (or meta-studies or whatever they are called) ? My impression was that there are currently several well-understood—yet unresolved—problems that prevent nanofactories from becoming a reality, though I could be wrong.
The way I see it, there’s no evidence that these problems require additional experimentation to resolve, rather than find an obscure piece of experimentation that has already taken place and whose relevance may not be immediately obvious.
Sure, that more experimentation is needed is probable; but by no means certain.
Thorough interdisciplinary studies may or may not lead to nanotechnology, but they’re fairly certain to lead to something new. While there are a fair number of (say) marine biologists out there, and a fair number of astronomers, there are probably rather few people who have expertise in both fields; and it’s possible that there exists some obscure unsolved problem in marine biology whose solution is obvious to someone who’s keeping up on the forefront of astronomy research. Or vice versa.
Or substitute in any other two fields of your choice.
First I found a clever way to minimize the amount of bits necessary to describe a board position. I think I hit 34 bytes per position or so, and I guess further optimization was possible.
Indeed, using a very straightforward Huffman encoding (1 bit for an for empty cell, 3 bits for pawns) you can get it down to 24 bytes for the board alone. Was an interesting puzzle.
Looking up “prior art” on the subject, you also need 2 bytes for things like “may castle”, and other more obscure rules.
There’s further optimizations you can do, but they are mostly for the average case, not the worst case.
It’s been tried, and our computational capabilities fell woefully short of succeeding.
Is that because we don’t have enough brute force, or because we don’t know what calculation to apply it to?
I would be unsurprised to learn that calculating the folding state having global minimum energy was NP-complete; but for that reason I would be surprised to learn that nature solves that problem, rather than finding a local minimum.
I don’t have a background in biology, but my impression from Wikipedia is that the tension between Anfinsen’s dogma and
Levinthal’s paradox is yet unresolved.
A-la Levinthal’s paradox, I can say that throwing a marble down a conical hollow at different angles and force can have literally trillions of possible trajectories; a-la Anfinsen’s dogma, that should not stop me from predicting that it will end up at the bottom of the cone; but I’d need to know the shape of the cone (or, more specifically, its point’s location) to determine exactly where that is—so being able to make the prediction once I know this is of no assistance for predicting the end position with a different, unknown cone.
Similarly, Eliezer is able to predict that a grandmaster chess player would be able to bring a board to a winning position against himself, even though he has no idea what moves that would entail or which of the many trillions of possible move sets the game would be comprised of.
Problems like this cannot be solved on brute force alone; you need to use attractors and heuristics to get where you want to get.
So yes, obviously nature stumbled into certain stable configurations which propelled it forward, rather than solve the problem and start designing away. But even if we can never have enough computing power to model each and every atom in each and every configuration, we might still get a good enough understanding of the general laws for designing proteins almost from scratch.
I would think it would be possible to cut the space of possible chess positions down quite a bit by only retaining those which can result from moves the AI would make, and legal moves an opponent could make in response. That is, when it becomes clear that a position is unwinnable, backtrack, and don’t keep full notes on why it’s unwinnable.
This is more or less what computers do today to win chess matches, but the space of possibilities explodes too fast; even the strongest computers can’t really keep track of more than I think 13 or 14 moves ahead, even given a long time to think.
Merely storing all the positions that are unwinnable—regardless of why they are so—would require more matter than we have in the solar system. Not to mention the efficiency of running a DB search on that...
Not to mention the efficiency of running a DB search on that...
Actually, with proper design, that can be made very quick and easy. You don’t need to store the positions; you just need to store the states (win:black, win:white, draw—two bits per state).
The trick is, you store each win/loss state in a memory address equal to the 34-byte (or however long) binary number that describes the position in question. Checking a given state is then simply a memory retrieval from a known address.
I suspect that with memory on the order of 10^70 bytes, that might involve additional complications; but you’re correct, normally this cancels out the complexity problem.
Merely storing all the positions that are unwinnable—regardless of why they are so—would require more matter than we have in the solar system. Not to mention the efficiency of running a DB search on that...
The storage space problem is insurmountable. However searching that kind of database would be extremely efficient (if the designer isn’t a moron). The search speed would have a lower bound of very close to (diameter of the sphere that can contain the database / c). Nothing more is required for search purposes than physically getting a signal to the relevant bit, and back, with only minor deviations from a straight line each way. And that is without even the most obvious optimisations.
If your chess opponent is willing to fly with you in a relativistic rocket and you only care about time elapsed from your own reference frame rather than the reference frame of the computer (or most anything else of note) you can even get down below that diameter / light speed limit, depending on your available fuel and the degree of accelleration you can survive.
Yes, this is a good point. That said, while protein folding had not been entirely solved yet, it had been greatly accelerated by projects such as FoldIt, which leverage multiple human minds working in parallel on the problem all over the world. Sure, we can’t get a perfect answer with such a distributed/human-powered approach, but a perfect answer isn’t really required in practice; all we need is an answer that has a sufficiently high chance of being correct.
If we assume that there’s nothing supernatural (or “emergent”) about human minds [1], then it is likely that the problem is at least tractable. Given the vast computational power of existing computers, it is likely that the AI would have access to at least as many computational resources as the sum of all the brains who are working on FoldIt. Given Moore’s Law, it is likely that the AI would soon surpass FoldIt, and will keep expanding its power exponentially, especially if the AI is able to recursively improve its own hardware (by using purely conventional means, at least initially).
[1] Which is an assumption that both my Nanodevil’s Advocate persona and I share.
Protein folding models are generally at least as bad as NP-hard, and some models may be worse. This means that exponential improvement is unlikely. Simply put, one probably gets diminishing marginal returns for how much one can computer further in terms of how much improvement one has already done.
Protein folding models must be inaccurate if they are NP-hard. Reality itself is not known to be able to solve NP-hard problems.
Yet the proteins are folding. Is that not “reality” solving the problem?
If reality cannot solve NP-hard problems as easily as proteins are being folded, and yet proteins are getting folded, then that implies that one of the following must be true:
It turns out that reality can solve NP-hard problems after all
Protein folding is not an NP-hard problem (which implies that it is not properly understood)
Reality is not solving protein folding; it merely has a very good approximation that works on some but not necessarily all proteins (including most examples found in nature)
Yes, and I’m leaning towards 1.
I am not familiar whether e.g. papers like these (“We show that the protein folding problem in the two-dimensional H-P model is NP-complete.”) accurately models what we’d call “protein folding” in nature (just because the same name is used), but prima facie there is no reason to doubt the applicability, at least for the time being. (This precludes 2.)
Regarding 3, I don’t think it would make sense to say “reality is using only a good approximation of protein folding, and by the way, we define protein folding as that which occurs in nature.” That which happens in reality is precisely—and by definition not only an approximation of—that which we call “protein folding”, isn’t it?
What do you think?
It’s #3. (B.Sc. in biochemistry, did my Ph.D. in proteomics.)
First, the set of polypeptide sequences that have a repeatable final conformation (and therefore “work” biologically) is tiny in comparison to the set of all possible sequences (of the 20-or-so naturally amino acid monomers). Pick a random sequence of reasonable length and make many copies and you get a gummy mess. The long slow grind of evolution has done the hard work of finding useful sequences.
Second, there is an entire class of proteins called chaperones) that assist macromolecular assembly, including protein folding. Even so, folding is a stochastic process, and a certain amount of newly synthesized proteins misfold. Some chaperones will then tag the misfolded protein with ubiquitin, which puts it on a path that ends in digestion by a proteasome.
Thank you, Cyan. It’s good to occasionally get someone into the debate who actually has a good understanding of the subject matter.
Aaronson used to blog about instances where people thought they found nature solving a hard problem very quickly, and usually there turns out to be a problem like the protein misfolding thing; the last instance I remember was soap films/bubbles perhaps solving NP problems by producing minimal Steiner trees, and Aaronson wound up experimenting with them himself. Fun stuff.
Apologies; looking back at my post, I wasn’t clear on 3.
Protein folding, as I understand it, is the process of finding a way to fold a given protein that globally minimizes some mathematical function. I’m not sure what that function is, but this is the definition that I used in my post.
Option 2 raises the possibility that globally minimizing that function is not NP-hard, but is merely misunderstood in some way.
Option 3 raises the possibility that proteins are not (in nature) finding a global minimum; rather, they are finding a local minimum through a less computationally intensive process. Furthermore, it may be that, for proteins which have certain limits on their structure and/or their initial conditions, that local minimum is the same as the global minimum; this may lead to natural selection favouring structures which use such ‘easy proteins’, leading to the incorrect impression that a general global minimum is being found (as opposed to a handy local minimum).
Yup.
No. Not 1. It would be front-page news all over the universe if it were 1.
NP hard problems are solvable (in the theoretical sense) by definition, the problem lies in their resource requirements (running time, for the usual complexity classes) as defined in relation to a UTM. (You know this, just establishing a basis.)
The assumption that the universe can be perfectly described by a computable model is satisfied just by a theoretical computational description existing, it says nothing about tractability (running times) and being able to submerge complexity classes in reality fluid or having some thoroughly defined correspondence (other than when we build hardware models ourselves, for which we define all the relevant parameters, e.g. CPU clock speed).
You may think along the lines of “if reality could (easily) solve NP hard problems for arbitrarily chosen and large inputs, we could mimick that approach and thus have a P=NP proving algorithm”, or something along those lines.
My difficulty is in how even to describe the “number of computational steps” that reality takes—do we measure it in relation to some computronium-hardware model, do we take it as discrete or continuous, what’s the sampling rate, picoseconds (as CCC said further down), Planck time intervals, or what?
In short, I have no idea about the actual computing power in terms of resource requirements of the underlying reality fluid, and thus can’t match it against UTMs in order to compare running times. Maybe you can give me some pointers.
Kawoomba, there is no known case of any NP-hard or NP-complete solution which physics finds.
In the case of proteins, if finding the lowest energy minimum of an arbitrary protein is NP-hard, then what this means in practice is that some proteins will fold up into non-lowest-energy configurations. There is no known case of a quantum process which finds an NP-hard solution to anything, including an energy minimum; on our present understanding of complexity theory and quantum mechanics ‘quantum solvable’ is still looking like a smaller class than ‘NP solvable’. Read Scott Aaronson for more.
One example here is the Steiner tree problem, which is NP-complete and can sort of be solved using soap films. Bringsjord and Taylor claimed this implies that P = NP. Scott Aaronson did some experimentation and found that soap films 1) can get stuck at local minima and 2) might take a long time to settle into a good configuration.
Heh. I remember that one, and thinking, “No… no, you can’t possibly do that using a soap bubble, that’s not even quantum and you can’t do that in classical, how would the soap molecules know where to move?”
Well. I mean, it’s quantum. But the ground state is a lump of iron, or maybe a black hole, not a low-energy soap film, so I don’t think waiting for quantum annealing will help.
waves soap-covered wire so it settles into low-energy minimum
dies as it turns into iron
I also seem to recall Penrose hypothesizing something about quasicrystals, though he does have an axe to grind so I’m quite sceptical.
I saw someone do the experiment once (school science project). Soap bubbles are pretty good at solving three- and four-element cases, as long as you make sure that all the points are actually connected.
I don’t think that three- and four-element cases have local minima, do they? That avoids (1) and a bit of gentle shaking can help speed up (2).
Yup.
Probably the best way is to simply define a “step” in some easily measurable way, and then sit there with a stopwatch and try a few experiments. (For protein folding, the ‘stopwatch’ may need to be a fairly sophisticated piece of scientific timing instrumentation, of course, and observing the protein as it folds is rather tricky).
Another way is to take advantage of the universal speed limit to get a theoretical upper bound to the speed that reality runs at; assume that the protein molecule folds in a brute-force search pattern that never ends until it hits the right state, and assume that at any point in that process, the fastest-moving part of the molecule moves at the speed of light (it won’t have to move far, which helps) and that the sudden, intense acceleration doesn’t hurt the molecule. It’s pretty certain to be slower than that, so if this calculation says it takes longer than an hour, then it’s pretty clear that the brute force approach is not what the protein is using.
What exactly am I missing in this argument? Evolution is perfectly capable of brute-force solutions. That’s pretty much what it’s best at.
The brute-force solution, if sampling conformations at picosecond rates, has been estimated to require a time longer than the age of the universe to fold certain proteins. Yet proteins fold on a millisecond scale or faster.
See: Levinthal’s paradox.
That requires that the proteins fold more or less randomly, and that the brute-force algorithm is in the -folding-, rather than the development of mechanisms which force certain foldings.
In order for the problem to hold, one of three things has to hold true: 1.) The proteins fold randomly (evidence suggests otherwise, as mentioned in the wikipedia link) 2.) Only a tiny subset of possible forced foldings are useful (that is, if there are a billion different ways for protein to be forced to fold in a particular manner, only one of them does what the body needs them to do) - AND anthropic reasoning isn’t valid (that is, we can’t say that our existence requires that evolution solved this nearly-impossible-to-arrive-at-through-random-processes) 3.) The majority of possible forced holdings are incompatible (that is, if protein A folds one way, then protein B -must- fold in a particular manner, or life isn’t possible) - AND anthropic reasoning isn’t valid
ETA: If anthropic reasoning is valid AND either 2 or 3 hold otherwise, it suggests our existence was considerably less likely than we might otherwise expect.
Ah. I apologise for having misunderstood you.
In that case, yes, the mechanisms for the folding may very well have developed by a brute-force type algorithm, for all I know. (Which, on this topic, isn’t all that much) But… what are those mechanisms?
Google has pointed me to an article describing an algorithm that can apparently predict folded protein shapes pretty quickly (a few minutes on a single laptop).
Original paper here. From a quick glance, it looks like it’s only effective for certain types of protein chains.
That too. Even NP-hard problems are often easy if you get the choice of which one to solve.
Hmm, ok, my Nanodevil’s Advocate persona doesn’t have a good answer to this one. Perhaps some SIAI folks would like to step in and pick up the slack ?
I’m afraid not.
Actually, as someone with background in Biology I can tell you that this is not a problem you want to approach atoms-up. It’s been tried, and our computational capabilities fell woefully short of succeeding.
I should explain what “woefully short” means, so that the answer won’t be “but can’t the AI apply more computational power than us?”. Yes, presumably it can. But the scales are immense. To explain it, I will need an analogy.
Not that long ago, I had the notion that chess could be fully solved; that is, that you could simply describe every legal position and every position possible to reach from it, without duplicates, so you could use that decision tree to play a perfect game. After all, I reasoned, it’s been done with checkers; surely it’s just a matter of getting our computational power just a little bit better, right?
First I found a clever way to minimize the amount of bits necessary to describe a board position. I think I hit 34 bytes per position or so, and I guess further optimization was possible. Then, I set out to calculate how many legal board positions there are.
I stopped trying to be accurate about it when it turned out that the answer was in the vicinity of 10^68, give or take a couple orders of magnitude. That’s about a billionth billionth of the TOTAL NUMBER OF ATOMS IN THE ENTIRE UNIVERSE. You would literally need more than our entire galaxy made into a huge database just to store the information, not to mention accessing it and computing on it.
So, not anytime soon.
Now, the problem with protein folding is, it’s even more complex than chess. At the atomic level, it’s incredibly more complex than chess. Our luck is, you don’t need to fully solve it; just like today’s computers can beat human chess players without spanning the whole planet. But they do it with heuristics, approximations, sometimes machine learning (though that just gives them more heuristics and approximations). We may one day be able to fold proteins, but we will do so by making assumptions and approximations, generating useful rules of thumb, not by modeling each atom.
Yes, I understand what “exponential complexity” means :-)
It sounds, then, like you’re on the side of kalla724 and myself (and against my Devil’s Advocate persona): the AI would not be able to develop nanotechnology (or any other world-shattering technology) without performing physical experiments out in meatspace. It could do so in theory, but in practice, the computational requirements are too high.
But this puts severe constraints on the speed with which the AI’s intelligence explosion could occur. Once it hits the limits of existing technology, it will have to take a long slog through empirical science, at human-grade speeds.
Actually, I don’t know that this means it has to perform physical experiments in order to develop nanotechnology. It is quite conceivable that all the necessary information is already out there, but we haven’t been able to connect all the dots just yet.
At some point the AI hits a wall in the knowledge it can gain without physical experiments, but there’s no good way to know how far ahead that wall is.
Wouldn’t this mean that creating fully functional self-replicating nanotechnology is just a matter of performing some thorough interdisciplinary studies (or meta-studies or whatever they are called) ? My impression was that there are currently several well-understood—yet unresolved—problems that prevent nanofactories from becoming a reality, though I could be wrong.
The way I see it, there’s no evidence that these problems require additional experimentation to resolve, rather than find an obscure piece of experimentation that has already taken place and whose relevance may not be immediately obvious.
Sure, that more experimentation is needed is probable; but by no means certain.
Thorough interdisciplinary studies may or may not lead to nanotechnology, but they’re fairly certain to lead to something new. While there are a fair number of (say) marine biologists out there, and a fair number of astronomers, there are probably rather few people who have expertise in both fields; and it’s possible that there exists some obscure unsolved problem in marine biology whose solution is obvious to someone who’s keeping up on the forefront of astronomy research. Or vice versa.
Or substitute in any other two fields of your choice.
Indeed, using a very straightforward Huffman encoding (1 bit for an for empty cell, 3 bits for pawns) you can get it down to 24 bytes for the board alone. Was an interesting puzzle.
Looking up “prior art” on the subject, you also need 2 bytes for things like “may castle”, and other more obscure rules.
There’s further optimizations you can do, but they are mostly for the average case, not the worst case.
I didn’t consider using 3 bits for pawns! Thanks for that :) I did account for such variables as may castle and whose turn it is.
Is that because we don’t have enough brute force, or because we don’t know what calculation to apply it to?
I would be unsurprised to learn that calculating the folding state having global minimum energy was NP-complete; but for that reason I would be surprised to learn that nature solves that problem, rather than finding a local minimum.
I don’t have a background in biology, but my impression from Wikipedia is that the tension between Anfinsen’s dogma and Levinthal’s paradox is yet unresolved.
The two are not in conflict.
A-la Levinthal’s paradox, I can say that throwing a marble down a conical hollow at different angles and force can have literally trillions of possible trajectories; a-la Anfinsen’s dogma, that should not stop me from predicting that it will end up at the bottom of the cone; but I’d need to know the shape of the cone (or, more specifically, its point’s location) to determine exactly where that is—so being able to make the prediction once I know this is of no assistance for predicting the end position with a different, unknown cone.
Similarly, Eliezer is able to predict that a grandmaster chess player would be able to bring a board to a winning position against himself, even though he has no idea what moves that would entail or which of the many trillions of possible move sets the game would be comprised of.
Problems like this cannot be solved on brute force alone; you need to use attractors and heuristics to get where you want to get.
So yes, obviously nature stumbled into certain stable configurations which propelled it forward, rather than solve the problem and start designing away. But even if we can never have enough computing power to model each and every atom in each and every configuration, we might still get a good enough understanding of the general laws for designing proteins almost from scratch.
I would think it would be possible to cut the space of possible chess positions down quite a bit by only retaining those which can result from moves the AI would make, and legal moves an opponent could make in response. That is, when it becomes clear that a position is unwinnable, backtrack, and don’t keep full notes on why it’s unwinnable.
This is more or less what computers do today to win chess matches, but the space of possibilities explodes too fast; even the strongest computers can’t really keep track of more than I think 13 or 14 moves ahead, even given a long time to think.
Merely storing all the positions that are unwinnable—regardless of why they are so—would require more matter than we have in the solar system. Not to mention the efficiency of running a DB search on that...
Actually, with proper design, that can be made very quick and easy. You don’t need to store the positions; you just need to store the states (win:black, win:white, draw—two bits per state).
The trick is, you store each win/loss state in a memory address equal to the 34-byte (or however long) binary number that describes the position in question. Checking a given state is then simply a memory retrieval from a known address.
I suspect that with memory on the order of 10^70 bytes, that might involve additional complications; but you’re correct, normally this cancels out the complexity problem.
The storage space problem is insurmountable. However searching that kind of database would be extremely efficient (if the designer isn’t a moron). The search speed would have a lower bound of very close to (diameter of the sphere that can contain the database / c). Nothing more is required for search purposes than physically getting a signal to the relevant bit, and back, with only minor deviations from a straight line each way. And that is without even the most obvious optimisations.
If your chess opponent is willing to fly with you in a relativistic rocket and you only care about time elapsed from your own reference frame rather than the reference frame of the computer (or most anything else of note) you can even get down below that diameter / light speed limit, depending on your available fuel and the degree of accelleration you can survive.