Finding new useful proteins tends to occur via point mutations so that can’t be NP-complete either.
This does not follow. It may be that finding new useful proteins just takes a very long time and is very inefficient. The rest of your comment seems correct though.
Actually, even that understates the argument. If you can take a 20,000 base sequence and get something useful by point-perturbing at least one of the 20,000 bases in the sequence, then whatever just happened was 50 lightyears from being NP-hard—you only had to search through 19 variants on each of 20,000 cases.
Huh? How does this argument work? That doesn’t mean that evolution can’t happen in real life, it would be a reason to think that evolution is very slow (which it is!) or that evolution is missing a lot of interesting proteins (which seems plausible).
If you can take a 20,000 base sequence and get something useful by point-perturbing at least one of the 20,000 bases in the sequence, then whatever just happened was 50 lightyears from being NP-hard—you only had to search through 19 variants on each of 20,000 cases.
I’m not sure I follow your logic. Are you arguing that because log 20,000 <19? Yes, you can check every possible position in a base sequence this way, but there are still a lot more proteins than those 19. One doesn’t get something special from just changing a specific base. Moreover, even if something interesting does happen for changing a specific one, it might not happen if one changes some other one.
missing a lot of interesting proteins (which seems plausible).
Definitely, since evolution keeps introducing new interesting proteins.
That doesn’t mean that evolution can’t happen in real life, it would be a reason to think that evolution is very slow (which it is!)
But it’s not slow on a scale of e^n for even modestly large n. If you can produce millions of proteins with hundreds to thousands of amino acids in a few billion years, then approximate search for useful proteins is not inefficient like finding the lowest-energy conformation is (maybe polynomial approximation, or the base is much better, or functional chunking lets you effectively reduce n greatly...).
[...evolution is] missing a lot of interesting proteins (which seems plausible).
Definitely, since evolution keeps introducing new interesting proteins.
Wait, the fact the evolution is often introducing a interesting new proteins is evidence that evolution is missing a lot of interesting proteins? How does that follow?
Switch the scenario around: if evolution never produced interesting new proteins (anymore, after time T), would that be evidence that there are no other interesting proteins than what evolution produced?
Switch the scenario around: if evolution never produced interesting new proteins (anymore, after time T), would that be evidence that there are no other interesting proteins than what evolution produced?
Yes.
That would be evidence that the supply of interesting proteins had been exhausted, just as computer performance at tic-tac-toe and checkers has stopped improving. I don’t see where you’re coming from here.
Because evolution can’t get stuck in the domain of attraction of a local optimum? It always finds any good points?
Edit to add: Intelligent humans can quickly refactor their programs out of poor regions of designspace. Evolution must grope within its neighborhood.
2nd Edit: How about this argument:
“Evolution has stopped producing interesting new ways of flying; therefore, there are probably no other interesting ways of accomplishing flight, since after all, if there were a good way of doing it, evolution would find it.”
Point mutations aren’t the only way for new things to be produced. You can also recombine large chunks and domains together from multiple previous genes.
Hell, there are even examples of genes evolving via a frame-shift that knocks the 3-base frame of a gene off by one producing a gobbeldygook protein that selection then acts upon...
Yes, we can be definitely confident that there are more interesting proteins in the vicinity because of continuing production. We have less evidence about more distant extrapolations, although they could exist too.
It’s just that, from the context, you seemed to be making a claim about evolution’s ability to find all cool proteins, rather than just the ones within organisms’ local search neighborhood (which would thus be within evolution’s reach).
Hence why you appeared, from my reading, to be making the common mistake of attributing intelligence (and global search capabilities) to evolution, which it definitely does not have.
This insinuation was compounded by your comparison to human-intelligence-designed game algorithms, further making it sound like you attributed excessive search capability to evolution.
(And I’m a little scared, to be honest, that the linked comment got several upvotes.)
If you actually recognize the different search capabilities of evolution version more intelligent algorithms, I suggest you retract, or significantly revise, the linked comment.
20 standard amino acids, so 19 * 20,000 one-amino-acid-changed variants. If you can find something by searching 380,000 cases, it wasn’t an NP-hard problem.
EDIT: Actually, since I originally said 20,000 bases, it should just be 3 variants per base (4 standard DNA bases). I don’t know if there’s any significant 20,000-unit peptides.
20 amino acids, so 19 * 20,000 one-amino-acid-changed variants.
Ok. Got it.
If you can find something by searching 380,000 cases, it wasn’t an NP-hard problem.
This does not follow. Instances of NP-hard problems can be quite easy. For example, consider the traveling salesman with 2 cities or SAT with 2 or 3 variables. These are easy enough to solve by hand. But the general problems are still NP-hard.
Note incidentally, that part of why proteins are difficult is that even when you do have all the variations, how they behave in the actual living thing is really complicated. You can’t easily grow thousands of mice each with each small protein difference and see what happens. So there are a lot of associated problems with brute forcing this sort of thing.
Sure, but standard padding arguments can make NP-hard problems with much larger initial easy instances. It may be that there’s some general heuristic that real life, non-constructed problems never act this way, but that’s a distinct claim than what Eliezer seems to be making here.
If you can find something by searching 380,000 cases, it wasn’t an NP-hard problem.
The “NP” terminology is typically not a reference to hard problems are on an absolute scale. It’s a about how the difficulty of the problem in a specified class changes as its scale increases. So, there’s no issue with evolution solving particular instances of problems from classes of problem that are NP-hard—and talk about solving a whole class of such problems would be silly—all such classes consist of infinite sets of problems.
This does not follow. It may be that finding new useful proteins just takes a very long time and is very inefficient. The rest of your comment seems correct though.
Then evolution wouldn’t happen in real life.
Actually, even that understates the argument. If you can take a 20,000 base sequence and get something useful by point-perturbing at least one of the 20,000 bases in the sequence, then whatever just happened was 50 lightyears from being NP-hard—you only had to search through 19 variants on each of 20,000 cases.
Huh? How does this argument work? That doesn’t mean that evolution can’t happen in real life, it would be a reason to think that evolution is very slow (which it is!) or that evolution is missing a lot of interesting proteins (which seems plausible).
I’m not sure I follow your logic. Are you arguing that because log 20,000 <19? Yes, you can check every possible position in a base sequence this way, but there are still a lot more proteins than those 19. One doesn’t get something special from just changing a specific base. Moreover, even if something interesting does happen for changing a specific one, it might not happen if one changes some other one.
Definitely, since evolution keeps introducing new interesting proteins.
But it’s not slow on a scale of e^n for even modestly large n. If you can produce millions of proteins with hundreds to thousands of amino acids in a few billion years, then approximate search for useful proteins is not inefficient like finding the lowest-energy conformation is (maybe polynomial approximation, or the base is much better, or functional chunking lets you effectively reduce n greatly...).
Wait, the fact the evolution is often introducing a interesting new proteins is evidence that evolution is missing a lot of interesting proteins? How does that follow?
Switch the scenario around: if evolution never produced interesting new proteins (anymore, after time T), would that be evidence that there are no other interesting proteins than what evolution produced?
Yes.
That would be evidence that the supply of interesting proteins had been exhausted, just as computer performance at tic-tac-toe and checkers has stopped improving. I don’t see where you’re coming from here.
Because evolution can’t get stuck in the domain of attraction of a local optimum? It always finds any good points?
Edit to add: Intelligent humans can quickly refactor their programs out of poor regions of designspace. Evolution must grope within its neighborhood.
2nd Edit: How about this argument:
“Evolution has stopped producing interesting new ways of flying; therefore, there are probably no other interesting ways of accomplishing flight, since after all, if there were a good way of doing it, evolution would find it.”
Point mutations aren’t the only way for new things to be produced. You can also recombine large chunks and domains together from multiple previous genes.
Hell, there are even examples of genes evolving via a frame-shift that knocks the 3-base frame of a gene off by one producing a gobbeldygook protein that selection then acts upon...
Carl wasn’t commenting on whether it would be very strong evidence but whether it would be evidence.
Yes, we can be definitely confident that there are more interesting proteins in the vicinity because of continuing production. We have less evidence about more distant extrapolations, although they could exist too.
That makes a lot more sense.
It’s just that, from the context, you seemed to be making a claim about evolution’s ability to find all cool proteins, rather than just the ones within organisms’ local search neighborhood (which would thus be within evolution’s reach).
Hence why you appeared, from my reading, to be making the common mistake of attributing intelligence (and global search capabilities) to evolution, which it definitely does not have.
This insinuation was compounded by your comparison to human-intelligence-designed game algorithms, further making it sound like you attributed excessive search capability to evolution.
(And I’m a little scared, to be honest, that the linked comment got several upvotes.)
If you actually recognize the different search capabilities of evolution version more intelligent algorithms, I suggest you retract, or significantly revise, the linked comment.
20 standard amino acids, so 19 * 20,000 one-amino-acid-changed variants. If you can find something by searching 380,000 cases, it wasn’t an NP-hard problem.
EDIT: Actually, since I originally said 20,000 bases, it should just be 3 variants per base (4 standard DNA bases). I don’t know if there’s any significant 20,000-unit peptides.
Ok. Got it.
This does not follow. Instances of NP-hard problems can be quite easy. For example, consider the traveling salesman with 2 cities or SAT with 2 or 3 variables. These are easy enough to solve by hand. But the general problems are still NP-hard.
Note incidentally, that part of why proteins are difficult is that even when you do have all the variations, how they behave in the actual living thing is really complicated. You can’t easily grow thousands of mice each with each small protein difference and see what happens. So there are a lot of associated problems with brute forcing this sort of thing.
Removed for duplication with comment elsewhere that discussed padding.
Sure, but standard padding arguments can make NP-hard problems with much larger initial easy instances. It may be that there’s some general heuristic that real life, non-constructed problems never act this way, but that’s a distinct claim than what Eliezer seems to be making here.
The “NP” terminology is typically not a reference to hard problems are on an absolute scale. It’s a about how the difficulty of the problem in a specified class changes as its scale increases. So, there’s no issue with evolution solving particular instances of problems from classes of problem that are NP-hard—and talk about solving a whole class of such problems would be silly—all such classes consist of infinite sets of problems.