Thanks for sharing! I hope this post doesn’t split the conversation into too many directions for you (Luke and Ben) to respond to and that all commenters will do their best to be polite, address issues directly and clearly label what’s from intuition and what’s shown from argument.
Ben wrote:
Steve’s argument for the inevitability of these drives in AIs is based on evolutionary ideas, and would seem to hold up in the case that there is a population of distinct AIs competing for resources—but the argument seems to fall apart in the case of other possibilities like an AGI mindplex
(For reference, we’re talking about this paper and the AI drives it lists are (1) AIs will want to self improve, (2) AIs will want to be rational, (3) AIs will try to preserve their utility functions, (4) AIs will try to prevent counterfeit utility, (5) AIs will be self-protective, and (6) AIs will want to acquire resources and use them efficiently.)
I don’t think it’s true that this depends on evolutionary ideas. Rather, these all seem to follow from the definitions of intelligence and goals. Consider the first drive, self-improvement. Whatever goal(s) the AI has, it knows in the future it’ll be trying to make those goals happen, and that those future attempts will be more effective if it’s smarter. That’s barely more than a tautology, but it’s enough to show that it’ll think self-improvement is good. Now, it might be that making itself smarter is too difficult/slow/expensive, in which case it’ll be seen as good, but not prioritized.
That part’s pretty rigorous. The intuitive continuation to that, is that I think AIs will find self-improvement to be be cheap and easy, at least until it’s well above human level. That part depends on what sort of refinements to the algorithm are available, after it’s demonstrated human-level intelligence, which depends how deep into the pool of possible refinements the human developers got, and how many new ideas are suggested by the proof of concept. It also depends on how useful additional hardware is, whether Moore’s Law is still going, and whether the AI’s algorithms can benefit from moving along the CPU->GPU->ASIC axis.
The second drive (AIs will want to be rational) is subject to the same debate about whether humans want to be rational that Ben and Luke had earlier. My perspective is that all the cases where rationality appears to fail, are cases where it’s being misapplied; and while there quite a few different definitions of “rationality” in play, the one Omohundro and Eliezer use is more or less that “rationality is whatever wins”. Under that definition, the claim that AIs will want to be rational is tautological. The tricky part, and I think this is where your disagreement comes from, is that rationality is also sometimes defined to mean “using symbolic reasoning instead of intuition”, and it is sometimes claimed that these two definitions are the same (ie it is claimed that symbolic reasoning wins and intuition doesn’t). My own perspective is that an AI will probably need something analogous to intuition.
Similar arguments apply to the other AI drives, but this comment is getting long. Getting back to the idea that an evolutionary context with competing AIs would matter—if some AIs had Omohundro’s drives and some didn’t, then competition would filter out all the ones that didn’t. But the argument here is that all AIs that can reason about goals will either have, or choose to give themselves, those same drives. That would mean no evolutionary filter is necessary.
“Winning” refers to achieving whatever ends the AI wants. If the AI does not want anything, it can’t be at all successful at it, and is therefore not intelligent.
Not quite. Notice that the word “win” here is mapping onto a lot of different meanings- the one used in the grandparent and great-grandparent (unless I misunderstood it) is “the satisfaction of goals.” What one means by “goals” is not entirely clear- if I build a bacterium whose operation results in the construction of more bacterium, is it appropriate to claim it has “goals” in the same sense that a human has “goals”? A readily visible difference is that the human’s goals are accessible to introspection, whereas the bacterium’s aren’t, and whether or not that difference is material depends on what you want to use the word “goals” for.
The meaning for “win” that I’m inferring from the parent is “dominate,” which is different from “has goals and uses reason to perform better at fulfilling those goals.” One can imagine a setup in which an AI without explicit goals can defeat an AI with explicit goals. (The tautology is preserved because one can say afterwards that it was clearly irrational to have explicit goals, but I mostly wanted to point out another wrinkle that should be considered rather than knock down the tautology.)
Right—what I’m saying wasn’t true under all circumstances, and there are certainly criteria for “winning” other than domination.
What I meant was that as soon as you introduce an AI into the system that has domination as a goal or subgoal, it will tend to wipe out any other AIs that don’t have some kind of drive to win. If an AI can be persuaded to be indifferent about the future then the dominating AI can choose to exploit that.
Whatever goal(s) the AI has, it knows in the future it’ll be trying to make those goals happen, and that those future attempts will be more effective if it’s smarter.
This isn’t true. You can adjust the strength of modern chess software. There are many reasons for why an AI is not going to attempt to become as intelligent as possible. But the most important reason is that it won’t care if you don’t make it care.
The intuitive continuation to that, is that I think AIs will find self-improvement to be be cheap and easy, at least until it’s well above human level.
I am seriously unable to see how anyone could come to believe this.
This isn’t true. You can adjust the strength of modern chess software. There are many reasons for why an AI is not going to attempt to become as intelligent as possible. But the most important reason is that it won’t care if you don’t make it care.
Thanks for sharing! I hope this post doesn’t split the conversation into too many directions for you (Luke and Ben) to respond to and that all commenters will do their best to be polite, address issues directly and clearly label what’s from intuition and what’s shown from argument.
Ben wrote:
(For reference, we’re talking about this paper and the AI drives it lists are (1) AIs will want to self improve, (2) AIs will want to be rational, (3) AIs will try to preserve their utility functions, (4) AIs will try to prevent counterfeit utility, (5) AIs will be self-protective, and (6) AIs will want to acquire resources and use them efficiently.)
I don’t think it’s true that this depends on evolutionary ideas. Rather, these all seem to follow from the definitions of intelligence and goals. Consider the first drive, self-improvement. Whatever goal(s) the AI has, it knows in the future it’ll be trying to make those goals happen, and that those future attempts will be more effective if it’s smarter. That’s barely more than a tautology, but it’s enough to show that it’ll think self-improvement is good. Now, it might be that making itself smarter is too difficult/slow/expensive, in which case it’ll be seen as good, but not prioritized.
That part’s pretty rigorous. The intuitive continuation to that, is that I think AIs will find self-improvement to be be cheap and easy, at least until it’s well above human level. That part depends on what sort of refinements to the algorithm are available, after it’s demonstrated human-level intelligence, which depends how deep into the pool of possible refinements the human developers got, and how many new ideas are suggested by the proof of concept. It also depends on how useful additional hardware is, whether Moore’s Law is still going, and whether the AI’s algorithms can benefit from moving along the CPU->GPU->ASIC axis.
The second drive (AIs will want to be rational) is subject to the same debate about whether humans want to be rational that Ben and Luke had earlier. My perspective is that all the cases where rationality appears to fail, are cases where it’s being misapplied; and while there quite a few different definitions of “rationality” in play, the one Omohundro and Eliezer use is more or less that “rationality is whatever wins”. Under that definition, the claim that AIs will want to be rational is tautological. The tricky part, and I think this is where your disagreement comes from, is that rationality is also sometimes defined to mean “using symbolic reasoning instead of intuition”, and it is sometimes claimed that these two definitions are the same (ie it is claimed that symbolic reasoning wins and intuition doesn’t). My own perspective is that an AI will probably need something analogous to intuition.
Similar arguments apply to the other AI drives, but this comment is getting long. Getting back to the idea that an evolutionary context with competing AIs would matter—if some AIs had Omohundro’s drives and some didn’t, then competition would filter out all the ones that didn’t. But the argument here is that all AIs that can reason about goals will either have, or choose to give themselves, those same drives. That would mean no evolutionary filter is necessary.
Well, unless their values say to do otherwise...
Yes—Ben is not correct about this—Universal Instrumental Values are not a product of evolution.
Do we have a guarantee that AIs will want to win?
“Winning” refers to achieving whatever ends the AI wants. If the AI does not want anything, it can’t be at all successful at it, and is therefore not intelligent.
If you create a bunch of sufficiently powerful AIs then whichever one is left after a few years is the one which wanted to win.
Not quite. Notice that the word “win” here is mapping onto a lot of different meanings- the one used in the grandparent and great-grandparent (unless I misunderstood it) is “the satisfaction of goals.” What one means by “goals” is not entirely clear- if I build a bacterium whose operation results in the construction of more bacterium, is it appropriate to claim it has “goals” in the same sense that a human has “goals”? A readily visible difference is that the human’s goals are accessible to introspection, whereas the bacterium’s aren’t, and whether or not that difference is material depends on what you want to use the word “goals” for.
The meaning for “win” that I’m inferring from the parent is “dominate,” which is different from “has goals and uses reason to perform better at fulfilling those goals.” One can imagine a setup in which an AI without explicit goals can defeat an AI with explicit goals. (The tautology is preserved because one can say afterwards that it was clearly irrational to have explicit goals, but I mostly wanted to point out another wrinkle that should be considered rather than knock down the tautology.)
Right—what I’m saying wasn’t true under all circumstances, and there are certainly criteria for “winning” other than domination.
What I meant was that as soon as you introduce an AI into the system that has domination as a goal or subgoal, it will tend to wipe out any other AIs that don’t have some kind of drive to win. If an AI can be persuaded to be indifferent about the future then the dominating AI can choose to exploit that.
We have a guarantee that that universal is not true :P But it seems like a reasonable property to expect for an AI built by humans.
This isn’t true. You can adjust the strength of modern chess software. There are many reasons for why an AI is not going to attempt to become as intelligent as possible. But the most important reason is that it won’t care if you don’t make it care.
I am seriously unable to see how anyone could come to believe this.
You are confused and uninformed. Please read up on instrumental values.