I’m talking about the fact that humans can (and sometimes do) sort of optimize the universe. Like, you can reason about the way the universe is and decide to work on causing it to be in a certain state.
So people say they have general goals, but in reality they remain human beings with various tendencies, and continue to act according to those tendencies, and only support that general goal to the extent that it’s consistent with those other behaviors.
This could very well be the case, but humans still sometimes sort of optimize the universe. Like, I’m saying it’s at least possible to sort of optimize the universe in theory, and humans do this somewhat, not that humans directly use universe-optimizing to select their actions. If a way to write universe-optimizing AGIs exists, someone is likely to find it eventually.
I think it is perfectly possible to develop an AI intelligent enough to pass the Turing Test, but which still would not have anything (not even “passing the Turing Test”) as a general goal that would take over its behavior and make it conquer the world.
I agree with this. There are some difficulties with self-modification (as elaborated in my other comment), but it seems probable that this can be done.
And I would expect the first AIs to be of this kind by default, because of the difficulty of ensuring that the whole of the AI’s activity is ordered to one particular goal.
Seems pretty plausible. Obviously it depends on what you mean by “AI”; certainly, most modern-day AIs are this way. At the same time, this is definitely not a reason to not worry about AI risk, because (a) tool AIs could still “accidentally” optimize the universe depending on how search for self-modifications and other actions happens, and (b) we can’t bet on no one figuring out how to turn a superintelligent tool AI into a universe optimizer.
I do agree with a lot of what you say: it seems like a lot of people talk about AI risk in terms of universe-optimization, when we don’t even understand how to optimize functions over the universe given infinite computational power. I do think that non-universe-optimizing AIs are under-studied, that they are somewhat likely to be the first human-level AGIs, and that they will be extraordinary useful for solving some FAI-related problems. But none of this makes the problems of AI risk go away.
Ok. I don’t think we are disagreeing here much, if at all. I’m not maintaining that there’s no risk from AI, just that the default original AI is likely not to be universe-optimizing in that way. When I said in the bet “without paying attention to Friendliness”, that did not mean without paying attention to risks, since of course programmers even now try to make their programs safe, but just that they would not try to program it to optimize everything for human goals.
Also, I don’t understand why so many people thought my side of the bet was a bad idea, when Eliezer is betting at odds of 100 to 1 against me, and in fact there are plenty of other ways I could win the bet, even if my whole theory is wrong. For example, it is not even specified in the bet that the AI has to be self-modifying, just superintelligent, so it could be that first a human level AI is constructed, not superintelligent and not self-modifying, and then people build a superintelligence simply by adding on lots of hardware. In that case it is not clear at all that it would have any fast way to take over the world, even if it had the ability and desire to optimize the universe. First it would have to acquire the ability to self-modify, which perhaps it could do by convincing people to give it that ability or by taking other actions in the external world to take over first. But that could take a while, which would mean that I would still win the bet—we would still be around acting normally with a superintelligence in the world. Of course, winning the bet wouldn’t do me much good in that particular situation, but I’d still win. And that’s just one example; I can think of plenty of other ways I could win the bet even while being wrong in theory. I don’t see how anyone can reasonably think he’s 99% certain both that my theory is wrong and that none of these other things will happen.
Do you realize you failed to specify any of that? I feel I’m being slightly generous by interpreting “and the world doesn’t end” to mean a causal relationship, e.g. the existence of the first AGI has to inspire someone else to create a more dangerous version if the AI doesn’t do so itself. (Though I can’t pay if the world ends for some other reason, and I might die beforehand.) Of course, you might persuade whatever judge we agree on to rule in your favor before I would consider the question settled.
(In case it’s not clear, the comment I just linked comes from 2010 or thereabouts. This is not a worry I made up on the spot.)
Given the the fact that the bet is 100 to 1 in my favor, I would be happy to let you judge the result yourself.
Or you could agree to whatever result Eliezer agrees with. However, with Eliezer the conditions are specified, and “the world doesn’t end” just means that we’re still alive with the artificial intelligence running for a week.
I’m talking about the fact that humans can (and sometimes do) sort of optimize the universe. Like, you can reason about the way the universe is and decide to work on causing it to be in a certain state.
This could very well be the case, but humans still sometimes sort of optimize the universe. Like, I’m saying it’s at least possible to sort of optimize the universe in theory, and humans do this somewhat, not that humans directly use universe-optimizing to select their actions. If a way to write universe-optimizing AGIs exists, someone is likely to find it eventually.
I agree with this. There are some difficulties with self-modification (as elaborated in my other comment), but it seems probable that this can be done.
Seems pretty plausible. Obviously it depends on what you mean by “AI”; certainly, most modern-day AIs are this way. At the same time, this is definitely not a reason to not worry about AI risk, because (a) tool AIs could still “accidentally” optimize the universe depending on how search for self-modifications and other actions happens, and (b) we can’t bet on no one figuring out how to turn a superintelligent tool AI into a universe optimizer.
I do agree with a lot of what you say: it seems like a lot of people talk about AI risk in terms of universe-optimization, when we don’t even understand how to optimize functions over the universe given infinite computational power. I do think that non-universe-optimizing AIs are under-studied, that they are somewhat likely to be the first human-level AGIs, and that they will be extraordinary useful for solving some FAI-related problems. But none of this makes the problems of AI risk go away.
Ok. I don’t think we are disagreeing here much, if at all. I’m not maintaining that there’s no risk from AI, just that the default original AI is likely not to be universe-optimizing in that way. When I said in the bet “without paying attention to Friendliness”, that did not mean without paying attention to risks, since of course programmers even now try to make their programs safe, but just that they would not try to program it to optimize everything for human goals.
Also, I don’t understand why so many people thought my side of the bet was a bad idea, when Eliezer is betting at odds of 100 to 1 against me, and in fact there are plenty of other ways I could win the bet, even if my whole theory is wrong. For example, it is not even specified in the bet that the AI has to be self-modifying, just superintelligent, so it could be that first a human level AI is constructed, not superintelligent and not self-modifying, and then people build a superintelligence simply by adding on lots of hardware. In that case it is not clear at all that it would have any fast way to take over the world, even if it had the ability and desire to optimize the universe. First it would have to acquire the ability to self-modify, which perhaps it could do by convincing people to give it that ability or by taking other actions in the external world to take over first. But that could take a while, which would mean that I would still win the bet—we would still be around acting normally with a superintelligence in the world. Of course, winning the bet wouldn’t do me much good in that particular situation, but I’d still win. And that’s just one example; I can think of plenty of other ways I could win the bet even while being wrong in theory. I don’t see how anyone can reasonably think he’s 99% certain both that my theory is wrong and that none of these other things will happen.
Do you realize you failed to specify any of that? I feel I’m being slightly generous by interpreting “and the world doesn’t end” to mean a causal relationship, e.g. the existence of the first AGI has to inspire someone else to create a more dangerous version if the AI doesn’t do so itself. (Though I can’t pay if the world ends for some other reason, and I might die beforehand.) Of course, you might persuade whatever judge we agree on to rule in your favor before I would consider the question settled.
(In case it’s not clear, the comment I just linked comes from 2010 or thereabouts. This is not a worry I made up on the spot.)
Given the the fact that the bet is 100 to 1 in my favor, I would be happy to let you judge the result yourself.
Or you could agree to whatever result Eliezer agrees with. However, with Eliezer the conditions are specified, and “the world doesn’t end” just means that we’re still alive with the artificial intelligence running for a week.