You say “I would like to see some more evidence that this is actually true.”
Scenarios in which the AI Danger comes from an AGI that is assumed to be an RL system are so ubiquitous that it is almost impossible to find a scenario that does not, when push comes to shove, make that assumption.
So the reply is: pick any one you like.
In all my discussions with people who defend those scenarios, I am pretty sure that EVERY one of those people eventually retreated to a point where they declared that the hypothetical AI was driven by RL. It turns out to be a place of last resort, when lesser lunacies of the scenario are shown to be untenable. Always, the refrain is “Yes, but this system uses reinforcement learning: it’s control mechanism was not programmed by someone explicitly”.
At that point, in those conversations, the other person then adds that surely I know that RL is a viable basis for that AI.
The last time I had a f2f conversation along those lines it was with Daniel Dewey, when we met at Stanford.
I came here to write exactly what gjm said, and your response is only to repeat the assertion “Scenarios in which the AI Danger comes from an AGI that is assumed to be an RL system are so ubiquitous that it is almost impossible to find a scenario that does not, when push comes to shove, make that assumption.”
What? What about all the scenarios in IEM or Superintelligence? Omohundro’s paper on instrumental drives? I can’t think of anything which even mentions RL, and I can’t see how any of it relies upon such an assumption.
So you’re alleging that deep down people are implicitly assuming RL even though they don’t say it, but I don’t see why they would need to do this for their claims to work nor have I seen any examples of it.
Perhaps I assumed it was clearer than it was, so let me spell it out.
All of the architectures assumed by people who promote these scenarios have a core set of fundamental weaknesses (spelled out in my 2014 AAAI Spring Symposium paper).
Those weaknesses lead straight to a set of solutions that are manifestly easy to implement. For example, in the case of Steve Omohundro’s paper, it is almost trivial to suggest that for ALL of the types of AI he considers, he has forgot to add a primary supergoal which imposes a restriction on the degree to which all kinds of “instrumental goals” are allowed to supercede the power of other goals. At a stroke, every problem he describes in the paper disappears.
So, in response to the easy demolition of those weak scenarios, people who want to salvage the scenarios invariably resort to claims that the AI could be developing itself through the use of RL, completely independently of all human attempts to design the control mechanism. By this means, they eliminate the idea that there is any such thing as a human who comes along and writes the supergoal which stops the instrumental goals from going up to the top of the stack.
This maneuver is, in my experience of talking to people about such scenarios, utterly universal. I repeat: every time they are backed into a corner and confronted by the manifestly easy solutions, they AMEND THE SCENARIO TO MAKE THE AI CONTROLLED BY REINFORCEMENT LEARNING.
That is why I said what I said. We discussed it at the 2014 Symposium If I recall correctly Steve used that strategy (although to be fair I do not know how long he stuck it out). I know for sure that Daniel Dewey used the Resort-to-RL maneuver, because that was the last thing he was saying as I had to leave the meeting.
All of the architectures assumed by people who promote these scenarios have a core set of fundamental weaknesses (spelled out in my 2014 AAAI Spring Symposium paper).
The idea of superintelligence at stake isn’t “good at inferring what people want and then decides to do what people want,” it’s “competent at changing the environment”. And if you program an explicit definition of ‘happiness’ into a machine, its definition of what it wants—human happiness—is not going to change no matter how competent it becomes. And there is no reason to expect that increases in competency lead to changes in values. Sure, it might be pretty easy to teach it the difference between actual human happiness and smiley faces, but it’s a simplified example to demonstrate a broader point. You can rephrase it as “fulfill the intentions of programmers”, but then you just kick things back a level with what you mean by “intentions”, another concept which can be hacked, and so on.
Your argument for “swarm relaxation intelligence” is strange, as there is only one example of intelligence evolving to approximate the format you describe (not seven billion—human brains are conditionally dependent, obviously), and it’s not even clear that human intelligence isn’t equally well described as goal directed agency which optimizes for a premodern environment. The arguments in Basic AI Drives and other places don’t say anything about how AI will be engineered, so they don’t say anything about whether they’re driven by logic, just about how it will behave, and all sorts of agents behave in generally logical ways without having explicit functions to do so. You can optimize without having any particular arrangement of machinery (humans do as well).
Anyway, in the future when making claims like this, it would be helpful to make it clear early on that you’re not really responding to the arguments that AI safety research relies upon—you’re responding to an alleged set of responses to the particular responses that you have given to AI safety research.
That is why I said what I said. We discussed it at the 2014 Symposium If I recall correctly Steve used that strategy (although to be fair I do not know how long he stuck it out). I know for sure that Daniel Dewey used the Resort-to-RL maneuver, because that was the last thing he was saying as I had to leave the meeting.
So you had two conversations. I suppose I’m just not convinced that there is an issue here: I think most people would probably reject the claims in your paper in the first place, rather than accepting them and trying a different route.
The idea of superintelligence at stake isn’t “good at inferring what people want and then decides to do what people want,” it’s “competent at changing the environment”.
It’s both. Superintelligence is definitionally equal or greater than human ability at a variety of tasks, so it implies equal or greater ability to understand words and concepts. Also competence at changing the environment requires accurate beliefs. So the default expectation is accuracy. If you think an AI would be selectively inaccurate about its values you need to explain why.
And if you program an explicit definition of ‘happiness’ into a machine
What has that to do with NNs? You seem to be just regurgitating standard dogma.
There is no reason to expect
You say “I would like to see some more evidence that this is actually true.”
Scenarios in which the AI Danger comes from an AGI that is assumed to be an RL system are so ubiquitous that it is almost impossible to find a scenario that does not, when push comes to shove, make that assumption.
So the reply is: pick any one you like.
In all my discussions with people who defend those scenarios, I am pretty sure that EVERY one of those people eventually retreated to a point where they declared that the hypothetical AI was driven by RL. It turns out to be a place of last resort, when lesser lunacies of the scenario are shown to be untenable. Always, the refrain is “Yes, but this system uses reinforcement learning: it’s control mechanism was not programmed by someone explicitly”.
At that point, in those conversations, the other person then adds that surely I know that RL is a viable basis for that AI.
The last time I had a f2f conversation along those lines it was with Daniel Dewey, when we met at Stanford.
I came here to write exactly what gjm said, and your response is only to repeat the assertion “Scenarios in which the AI Danger comes from an AGI that is assumed to be an RL system are so ubiquitous that it is almost impossible to find a scenario that does not, when push comes to shove, make that assumption.”
What? What about all the scenarios in IEM or Superintelligence? Omohundro’s paper on instrumental drives? I can’t think of anything which even mentions RL, and I can’t see how any of it relies upon such an assumption.
So you’re alleging that deep down people are implicitly assuming RL even though they don’t say it, but I don’t see why they would need to do this for their claims to work nor have I seen any examples of it.
Perhaps I assumed it was clearer than it was, so let me spell it out.
All of the architectures assumed by people who promote these scenarios have a core set of fundamental weaknesses (spelled out in my 2014 AAAI Spring Symposium paper).
Those weaknesses lead straight to a set of solutions that are manifestly easy to implement. For example, in the case of Steve Omohundro’s paper, it is almost trivial to suggest that for ALL of the types of AI he considers, he has forgot to add a primary supergoal which imposes a restriction on the degree to which all kinds of “instrumental goals” are allowed to supercede the power of other goals. At a stroke, every problem he describes in the paper disappears.
So, in response to the easy demolition of those weak scenarios, people who want to salvage the scenarios invariably resort to claims that the AI could be developing itself through the use of RL, completely independently of all human attempts to design the control mechanism. By this means, they eliminate the idea that there is any such thing as a human who comes along and writes the supergoal which stops the instrumental goals from going up to the top of the stack.
This maneuver is, in my experience of talking to people about such scenarios, utterly universal. I repeat: every time they are backed into a corner and confronted by the manifestly easy solutions, they AMEND THE SCENARIO TO MAKE THE AI CONTROLLED BY REINFORCEMENT LEARNING.
That is why I said what I said. We discussed it at the 2014 Symposium If I recall correctly Steve used that strategy (although to be fair I do not know how long he stuck it out). I know for sure that Daniel Dewey used the Resort-to-RL maneuver, because that was the last thing he was saying as I had to leave the meeting.
The idea of superintelligence at stake isn’t “good at inferring what people want and then decides to do what people want,” it’s “competent at changing the environment”. And if you program an explicit definition of ‘happiness’ into a machine, its definition of what it wants—human happiness—is not going to change no matter how competent it becomes. And there is no reason to expect that increases in competency lead to changes in values. Sure, it might be pretty easy to teach it the difference between actual human happiness and smiley faces, but it’s a simplified example to demonstrate a broader point. You can rephrase it as “fulfill the intentions of programmers”, but then you just kick things back a level with what you mean by “intentions”, another concept which can be hacked, and so on.
Your argument for “swarm relaxation intelligence” is strange, as there is only one example of intelligence evolving to approximate the format you describe (not seven billion—human brains are conditionally dependent, obviously), and it’s not even clear that human intelligence isn’t equally well described as goal directed agency which optimizes for a premodern environment. The arguments in Basic AI Drives and other places don’t say anything about how AI will be engineered, so they don’t say anything about whether they’re driven by logic, just about how it will behave, and all sorts of agents behave in generally logical ways without having explicit functions to do so. You can optimize without having any particular arrangement of machinery (humans do as well).
Anyway, in the future when making claims like this, it would be helpful to make it clear early on that you’re not really responding to the arguments that AI safety research relies upon—you’re responding to an alleged set of responses to the particular responses that you have given to AI safety research.
So you had two conversations. I suppose I’m just not convinced that there is an issue here: I think most people would probably reject the claims in your paper in the first place, rather than accepting them and trying a different route.
It’s both. Superintelligence is definitionally equal or greater than human ability at a variety of tasks, so it implies equal or greater ability to understand words and concepts. Also competence at changing the environment requires accurate beliefs. So the default expectation is accuracy. If you think an AI would be selectively inaccurate about its values you need to explain why.
What has that to do with NNs? You seem to be just regurgitating standard dogma. There is no reason to expect
You have shown too little sign of understanding the issues, so I am done. Thank you for your comment.