And when you interrogate people in that ecosystem, to find out what exactly they see as the main dangers of future AGI, they quote—again and again and again—scenarios in which an AGI is controlled by Reinforcement Learning, and it is both superintelligent and dangerous psychopathic.
I would like to see some more evidence that this is actually true. The post contains one example of someone assuming an AI controlled by RL, quoted from this blog post by Holden Karnofsky. But that blog post very explicitly does not assume that what we need to worry about most is reinforcement learners run amok. Perhaps it assumes that that is one thing we need to worry about, and perhaps it is badly wrong to assume that, but it doesn’t at all make the assumption that if AIs become dangerous then they will be reinforcement learners.
So perhaps the real target here is MIRI rather than, say, Holden Karnofsky? (Perhaps they are the recipients of the “large stream of donated money”.) Well, I had a look at MIRI’s description of their mission (nothing about reinforcement learning, either explicitly or implicitly) and their “technical agenda” (ditto) and the paper that describes that agenda in more detail (which mentions reinforcement learning as something that might form a part of how an AI works but certainly neither states nor assumes that AIs will be reinforcement learners).
Maybe the issue is popularizations like Bostrom’s “Superintelligence”? Well, that at least has “reinforcement learning” in its index. I checked all the places pointed to by that index entry; none of them goes any further than suggesting that reinforcement learning might be one element of how an AI system comes to be.
Perhaps, then, the target is Less Wrong more specifically: maybe the idea is that the community here has been infected by the idea that what we need to be afraid of is systems that attain superintelligence through reinforcement learning alone. That’s a harder one to assess—there’s a lot of writing on Less Wrong, and any given bit can’t be assumed to be endorsed by everyone here. So I put <<<”reinforcement learning” site:lesswrong.com>>> into Google and followed a selection of links from the first few pages of results … and I didn’t find anything that seems to expect that any system will attain superintelligence through reinforcement learning alone, nor anything that assumes that AI and reinforcement learning are the same thing, nor anything else of the sort.
(The picture on LW looks to me more like this: most discussion of AI doesn’t use the term “reinforcement learning” or anything much like it; sometimes reinforcement-learning agents are used in toy models, a practice that seems to me obviously harmless; sometimes the possibility that a reinforcement-learning mechanism might be part of how an AI system learns, which again seems reasonable especially given that some of the successes (such as they are) that AI has had have in fact worked partly by reinforcement learning; sometimes it’s stated or assumed that some of what goes on in human brains is kinda reinforcement-learning-y, which again seems eminently reasonable.)
So I’m left rather confused. What are these research projects based on RL-AGI that need to stop getting funded? Who is quoting “again and again and again” scenarios in which a superintelligent AGI is controlled by reinforcement learning? Where are the non-straw-man reinforcement learning hype merchants?
You say “I would like to see some more evidence that this is actually true.”
Scenarios in which the AI Danger comes from an AGI that is assumed to be an RL system are so ubiquitous that it is almost impossible to find a scenario that does not, when push comes to shove, make that assumption.
So the reply is: pick any one you like.
In all my discussions with people who defend those scenarios, I am pretty sure that EVERY one of those people eventually retreated to a point where they declared that the hypothetical AI was driven by RL. It turns out to be a place of last resort, when lesser lunacies of the scenario are shown to be untenable. Always, the refrain is “Yes, but this system uses reinforcement learning: it’s control mechanism was not programmed by someone explicitly”.
At that point, in those conversations, the other person then adds that surely I know that RL is a viable basis for that AI.
The last time I had a f2f conversation along those lines it was with Daniel Dewey, when we met at Stanford.
I came here to write exactly what gjm said, and your response is only to repeat the assertion “Scenarios in which the AI Danger comes from an AGI that is assumed to be an RL system are so ubiquitous that it is almost impossible to find a scenario that does not, when push comes to shove, make that assumption.”
What? What about all the scenarios in IEM or Superintelligence? Omohundro’s paper on instrumental drives? I can’t think of anything which even mentions RL, and I can’t see how any of it relies upon such an assumption.
So you’re alleging that deep down people are implicitly assuming RL even though they don’t say it, but I don’t see why they would need to do this for their claims to work nor have I seen any examples of it.
Perhaps I assumed it was clearer than it was, so let me spell it out.
All of the architectures assumed by people who promote these scenarios have a core set of fundamental weaknesses (spelled out in my 2014 AAAI Spring Symposium paper).
Those weaknesses lead straight to a set of solutions that are manifestly easy to implement. For example, in the case of Steve Omohundro’s paper, it is almost trivial to suggest that for ALL of the types of AI he considers, he has forgot to add a primary supergoal which imposes a restriction on the degree to which all kinds of “instrumental goals” are allowed to supercede the power of other goals. At a stroke, every problem he describes in the paper disappears.
So, in response to the easy demolition of those weak scenarios, people who want to salvage the scenarios invariably resort to claims that the AI could be developing itself through the use of RL, completely independently of all human attempts to design the control mechanism. By this means, they eliminate the idea that there is any such thing as a human who comes along and writes the supergoal which stops the instrumental goals from going up to the top of the stack.
This maneuver is, in my experience of talking to people about such scenarios, utterly universal. I repeat: every time they are backed into a corner and confronted by the manifestly easy solutions, they AMEND THE SCENARIO TO MAKE THE AI CONTROLLED BY REINFORCEMENT LEARNING.
That is why I said what I said. We discussed it at the 2014 Symposium If I recall correctly Steve used that strategy (although to be fair I do not know how long he stuck it out). I know for sure that Daniel Dewey used the Resort-to-RL maneuver, because that was the last thing he was saying as I had to leave the meeting.
All of the architectures assumed by people who promote these scenarios have a core set of fundamental weaknesses (spelled out in my 2014 AAAI Spring Symposium paper).
The idea of superintelligence at stake isn’t “good at inferring what people want and then decides to do what people want,” it’s “competent at changing the environment”. And if you program an explicit definition of ‘happiness’ into a machine, its definition of what it wants—human happiness—is not going to change no matter how competent it becomes. And there is no reason to expect that increases in competency lead to changes in values. Sure, it might be pretty easy to teach it the difference between actual human happiness and smiley faces, but it’s a simplified example to demonstrate a broader point. You can rephrase it as “fulfill the intentions of programmers”, but then you just kick things back a level with what you mean by “intentions”, another concept which can be hacked, and so on.
Your argument for “swarm relaxation intelligence” is strange, as there is only one example of intelligence evolving to approximate the format you describe (not seven billion—human brains are conditionally dependent, obviously), and it’s not even clear that human intelligence isn’t equally well described as goal directed agency which optimizes for a premodern environment. The arguments in Basic AI Drives and other places don’t say anything about how AI will be engineered, so they don’t say anything about whether they’re driven by logic, just about how it will behave, and all sorts of agents behave in generally logical ways without having explicit functions to do so. You can optimize without having any particular arrangement of machinery (humans do as well).
Anyway, in the future when making claims like this, it would be helpful to make it clear early on that you’re not really responding to the arguments that AI safety research relies upon—you’re responding to an alleged set of responses to the particular responses that you have given to AI safety research.
That is why I said what I said. We discussed it at the 2014 Symposium If I recall correctly Steve used that strategy (although to be fair I do not know how long he stuck it out). I know for sure that Daniel Dewey used the Resort-to-RL maneuver, because that was the last thing he was saying as I had to leave the meeting.
So you had two conversations. I suppose I’m just not convinced that there is an issue here: I think most people would probably reject the claims in your paper in the first place, rather than accepting them and trying a different route.
The idea of superintelligence at stake isn’t “good at inferring what people want and then decides to do what people want,” it’s “competent at changing the environment”.
It’s both. Superintelligence is definitionally equal or greater than human ability at a variety of tasks, so it implies equal or greater ability to understand words and concepts. Also competence at changing the environment requires accurate beliefs. So the default expectation is accuracy. If you think an AI would be selectively inaccurate about its values you need to explain why.
And if you program an explicit definition of ‘happiness’ into a machine
What has that to do with NNs? You seem to be just regurgitating standard dogma.
There is no reason to expect
I would like to see some more evidence that this is actually true. The post contains one example of someone assuming an AI controlled by RL, quoted from this blog post by Holden Karnofsky. But that blog post very explicitly does not assume that what we need to worry about most is reinforcement learners run amok. Perhaps it assumes that that is one thing we need to worry about, and perhaps it is badly wrong to assume that, but it doesn’t at all make the assumption that if AIs become dangerous then they will be reinforcement learners.
So perhaps the real target here is MIRI rather than, say, Holden Karnofsky? (Perhaps they are the recipients of the “large stream of donated money”.) Well, I had a look at MIRI’s description of their mission (nothing about reinforcement learning, either explicitly or implicitly) and their “technical agenda” (ditto) and the paper that describes that agenda in more detail (which mentions reinforcement learning as something that might form a part of how an AI works but certainly neither states nor assumes that AIs will be reinforcement learners).
Maybe the issue is popularizations like Bostrom’s “Superintelligence”? Well, that at least has “reinforcement learning” in its index. I checked all the places pointed to by that index entry; none of them goes any further than suggesting that reinforcement learning might be one element of how an AI system comes to be.
Perhaps, then, the target is Less Wrong more specifically: maybe the idea is that the community here has been infected by the idea that what we need to be afraid of is systems that attain superintelligence through reinforcement learning alone. That’s a harder one to assess—there’s a lot of writing on Less Wrong, and any given bit can’t be assumed to be endorsed by everyone here. So I put <<<”reinforcement learning” site:lesswrong.com>>> into Google and followed a selection of links from the first few pages of results … and I didn’t find anything that seems to expect that any system will attain superintelligence through reinforcement learning alone, nor anything that assumes that AI and reinforcement learning are the same thing, nor anything else of the sort.
(The picture on LW looks to me more like this: most discussion of AI doesn’t use the term “reinforcement learning” or anything much like it; sometimes reinforcement-learning agents are used in toy models, a practice that seems to me obviously harmless; sometimes the possibility that a reinforcement-learning mechanism might be part of how an AI system learns, which again seems reasonable especially given that some of the successes (such as they are) that AI has had have in fact worked partly by reinforcement learning; sometimes it’s stated or assumed that some of what goes on in human brains is kinda reinforcement-learning-y, which again seems eminently reasonable.)
So I’m left rather confused. What are these research projects based on RL-AGI that need to stop getting funded? Who is quoting “again and again and again” scenarios in which a superintelligent AGI is controlled by reinforcement learning? Where are the non-straw-man reinforcement learning hype merchants?
You say “I would like to see some more evidence that this is actually true.”
Scenarios in which the AI Danger comes from an AGI that is assumed to be an RL system are so ubiquitous that it is almost impossible to find a scenario that does not, when push comes to shove, make that assumption.
So the reply is: pick any one you like.
In all my discussions with people who defend those scenarios, I am pretty sure that EVERY one of those people eventually retreated to a point where they declared that the hypothetical AI was driven by RL. It turns out to be a place of last resort, when lesser lunacies of the scenario are shown to be untenable. Always, the refrain is “Yes, but this system uses reinforcement learning: it’s control mechanism was not programmed by someone explicitly”.
At that point, in those conversations, the other person then adds that surely I know that RL is a viable basis for that AI.
The last time I had a f2f conversation along those lines it was with Daniel Dewey, when we met at Stanford.
I came here to write exactly what gjm said, and your response is only to repeat the assertion “Scenarios in which the AI Danger comes from an AGI that is assumed to be an RL system are so ubiquitous that it is almost impossible to find a scenario that does not, when push comes to shove, make that assumption.”
What? What about all the scenarios in IEM or Superintelligence? Omohundro’s paper on instrumental drives? I can’t think of anything which even mentions RL, and I can’t see how any of it relies upon such an assumption.
So you’re alleging that deep down people are implicitly assuming RL even though they don’t say it, but I don’t see why they would need to do this for their claims to work nor have I seen any examples of it.
Perhaps I assumed it was clearer than it was, so let me spell it out.
All of the architectures assumed by people who promote these scenarios have a core set of fundamental weaknesses (spelled out in my 2014 AAAI Spring Symposium paper).
Those weaknesses lead straight to a set of solutions that are manifestly easy to implement. For example, in the case of Steve Omohundro’s paper, it is almost trivial to suggest that for ALL of the types of AI he considers, he has forgot to add a primary supergoal which imposes a restriction on the degree to which all kinds of “instrumental goals” are allowed to supercede the power of other goals. At a stroke, every problem he describes in the paper disappears.
So, in response to the easy demolition of those weak scenarios, people who want to salvage the scenarios invariably resort to claims that the AI could be developing itself through the use of RL, completely independently of all human attempts to design the control mechanism. By this means, they eliminate the idea that there is any such thing as a human who comes along and writes the supergoal which stops the instrumental goals from going up to the top of the stack.
This maneuver is, in my experience of talking to people about such scenarios, utterly universal. I repeat: every time they are backed into a corner and confronted by the manifestly easy solutions, they AMEND THE SCENARIO TO MAKE THE AI CONTROLLED BY REINFORCEMENT LEARNING.
That is why I said what I said. We discussed it at the 2014 Symposium If I recall correctly Steve used that strategy (although to be fair I do not know how long he stuck it out). I know for sure that Daniel Dewey used the Resort-to-RL maneuver, because that was the last thing he was saying as I had to leave the meeting.
The idea of superintelligence at stake isn’t “good at inferring what people want and then decides to do what people want,” it’s “competent at changing the environment”. And if you program an explicit definition of ‘happiness’ into a machine, its definition of what it wants—human happiness—is not going to change no matter how competent it becomes. And there is no reason to expect that increases in competency lead to changes in values. Sure, it might be pretty easy to teach it the difference between actual human happiness and smiley faces, but it’s a simplified example to demonstrate a broader point. You can rephrase it as “fulfill the intentions of programmers”, but then you just kick things back a level with what you mean by “intentions”, another concept which can be hacked, and so on.
Your argument for “swarm relaxation intelligence” is strange, as there is only one example of intelligence evolving to approximate the format you describe (not seven billion—human brains are conditionally dependent, obviously), and it’s not even clear that human intelligence isn’t equally well described as goal directed agency which optimizes for a premodern environment. The arguments in Basic AI Drives and other places don’t say anything about how AI will be engineered, so they don’t say anything about whether they’re driven by logic, just about how it will behave, and all sorts of agents behave in generally logical ways without having explicit functions to do so. You can optimize without having any particular arrangement of machinery (humans do as well).
Anyway, in the future when making claims like this, it would be helpful to make it clear early on that you’re not really responding to the arguments that AI safety research relies upon—you’re responding to an alleged set of responses to the particular responses that you have given to AI safety research.
So you had two conversations. I suppose I’m just not convinced that there is an issue here: I think most people would probably reject the claims in your paper in the first place, rather than accepting them and trying a different route.
It’s both. Superintelligence is definitionally equal or greater than human ability at a variety of tasks, so it implies equal or greater ability to understand words and concepts. Also competence at changing the environment requires accurate beliefs. So the default expectation is accuracy. If you think an AI would be selectively inaccurate about its values you need to explain why.
What has that to do with NNs? You seem to be just regurgitating standard dogma. There is no reason to expect
You have shown too little sign of understanding the issues, so I am done. Thank you for your comment.
I have modified the original essay to include a clarification of why I describe RL as ubiquitous.