All of the architectures assumed by people who promote these scenarios have a core set of fundamental weaknesses (spelled out in my 2014 AAAI Spring Symposium paper).
The idea of superintelligence at stake isn’t “good at inferring what people want and then decides to do what people want,” it’s “competent at changing the environment”. And if you program an explicit definition of ‘happiness’ into a machine, its definition of what it wants—human happiness—is not going to change no matter how competent it becomes. And there is no reason to expect that increases in competency lead to changes in values. Sure, it might be pretty easy to teach it the difference between actual human happiness and smiley faces, but it’s a simplified example to demonstrate a broader point. You can rephrase it as “fulfill the intentions of programmers”, but then you just kick things back a level with what you mean by “intentions”, another concept which can be hacked, and so on.
Your argument for “swarm relaxation intelligence” is strange, as there is only one example of intelligence evolving to approximate the format you describe (not seven billion—human brains are conditionally dependent, obviously), and it’s not even clear that human intelligence isn’t equally well described as goal directed agency which optimizes for a premodern environment. The arguments in Basic AI Drives and other places don’t say anything about how AI will be engineered, so they don’t say anything about whether they’re driven by logic, just about how it will behave, and all sorts of agents behave in generally logical ways without having explicit functions to do so. You can optimize without having any particular arrangement of machinery (humans do as well).
Anyway, in the future when making claims like this, it would be helpful to make it clear early on that you’re not really responding to the arguments that AI safety research relies upon—you’re responding to an alleged set of responses to the particular responses that you have given to AI safety research.
That is why I said what I said. We discussed it at the 2014 Symposium If I recall correctly Steve used that strategy (although to be fair I do not know how long he stuck it out). I know for sure that Daniel Dewey used the Resort-to-RL maneuver, because that was the last thing he was saying as I had to leave the meeting.
So you had two conversations. I suppose I’m just not convinced that there is an issue here: I think most people would probably reject the claims in your paper in the first place, rather than accepting them and trying a different route.
The idea of superintelligence at stake isn’t “good at inferring what people want and then decides to do what people want,” it’s “competent at changing the environment”.
It’s both. Superintelligence is definitionally equal or greater than human ability at a variety of tasks, so it implies equal or greater ability to understand words and concepts. Also competence at changing the environment requires accurate beliefs. So the default expectation is accuracy. If you think an AI would be selectively inaccurate about its values you need to explain why.
And if you program an explicit definition of ‘happiness’ into a machine
What has that to do with NNs? You seem to be just regurgitating standard dogma.
There is no reason to expect
The idea of superintelligence at stake isn’t “good at inferring what people want and then decides to do what people want,” it’s “competent at changing the environment”. And if you program an explicit definition of ‘happiness’ into a machine, its definition of what it wants—human happiness—is not going to change no matter how competent it becomes. And there is no reason to expect that increases in competency lead to changes in values. Sure, it might be pretty easy to teach it the difference between actual human happiness and smiley faces, but it’s a simplified example to demonstrate a broader point. You can rephrase it as “fulfill the intentions of programmers”, but then you just kick things back a level with what you mean by “intentions”, another concept which can be hacked, and so on.
Your argument for “swarm relaxation intelligence” is strange, as there is only one example of intelligence evolving to approximate the format you describe (not seven billion—human brains are conditionally dependent, obviously), and it’s not even clear that human intelligence isn’t equally well described as goal directed agency which optimizes for a premodern environment. The arguments in Basic AI Drives and other places don’t say anything about how AI will be engineered, so they don’t say anything about whether they’re driven by logic, just about how it will behave, and all sorts of agents behave in generally logical ways without having explicit functions to do so. You can optimize without having any particular arrangement of machinery (humans do as well).
Anyway, in the future when making claims like this, it would be helpful to make it clear early on that you’re not really responding to the arguments that AI safety research relies upon—you’re responding to an alleged set of responses to the particular responses that you have given to AI safety research.
So you had two conversations. I suppose I’m just not convinced that there is an issue here: I think most people would probably reject the claims in your paper in the first place, rather than accepting them and trying a different route.
It’s both. Superintelligence is definitionally equal or greater than human ability at a variety of tasks, so it implies equal or greater ability to understand words and concepts. Also competence at changing the environment requires accurate beliefs. So the default expectation is accuracy. If you think an AI would be selectively inaccurate about its values you need to explain why.
What has that to do with NNs? You seem to be just regurgitating standard dogma. There is no reason to expect
You have shown too little sign of understanding the issues, so I am done. Thank you for your comment.