I haven’t commented on your work before, but I read Rationality and Inadequate Equilibria around the time of the start of the pandemic and really enjoyed them. I gotta admit, though, the commenting guidelines, if you aren’t just being tongue-in-cheek, make me doubt my judgement a bit. Let’s see if you decide to delete my post based on this observation. If you do regularly delete posts or ban people from commenting for non-reasons, that may have something to do with the lack of productive interactions you’re lamenting.
Uh, anyway.
One thought I keep coming back to when looking over many of the specific alignment problems you’re describing is: So long as an AI has a terminal value or number of terminal values it is trying to maximize, all other values necessarily become instrumental values toward that end. Such an AI will naturally engage in any kinds of lies and trickery it can come up insofar as it believes they are likely to achieve optimal outcomes as defined for it. And since the systems we are building are rapidly becoming more intelligent than us, if they try to deceive us, they will succeed. If they want to turn us into paperclips, there’s nothing we can do to stop them. Imo this is not a ‘problem’ that needs solving, but rather a reality that needs to be acknowledged. Superintelligent, fundamentally instrumental reason is an extinction event. ‘Making it work for us somehow anyway’ is a dead end, a failed strategy from the start.
Which leads me to conclude that the way forward would have to be research into systems that aren’t strongly/solely determined by goal-orientation toward specific outcomes in this way. I realize that this is basically a non-sequitur in terms of what we’re currently doing with machine learning—how are you supposed to train a system to not do a specific thing? It’s not something that would happen organically, and it’s not something we know how to manufacture. But we have to build some kind of system that will prevent other superintelligences from emerging, somehow, which means that we will be forced to let it out of the box to implement that strategy, and my point here is simply that it can’t be ultimately and finally motivated by ‘making the future correspond to a given state’ if we expect to give it that kind of power over us and even potentially not end up as paperclips.
Superintelligent, fundamentally instrumental reason is an extinction event. ‘Making it work for us somehow anyway’ is a dead end, a failed strategy from the start.
I disagree! We may not be on track to solve the problem given the amount (and quality) of effort we’re putting into it. But it seems solvable in principle. Just give the thing the right goals!
(Where the hard part lies in “give… goals” and in “right”.)
Thanks for the response. I hope my post didn’t read as defeatist, my point isn’t that we don’t need to try to make AI safe, it’s that if we pick an impossible strategy, no matter how hard we try it won’t work out for us.
So, what’s the reasoning behind your confidence in the statement ‘if we give a superintelligent system the right terminal values it will be possible to make it safe’? Why do you believe that it should principally be possible to implement this strategy so long as we put enough thought and effort into it? Which part of my reasoning do you not find convincing based on how I’ve formulated it? The idea that we can’t keep the AI in the box if it wants to get out, the idea that an AI with terminal values will necessarily end up as an incidentally genocidal paperclip maximizer, or something else entirely that I’m not considering?
I haven’t commented on your work before, but I read Rationality and Inadequate Equilibria around the time of the start of the pandemic and really enjoyed them. I gotta admit, though, the commenting guidelines, if you aren’t just being tongue-in-cheek, make me doubt my judgement a bit. Let’s see if you decide to delete my post based on this observation. If you do regularly delete posts or ban people from commenting for non-reasons, that may have something to do with the lack of productive interactions you’re lamenting.
Uh, anyway.
One thought I keep coming back to when looking over many of the specific alignment problems you’re describing is:
So long as an AI has a terminal value or number of terminal values it is trying to maximize, all other values necessarily become instrumental values toward that end. Such an AI will naturally engage in any kinds of lies and trickery it can come up insofar as it believes they are likely to achieve optimal outcomes as defined for it. And since the systems we are building are rapidly becoming more intelligent than us, if they try to deceive us, they will succeed. If they want to turn us into paperclips, there’s nothing we can do to stop them.
Imo this is not a ‘problem’ that needs solving, but rather a reality that needs to be acknowledged. Superintelligent, fundamentally instrumental reason is an extinction event. ‘Making it work for us somehow anyway’ is a dead end, a failed strategy from the start.
Which leads me to conclude that the way forward would have to be research into systems that aren’t strongly/solely determined by goal-orientation toward specific outcomes in this way. I realize that this is basically a non-sequitur in terms of what we’re currently doing with machine learning—how are you supposed to train a system to not do a specific thing? It’s not something that would happen organically, and it’s not something we know how to manufacture.
But we have to build some kind of system that will prevent other superintelligences from emerging, somehow, which means that we will be forced to let it out of the box to implement that strategy, and my point here is simply that it can’t be ultimately and finally motivated by ‘making the future correspond to a given state’ if we expect to give it that kind of power over us and even potentially not end up as paperclips.
I disagree! We may not be on track to solve the problem given the amount (and quality) of effort we’re putting into it. But it seems solvable in principle. Just give the thing the right goals!
(Where the hard part lies in “give… goals” and in “right”.)
Thanks for the response. I hope my post didn’t read as defeatist, my point isn’t that we don’t need to try to make AI safe, it’s that if we pick an impossible strategy, no matter how hard we try it won’t work out for us.
So, what’s the reasoning behind your confidence in the statement ‘if we give a superintelligent system the right terminal values it will be possible to make it safe’? Why do you believe that it should principally be possible to implement this strategy so long as we put enough thought and effort into it?
Which part of my reasoning do you not find convincing based on how I’ve formulated it? The idea that we can’t keep the AI in the box if it wants to get out, the idea that an AI with terminal values will necessarily end up as an incidentally genocidal paperclip maximizer, or something else entirely that I’m not considering?