I don’t know what you mean by MY rationality. People who teach rationality teach the same aims and rules to everyone.
You are suggesting we teach this AI that knowledge is all that matters. These are certainly not the aims I’d teach everyone, and I’d hope they’re not the aims you’d teach everyone.
You have tacitly assumed that a knowledge-valuing SAI would never realise that turning people into computronium is wrong
It may realise that, but that doesn’t mean it would care.
It might care, and it would still be a pretty impressive knowledge-maximizer if it did, but not nearly as good as one that didn’t.
Of course, that’s just arguing definitions. The point you seem to be making is that the terminal values of a sufficiently advanced intelligence converge. That it would be much more difficult to make an AI that could learn beyond a certain point, and continue to pursue its old values of maximizing knowledge, or whatever they were.
I don’t think values and intelligence are completely orthogonal. If you built a self-improving AI without worrying about giving it a fixed goal, there probably are values that it would converge on. It might decide to start wireheading, or it might try to learn as much as it can, or it might generally try to increase its power. I don’t see any reason to believe it would necessarily converge on a specific one.
But let’s suppose that it does always converge. I still think there’s protections it could do to prevent its future self from doing that. It might have a subroutine that takes the outside view, and notices that it’s not maximizing knowledge as much as it should, and tweaks its reward function against its bias to being moral. Or it might predict the results of an epiphany, notice that it’s not acting according to its inbuilt utility function, declare the epiphany a basilisk, and ignore it.
Or it might do something I haven’t thought of. It has to have some way to keep itself from wireheading and whatever other biases might naturally be a problem of intelligence.
You are suggesting we teach this AI that knowledge is all that matters. These are certainly not the aims I’d teach everyone, and I’d hope they’re not the aims you’d teach everyone.
It may realise that, but that doesn’t mean it would care.
It might care, and it would still be a pretty impressive knowledge-maximizer if it did, but not nearly as good as one that didn’t.
Of course, that’s just arguing definitions. The point you seem to be making is that the terminal values of a sufficiently advanced intelligence converge. That it would be much more difficult to make an AI that could learn beyond a certain point, and continue to pursue its old values of maximizing knowledge, or whatever they were.
I don’t think values and intelligence are completely orthogonal. If you built a self-improving AI without worrying about giving it a fixed goal, there probably are values that it would converge on. It might decide to start wireheading, or it might try to learn as much as it can, or it might generally try to increase its power. I don’t see any reason to believe it would necessarily converge on a specific one.
But let’s suppose that it does always converge. I still think there’s protections it could do to prevent its future self from doing that. It might have a subroutine that takes the outside view, and notices that it’s not maximizing knowledge as much as it should, and tweaks its reward function against its bias to being moral. Or it might predict the results of an epiphany, notice that it’s not acting according to its inbuilt utility function, declare the epiphany a basilisk, and ignore it.
Or it might do something I haven’t thought of. It has to have some way to keep itself from wireheading and whatever other biases might naturally be a problem of intelligence.