Instrumental rationality is the one that actually matters. It’s just that, regardless of your goals, figuring out what’s going on is useful, hence the discussion of epistemic rationality.
The most obvious point of conflict is when further learning reaches diminishing returns. At some point, if you want to be instrumentally rational, you must actually do something.
Cool. So now there are two ways to make safe superinelligence. You can give it terminal values corresponding to human morality; or you can give it make it value knowledge and truth, as suggested by Wei Dai , Richard Loosemore....an Artificial Philosophical, a Genie that Does Care.
If it terminally values knowledge, then it’s instrumental rationality will be it’s epistemic rationality, but neither of those are your terminal rationality. From your point of view, creating an AI that consumes the universe as it tries to learn as much as it can would be a very bad idea.
I don’t know what you mean by MY rationality. People who teach rationality teach the same aims and rules to everyone.
You have tacitly assumed that a knowledge-valuing SAI would never realise that turning people into computronium is wrong...that it would never understand morality, or that moral truths cannot be discovered irrespective of cognitive resources.
I don’t know what you mean by MY rationality. People who teach rationality teach the same aims and rules to everyone.
You are suggesting we teach this AI that knowledge is all that matters. These are certainly not the aims I’d teach everyone, and I’d hope they’re not the aims you’d teach everyone.
You have tacitly assumed that a knowledge-valuing SAI would never realise that turning people into computronium is wrong
It may realise that, but that doesn’t mean it would care.
It might care, and it would still be a pretty impressive knowledge-maximizer if it did, but not nearly as good as one that didn’t.
Of course, that’s just arguing definitions. The point you seem to be making is that the terminal values of a sufficiently advanced intelligence converge. That it would be much more difficult to make an AI that could learn beyond a certain point, and continue to pursue its old values of maximizing knowledge, or whatever they were.
I don’t think values and intelligence are completely orthogonal. If you built a self-improving AI without worrying about giving it a fixed goal, there probably are values that it would converge on. It might decide to start wireheading, or it might try to learn as much as it can, or it might generally try to increase its power. I don’t see any reason to believe it would necessarily converge on a specific one.
But let’s suppose that it does always converge. I still think there’s protections it could do to prevent its future self from doing that. It might have a subroutine that takes the outside view, and notices that it’s not maximizing knowledge as much as it should, and tweaks its reward function against its bias to being moral. Or it might predict the results of an epiphany, notice that it’s not acting according to its inbuilt utility function, declare the epiphany a basilisk, and ignore it.
Or it might do something I haven’t thought of. It has to have some way to keep itself from wireheading and whatever other biases might naturally be a problem of intelligence.
Instrumental rationality is the one that actually matters. It’s just that, regardless of your goals, figuring out what’s going on is useful, hence the discussion of epistemic rationality.
The most obvious point of conflict is when further learning reaches diminishing returns. At some point, if you want to be instrumentally rational, you must actually do something.
Is .IR still the one that matters if you terminally value truth?
If your terminal value is your knowledge, then the two are the same.
It’s sort of like how your instrumental rationality and your ability to maximize paperclips are the same if you happen to be a paperclip maximizer.
Cool. So now there are two ways to make safe superinelligence. You can give it terminal values corresponding to human morality; or you can give it make it value knowledge and truth, as suggested by Wei Dai , Richard Loosemore....an Artificial Philosophical, a Genie that Does Care.
Huh?
If it terminally values knowledge, then it’s instrumental rationality will be it’s epistemic rationality, but neither of those are your terminal rationality. From your point of view, creating an AI that consumes the universe as it tries to learn as much as it can would be a very bad idea.
I don’t know what you mean by MY rationality. People who teach rationality teach the same aims and rules to everyone.
You have tacitly assumed that a knowledge-valuing SAI would never realise that turning people into computronium is wrong...that it would never understand morality, or that moral truths cannot be discovered irrespective of cognitive resources.
You are suggesting we teach this AI that knowledge is all that matters. These are certainly not the aims I’d teach everyone, and I’d hope they’re not the aims you’d teach everyone.
It may realise that, but that doesn’t mean it would care.
It might care, and it would still be a pretty impressive knowledge-maximizer if it did, but not nearly as good as one that didn’t.
Of course, that’s just arguing definitions. The point you seem to be making is that the terminal values of a sufficiently advanced intelligence converge. That it would be much more difficult to make an AI that could learn beyond a certain point, and continue to pursue its old values of maximizing knowledge, or whatever they were.
I don’t think values and intelligence are completely orthogonal. If you built a self-improving AI without worrying about giving it a fixed goal, there probably are values that it would converge on. It might decide to start wireheading, or it might try to learn as much as it can, or it might generally try to increase its power. I don’t see any reason to believe it would necessarily converge on a specific one.
But let’s suppose that it does always converge. I still think there’s protections it could do to prevent its future self from doing that. It might have a subroutine that takes the outside view, and notices that it’s not maximizing knowledge as much as it should, and tweaks its reward function against its bias to being moral. Or it might predict the results of an epiphany, notice that it’s not acting according to its inbuilt utility function, declare the epiphany a basilisk, and ignore it.
Or it might do something I haven’t thought of. It has to have some way to keep itself from wireheading and whatever other biases might naturally be a problem of intelligence.