I think maximizing paperclips is at least comprehensible to a human as a thing that some agent might take actions to do, even if it’s extremely narrow from a human point of view. I suspect that the more alien tasks are likely to be in the opposite direction: too complex for us to comprehend. Which also makes them difficult when trying to find and post examples.
While reading, I was thinking along the lines of: in the space of all possible mappings from states of the universe (including histories) to “value”, what proportion would make any sense at all? I suspect almost none. The problem is that almost all of them are also indescribably complex. Now, an emerging super-intelligence isn’t likely to have a purely random value function, and might not have anything we recognise as a value function at all, but it may still be an exercise that gives some hint at just how different non-human agents could possibly be.
I suspect that almost all of those would still lead to comprehensible instrumental goals though, such as “find out what the state of the universe actually is”, and “take over a lot of the universe to direct it toward a more highly valued state”.
Yeah, in one sense this question seems impossible to answer—“help me comprehend something incomprehensible to me by definition.”
But there’s another type of answer; of the utility functions that are alien in OP’s sense, it is possible that most will share patterns discernible by humans. OP could be asking what those patterns are.
I’m not sure how worthwhile it is to try to predict the high-level behavior of a generic superintelligence whose goals we don’t understand.
I don’t understand the question. Maximizing paperclips at the expense of everything else strikes me as “distinctly un-human”, isn’t it?
I think maximizing paperclips is at least comprehensible to a human as a thing that some agent might take actions to do, even if it’s extremely narrow from a human point of view. I suspect that the more alien tasks are likely to be in the opposite direction: too complex for us to comprehend. Which also makes them difficult when trying to find and post examples.
While reading, I was thinking along the lines of: in the space of all possible mappings from states of the universe (including histories) to “value”, what proportion would make any sense at all? I suspect almost none. The problem is that almost all of them are also indescribably complex. Now, an emerging super-intelligence isn’t likely to have a purely random value function, and might not have anything we recognise as a value function at all, but it may still be an exercise that gives some hint at just how different non-human agents could possibly be.
I suspect that almost all of those would still lead to comprehensible instrumental goals though, such as “find out what the state of the universe actually is”, and “take over a lot of the universe to direct it toward a more highly valued state”.
Yeah, in one sense this question seems impossible to answer—“help me comprehend something incomprehensible to me by definition.”
But there’s another type of answer; of the utility functions that are alien in OP’s sense, it is possible that most will share patterns discernible by humans. OP could be asking what those patterns are.
I’m not sure how worthwhile it is to try to predict the high-level behavior of a generic superintelligence whose goals we don’t understand.