A corollary is a necessary condition for friendliness: if the utility function of an AI can take values much larger than the complexity of the input, then it is unfriendly.
How does that work at all? They’re not measured by the same unit (bits vs. utils), and you can multiply a utility function by a positive constant or add or subtract an arbitrary constant and still have it represent the same preferences.
The question and Weissman’s answer are good, so this is just a distraction: are utils and bits really thought of as units? The mathematical formalism of e.g. physics doesn’t actually have (or doesn’t require) units, but you can extract them by thinking about the symmetries of the theory: e.g. distance is measured in the same units vertically and horizontally because the laws of physics stay the same after changing some coordinates. How do people think about this in economics?
The concept can be rescued, at least from that objection, by saying instead that their should be some value alpha, such that for any description of a state of the universe, the utility of that state is less than alpha times the complexity of that description. That is, the asymptotic complexity of utility is linear in terms of complexity.
However, the utility function still isn’t up for grabs. If our actual true utility function violates this rule, I don’t want to say that an AGI is unfriendly for maximizing it.
However, the utility function still isn’t up for grabs. If our actual true utility function violates this rule, I don’t want to say that an AGI is unfriendly for maximizing it.
Of course. The proposal here is that “our actual true utility function” does not violate this rule, since we are not in fact inclined to give in to a Pascalian mugger.
Sometimes, when I go back and read my own comments, I wonder just what goes on in that part of my brain that translates concepts into typed out words when I am not paying it conscious attention.
Anyways, let “our actual true utility function” refer to the utility function that best describes our collective values that we only manage to effectively achieve in certain environments that match the assumptions inherent in our heuristics. Thinking of it this way, one might wonder if Pascalian muggers fit into these environments, and if not, how much does our instinctual reaction to them indicate about our values?
How does that work at all? They’re not measured by the same unit (bits vs. utils), and you can multiply a utility function by a positive constant or add or subtract an arbitrary constant and still have it represent the same preferences.
The question and Weissman’s answer are good, so this is just a distraction: are utils and bits really thought of as units? The mathematical formalism of e.g. physics doesn’t actually have (or doesn’t require) units, but you can extract them by thinking about the symmetries of the theory: e.g. distance is measured in the same units vertically and horizontally because the laws of physics stay the same after changing some coordinates. How do people think about this in economics?
The concept can be rescued, at least from that objection, by saying instead that their should be some value alpha, such that for any description of a state of the universe, the utility of that state is less than alpha times the complexity of that description. That is, the asymptotic complexity of utility is linear in terms of complexity.
However, the utility function still isn’t up for grabs. If our actual true utility function violates this rule, I don’t want to say that an AGI is unfriendly for maximizing it.
Of course. The proposal here is that “our actual true utility function” does not violate this rule, since we are not in fact inclined to give in to a Pascalian mugger.
Sometimes, when I go back and read my own comments, I wonder just what goes on in that part of my brain that translates concepts into typed out words when I am not paying it conscious attention.
Anyways, let “our actual true utility function” refer to the utility function that best describes our collective values that we only manage to effectively achieve in certain environments that match the assumptions inherent in our heuristics. Thinking of it this way, one might wonder if Pascalian muggers fit into these environments, and if not, how much does our instinctual reaction to them indicate about our values?