A corollary is a necessary condition for friendliness: if the utility function of an AI can take values much larger than the complexity of the input, then it is unfriendly. This kills Pascal’s mugging and paperclip maximizers with the same stone. It even sounds simple and formal enough to imagine testing it on a given piece of code.
A corollary is a necessary condition for friendliness: if the utility function of an AI can take values much larger than the complexity of the input, then it is unfriendly.
How does that work at all? They’re not measured by the same unit (bits vs. utils), and you can multiply a utility function by a positive constant or add or subtract an arbitrary constant and still have it represent the same preferences.
The question and Weissman’s answer are good, so this is just a distraction: are utils and bits really thought of as units? The mathematical formalism of e.g. physics doesn’t actually have (or doesn’t require) units, but you can extract them by thinking about the symmetries of the theory: e.g. distance is measured in the same units vertically and horizontally because the laws of physics stay the same after changing some coordinates. How do people think about this in economics?
The concept can be rescued, at least from that objection, by saying instead that their should be some value alpha, such that for any description of a state of the universe, the utility of that state is less than alpha times the complexity of that description. That is, the asymptotic complexity of utility is linear in terms of complexity.
However, the utility function still isn’t up for grabs. If our actual true utility function violates this rule, I don’t want to say that an AGI is unfriendly for maximizing it.
However, the utility function still isn’t up for grabs. If our actual true utility function violates this rule, I don’t want to say that an AGI is unfriendly for maximizing it.
Of course. The proposal here is that “our actual true utility function” does not violate this rule, since we are not in fact inclined to give in to a Pascalian mugger.
Sometimes, when I go back and read my own comments, I wonder just what goes on in that part of my brain that translates concepts into typed out words when I am not paying it conscious attention.
Anyways, let “our actual true utility function” refer to the utility function that best describes our collective values that we only manage to effectively achieve in certain environments that match the assumptions inherent in our heuristics. Thinking of it this way, one might wonder if Pascalian muggers fit into these environments, and if not, how much does our instinctual reaction to them indicate about our values?
It doesn’t quite kill Pascal’s mugging—the threat does have to have some minimum level of credibility, but that minimum credibility can still be low enough that hand over the cash. Pascal’s mugging only is killed if the expected utility of handing over the cash is negative. To show this I think you really do need to evaluate the probability to the end.
Neither does it kill paperclip maximizers. A bunch of paperclips requires about log2(N) bits to describe, plus the description of the properties of a paperclip. So the paperclip maximizer can still have a constantly-increasing utility as they make more paperclips, your rule would just bound it to growing like log(N).
Good line of thought though: there may still be something in here.
To “kill Pascal’s mugging” one doesn’t have to give advice on how to deal with threats generally.
I think that N paperclips takes about complexity-of-N, plus complexity of a paperclip, bits to describe. “Complexity of N” can be much lower than log(N), e.g. complexity of 3^^^3 is smaller than the wikipedia article on Knuth’s notation. “3^^^3 paperclips” has very low complexity and very high utility.
But I think that a decision theory is better (better fulfills desiterata of universality, simplicity, etc. etc.) if it treats Pascal’s mugging with the same method it uses for other threats.
Why? Is “threat” a particularly “natural” category?
From my perspective, Pascal’s mugging is simply an argument showing that a human-friendly utility function should have a certain property, not a special class of problem to be solved.
Hah. Well, we can apply my exact same argument with different words to show why I agree with you:
But I think that a decision theory is better (better fulfills desiterata of universality, simplicity, etc. etc.) if it treats threats with the same method it uses for other decision problems.
Pascal’s mugging only is killed if the expected utility of handing over the cash is negative.
This will be the case in the scenario under discussion, due to the low probability of the mugger’s threat (in the “3^^^^3 disutilons” version), or the (relatively!) low disutility (in the “3^^^^3 persons” version, under Michael Vassar’s proposal).
So the paperclip maximizer can still have a constantly-increasing utility as they make more paperclips, your rule would just bound it to growing like log(N)
Yes; it would be a “less pure” paperclip maximizer, but still an unfriendly AI.
The rule is (proposed to be) necessary for friendliness, not sufficient by any means.
A corollary is a necessary condition for friendliness: if the utility function of an AI can take values much larger than the complexity of the input, then it is unfriendly. This kills Pascal’s mugging and paperclip maximizers with the same stone. It even sounds simple and formal enough to imagine testing it on a given piece of code.
How does that work at all? They’re not measured by the same unit (bits vs. utils), and you can multiply a utility function by a positive constant or add or subtract an arbitrary constant and still have it represent the same preferences.
The question and Weissman’s answer are good, so this is just a distraction: are utils and bits really thought of as units? The mathematical formalism of e.g. physics doesn’t actually have (or doesn’t require) units, but you can extract them by thinking about the symmetries of the theory: e.g. distance is measured in the same units vertically and horizontally because the laws of physics stay the same after changing some coordinates. How do people think about this in economics?
The concept can be rescued, at least from that objection, by saying instead that their should be some value alpha, such that for any description of a state of the universe, the utility of that state is less than alpha times the complexity of that description. That is, the asymptotic complexity of utility is linear in terms of complexity.
However, the utility function still isn’t up for grabs. If our actual true utility function violates this rule, I don’t want to say that an AGI is unfriendly for maximizing it.
Of course. The proposal here is that “our actual true utility function” does not violate this rule, since we are not in fact inclined to give in to a Pascalian mugger.
Sometimes, when I go back and read my own comments, I wonder just what goes on in that part of my brain that translates concepts into typed out words when I am not paying it conscious attention.
Anyways, let “our actual true utility function” refer to the utility function that best describes our collective values that we only manage to effectively achieve in certain environments that match the assumptions inherent in our heuristics. Thinking of it this way, one might wonder if Pascalian muggers fit into these environments, and if not, how much does our instinctual reaction to them indicate about our values?
I think I agree.
Perhaps one way to state the complexity-of-value thesis would be to say that the utility function should be bounded by Kolmogorov complexity.
It doesn’t quite kill Pascal’s mugging—the threat does have to have some minimum level of credibility, but that minimum credibility can still be low enough that hand over the cash. Pascal’s mugging only is killed if the expected utility of handing over the cash is negative. To show this I think you really do need to evaluate the probability to the end.
Neither does it kill paperclip maximizers. A bunch of paperclips requires about log2(N) bits to describe, plus the description of the properties of a paperclip. So the paperclip maximizer can still have a constantly-increasing utility as they make more paperclips, your rule would just bound it to growing like log(N).
Good line of thought though: there may still be something in here.
To “kill Pascal’s mugging” one doesn’t have to give advice on how to deal with threats generally.
I think that N paperclips takes about complexity-of-N, plus complexity of a paperclip, bits to describe. “Complexity of N” can be much lower than log(N), e.g. complexity of 3^^^3 is smaller than the wikipedia article on Knuth’s notation. “3^^^3 paperclips” has very low complexity and very high utility.
Ah, you’re right.
But I think that a decision theory is better (better fulfills desiterata of universality, simplicity, etc. etc.) if it treats Pascal’s mugging with the same method it uses for other threats.
Why? Is “threat” a particularly “natural” category?
From my perspective, Pascal’s mugging is simply an argument showing that a human-friendly utility function should have a certain property, not a special class of problem to be solved.
Hah. Well, we can apply my exact same argument with different words to show why I agree with you:
This will be the case in the scenario under discussion, due to the low probability of the mugger’s threat (in the “3^^^^3 disutilons” version), or the (relatively!) low disutility (in the “3^^^^3 persons” version, under Michael Vassar’s proposal).
Yes; it would be a “less pure” paperclip maximizer, but still an unfriendly AI.
The rule is (proposed to be) necessary for friendliness, not sufficient by any means.