Steven Byrnes comments on Is “VNM-agent” one of several options, for what minds can grow up into?

Steven Byrnes 30 Dec 2024 15:51 UTC
16 points
13
The standard dutch-book arguments seem like pretty good reason to be VNM-rational in the relevant sense.
I think that’s kinda circular reasoning, the way you’re using it in context:
If I have preferences exclusively about the state of the world in the distant future, then dutch-book arguments indeed show that I should be VNM-rational. But if I don’t have such preferences, then someone could say “hey Steve, your behavior is dutch-bookable”, and I am allowed to respond “OK, but I still want to behave that way”.
I put a silly example here:
For example, the first (Yudkowsky) post mentions a hypothetical person at a restaurant. When they have an onion pizza, they’ll happily pay $0.01 to trade it for a pineapple pizza. When they have a pineapple pizza, they’ll happily pay $0.01 to trade it for a mushroom pizza. When they have a mushroom pizza, they’ll happily pay $0.01 to trade it for a pineapple pizza. The person goes around and around, wasting their money in a self-defeating way (a.k.a. “getting money-pumped”).
That post describes the person as behaving sub-optimally. But if you read carefully, the author sneaks in a critical background assumption: the person in question has preferences about what pizza they wind up eating, and they’re making these decisions based on those preferences. But what if they don’t? What if the person has no preference whatsoever about pizza? What if instead they’re an asshole restaurant customer who derives pure joy from making the waiter run back and forth to the kitchen?! Then we can look at the same behavior, and we wouldn’t describe it as self-defeating “getting money-pumped”, instead we would describe it as the skillful satisfaction of the person’s own preferences! They’re buying cheap entertainment! So that would be an example of preferences-not-concerning-future-states.
(I’m assuming in this comment that the domain (input) of the VNM utility function is purely the state of the world in the distant future. If you don’t assume that, then saying that I should have a VNM utility function is true but trivial, and in particular doesn’t imply instrumental convergence. Again, more discussion here.)
(I agree that humans do in fact have preferences about the state of the world in the future, and that AGIs will too, and that this leads to instrumental convergence and is important, etc. I’m just saying that humans don’t exclusively have preferences about the state of the world in the future, and AGIs might be the same, and that this caveat is potentially important.)