If you negate 1 or 3, then you have an additional factor/consideration in what your mind should be shaped like and the conclusion “you better be shaped such that your behavior is interpretable as maximizing some (sensible) utility function or otherwise you are exploitable or miss out on profitable bets” doesn’t straightforwardly follow.
I feel like people keep imagining that VNM rationality is some highly specific cognitive architecture. But the real check for VNM rationality is (approximately) just “can you be dutch booked?”.
I think I can care about external incentives on how my mind runs, and not be dutch booked. Therefore, it’s not in conflict with VNM rationality. This means there is some kind of high-dimensional utility function which I am behaving to, but it doesn’t mean me or anyone else has access to that utility function, or that utility function has a particularly natural basis, or an explicit representation of that utility function is useful for doing the actual computation of how I am making decisions.
I feel like this discussion could do with some disambiguation of what “VNM rationality” means.
VNM assumes consequentialism. If you define consequentialism narrowly, this has specific results in terms of instrumental convergence.
You can redefine what constitutes a consequence arbitrarily. But, along the lines of what Steven Byrnes points out in his comment, redefining this can get rid of instrumental convergence. In the extreme case you can define a utility function for literally any pattern of behaviour.
When you say you feel like you can’t be dutch booked, you are at least implicitly assuming some definition of consequences you can’t be dutch booked in terms of. To claim that one is rationally required to adopt any particular definition of consequences in your utility function is basically circular, since you only care about being dutch booked according to it if you actually care about that definition of consequences. It’s in this sense that the VNM theorem is trivial.
BTW I am concerned that self-modifying AIs may self-modify towards VNM-0 agents.
But the reason is not because such self modification is “rational”.
It’s just that (narrowly defined) consequentialist agents care about preserving and improving their abilities to and proclivities to pursue their consequentialist goals, so tendencies towards VNM-0 will be reinforced in a feedback loop. Likewise for inter-agent competition.
If you negate 1 or 3, then you have an additional factor/consideration in what your mind should be shaped like and the conclusion “you better be shaped such that your behavior is interpretable as maximizing some (sensible) utility function or otherwise you are exploitable or miss out on profitable bets” doesn’t straightforwardly follow.
I feel like people keep imagining that VNM rationality is some highly specific cognitive architecture. But the real check for VNM rationality is (approximately) just “can you be dutch booked?”.
I think I can care about external incentives on how my mind runs, and not be dutch booked. Therefore, it’s not in conflict with VNM rationality. This means there is some kind of high-dimensional utility function which I am behaving to, but it doesn’t mean me or anyone else has access to that utility function, or that utility function has a particularly natural basis, or an explicit representation of that utility function is useful for doing the actual computation of how I am making decisions.
I feel like this discussion could do with some disambiguation of what “VNM rationality” means.
VNM assumes consequentialism. If you define consequentialism narrowly, this has specific results in terms of instrumental convergence.
You can redefine what constitutes a consequence arbitrarily. But, along the lines of what Steven Byrnes points out in his comment, redefining this can get rid of instrumental convergence. In the extreme case you can define a utility function for literally any pattern of behaviour.
When you say you feel like you can’t be dutch booked, you are at least implicitly assuming some definition of consequences you can’t be dutch booked in terms of. To claim that one is rationally required to adopt any particular definition of consequences in your utility function is basically circular, since you only care about being dutch booked according to it if you actually care about that definition of consequences. It’s in this sense that the VNM theorem is trivial.
BTW I am concerned that self-modifying AIs may self-modify towards VNM-0 agents.
But the reason is not because such self modification is “rational”.
It’s just that (narrowly defined) consequentialist agents care about preserving and improving their abilities to and proclivities to pursue their consequentialist goals, so tendencies towards VNM-0 will be reinforced in a feedback loop. Likewise for inter-agent competition.