(I don’t think any of these, except 2, are things that the VNM axioms rely on. The rest seem totally compatible to me. I agree that 2 is interesting, and I’ve liked Scott Garrabrant’s exploration of the stuff)
The VNM axioms refer to an “agent” who has “preferences” over lotteries of outcomes. It seems to me this is challenging to interpret if there isn’t a persistent agent, with a persistent mind, who assigns Bayesian subjective probabilities to outcomes (which I’m assuming it has some ability to think about and care about, i.e. my (4)), and who chooses actions based on their preferences between lotteries. That is, it seems to me the axioms rely on there being a mind that is certain kinds of persistent/unaffected.
Do you (habryka) mean there’s a new “utility function” at any given moment, made of “outcomes” that can include parts of how the agent runs its own inside? Or can you say more about VNM is compatible with the negations of my 1, 3, and 4, or otherwise give me more traction for figuring out where our disagreement is coming from?
I was reasoning mostly from “what’re the assumptions required for an agent to base its choices on the anticipated external consequences of those choices.”
It seems to me this is challenging to interpret if there isn’t a persistent agent, with a persistent mind, who assigns Bayesian subjective probabilities to outcomes
Right but if there isn’t a persistent agent with a persistent mind, then we no longer have an entity to which predicates of rationality apply (at least in the sense that the term “rationality” is usually understood in this community). Talking about it in terms of “it’s no longer vNM-rational” feels like saying “it’s no longer wet” when you change the subject of discussion from physical bodies to abstract mathematical structures.
I was trying to explain to Habryka why I thought (1), (3) and (4) are parts of the assumptions under which the VNM utility theorem is derived.
I think all of (1), (2), (3) and (4) are part of the context I’ve usually pictured in understanding VNM as having real-world application, at least. And they’re part of this context because I’ve been wanting to think of a mind as having persistence, and persistent preferences, and persistent (though rationally updated) beliefs about what lotteries of outcomes can be chosen via particular physical actions, and stuff. (E.g., in Scott’s example about the couple, one could say “they don’t really violate independence; they just care also about process-fairness” or something, but, … it seems more natural to attach words to real-world scenarios in such a way as to say the couple does violate independence. And when I try to reason this way, I end up thinking that all of (1)-(4) are part of the most natural way to try to get the VNM utility theorem to apply to the world with sensible, non-Grue-like word-to-stuff mappings.)
I’m not sure why Habryka disagrees. I feel like lots of us are talking past each other in this subthread, and am not sure how to do better.
I don’t think I follow your (Mateusz’s) remark yet.
It may be that some of the good reasons to not be VNM right now, will continue to be such. In that case, there’s no point at which you want to be VNM, and in some senses you don’t even limit to VNM. (E.g. you might limit to VNM in the sense that, for any local ontology thing, as long as it isn’t revised, you tend toward VNMness; but the same mind might fail to limit to VNM in that, on any given day, the stuff it is most concerned/involved with makes it look quite non-VNM.)
If you negate 1 or 3, then you have an additional factor/consideration in what your mind should be shaped like and the conclusion “you better be shaped such that your behavior is interpretable as maximizing some (sensible) utility function or otherwise you are exploitable or miss out on profitable bets” doesn’t straightforwardly follow.
I feel like people keep imagining that VNM rationality is some highly specific cognitive architecture. But the real check for VNM rationality is (approximately) just “can you be dutch booked?”.
I think I can care about external incentives on how my mind runs, and not be dutch booked. Therefore, it’s not in conflict with VNM rationality. This means there is some kind of high-dimensional utility function which I am behaving to, but it doesn’t mean me or anyone else has access to that utility function, or that utility function has a particularly natural basis, or an explicit representation of that utility function is useful for doing the actual computation of how I am making decisions.
I feel like this discussion could do with some disambiguation of what “VNM rationality” means.
VNM assumes consequentialism. If you define consequentialism narrowly, this has specific results in terms of instrumental convergence.
You can redefine what constitutes a consequence arbitrarily. But, along the lines of what Steven Byrnes points out in his comment, redefining this can get rid of instrumental convergence. In the extreme case you can define a utility function for literally any pattern of behaviour.
When you say you feel like you can’t be dutch booked, you are at least implicitly assuming some definition of consequences you can’t be dutch booked in terms of. To claim that one is rationally required to adopt any particular definition of consequences in your utility function is basically circular, since you only care about being dutch booked according to it if you actually care about that definition of consequences. It’s in this sense that the VNM theorem is trivial.
BTW I am concerned that self-modifying AIs may self-modify towards VNM-0 agents.
But the reason is not because such self modification is “rational”.
It’s just that (narrowly defined) consequentialist agents care about preserving and improving their abilities to and proclivities to pursue their consequentialist goals, so tendencies towards VNM-0 will be reinforced in a feedback loop. Likewise for inter-agent competition.
(I don’t think any of these, except 2, are things that the VNM axioms rely on. The rest seem totally compatible to me. I agree that 2 is interesting, and I’ve liked Scott Garrabrant’s exploration of the stuff)
The VNM axioms refer to an “agent” who has “preferences” over lotteries of outcomes. It seems to me this is challenging to interpret if there isn’t a persistent agent, with a persistent mind, who assigns Bayesian subjective probabilities to outcomes (which I’m assuming it has some ability to think about and care about, i.e. my (4)), and who chooses actions based on their preferences between lotteries. That is, it seems to me the axioms rely on there being a mind that is certain kinds of persistent/unaffected.
Do you (habryka) mean there’s a new “utility function” at any given moment, made of “outcomes” that can include parts of how the agent runs its own inside? Or can you say more about VNM is compatible with the negations of my 1, 3, and 4, or otherwise give me more traction for figuring out where our disagreement is coming from?
I was reasoning mostly from “what’re the assumptions required for an agent to base its choices on the anticipated external consequences of those choices.”
Right but if there isn’t a persistent agent with a persistent mind, then we no longer have an entity to which predicates of rationality apply (at least in the sense that the term “rationality” is usually understood in this community). Talking about it in terms of “it’s no longer vNM-rational” feels like saying “it’s no longer wet” when you change the subject of discussion from physical bodies to abstract mathematical structures.
Or am I misunderstanding you?
I was trying to explain to Habryka why I thought (1), (3) and (4) are parts of the assumptions under which the VNM utility theorem is derived.
I think all of (1), (2), (3) and (4) are part of the context I’ve usually pictured in understanding VNM as having real-world application, at least. And they’re part of this context because I’ve been wanting to think of a mind as having persistence, and persistent preferences, and persistent (though rationally updated) beliefs about what lotteries of outcomes can be chosen via particular physical actions, and stuff. (E.g., in Scott’s example about the couple, one could say “they don’t really violate independence; they just care also about process-fairness” or something, but, … it seems more natural to attach words to real-world scenarios in such a way as to say the couple does violate independence. And when I try to reason this way, I end up thinking that all of (1)-(4) are part of the most natural way to try to get the VNM utility theorem to apply to the world with sensible, non-Grue-like word-to-stuff mappings.)
I’m not sure why Habryka disagrees. I feel like lots of us are talking past each other in this subthread, and am not sure how to do better.
I don’t think I follow your (Mateusz’s) remark yet.
It may be that some of the good reasons to not be VNM right now, will continue to be such. In that case, there’s no point at which you want to be VNM, and in some senses you don’t even limit to VNM. (E.g. you might limit to VNM in the sense that, for any local ontology thing, as long as it isn’t revised, you tend toward VNMness; but the same mind might fail to limit to VNM in that, on any given day, the stuff it is most concerned/involved with makes it look quite non-VNM.)
If you negate 1 or 3, then you have an additional factor/consideration in what your mind should be shaped like and the conclusion “you better be shaped such that your behavior is interpretable as maximizing some (sensible) utility function or otherwise you are exploitable or miss out on profitable bets” doesn’t straightforwardly follow.
I feel like people keep imagining that VNM rationality is some highly specific cognitive architecture. But the real check for VNM rationality is (approximately) just “can you be dutch booked?”.
I think I can care about external incentives on how my mind runs, and not be dutch booked. Therefore, it’s not in conflict with VNM rationality. This means there is some kind of high-dimensional utility function which I am behaving to, but it doesn’t mean me or anyone else has access to that utility function, or that utility function has a particularly natural basis, or an explicit representation of that utility function is useful for doing the actual computation of how I am making decisions.
I feel like this discussion could do with some disambiguation of what “VNM rationality” means.
VNM assumes consequentialism. If you define consequentialism narrowly, this has specific results in terms of instrumental convergence.
You can redefine what constitutes a consequence arbitrarily. But, along the lines of what Steven Byrnes points out in his comment, redefining this can get rid of instrumental convergence. In the extreme case you can define a utility function for literally any pattern of behaviour.
When you say you feel like you can’t be dutch booked, you are at least implicitly assuming some definition of consequences you can’t be dutch booked in terms of. To claim that one is rationally required to adopt any particular definition of consequences in your utility function is basically circular, since you only care about being dutch booked according to it if you actually care about that definition of consequences. It’s in this sense that the VNM theorem is trivial.
BTW I am concerned that self-modifying AIs may self-modify towards VNM-0 agents.
But the reason is not because such self modification is “rational”.
It’s just that (narrowly defined) consequentialist agents care about preserving and improving their abilities to and proclivities to pursue their consequentialist goals, so tendencies towards VNM-0 will be reinforced in a feedback loop. Likewise for inter-agent competition.