The standard dutch-book arguments seem like pretty good reason to be VNM-rational in the relevant sense.
I mean, there are arguments about as solid as the “VNM utility theorem” pointing to CDT, but CDT is nevertheless not always the thing to aspire to, because CDT is based on an assumption/approximation that is not always a good-enough approximation (namely, CDT assumes our minds have no effects except via our actions, eg it assumes our minds have no direct effects on others’ predictions about us).
Some assumptions the VNM utility theorem is based on, that I suspect aren’t always good-enough approximations for the worlds we are in:
1) VNM assumes there are no important external incentives, that’ll give you more of what you care about if you run your mind (certain ways, not other ways). So, for example:
1a) “You” (the decision-maker process we are modeling) can choose anything you like, without risk of losing control of your hardware. (Contrast case: if the ruler of a country chooses unpopular policies, they are sometimes ousted. If a human chooses dieting/unrewarding problems/social risk, they sometimes lose control of themselves.)
1b) There are no costs to maintaining control of your mind/hardware. (Contrast case: if a company hires some brilliant young scientists to be creative on its behalf, it often has to pay a steep overhead if it additionally wants to make sure those scientists don’t disrupt its goals/beliefs/normal functioning.)
1c) We can’t acquire more resources by changing who we are via making friends, adopting ethics that our prospective friends want us to follow, etc.
2) VNM assumes the independence axiom. (Contrast case: Maybe we are a “society of mind” that has lots of small ~agents that will only stay knitted together if we respect “fairness” or something. And maybe the best ways of doing this violate the independence axiom. See Scott Garrabrant.) (Aka, I’m agreeing with Jan.)
2a) And maybe this’ll keep being true, even if we get to reflect a lot, if we keep wanting to craft in new creative processes that we don’t want to pay to keep fully supervised.
3) (As Steven Byrnes notes) we care only about the external world, and don’t care about the process we use to make decisions. (Contrast case: we might have process proferences, as well as outcome preferences.)
4) We have accurate external reference. Like, we can choose actions based on what external outcomes we want, and this power is given to us for free, stably. (Contrast case: ethics is sometimes defended as a set of compensations for how our maps predictably diverge from the territory, e.g. running on untrustworthy hardware, or “respect people, because they’re bigger than your map of them so you should expect they may benefit from e.g. honesty in ways you won’t manage to specifically predict.”) (Alternate contrast case: it’s hard to build a mind that can do external reference toward e.g. “diamonds”).
(I don’t think any of these, except 2, are things that the VNM axioms rely on. The rest seem totally compatible to me. I agree that 2 is interesting, and I’ve liked Scott Garrabrant’s exploration of the stuff)
The VNM axioms refer to an “agent” who has “preferences” over lotteries of outcomes. It seems to me this is challenging to interpret if there isn’t a persistent agent, with a persistent mind, who assigns Bayesian subjective probabilities to outcomes (which I’m assuming it has some ability to think about and care about, i.e. my (4)), and who chooses actions based on their preferences between lotteries. That is, it seems to me the axioms rely on there being a mind that is certain kinds of persistent/unaffected.
Do you (habryka) mean there’s a new “utility function” at any given moment, made of “outcomes” that can include parts of how the agent runs its own inside? Or can you say more about VNM is compatible with the negations of my 1, 3, and 4, or otherwise give me more traction for figuring out where our disagreement is coming from?
I was reasoning mostly from “what’re the assumptions required for an agent to base its choices on the anticipated external consequences of those choices.”
It seems to me this is challenging to interpret if there isn’t a persistent agent, with a persistent mind, who assigns Bayesian subjective probabilities to outcomes
Right but if there isn’t a persistent agent with a persistent mind, then we no longer have an entity to which predicates of rationality apply (at least in the sense that the term “rationality” is usually understood in this community). Talking about it in terms of “it’s no longer vNM-rational” feels like saying “it’s no longer wet” when you change the subject of discussion from physical bodies to abstract mathematical structures.
I was trying to explain to Habryka why I thought (1), (3) and (4) are parts of the assumptions under which the VNM utility theorem is derived.
I think all of (1), (2), (3) and (4) are part of the context I’ve usually pictured in understanding VNM as having real-world application, at least. And they’re part of this context because I’ve been wanting to think of a mind as having persistence, and persistent preferences, and persistent (though rationally updated) beliefs about what lotteries of outcomes can be chosen via particular physical actions, and stuff. (E.g., in Scott’s example about the couple, one could say “they don’t really violate independence; they just care also about process-fairness” or something, but, … it seems more natural to attach words to real-world scenarios in such a way as to say the couple does violate independence. And when I try to reason this way, I end up thinking that all of (1)-(4) are part of the most natural way to try to get the VNM utility theorem to apply to the world with sensible, non-Grue-like word-to-stuff mappings.)
I’m not sure why Habryka disagrees. I feel like lots of us are talking past each other in this subthread, and am not sure how to do better.
I don’t think I follow your (Mateusz’s) remark yet.
It may be that some of the good reasons to not be VNM right now, will continue to be such. In that case, there’s no point at which you want to be VNM, and in some senses you don’t even limit to VNM. (E.g. you might limit to VNM in the sense that, for any local ontology thing, as long as it isn’t revised, you tend toward VNMness; but the same mind might fail to limit to VNM in that, on any given day, the stuff it is most concerned/involved with makes it look quite non-VNM.)
If you negate 1 or 3, then you have an additional factor/consideration in what your mind should be shaped like and the conclusion “you better be shaped such that your behavior is interpretable as maximizing some (sensible) utility function or otherwise you are exploitable or miss out on profitable bets” doesn’t straightforwardly follow.
I feel like people keep imagining that VNM rationality is some highly specific cognitive architecture. But the real check for VNM rationality is (approximately) just “can you be dutch booked?”.
I think I can care about external incentives on how my mind runs, and not be dutch booked. Therefore, it’s not in conflict with VNM rationality. This means there is some kind of high-dimensional utility function which I am behaving to, but it doesn’t mean me or anyone else has access to that utility function, or that utility function has a particularly natural basis, or an explicit representation of that utility function is useful for doing the actual computation of how I am making decisions.
I feel like this discussion could do with some disambiguation of what “VNM rationality” means.
VNM assumes consequentialism. If you define consequentialism narrowly, this has specific results in terms of instrumental convergence.
You can redefine what constitutes a consequence arbitrarily. But, along the lines of what Steven Byrnes points out in his comment, redefining this can get rid of instrumental convergence. In the extreme case you can define a utility function for literally any pattern of behaviour.
When you say you feel like you can’t be dutch booked, you are at least implicitly assuming some definition of consequences you can’t be dutch booked in terms of. To claim that one is rationally required to adopt any particular definition of consequences in your utility function is basically circular, since you only care about being dutch booked according to it if you actually care about that definition of consequences. It’s in this sense that the VNM theorem is trivial.
BTW I am concerned that self-modifying AIs may self-modify towards VNM-0 agents.
But the reason is not because such self modification is “rational”.
It’s just that (narrowly defined) consequentialist agents care about preserving and improving their abilities to and proclivities to pursue their consequentialist goals, so tendencies towards VNM-0 will be reinforced in a feedback loop. Likewise for inter-agent competition.
I mean, there are arguments about as solid as the “VNM utility theorem” pointing to CDT, but CDT is nevertheless not always the thing to aspire to, because CDT is based on an assumption/approximation that is not always a good-enough approximation (namely, CDT assumes our minds have no effects except via our actions, eg it assumes our minds have no direct effects on others’ predictions about us).
Some assumptions the VNM utility theorem is based on, that I suspect aren’t always good-enough approximations for the worlds we are in:
1) VNM assumes there are no important external incentives, that’ll give you more of what you care about if you run your mind (certain ways, not other ways). So, for example:
1a) “You” (the decision-maker process we are modeling) can choose anything you like, without risk of losing control of your hardware. (Contrast case: if the ruler of a country chooses unpopular policies, they are sometimes ousted. If a human chooses dieting/unrewarding problems/social risk, they sometimes lose control of themselves.)
1b) There are no costs to maintaining control of your mind/hardware. (Contrast case: if a company hires some brilliant young scientists to be creative on its behalf, it often has to pay a steep overhead if it additionally wants to make sure those scientists don’t disrupt its goals/beliefs/normal functioning.)
1c) We can’t acquire more resources by changing who we are via making friends, adopting ethics that our prospective friends want us to follow, etc.
2) VNM assumes the independence axiom. (Contrast case: Maybe we are a “society of mind” that has lots of small ~agents that will only stay knitted together if we respect “fairness” or something. And maybe the best ways of doing this violate the independence axiom. See Scott Garrabrant.) (Aka, I’m agreeing with Jan.)
2a) And maybe this’ll keep being true, even if we get to reflect a lot, if we keep wanting to craft in new creative processes that we don’t want to pay to keep fully supervised.
3) (As Steven Byrnes notes) we care only about the external world, and don’t care about the process we use to make decisions. (Contrast case: we might have process proferences, as well as outcome preferences.)
4) We have accurate external reference. Like, we can choose actions based on what external outcomes we want, and this power is given to us for free, stably. (Contrast case: ethics is sometimes defended as a set of compensations for how our maps predictably diverge from the territory, e.g. running on untrustworthy hardware, or “respect people, because they’re bigger than your map of them so you should expect they may benefit from e.g. honesty in ways you won’t manage to specifically predict.”) (Alternate contrast case: it’s hard to build a mind that can do external reference toward e.g. “diamonds”).
(I don’t think any of these, except 2, are things that the VNM axioms rely on. The rest seem totally compatible to me. I agree that 2 is interesting, and I’ve liked Scott Garrabrant’s exploration of the stuff)
The VNM axioms refer to an “agent” who has “preferences” over lotteries of outcomes. It seems to me this is challenging to interpret if there isn’t a persistent agent, with a persistent mind, who assigns Bayesian subjective probabilities to outcomes (which I’m assuming it has some ability to think about and care about, i.e. my (4)), and who chooses actions based on their preferences between lotteries. That is, it seems to me the axioms rely on there being a mind that is certain kinds of persistent/unaffected.
Do you (habryka) mean there’s a new “utility function” at any given moment, made of “outcomes” that can include parts of how the agent runs its own inside? Or can you say more about VNM is compatible with the negations of my 1, 3, and 4, or otherwise give me more traction for figuring out where our disagreement is coming from?
I was reasoning mostly from “what’re the assumptions required for an agent to base its choices on the anticipated external consequences of those choices.”
Right but if there isn’t a persistent agent with a persistent mind, then we no longer have an entity to which predicates of rationality apply (at least in the sense that the term “rationality” is usually understood in this community). Talking about it in terms of “it’s no longer vNM-rational” feels like saying “it’s no longer wet” when you change the subject of discussion from physical bodies to abstract mathematical structures.
Or am I misunderstanding you?
I was trying to explain to Habryka why I thought (1), (3) and (4) are parts of the assumptions under which the VNM utility theorem is derived.
I think all of (1), (2), (3) and (4) are part of the context I’ve usually pictured in understanding VNM as having real-world application, at least. And they’re part of this context because I’ve been wanting to think of a mind as having persistence, and persistent preferences, and persistent (though rationally updated) beliefs about what lotteries of outcomes can be chosen via particular physical actions, and stuff. (E.g., in Scott’s example about the couple, one could say “they don’t really violate independence; they just care also about process-fairness” or something, but, … it seems more natural to attach words to real-world scenarios in such a way as to say the couple does violate independence. And when I try to reason this way, I end up thinking that all of (1)-(4) are part of the most natural way to try to get the VNM utility theorem to apply to the world with sensible, non-Grue-like word-to-stuff mappings.)
I’m not sure why Habryka disagrees. I feel like lots of us are talking past each other in this subthread, and am not sure how to do better.
I don’t think I follow your (Mateusz’s) remark yet.
It may be that some of the good reasons to not be VNM right now, will continue to be such. In that case, there’s no point at which you want to be VNM, and in some senses you don’t even limit to VNM. (E.g. you might limit to VNM in the sense that, for any local ontology thing, as long as it isn’t revised, you tend toward VNMness; but the same mind might fail to limit to VNM in that, on any given day, the stuff it is most concerned/involved with makes it look quite non-VNM.)
If you negate 1 or 3, then you have an additional factor/consideration in what your mind should be shaped like and the conclusion “you better be shaped such that your behavior is interpretable as maximizing some (sensible) utility function or otherwise you are exploitable or miss out on profitable bets” doesn’t straightforwardly follow.
I feel like people keep imagining that VNM rationality is some highly specific cognitive architecture. But the real check for VNM rationality is (approximately) just “can you be dutch booked?”.
I think I can care about external incentives on how my mind runs, and not be dutch booked. Therefore, it’s not in conflict with VNM rationality. This means there is some kind of high-dimensional utility function which I am behaving to, but it doesn’t mean me or anyone else has access to that utility function, or that utility function has a particularly natural basis, or an explicit representation of that utility function is useful for doing the actual computation of how I am making decisions.
I feel like this discussion could do with some disambiguation of what “VNM rationality” means.
VNM assumes consequentialism. If you define consequentialism narrowly, this has specific results in terms of instrumental convergence.
You can redefine what constitutes a consequence arbitrarily. But, along the lines of what Steven Byrnes points out in his comment, redefining this can get rid of instrumental convergence. In the extreme case you can define a utility function for literally any pattern of behaviour.
When you say you feel like you can’t be dutch booked, you are at least implicitly assuming some definition of consequences you can’t be dutch booked in terms of. To claim that one is rationally required to adopt any particular definition of consequences in your utility function is basically circular, since you only care about being dutch booked according to it if you actually care about that definition of consequences. It’s in this sense that the VNM theorem is trivial.
BTW I am concerned that self-modifying AIs may self-modify towards VNM-0 agents.
But the reason is not because such self modification is “rational”.
It’s just that (narrowly defined) consequentialist agents care about preserving and improving their abilities to and proclivities to pursue their consequentialist goals, so tendencies towards VNM-0 will be reinforced in a feedback loop. Likewise for inter-agent competition.