habryka comments on Is “VNM-agent” one of several options, for what minds can grow up into?

habryka 30 Dec 2024 9:03 UTC
8 points
−4
I don’t think I understand. The standard dutch-book arguments seem like pretty good reason to be VNM-rational in the relevant sense. I don’t feel like “wanting more of some things and less of other things” is a particularly narrow part of potential human mind-space (or AI mind-space), and so it makes sense people would want to behave according to that.
It happens to be that this also implies there is some hypothetical utility function that one could use to model those people’s/AI’s behavior, but especially given bounded rationality and huge amount of uncertainty and the dominance of convergent instrumental goals, that part feels like it matters relatively little, either for humans or AIs (like, I don’t know what I care about, neither do AIs know what they care about, we are all many steps of reflection away from being coherent in our values, but if you offer me a taxi from New York to SF, and a taxi from SF back to New York, I will still not take that at any price, unless I really like driving in taxis).
- Jan_Kulveit 30 Dec 2024 12:27 UTC
  25 points
  6
  Parent
  My impression is most people who converged on doubting VNM as norm of rationality also converged on a view that the problem it has in practice is it isn’t necessarily stable under some sort of compositionality/fairness. E.g Scott here, Richard here.
  
  The broader picture could be something like …yes, there is some selection pressure from the dutch-book arguments, but there are stronger selection pressures coming from being part of bigger things or being composed of parts
  - Richard_Ngo 30 Dec 2024 19:52 UTC
    10 points
    4
    Parent
    Yepp, though note that this still feels in tension with the original post to me—I expect to find a clean, elegant replacement to VNM, not just a set of approximately-equally-compelling alternatives.
    
    Why? Partly because of inside views which I can’t explain in brief. But mainly because that’s how conceptual progress works in general. There is basically always far more hidden beauty and order in the universe than people are able to conceive (because conceiving of it is nearly as hard as discovering it—like, before Darwin, people wouldn’t have been able to explain what type of theory could bring order to biology).
    
    I read the OP (perhaps uncharitably) as coming from a perspective of historically taking VNM much too seriously, and in this post kinda floating the possibility “what if we took it less seriously?” (this is mostly not from things I know about Anna, but rather a read on how it’s written). And to that I’d say: yepp, take VNM less seriously, but not at the expense of taking the hidden order of the universe less seriously.
    - AnnaSalamon 30 Dec 2024 20:07 UTC
      8 points
      0
      Parent
      I… don’t think I’m taking the hidden order of the universe non-seriously. If it matters, I’ve been obsessively rereading Christopher Alexander’s “The nature of order” books, and trying to find ways to express some of what he’s looking at in LW-friendly terms; this post is part of an attempt at that. I have thousands and thousands of words of discarded drafts about it.
      
      Re: why I think there might be room in the universe for multiple aspirational models of agency, each of which can be self-propagating for a time, in some contexts: Biology and culture often seem to me to have multiple kinda-stable equilibria. Like, eyes are pretty great, but so is sonar, and so is a sense of smell, or having good memory and priors about one’s surroundings, and each fulfills some of the same purposes. Or diploidy and haplodiploidy are both locally-kinda-stable reproductive systems.
      What makes you think I’m insufficiently respecting the hidden order of the universe?
      - plex 30 Dec 2024 23:16 UTC
        2 points
        0
        Parent
        “The nature of order”
        Link is broken
        AnnaSalamon 30 Dec 2024 23:21 UTC
        2 points
        0
        Parent
        Thanks; fixed.
    - Noosphere89 2 Jan 2025 15:50 UTC
      2 points
      0
      Parent
      I think a crux here is that I think the domain of values/utility functions is a domain in which it’s likely that multiple structures are equally compelling, and my big reason for this probably derives from me being a moral relativist here, in which while morality is something like a real thing, it’s not objective or universal, and other people are allowed to hold different moralities and not update on them.
      
      (Side note, but most of the objections that a lot of people hold about moral realism can be alleviated just by being a moral relativist, rather than a moral anti-realist).
- Steven Byrnes 30 Dec 2024 15:51 UTC
  18 points
  13
  Parent
  The standard dutch-book arguments seem like pretty good reason to be VNM-rational in the relevant sense.
  I think that’s kinda circular reasoning, the way you’re using it in context:
  If I have preferences exclusively about the state of the world in the distant future, then dutch-book arguments indeed show that I should be VNM-rational. But if I don’t have such preferences, then someone could say “hey Steve, your behavior is dutch-bookable”, and I am allowed to respond “OK, but I still want to behave that way”.
  I put a silly example here:
  For example, the first (Yudkowsky) post mentions a hypothetical person at a restaurant. When they have an onion pizza, they’ll happily pay $0.01 to trade it for a pineapple pizza. When they have a pineapple pizza, they’ll happily pay $0.01 to trade it for a mushroom pizza. When they have a mushroom pizza, they’ll happily pay $0.01 to trade it for a pineapple pizza. The person goes around and around, wasting their money in a self-defeating way (a.k.a. “getting money-pumped”).
  That post describes the person as behaving sub-optimally. But if you read carefully, the author sneaks in a critical background assumption: the person in question has preferences about what pizza they wind up eating, and they’re making these decisions based on those preferences. But what if they don’t? What if the person has no preference whatsoever about pizza? What if instead they’re an asshole restaurant customer who derives pure joy from making the waiter run back and forth to the kitchen?! Then we can look at the same behavior, and we wouldn’t describe it as self-defeating “getting money-pumped”, instead we would describe it as the skillful satisfaction of the person’s own preferences! They’re buying cheap entertainment! So that would be an example of preferences-not-concerning-future-states.
  (I’m assuming in this comment that the domain (input) of the VNM utility function is purely the state of the world in the distant future. If you don’t assume that, then saying that I should have a VNM utility function is true but trivial, and in particular doesn’t imply instrumental convergence. Again, more discussion here.)
  (I agree that humans do in fact have preferences about the state of the world in the future, and that AGIs will too, and that this leads to instrumental convergence and is important, etc. I’m just saying that humans don’t exclusively have preferences about the state of the world in the future, and AGIs might be the same, and that this caveat is potentially important.)
- AnnaSalamon 30 Dec 2024 18:40 UTC
  9 points
  3
  Parent
  The standard dutch-book arguments seem like pretty good reason to be VNM-rational in the relevant sense.
  I mean, there are arguments about as solid as the “VNM utility theorem” pointing to CDT, but CDT is nevertheless not always the thing to aspire to, because CDT is based on an assumption/approximation that is not always a good-enough approximation (namely, CDT assumes our minds have no effects except via our actions, eg it assumes our minds have no direct effects on others’ predictions about us).
  Some assumptions the VNM utility theorem is based on, that I suspect aren’t always good-enough approximations for the worlds we are in:
  1) VNM assumes there are no important external incentives, that’ll give you more of what you care about if you run your mind (certain ways, not other ways). So, for example:
  1a) “You” (the decision-maker process we are modeling) can choose anything you like, without risk of losing control of your hardware. (Contrast case: if the ruler of a country chooses unpopular policies, they are sometimes ousted. If a human chooses dieting/unrewarding problems/social risk, they sometimes lose control of themselves.)
  1b) There are no costs to maintaining control of your mind/hardware. (Contrast case: if a company hires some brilliant young scientists to be creative on its behalf, it often has to pay a steep overhead if it additionally wants to make sure those scientists don’t disrupt its goals/beliefs/normal functioning.)
  1c) We can’t acquire more resources by changing who we are via making friends, adopting ethics that our prospective friends want us to follow, etc.
  2) VNM assumes the independence axiom. (Contrast case: Maybe we are a “society of mind” that has lots of small ~agents that will only stay knitted together if we respect “fairness” or something. And maybe the best ways of doing this violate the independence axiom. See Scott Garrabrant.) (Aka, I’m agreeing with Jan.)
  2a) And maybe this’ll keep being true, even if we get to reflect a lot, if we keep wanting to craft in new creative processes that we don’t want to pay to keep fully supervised.
  3) (As Steven Byrnes notes) we care only about the external world, and don’t care about the process we use to make decisions. (Contrast case: we might have process proferences, as well as outcome preferences.)
  4) We have accurate external reference. Like, we can choose actions based on what external outcomes we want, and this power is given to us for free, stably. (Contrast case: ethics is sometimes defended as a set of compensations for how our maps predictably diverge from the territory, e.g. running on untrustworthy hardware, or “respect people, because they’re bigger than your map of them so you should expect they may benefit from e.g. honesty in ways you won’t manage to specifically predict.”) (Alternate contrast case: it’s hard to build a mind that can do external reference toward e.g. “diamonds”).
  What links here?
  - AnnaSalamon's comment on Is “VNM-agent” one of several options, for what minds can grow up into? by AnnaSalamon (31 Dec 2024 2:40 UTC; 12 points)
  - habryka 30 Dec 2024 19:09 UTC
    7 points
    −4
    Parent
    (I don’t think any of these, except 2, are things that the VNM axioms rely on. The rest seem totally compatible to me. I agree that 2 is interesting, and I’ve liked Scott Garrabrant’s exploration of the stuff)
    - AnnaSalamon 30 Dec 2024 19:31 UTC
      9 points
      2
      Parent
      The VNM axioms refer to an “agent” who has “preferences” over lotteries of outcomes. It seems to me this is challenging to interpret if there isn’t a persistent agent, with a persistent mind, who assigns Bayesian subjective probabilities to outcomes (which I’m assuming it has some ability to think about and care about, i.e. my (4)), and who chooses actions based on their preferences between lotteries. That is, it seems to me the axioms rely on there being a mind that is certain kinds of persistent/unaffected.
      
      Do you (habryka) mean there’s a new “utility function” at any given moment, made of “outcomes” that can include parts of how the agent runs its own inside? Or can you say more about VNM is compatible with the negations of my 1, 3, and 4, or otherwise give me more traction for figuring out where our disagreement is coming from?
      I was reasoning mostly from “what’re the assumptions required for an agent to base its choices on the anticipated external consequences of those choices.”
      - Mateusz Bagiński 30 Dec 2024 19:45 UTC
        4 points
        0
        Parent
        
        It seems to me this is challenging to interpret if there isn’t a persistent agent, with a persistent mind, who assigns Bayesian subjective probabilities to outcomes
        
        Right but if there isn’t a persistent agent with a persistent mind, then we no longer have an entity to which predicates of rationality apply (at least in the sense that the term “rationality” is usually understood in this community). Talking about it in terms of “it’s no longer vNM-rational” feels like saying “it’s no longer wet” when you change the subject of discussion from physical bodies to abstract mathematical structures.
        
        Or am I misunderstanding you?
        AnnaSalamon 30 Dec 2024 20:30 UTC
        4 points
        0
        Parent
        I was trying to explain to Habryka why I thought (1), (3) and (4) are parts of the assumptions under which the VNM utility theorem is derived.
        
        I think all of (1), (2), (3) and (4) are part of the context I’ve usually pictured in understanding VNM as having real-world application, at least. And they’re part of this context because I’ve been wanting to think of a mind as having persistence, and persistent preferences, and persistent (though rationally updated) beliefs about what lotteries of outcomes can be chosen via particular physical actions, and stuff. (E.g., in Scott’s example about the couple, one could say “they don’t really violate independence; they just care also about process-fairness” or something, but, … it seems more natural to attach words to real-world scenarios in such a way as to say the couple does violate independence. And when I try to reason this way, I end up thinking that all of (1)-(4) are part of the most natural way to try to get the VNM utility theorem to apply to the world with sensible, non-Grue-like word-to-stuff mappings.)
        I’m not sure why Habryka disagrees. I feel like lots of us are talking past each other in this subthread, and am not sure how to do better.
        I don’t think I follow your (Mateusz’s) remark yet.
    - TsviBT 30 Dec 2024 19:42 UTC
      4 points
      0
      Parent
      It may be that some of the good reasons to not be VNM right now, will continue to be such. In that case, there’s no point at which you want to be VNM, and in some senses you don’t even limit to VNM. (E.g. you might limit to VNM in the sense that, for any local ontology thing, as long as it isn’t revised, you tend toward VNMness; but the same mind might fail to limit to VNM in that, on any given day, the stuff it is most concerned/involved with makes it look quite non-VNM.)
    - Mateusz Bagiński 30 Dec 2024 19:31 UTC
      3 points
      0
      Parent
      If you negate 1 or 3, then you have an additional factor/consideration in what your mind should be shaped like and the conclusion “you better be shaped such that your behavior is interpretable as maximizing some (sensible) utility function or otherwise you are exploitable or miss out on profitable bets” doesn’t straightforwardly follow.
      - habryka 30 Dec 2024 19:48 UTC
        4 points
        −1
        Parent
        I feel like people keep imagining that VNM rationality is some highly specific cognitive architecture. But the real check for VNM rationality is (approximately) just “can you be dutch booked?”.
        I think I can care about external incentives on how my mind runs, and not be dutch booked. Therefore, it’s not in conflict with VNM rationality. This means there is some kind of high-dimensional utility function which I am behaving to, but it doesn’t mean me or anyone else has access to that utility function, or that utility function has a particularly natural basis, or an explicit representation of that utility function is useful for doing the actual computation of how I am making decisions.
        simon 30 Dec 2024 21:41 UTC
        4 points
        0
        Parent
        I feel like this discussion could do with some disambiguation of what “VNM rationality” means.
        VNM assumes consequentialism. If you define consequentialism narrowly, this has specific results in terms of instrumental convergence.
        
        You can redefine what constitutes a consequence arbitrarily. But, along the lines of what Steven Byrnes points out in his comment, redefining this can get rid of instrumental convergence. In the extreme case you can define a utility function for literally any pattern of behaviour.
        When you say you feel like you can’t be dutch booked, you are at least implicitly assuming some definition of consequences you can’t be dutch booked in terms of. To claim that one is rationally required to adopt any particular definition of consequences in your utility function is basically circular, since you only care about being dutch booked according to it if you actually care about that definition of consequences. It’s in this sense that the VNM theorem is trivial.
        BTW I am concerned that self-modifying AIs may self-modify towards VNM-0 agents.
        
        But the reason is not because such self modification is “rational”.
        
        It’s just that (narrowly defined) consequentialist agents care about preserving and improving their abilities to and proclivities to pursue their consequentialist goals, so tendencies towards VNM-0 will be reinforced in a feedback loop. Likewise for inter-agent competition.
- Said Achmiz 30 Dec 2024 20:31 UTC
  3 points
  1
  Parent
  As far as I can tell, “the standard Dutch book arguments” aren’t even a reason why one’s preferences must conform to all the VNM axioms, much less a “pretty good” reason.
  
  (We’ve had this discussion many times before, and it frustrates me that people seem to forget about this every time.)
  - habryka 30 Dec 2024 20:38 UTC
    2 points
    0
    Parent
    I too get frustrated that people seem to forget about the arguments against your positions every time and keep bringing up positions long addressed :P
    We can dig into it again, might be worth carving out some explicit time for it if you want to. Otherwise it seems fine for you to register your disagreement. My guess is it could be good to link to some of the past conversations if you have a link handy for the benefit of other readers (I don’t right now).
    - Said Achmiz 30 Dec 2024 22:08 UTC
      8 points
      2
      Parent
      Here’s one.