VojtaKovarik comments on When is Goodhart catastrophic?

VojtaKovarik 11 May 2023 19:35 UTC
LW: 6 AF: 3
0
AF
Another piece of related work: Simon Zhuang, Dylan Hadfield-Mennel: Consequences of Misaligned AI.
The authors assume a model where the state of the world is characterized by multiple “features”. There are two key assumptions: (1) our utility is (strictly) increasing in each feature, so—by definition—features are things we care about (I imagine money, QUALYs, chocolate). (2) We have a limited budget, and any increase in any of the features always has a non-zero cost. The paper shows that: (A) if you are only allowed to tell your optimiser about a strict subset of the features, all of the non-specified features get thrown under the buss. (B) However, if you can optimise things gradually, then you can alternate which features you focus on, and somehow things will end up being pretty okay.
Personal note: Because of the assumption (2), I find the result (A) extremely unsurprising, and perhaps misleading. Yes, it is true that at the Pareto-frontier of resource allocation, there is no space for positive-sum interactions (ie, getting better on some axis must hurt us on some other axis). But the assumption (2) instead claims that positive-sum interactions are literally never possible. This is clearly untrue in the real-world, about things we care about.
That said, I find the result (B) quite interesting, and I don’t mean to hate on the paper :-).
- Noosphere89 15 Apr 2024 17:45 UTC
  2 points
  0
  Parent
  The real issue IMO is assumption 1, the assumption that utility strictly increases. Assumption 2 is, barring rather exotic regimes far into the future, basically always correct, and for irreversible computation, this always happens, since there’s a minimum cost to increase the features IRL, and it isn’t 0.
  
  Increasing utility IRL is not free.
  
  Assumption 1 is plausibly violated for some goods, provided utility grows slower than logarithmic, but the worry here is status might actually be a utility that strictly increases, at least relatively speaking.
  - VojtaKovarik 15 Apr 2024 18:37 UTC
    1 point
    0
    Parent
    Assumption 2 is, barring rather exotic regimes far into the future, basically always correct, and for irreversible computation, this always happens, since there’s a minimum cost to increase the features IRL, and it isn’t 0.
    Increasing utility IRL is not free.
    I think this is a misunderstanding of what I meant. (And the misunderstanding probably only makes sense to try clarifying it if you read the paper and disagree with my interpretation of it, rather than if your reaction is only based on my summary. Not sure which of the two is the case.)
    What I was trying to say is that the most natural interpretation of the paper’s model does not allow for things like: In state 1, the world is exactly as it is now, except that you decided to sleep on the floor every day instead of in your bed (for no particular reason), and you are tired and miserable all day. State 2 is exactly the same as state 1, except you decided that it would be smarter to sleep in your bed. And now, state 2 is just strictly better than state 1 (at least in all respects that you would care to name).
    Essentially, the paper’s model requires, by assumption, that it is impossible to get any efficiency gains (like “don’t sleep on the floor” or “use this more efficient design instead) or mutually-beneficial deals (like helping two sides negotiate and avoid a war).
    Yes, I agree that you can interpret the model in ways that avoid this. EG, maybe by sleeping on the floor, your bed will last longer. And sure, any action at all requires computation. I am just saying that these are perhaps not the interpretations that people initially imagine when reading the paper,. So unless you are using an interpretation like that, it is important to notice those strong assumptions.
    - Noosphere89 15 Apr 2024 19:24 UTC
      3 points
      1
      Parent
      
      Essentially, the paper’s model requires, by assumption, that it is impossible to get any efficiency gains (like “don’t sleep on the floor” or “use this more efficient design instead) or mutually-beneficial deals (like helping two sides negotiate and avoid a war).
      
      Yeah, that was a different assumption that I didn’t realize, because I thought the assumption was solely that we had a limited budget and every increase in a feature has a non-zero cost, which is a very different assumption.
      
      I sort of wish the assumptions were distinguished, because these are very, very different assumptions (for example, you can have positive-sum interactions/trade so long as the cost is sufficiently low and the utility gain is sufficiently high, which is pretty usual.)
    - Noosphere89 18 Dec 2024 22:38 UTC
      2 points
      0
      Parent
      I definitely interpreted the model like this, in that I was assuming all the costs and benefits are included by default:
      
      Yes, I agree that you can interpret the model in ways that avoid this. EG, maybe by sleeping on the floor, your bed will last longer. And sure, any action at all requires computation. I am just saying that these are perhaps not the interpretations that people initially imagine when reading the paper,. So unless you are using an interpretation like that, it is important to notice those strong assumptions.