Martín Soto comments on Richard Ngo’s Shortform

Martín Soto 20 Mar 2024 23:57 UTC
1 point
0
But you need some mechanism for actually updating your beliefs about U
Yep, but you can just treat it as another observation channel into UDT. You could, if you want, treat it as a computed number you observe in the corner of your eye, and then just apply UDT maximizing U, and you don’t need to change UDT in any way.
UDT says to pay here
(Let’s not forget this depends on your prior, and we don’t have any privileged way to assign priors to these things. But that’s a tangential point.)
I do agree that there’s not any sharp distinction between situations where it “seems good” and situations where it “seems bad” to get mugged. After all, if all you care about is maximizing EV, then you should take all muggings. It’s just that, when we do that, something feels off (to us humans, maybe due to risk-aversion), and we go “hmm, probably this framework is not modelling everything we want, or missing some important robustness considerations, or whatever, because I don’t really feel like spending all my resources and creating a lot of disvalue just because in the world where 1 + 1 = 3 someone is offering me a good deal”. You start to see how your abstractions might break, and how you can’t get any satisfying notion of “complete updatelessness” (that doesn’t go against important intuitions). And you start to rethink whether this is what we normatively want, nor what we realistically see in agents.
- Richard_Ngo 21 Mar 2024 0:12 UTC
  2 points
  0
  Parent
  Yep, but you can just treat it as another observation channel into UDT.
  Hmm, I’m confused by this. Why should we treat it this way? There’s no actual observation channel, and in order to derive information about utilities from our experiences, we need to specify some value learning algorithm. That’s the role V is playing.
  It’s just that, when we do that, something feels off (to us humans, maybe due to risk-aversion), and we go “hmm, probably this framework is not modelling everything we want, or missing some important robustness considerations, or whatever, because I don’t really feel like spending all my resources and creating a lot of disvalue just because in the world where 1 + 1 = 3 someone is offering me a good deal”.
  Obviously I am not arguing that you should agree to all moral muggings. If a pain-maximizer came up to you and said “hey, looks like we’re in a world where pain is way easier to create than pleasure, give me all your resources”, it would be nuts to agree, just like it would be nuts to get mugged by “1+1=3″. I’m just saying that “sometimes you get mugged” is not a good argument against my position, and definitely doesn’t imply “you get mugged everywhere”.
  - Martín Soto 21 Mar 2024 0:42 UTC
    1 point
    0
    Parent
    There’s no actual observation channel, and in order to derive information about utilities from our experiences, we need to specify some value learning algorithm.
    Yes, absolutely! I just meant that, once you give me whatever V you choose to derive U from observations, I will just be able to apply UDT on top of that. So under this framework there doesn’t seem to be anything new going on, because you are just choosing an algorithm V at the start of time, and then treating its outputs as observations. That’s, again, why this only feels like a good model of “completely crystallized rigid values”, and not of “organically building them up slowly, while my concepts and planner module also evolve, etc.”.^[1]
    definitely doesn’t imply “you get mugged everywhere”
    Wait, but how does your proposal differ from EV maximization (with moral uncertainty as part of the EV maximization itself, as I explain above)?
    Because anything that is doing pure EV maximization “gets mugged everywhere”. Meaning if you actually have the beliefs (for example, that the world where suffering is hard to produce could exist), you just take those bets.
    Of course if you don’t have such “extreme” beliefs it doesn’t, but then we’re not talking about decision-making, and instead belief-formation. You could say “I will just do EV maximization, but never have extreme beliefs that lead to suspiciously-looking behavior”, but that’d be hiding the problem under belief-formation, and doesn’t seem to be the kind of efficient mechanism that agents really implement to avoid these failure modes.
    ^
    To be clear, V can be a very general algorithm (like “run a copy of me thinking about ethics”), so that this doesn’t “feel like” having rigid values. Then I just think you’re carving reality at the wrong spot. You’re ignoring the actual dynamics of messy value formation, hiding them under V.