Richard_Ngo comments on Richard Ngo’s Shortform

Richard_Ngo Mar 21, 2024, 12:12 AM
2 points
0
Yep, but you can just treat it as another observation channel into UDT.
Hmm, I’m confused by this. Why should we treat it this way? There’s no actual observation channel, and in order to derive information about utilities from our experiences, we need to specify some value learning algorithm. That’s the role V is playing.
It’s just that, when we do that, something feels off (to us humans, maybe due to risk-aversion), and we go “hmm, probably this framework is not modelling everything we want, or missing some important robustness considerations, or whatever, because I don’t really feel like spending all my resources and creating a lot of disvalue just because in the world where 1 + 1 = 3 someone is offering me a good deal”.
Obviously I am not arguing that you should agree to all moral muggings. If a pain-maximizer came up to you and said “hey, looks like we’re in a world where pain is way easier to create than pleasure, give me all your resources”, it would be nuts to agree, just like it would be nuts to get mugged by “1+1=3″. I’m just saying that “sometimes you get mugged” is not a good argument against my position, and definitely doesn’t imply “you get mugged everywhere”.
- Martín Soto Mar 21, 2024, 12:42 AM
  1 point
  0
  Parent
  There’s no actual observation channel, and in order to derive information about utilities from our experiences, we need to specify some value learning algorithm.
  Yes, absolutely! I just meant that, once you give me whatever V you choose to derive U from observations, I will just be able to apply UDT on top of that. So under this framework there doesn’t seem to be anything new going on, because you are just choosing an algorithm V at the start of time, and then treating its outputs as observations. That’s, again, why this only feels like a good model of “completely crystallized rigid values”, and not of “organically building them up slowly, while my concepts and planner module also evolve, etc.”.^[1]
  definitely doesn’t imply “you get mugged everywhere”
  Wait, but how does your proposal differ from EV maximization (with moral uncertainty as part of the EV maximization itself, as I explain above)?
  Because anything that is doing pure EV maximization “gets mugged everywhere”. Meaning if you actually have the beliefs (for example, that the world where suffering is hard to produce could exist), you just take those bets.
  Of course if you don’t have such “extreme” beliefs it doesn’t, but then we’re not talking about decision-making, and instead belief-formation. You could say “I will just do EV maximization, but never have extreme beliefs that lead to suspiciously-looking behavior”, but that’d be hiding the problem under belief-formation, and doesn’t seem to be the kind of efficient mechanism that agents really implement to avoid these failure modes.
  1. ^
    To be clear, V can be a very general algorithm (like “run a copy of me thinking about ethics”), so that this doesn’t “feel like” having rigid values. Then I just think you’re carving reality at the wrong spot. You’re ignoring the actual dynamics of messy value formation, hiding them under V.