People back then certainly didn’t think of changing preferences.
Also, you can get rid of this problem by saying “you just want to maximize the variable U”. And the things you actually care about (dogs, apples) are just “instrumentally” useful in giving you U. So for example, it is possible in the future you will learn dogs give you a lot of U, or alternatively that apples give you a lot of U. Needless to say, this “instrumentalization” of moral deliberation is not how real agents work. And leads to getting Pascal’s mugged by the world in which you care a lot about easy things.
It’s more natural to model U as a logically uncertain variable, freely floating inside your logical inductor, shaped by its arbitrary aesthetic preferences. This doesn’t completely miss the importance of reward in shaping your values, but it’s certainly very different to how frugally computable agents do it.
I simply think the EV maximization framework breaks here. It is a useful abstraction when you already have a rigid enough notion of value, and are applying these EV calculations to a very concrete magisterium about which you can have well-defined estimates. Otherwise you get mugged everywhere. And that’s not how real agents behave.
Also, you can get rid of this problem by saying “you just want to maximize the variable U”. And the things you actually care about (dogs, apples) are just “instrumentally” useful in giving you U.
But you need some mechanism for actually updating your beliefs about U, because you can’t empirically observe U. That’s the role of V.
leads to getting Pascal’s mugged by the world in which you care a lot about easy things
I think this is fine. Consider two worlds:
In world L, lollipops are easy to make, and paperclips are hard to make.
In world P, it’s the reverse.
Suppose you’re a paperclip-maximizer in world L. And a lollipop-maximizer comes up to you and says “hey, before I found out whether we were in L or P, I committed to giving all my resources to paperclip-maximizers if we were in P, as long as they gave me all their resources if we were in L. Pay up.”
UDT says to pay here—but that seems basically equivalent to getting “mugged” by worlds where you care about easy things.
But you need some mechanism for actually updating your beliefs about U
Yep, but you can just treat it as another observation channel into UDT. You could, if you want, treat it as a computed number you observe in the corner of your eye, and then just apply UDT maximizing U, and you don’t need to change UDT in any way.
UDT says to pay here
(Let’s not forget this depends on your prior, and we don’t have any privileged way to assign priors to these things. But that’s a tangential point.)
I do agree that there’s not any sharp distinction between situations where it “seems good” and situations where it “seems bad” to get mugged. After all, if all you care about is maximizing EV, then you should take all muggings. It’s just that, when we do that, something feels off (to us humans, maybe due to risk-aversion), and we go “hmm, probably this framework is not modelling everything we want, or missing some important robustness considerations, or whatever, because I don’t really feel like spending all my resources and creating a lot of disvalue just because in the world where 1 + 1 = 3 someone is offering me a good deal”. You start to see how your abstractions might break, and how you can’t get any satisfying notion of “complete updatelessness” (that doesn’t go against important intuitions). And you start to rethink whether this is what we normatively want, nor what we realistically see in agents.
Yep, but you can just treat it as another observation channel into UDT.
Hmm, I’m confused by this. Why should we treat it this way? There’s no actual observation channel, and in order to derive information about utilities from our experiences, we need to specify some value learning algorithm. That’s the role V is playing.
It’s just that, when we do that, something feels off (to us humans, maybe due to risk-aversion), and we go “hmm, probably this framework is not modelling everything we want, or missing some important robustness considerations, or whatever, because I don’t really feel like spending all my resources and creating a lot of disvalue just because in the world where 1 + 1 = 3 someone is offering me a good deal”.
Obviously I am not arguing that you should agree to all moral muggings. If a pain-maximizer came up to you and said “hey, looks like we’re in a world where pain is way easier to create than pleasure, give me all your resources”, it would be nuts to agree, just like it would be nuts to get mugged by “1+1=3″. I’m just saying that “sometimes you get mugged” is not a good argument against my position, and definitely doesn’t imply “you get mugged everywhere”.
There’s no actual observation channel, and in order to derive information about utilities from our experiences, we need to specify some value learning algorithm.
Yes, absolutely! I just meant that, once you give me whatever V you choose to derive U from observations, I will just be able to apply UDT on top of that. So under this framework there doesn’t seem to be anything new going on, because you are just choosing an algorithm V at the start of time, and then treating its outputs as observations. That’s, again, why this only feels like a good model of “completely crystallized rigid values”, and not of “organically building them up slowly, while my concepts and planner module also evolve, etc.”.[1]
definitely doesn’t imply “you get mugged everywhere”
Wait, but how does your proposal differ from EV maximization (with moral uncertainty as part of the EV maximization itself, as I explain above)?
Because anything that is doing pure EV maximization “gets mugged everywhere”. Meaning if you actually have the beliefs (for example, that the world where suffering is hard to produce could exist), you just take those bets. Of course if you don’t have such “extreme” beliefs it doesn’t, but then we’re not talking about decision-making, and instead belief-formation. You could say “I will just do EV maximization, but never have extreme beliefs that lead to suspiciously-looking behavior”, but that’d be hiding the problem under belief-formation, and doesn’t seem to be the kind of efficient mechanism that agents really implement to avoid these failure modes.
To be clear, V can be a very general algorithm (like “run a copy of me thinking about ethics”), so that this doesn’t “feel like” having rigid values. Then I just think you’re carving reality at the wrong spot. You’re ignoring the actual dynamics of messy value formation, hiding them under V.
People back then certainly didn’t think of changing preferences.
Also, you can get rid of this problem by saying “you just want to maximize the variable U”. And the things you actually care about (dogs, apples) are just “instrumentally” useful in giving you U. So for example, it is possible in the future you will learn dogs give you a lot of U, or alternatively that apples give you a lot of U.
Needless to say, this “instrumentalization” of moral deliberation is not how real agents work. And leads to getting Pascal’s mugged by the world in which you care a lot about easy things.
It’s more natural to model U as a logically uncertain variable, freely floating inside your logical inductor, shaped by its arbitrary aesthetic preferences. This doesn’t completely miss the importance of reward in shaping your values, but it’s certainly very different to how frugally computable agents do it.
I simply think the EV maximization framework breaks here. It is a useful abstraction when you already have a rigid enough notion of value, and are applying these EV calculations to a very concrete magisterium about which you can have well-defined estimates.
Otherwise you get mugged everywhere. And that’s not how real agents behave.
But you need some mechanism for actually updating your beliefs about U, because you can’t empirically observe U. That’s the role of V.
I think this is fine. Consider two worlds:
In world L, lollipops are easy to make, and paperclips are hard to make.
In world P, it’s the reverse.
Suppose you’re a paperclip-maximizer in world L. And a lollipop-maximizer comes up to you and says “hey, before I found out whether we were in L or P, I committed to giving all my resources to paperclip-maximizers if we were in P, as long as they gave me all their resources if we were in L. Pay up.”
UDT says to pay here—but that seems basically equivalent to getting “mugged” by worlds where you care about easy things.
Yep, but you can just treat it as another observation channel into UDT. You could, if you want, treat it as a computed number you observe in the corner of your eye, and then just apply UDT maximizing U, and you don’t need to change UDT in any way.
(Let’s not forget this depends on your prior, and we don’t have any privileged way to assign priors to these things. But that’s a tangential point.)
I do agree that there’s not any sharp distinction between situations where it “seems good” and situations where it “seems bad” to get mugged. After all, if all you care about is maximizing EV, then you should take all muggings. It’s just that, when we do that, something feels off (to us humans, maybe due to risk-aversion), and we go “hmm, probably this framework is not modelling everything we want, or missing some important robustness considerations, or whatever, because I don’t really feel like spending all my resources and creating a lot of disvalue just because in the world where 1 + 1 = 3 someone is offering me a good deal”. You start to see how your abstractions might break, and how you can’t get any satisfying notion of “complete updatelessness” (that doesn’t go against important intuitions). And you start to rethink whether this is what we normatively want, nor what we realistically see in agents.
Hmm, I’m confused by this. Why should we treat it this way? There’s no actual observation channel, and in order to derive information about utilities from our experiences, we need to specify some value learning algorithm. That’s the role V is playing.
Obviously I am not arguing that you should agree to all moral muggings. If a pain-maximizer came up to you and said “hey, looks like we’re in a world where pain is way easier to create than pleasure, give me all your resources”, it would be nuts to agree, just like it would be nuts to get mugged by “1+1=3″. I’m just saying that “sometimes you get mugged” is not a good argument against my position, and definitely doesn’t imply “you get mugged everywhere”.
Yes, absolutely! I just meant that, once you give me whatever V you choose to derive U from observations, I will just be able to apply UDT on top of that. So under this framework there doesn’t seem to be anything new going on, because you are just choosing an algorithm V at the start of time, and then treating its outputs as observations. That’s, again, why this only feels like a good model of “completely crystallized rigid values”, and not of “organically building them up slowly, while my concepts and planner module also evolve, etc.”.[1]
Wait, but how does your proposal differ from EV maximization (with moral uncertainty as part of the EV maximization itself, as I explain above)?
Because anything that is doing pure EV maximization “gets mugged everywhere”. Meaning if you actually have the beliefs (for example, that the world where suffering is hard to produce could exist), you just take those bets.
Of course if you don’t have such “extreme” beliefs it doesn’t, but then we’re not talking about decision-making, and instead belief-formation. You could say “I will just do EV maximization, but never have extreme beliefs that lead to suspiciously-looking behavior”, but that’d be hiding the problem under belief-formation, and doesn’t seem to be the kind of efficient mechanism that agents really implement to avoid these failure modes.
To be clear, V can be a very general algorithm (like “run a copy of me thinking about ethics”), so that this doesn’t “feel like” having rigid values. Then I just think you’re carving reality at the wrong spot. You’re ignoring the actual dynamics of messy value formation, hiding them under V.