Do you think an AI extrapolating human preferences should also have a bounded utility function with a not very high bound? Do you think such an AI should give in to muggers?
Do you think an AI extrapolating human preferences should also have a bounded utility function with a not very high bound?
That phrasing suggests a certain structure, which I am suspicious of. For instance, we can have a bounded utility function which is a sum of terms reflecting different features of the world. We can have one bounded term for finite amounts of physical goods (with diminishing utility with increasing quantity), another for producing infinite goods (paperclips or happy people, whatever), another for average welfare in the multiverse according to some measure, and so forth.
Do you think such an AI should give in to muggers?
It’s quite hard to come up with a plausible AI utility function that would give in to such a mugger, since conditional on the possibility of producing lots of happy people (or whatever) there are better uses for the $5 in arranging that, e.g. high-energy physics research, well-orchestrated attempts to estimate and influence any beings running our world as a simulation, etc.
We can have one bounded term for finite amounts of physical goods (with diminishing utility with increasing quantity), another for producing infinite goods (paperclips or happy people, whatever)
So producing 10 happy people would sometimes outweigh producing infinite happy people? That sounds suspicious. Is that idea written up somewhere?
It’s quite hard to come up with a plausible AI utility function that would give in to such a mugger, since conditional on the possibility of producing lots of happy people (or whatever) there are better uses for the $5 in arranging that, e.g. high-energy physics research, well-orchestrated attempts to estimate and influence any beings running our world as a simulation, etc.
I think you’re not assuming the least convenient possible world here. Consider a very smart AI that has only two options: giving $5 to the mugger, or using the money to feed one starving African kid. Should it pay the mugger?
So producing 10 happy people would sometimes outweigh producing infinite happy people? That sounds suspicious. Is that idea written up somewhere?
A lottery with a chance of producing 10 happy people could sometimes outweigh a lottery with a payoff of infinite happy people, but in a direct comparison of certain payoffs producing infinite happy people includes producing 10 happy people.
I think you’re not assuming the least convenient possible world here. Consider a very smart AI that has only two options: giving $5 to the mugger, or using the money to feed one starving African kid. Should it pay the mugger?
That’s not inconvenient enough. Feeding a starving African kid probabilistically displaces efforts by folk like Bill Gates or Giving What We Can enthusiasts to reducing existential risk and increasing our likelihood of surviving to exploit wacky physics. But I’ll iron-man the dilemma to have infinite certainty that the African kid’s life will have no other effects on anything, the incentive effects of encouraging people to perform Pascal’s muggings are nil, etc. I’ll assume that I’m a causal decision theorist (otherwise in a Big World choosing to save the child can mean that my infinite counterparts do so as well, for an infinite payoff).
After that iron-manning, my current best guess is that I would prefer the AI feed the kid, unless the Mugging was more credible than it is described as in, e.g. Nick Bostrom’s write-up.
Do you think an AI extrapolating human preferences should also have a bounded utility function with a not very high bound? Do you think such an AI should give in to muggers?
That phrasing suggests a certain structure, which I am suspicious of. For instance, we can have a bounded utility function which is a sum of terms reflecting different features of the world. We can have one bounded term for finite amounts of physical goods (with diminishing utility with increasing quantity), another for producing infinite goods (paperclips or happy people, whatever), another for average welfare in the multiverse according to some measure, and so forth.
It’s quite hard to come up with a plausible AI utility function that would give in to such a mugger, since conditional on the possibility of producing lots of happy people (or whatever) there are better uses for the $5 in arranging that, e.g. high-energy physics research, well-orchestrated attempts to estimate and influence any beings running our world as a simulation, etc.
So producing 10 happy people would sometimes outweigh producing infinite happy people? That sounds suspicious. Is that idea written up somewhere?
I think you’re not assuming the least convenient possible world here. Consider a very smart AI that has only two options: giving $5 to the mugger, or using the money to feed one starving African kid. Should it pay the mugger?
A lottery with a chance of producing 10 happy people could sometimes outweigh a lottery with a payoff of infinite happy people, but in a direct comparison of certain payoffs producing infinite happy people includes producing 10 happy people.
That’s not inconvenient enough. Feeding a starving African kid probabilistically displaces efforts by folk like Bill Gates or Giving What We Can enthusiasts to reducing existential risk and increasing our likelihood of surviving to exploit wacky physics. But I’ll iron-man the dilemma to have infinite certainty that the African kid’s life will have no other effects on anything, the incentive effects of encouraging people to perform Pascal’s muggings are nil, etc. I’ll assume that I’m a causal decision theorist (otherwise in a Big World choosing to save the child can mean that my infinite counterparts do so as well, for an infinite payoff).
After that iron-manning, my current best guess is that I would prefer the AI feed the kid, unless the Mugging was more credible than it is described as in, e.g. Nick Bostrom’s write-up.