Yes good point, I didn’t consider how fast the solomonoff prior weight goes down in relation to reward, but I did mention that eventually everything is dominated by programs longer than your observation bit-sequence.
So perhaps that is a good route for tightening the zero limit. For example a 30yr old human has an observation history of perhaps 10^16 bits, which sets an upper bound on the zero limit at probabilities around 1/2^10^16. (not sure what that is in knuth notation?)
But again it has nothing to do with the mugger’s words or claimed reward: causality breakdown occurs when the programs are longer than the observation history. At this limit the expected reward of any action is zero.
I’ll edit the OP to include your point and how the zero limit depends on observation history.
nothing to do with the mugger’s words or claimed reward
Yvain’s rebuttal in one of the threads you linked to is pretty good, and can be cashed out in terms of programs simulating worlds and sampling the simulations to generate observations for AIXI. You will not get a complexity penalty in bits anywhere close to the size of your observation history.
Disclaimer: one shouldn’t actually hand over money to a mugger in such cases. Human preferences can usually be better summed up in terms of bounded utility functions (which may assign [bounded] utility to infinite quantities of stuff), and there would be less implausible ways to convert $5 into vast utility than handing it over to such a character.
Yvain’s argument doesn’t hold water, it’s just an intuition pump. The mugger is irrelevant and the threat is irrelevant, it’s the tiny probabilities that matter. There is a minimum probability regime beyond which expected rewards goes to zero.
Do you think an AI extrapolating human preferences should also have a bounded utility function with a not very high bound? Do you think such an AI should give in to muggers?
Do you think an AI extrapolating human preferences should also have a bounded utility function with a not very high bound?
That phrasing suggests a certain structure, which I am suspicious of. For instance, we can have a bounded utility function which is a sum of terms reflecting different features of the world. We can have one bounded term for finite amounts of physical goods (with diminishing utility with increasing quantity), another for producing infinite goods (paperclips or happy people, whatever), another for average welfare in the multiverse according to some measure, and so forth.
Do you think such an AI should give in to muggers?
It’s quite hard to come up with a plausible AI utility function that would give in to such a mugger, since conditional on the possibility of producing lots of happy people (or whatever) there are better uses for the $5 in arranging that, e.g. high-energy physics research, well-orchestrated attempts to estimate and influence any beings running our world as a simulation, etc.
We can have one bounded term for finite amounts of physical goods (with diminishing utility with increasing quantity), another for producing infinite goods (paperclips or happy people, whatever)
So producing 10 happy people would sometimes outweigh producing infinite happy people? That sounds suspicious. Is that idea written up somewhere?
It’s quite hard to come up with a plausible AI utility function that would give in to such a mugger, since conditional on the possibility of producing lots of happy people (or whatever) there are better uses for the $5 in arranging that, e.g. high-energy physics research, well-orchestrated attempts to estimate and influence any beings running our world as a simulation, etc.
I think you’re not assuming the least convenient possible world here. Consider a very smart AI that has only two options: giving $5 to the mugger, or using the money to feed one starving African kid. Should it pay the mugger?
So producing 10 happy people would sometimes outweigh producing infinite happy people? That sounds suspicious. Is that idea written up somewhere?
A lottery with a chance of producing 10 happy people could sometimes outweigh a lottery with a payoff of infinite happy people, but in a direct comparison of certain payoffs producing infinite happy people includes producing 10 happy people.
I think you’re not assuming the least convenient possible world here. Consider a very smart AI that has only two options: giving $5 to the mugger, or using the money to feed one starving African kid. Should it pay the mugger?
That’s not inconvenient enough. Feeding a starving African kid probabilistically displaces efforts by folk like Bill Gates or Giving What We Can enthusiasts to reducing existential risk and increasing our likelihood of surviving to exploit wacky physics. But I’ll iron-man the dilemma to have infinite certainty that the African kid’s life will have no other effects on anything, the incentive effects of encouraging people to perform Pascal’s muggings are nil, etc. I’ll assume that I’m a causal decision theorist (otherwise in a Big World choosing to save the child can mean that my infinite counterparts do so as well, for an infinite payoff).
After that iron-manning, my current best guess is that I would prefer the AI feed the kid, unless the Mugging was more credible than it is described as in, e.g. Nick Bostrom’s write-up.
Yes good point, I didn’t consider how fast the solomonoff prior weight goes down in relation to reward, but I did mention that eventually everything is dominated by programs longer than your observation bit-sequence.
So perhaps that is a good route for tightening the zero limit. For example a 30yr old human has an observation history of perhaps 10^16 bits, which sets an upper bound on the zero limit at probabilities around 1/2^10^16. (not sure what that is in knuth notation?)
But again it has nothing to do with the mugger’s words or claimed reward: causality breakdown occurs when the programs are longer than the observation history. At this limit the expected reward of any action is zero.
I’ll edit the OP to include your point and how the zero limit depends on observation history.
Yvain’s rebuttal in one of the threads you linked to is pretty good, and can be cashed out in terms of programs simulating worlds and sampling the simulations to generate observations for AIXI. You will not get a complexity penalty in bits anywhere close to the size of your observation history.
Disclaimer: one shouldn’t actually hand over money to a mugger in such cases. Human preferences can usually be better summed up in terms of bounded utility functions (which may assign [bounded] utility to infinite quantities of stuff), and there would be less implausible ways to convert $5 into vast utility than handing it over to such a character.
Yvain’s argument doesn’t hold water, it’s just an intuition pump. The mugger is irrelevant and the threat is irrelevant, it’s the tiny probabilities that matter. There is a minimum probability regime beyond which expected rewards goes to zero.
Do you think an AI extrapolating human preferences should also have a bounded utility function with a not very high bound? Do you think such an AI should give in to muggers?
That phrasing suggests a certain structure, which I am suspicious of. For instance, we can have a bounded utility function which is a sum of terms reflecting different features of the world. We can have one bounded term for finite amounts of physical goods (with diminishing utility with increasing quantity), another for producing infinite goods (paperclips or happy people, whatever), another for average welfare in the multiverse according to some measure, and so forth.
It’s quite hard to come up with a plausible AI utility function that would give in to such a mugger, since conditional on the possibility of producing lots of happy people (or whatever) there are better uses for the $5 in arranging that, e.g. high-energy physics research, well-orchestrated attempts to estimate and influence any beings running our world as a simulation, etc.
So producing 10 happy people would sometimes outweigh producing infinite happy people? That sounds suspicious. Is that idea written up somewhere?
I think you’re not assuming the least convenient possible world here. Consider a very smart AI that has only two options: giving $5 to the mugger, or using the money to feed one starving African kid. Should it pay the mugger?
A lottery with a chance of producing 10 happy people could sometimes outweigh a lottery with a payoff of infinite happy people, but in a direct comparison of certain payoffs producing infinite happy people includes producing 10 happy people.
That’s not inconvenient enough. Feeding a starving African kid probabilistically displaces efforts by folk like Bill Gates or Giving What We Can enthusiasts to reducing existential risk and increasing our likelihood of surviving to exploit wacky physics. But I’ll iron-man the dilemma to have infinite certainty that the African kid’s life will have no other effects on anything, the incentive effects of encouraging people to perform Pascal’s muggings are nil, etc. I’ll assume that I’m a causal decision theorist (otherwise in a Big World choosing to save the child can mean that my infinite counterparts do so as well, for an infinite payoff).
After that iron-manning, my current best guess is that I would prefer the AI feed the kid, unless the Mugging was more credible than it is described as in, e.g. Nick Bostrom’s write-up.