I am not clear at what point in this process the reward blows up such that it qualifies as a mugging. It appears that defining \pi(MAX) as achievable through human policies places the reward calculation firmly within the usual realm.
Pascal’s Wager was about the infinite gain of eternal salvation, and Eliezer’s Mugging example was as much about how the rewards are inducted as it was their magnitude—the pitch was that the nature of Solomonoff induction was such that rewards had no meaningful cap even when statements about their likelihood do, because magnitude is very efficiently communicated.
Let R0 be a reasonable human reward with all its complexity, and let R1 be “the human doesn’t eat”. A modified human can max out R1 much easier than an unmodified human can max out R0 (even though an unmodified human would be terrible at R1). Where the “Pascal” aspect of it comes in, is that we are comparing the practical upper bound of R0 with the theoretical upper bound of R1 - and choosing R1 to have the maximal such theoretical upper bounds.
Reviewing the post with your update, I think the problem may just be that the examples are de-priming my intuition. In your reply you chose ‘the human doesn’t eat’ as the reward for a modified human to maximize, which means the gains are only all the food humans would eat if unmodified. This is compared to brain surgery, which a bit of googling suggests costs 50-150K, much more than it costs to feed a person. It looks like I chunked the proposition as ‘costly intervention to achieve bounded reward’ as a consequence.
However, none of this is actually implied by the math. Insofar as you project there are likely to be other readers like me, it may be worth changing the examples to emphasize a trivial intervention for a very high reward.
I am not clear at what point in this process the reward blows up such that it qualifies as a mugging. It appears that defining \pi(MAX) as achievable through human policies places the reward calculation firmly within the usual realm.
Pascal’s Wager was about the infinite gain of eternal salvation, and Eliezer’s Mugging example was as much about how the rewards are inducted as it was their magnitude—the pitch was that the nature of Solomonoff induction was such that rewards had no meaningful cap even when statements about their likelihood do, because magnitude is very efficiently communicated.
Let R0 be a reasonable human reward with all its complexity, and let R1 be “the human doesn’t eat”. A modified human can max out R1 much easier than an unmodified human can max out R0 (even though an unmodified human would be terrible at R1). Where the “Pascal” aspect of it comes in, is that we are comparing the practical upper bound of R0 with the theoretical upper bound of R1 - and choosing R1 to have the maximal such theoretical upper bounds.
Reviewing the post with your update, I think the problem may just be that the examples are de-priming my intuition. In your reply you chose ‘the human doesn’t eat’ as the reward for a modified human to maximize, which means the gains are only all the food humans would eat if unmodified. This is compared to brain surgery, which a bit of googling suggests costs 50-150K, much more than it costs to feed a person. It looks like I chunked the proposition as ‘costly intervention to achieve bounded reward’ as a consequence.
However, none of this is actually implied by the math. Insofar as you project there are likely to be other readers like me, it may be worth changing the examples to emphasize a trivial intervention for a very high reward.
The brain surgery is an example of how the AI can transform us into the humans it wants us to be—an extreme version of wireheading.
That much I understood—my flaw was reading too much into the example.