So, at one point in my misspent youth I played with the idea of building an experimental Omega and looked into the subject in some detail.
In Martin Gardiner’s writeup on this back in 1973 reprinted in The Night Is Large the essay explained that the core idea still works if Omega can just predict with 90% accuracy.
Your choice of ONE box pays nothing if you’re predicted (incorrectly) to two box, and pays $1M if predicted correctly at 90%, for a total EV of $900,000(== (0.1)0 + (0.9)1,000,000).
Your choice of TWO box pays $1k if you’re predicted (correctly) to two box, and pays $1,001,000 if you’re predicted to only one box for a total EV of $101k(== 900 + 100,100 == (0.9)1,000 + (0.1)1,001,000).
So the expected profit from one boxing in a normal game, with Omega accuracy of 90% would be $799k.
Also, by adjusting the game’s payouts we could hypothetically make any amount of genuine human predictability (even just a reliable 51% accuracy) be enough to motivate one boxing.
The super simplistic conceptual question here is the distinction between two kinds of sincerity. One kind of sincerity is assessed at the time of the promise. The other kind of sincerity is assessed retrospectively by seeing whether the promise was upheld.
Then the standard version of the game tries to put a wedge between these concepts by supposing that maybe an initially sincere promise might be violated by the intervention of something like “free will”, and it tries to make this seem slightly more magical (more of a far mode question?) by imagining that the promise was never even uttered, but rather the promise was stolen from the person by the magical mind reading “Omega” entity before the promise was ever even imagined by the person as being possible to make.
One thing that seems clear to me is that if one boxing is profitable but not certain then you might wish you could have done something in the past that would make it clear that you’ll one box, so that you land in the part of Omega’s calculations where the prediction is easy, rather than being one of the edge cases where Omega really has to work for its brier score.
On the other hand, the setup is also (probably purposefully) quite fishy. The promise that “you made” is originally implicit, and depending on your understanding of the game maybe extremely abstract. Omega doesn’t just tell you what it predicted. If you get one box and get nothing and complain then Omega will probably try to twist it around and blame you for its failed prediction. If it all works then you seem to be getting free money, and why is anyone handing out free money?
The whole thing just “feels like the setup for a scam”. Like you one box, get a million, then in your glow of positive trust you give some money to their charitable cause. Then it turns out the charitable cause was fake. Then it turns out the million dollars was counterfeit but your donation was real. Sucker!
And yet… you know, parents actually are pretty good at knowing when their kids are telling the truth or lying. And parents really do give their kids a free lunch. And it isn’t really a scam, it is just normal life as a mortal human being.
But also in the end, for someone to look their parents in the eyes and promise to be home before 10PM and really mean it for reals at the time of the promise, and then be given the car keys, and then come home at 1AM… that also happens. And wouldn’t it be great to just blame that on “free will” and “the 10% of the time that Omega’s predictions fail”?
Looping this back around to the larger AGI question, it seems like what we’re basically hoping for is to learn how to become a flawless Omega (or at least build some software that can do this job) at least for the restricted case of an AGI that we can give the car keys without fear that after it has the car keys it will play the “free will” card and grind us all up into fuel paste after promising not to.
So, at one point in my misspent youth I played with the idea of building an experimental Omega and looked into the subject in some detail.
In Martin Gardiner’s writeup on this back in 1973 reprinted in The Night Is Large the essay explained that the core idea still works if Omega can just predict with 90% accuracy.
Your choice of ONE box pays nothing if you’re predicted (incorrectly) to two box, and pays $1M if predicted correctly at 90%, for a total EV of $900,000 (== (0.1)0 + (0.9)1,000,000).
Your choice of TWO box pays $1k if you’re predicted (correctly) to two box, and pays $1,001,000 if you’re predicted to only one box for a total EV of $101k (== 900 + 100,100 == (0.9)1,000 + (0.1)1,001,000).
So the expected profit from one boxing in a normal game, with Omega accuracy of 90% would be $799k.
Also, by adjusting the game’s payouts we could hypothetically make any amount of genuine human predictability (even just a reliable 51% accuracy) be enough to motivate one boxing.
The super simplistic conceptual question here is the distinction between two kinds of sincerity. One kind of sincerity is assessed at the time of the promise. The other kind of sincerity is assessed retrospectively by seeing whether the promise was upheld.
Then the standard version of the game tries to put a wedge between these concepts by supposing that maybe an initially sincere promise might be violated by the intervention of something like “free will”, and it tries to make this seem slightly more magical (more of a far mode question?) by imagining that the promise was never even uttered, but rather the promise was stolen from the person by the magical mind reading “Omega” entity before the promise was ever even imagined by the person as being possible to make.
One thing that seems clear to me is that if one boxing is profitable but not certain then you might wish you could have done something in the past that would make it clear that you’ll one box, so that you land in the part of Omega’s calculations where the prediction is easy, rather than being one of the edge cases where Omega really has to work for its brier score.
On the other hand, the setup is also (probably purposefully) quite fishy. The promise that “you made” is originally implicit, and depending on your understanding of the game maybe extremely abstract. Omega doesn’t just tell you what it predicted. If you get one box and get nothing and complain then Omega will probably try to twist it around and blame you for its failed prediction. If it all works then you seem to be getting free money, and why is anyone handing out free money?
The whole thing just “feels like the setup for a scam”. Like you one box, get a million, then in your glow of positive trust you give some money to their charitable cause. Then it turns out the charitable cause was fake. Then it turns out the million dollars was counterfeit but your donation was real. Sucker!
And yet… you know, parents actually are pretty good at knowing when their kids are telling the truth or lying. And parents really do give their kids a free lunch. And it isn’t really a scam, it is just normal life as a mortal human being.
But also in the end, for someone to look their parents in the eyes and promise to be home before 10PM and really mean it for reals at the time of the promise, and then be given the car keys, and then come home at 1AM… that also happens. And wouldn’t it be great to just blame that on “free will” and “the 10% of the time that Omega’s predictions fail”?
Looping this back around to the larger AGI question, it seems like what we’re basically hoping for is to learn how to become a flawless Omega (or at least build some software that can do this job) at least for the restricted case of an AGI that we can give the car keys without fear that after it has the car keys it will play the “free will” card and grind us all up into fuel paste after promising not to.