I think your semi-updateless idea is pretty interesting. The main issue I’m concerned about is finding a way to update on the things we want to have updated on, but not on the things we don’t want updated on.
As as example, consider Newcomb’s problem. There are two boxes. A superintelligent predictor will put $1000 in one box and $10 in the other if it predicts you will only take one box. Otherwise it doesn’t add money to either box. You see one is transparent and contains $1000.
I’m concerned the semi-updateless agent would reason as follows: “Well, since their’s money in the one box, their must be money in the other box. So, clearly that means this “Earth” thing I’m in is a place in which there is money in both boxes in front of me. I only care about how well I do in this “Earth” place, and clearly I’d do better if I got the money from the second box. So I’ll two-box.
But that’s the wrong choice. Because agents who would two-box end up with $0.
One intuitive way this case could work out, is if the SUDT could say “Ok, I’m in this Earth. And these Earthians consider themselves ‘the same as’ (or close enough) the alt-Earthians from the world where I’m actually inside a simulation that Omega is running to predict what I would do; so, though I’m only taking orders from these Earthians, I still want to act timelessly in this case”. This might be sort of vacuous, since it’s just referring back to the humans’s intuitions about decision theory (what they consider “the same as” themselves) rather than actually using the AI to do the decision theory, or making the decision theory explicit. But at least it sort of uses some of the AI’s intelligence to apply the humans’s intuitions across more lines of hypothetical reasoning than the humans could do by themselves.
I think your semi-updateless idea is pretty interesting. The main issue I’m concerned about is finding a way to update on the things we want to have updated on, but not on the things we don’t want updated on.
As as example, consider Newcomb’s problem. There are two boxes. A superintelligent predictor will put $1000 in one box and $10 in the other if it predicts you will only take one box. Otherwise it doesn’t add money to either box. You see one is transparent and contains $1000.
I’m concerned the semi-updateless agent would reason as follows: “Well, since their’s money in the one box, their must be money in the other box. So, clearly that means this “Earth” thing I’m in is a place in which there is money in both boxes in front of me. I only care about how well I do in this “Earth” place, and clearly I’d do better if I got the money from the second box. So I’ll two-box.
But that’s the wrong choice. Because agents who would two-box end up with $0.
One intuitive way this case could work out, is if the SUDT could say “Ok, I’m in this Earth. And these Earthians consider themselves ‘the same as’ (or close enough) the alt-Earthians from the world where I’m actually inside a simulation that Omega is running to predict what I would do; so, though I’m only taking orders from these Earthians, I still want to act timelessly in this case”. This might be sort of vacuous, since it’s just referring back to the humans’s intuitions about decision theory (what they consider “the same as” themselves) rather than actually using the AI to do the decision theory, or making the decision theory explicit. But at least it sort of uses some of the AI’s intelligence to apply the humans’s intuitions across more lines of hypothetical reasoning than the humans could do by themselves.