This puts me in mind of a thought experiment Yvain posted a while ago (I’m certain he’s not the original author, but I can’t for the life of me track it any further back than his LiveJournal):
“A man has a machine with a button on it. If you press the button, there is a one in five million chance that you will die immediately; otherwise, nothing happens. He offers you some money to press the button once. What do you do? Do you refuse to press it for any amount? If not, how much money would convince you to press the button?”
This is – I think – analogous to your ‘siren world’ thought experiment. Rather than pushing the button once for £X, every time you push the button the AI simulates a new future world and at any point you can stop and implement the future that looks best to you. You have a small probability of uncovering a siren world, which you will be forced to choose because it will appear almost perfect (although you may keep pressing the button after uncovering the siren world and uncover an even more deviously concealed siren, or even a utopia which is better than the original siren). How often do you simulate future worlds before forcing yourself to implement the best so far to maximize your expected utility?
Obviously the answer depends on how probable siren worlds are and how likely it is that the current world will be overtaken by a superior world on the next press (which is equivalent to a function where the probability of earning money on the next press is inversely related to how much money you already have). In fact, if the probability of a siren world is sufficiently low, it may be worthwhile to take the risk of generating worlds without constraints in case the AI can simulate a world substantially better than the best-optimised world changing only the 25 yes-no questions, even if we know that the 25 yes-no questions will produce a highly livable world.
Of course, if the AI can lie to you about whether a world is good or not (which seems likely) or can produce possible worlds in a non-random fashion, increasing the risk of generating a siren world (which also seems likely) then you should never push the button, because of the risk you would be unable to stop yourself implementing the siren world which – almost inevitably – be generated on the first try. If we can prove the best-possible utopia is better than the best-possible siren even given IC constraints (which seems unlikely) or that the AI we have is definitely Friendly (could happen, you never know… :p ) then we should push the button an infinite number of times. But excluding these edge cases, it seems likely the optimal decision will not be constrained in the way you describe, but more likely an unconstrained but non-exhaustive search – a finite number of pushes on our random-world button rather than an exhaustive search of a constrained possibility space.
This puts me in mind of a thought experiment Yvain posted a while ago (I’m certain he’s not the original author, but I can’t for the life of me track it any further back than his LiveJournal):
“A man has a machine with a button on it. If you press the button, there is a one in five million chance that you will die immediately; otherwise, nothing happens. He offers you some money to press the button once. What do you do? Do you refuse to press it for any amount? If not, how much money would convince you to press the button?”
This is – I think – analogous to your ‘siren world’ thought experiment. Rather than pushing the button once for £X, every time you push the button the AI simulates a new future world and at any point you can stop and implement the future that looks best to you. You have a small probability of uncovering a siren world, which you will be forced to choose because it will appear almost perfect (although you may keep pressing the button after uncovering the siren world and uncover an even more deviously concealed siren, or even a utopia which is better than the original siren). How often do you simulate future worlds before forcing yourself to implement the best so far to maximize your expected utility?
Obviously the answer depends on how probable siren worlds are and how likely it is that the current world will be overtaken by a superior world on the next press (which is equivalent to a function where the probability of earning money on the next press is inversely related to how much money you already have). In fact, if the probability of a siren world is sufficiently low, it may be worthwhile to take the risk of generating worlds without constraints in case the AI can simulate a world substantially better than the best-optimised world changing only the 25 yes-no questions, even if we know that the 25 yes-no questions will produce a highly livable world.
Of course, if the AI can lie to you about whether a world is good or not (which seems likely) or can produce possible worlds in a non-random fashion, increasing the risk of generating a siren world (which also seems likely) then you should never push the button, because of the risk you would be unable to stop yourself implementing the siren world which – almost inevitably – be generated on the first try. If we can prove the best-possible utopia is better than the best-possible siren even given IC constraints (which seems unlikely) or that the AI we have is definitely Friendly (could happen, you never know… :p ) then we should push the button an infinite number of times. But excluding these edge cases, it seems likely the optimal decision will not be constrained in the way you describe, but more likely an unconstrained but non-exhaustive search – a finite number of pushes on our random-world button rather than an exhaustive search of a constrained possibility space.
I consider that is also a constrained search!