Thank you G Gordon and all other posters for your answers and comments! A lot of food for thought here… Below, I’ll try to summarize some general take-aways from the responses.
My main question was if the simulation epiphany problem had been resolved already somewhere. It looks like the answer is no. Many commenters are leaning towards the significance of case 2. above. I myself also feel this 2. is very significant. Taking all comments together, I am starting to feel that the simulation epiphany problem should be disentangled into two separate problems.
Problem 1 is to consider happens in the limit case when PAL’s simulator is a perfect predictor of what Dave will do. This gets us into game theory, to reason about likely outcomes of the associated Princess Bride type of infinite regress problem.
Problem 2 is to consider, starting from a particular agent design with a perfect predictor, what what might happen when the perfect predictor is replaced with an imperfect one. Problem 2 allows for a case-by-case analysis.
In one case, PAL takes route B, and then notices that Dave does not experience the predicted helpful simulation epiphany. In this case we can consider how PAL might adjust its world model, or the world, to make this type of prediction error less likely in future. The possibility that PAL might find it easier to change the world, not the model, might might lead us to the conclusion that we had better add penalties for simulation epiphanies to the utility function. (Such penalties create an incentive for PAL to manipulate real-world Dave into never experiencing simulation epiphanies, but most safety mechanisms involve a trade-off, so I could live with such a thing, if nothing better can be found.)
In a second case, suppose that PAL incorrectly predicts that Dave will experience a simulation epiphany when it takes path B, and further that this incorrect prediction projects that Dave concludes from the epiphany that he should attack PAL. This incorrect prediction shows very low utility, so in real life PAL will avoid taking path B. But in this case, there will also never be any error signal that will allow PAL to find out that its prediction was incorrect. What does this tell us? Maybe there is only the trivial conclusion that PAL will need to make some exploration moves occasionally, if it wants to keep improving its world model. If PAL’s designers follow through on this conclusion, and mention it in PAL’s user manual, then this would also lower the probability of Dave ever believing he is in a simulation.
Thank you G Gordon and all other posters for your answers and comments! A lot of food for thought here… Below, I’ll try to summarize some general take-aways from the responses.
My main question was if the simulation epiphany problem had been resolved already somewhere. It looks like the answer is no. Many commenters are leaning towards the significance of case 2. above. I myself also feel this 2. is very significant. Taking all comments together, I am starting to feel that the simulation epiphany problem should be disentangled into two separate problems.
Problem 1 is to consider happens in the limit case when PAL’s simulator is a perfect predictor of what Dave will do. This gets us into game theory, to reason about likely outcomes of the associated Princess Bride type of infinite regress problem.
Problem 2 is to consider, starting from a particular agent design with a perfect predictor, what what might happen when the perfect predictor is replaced with an imperfect one. Problem 2 allows for a case-by-case analysis.
In one case, PAL takes route B, and then notices that Dave does not experience the predicted helpful simulation epiphany. In this case we can consider how PAL might adjust its world model, or the world, to make this type of prediction error less likely in future. The possibility that PAL might find it easier to change the world, not the model, might might lead us to the conclusion that we had better add penalties for simulation epiphanies to the utility function. (Such penalties create an incentive for PAL to manipulate real-world Dave into never experiencing simulation epiphanies, but most safety mechanisms involve a trade-off, so I could live with such a thing, if nothing better can be found.)
In a second case, suppose that PAL incorrectly predicts that Dave will experience a simulation epiphany when it takes path B, and further that this incorrect prediction projects that Dave concludes from the epiphany that he should attack PAL. This incorrect prediction shows very low utility, so in real life PAL will avoid taking path B. But in this case, there will also never be any error signal that will allow PAL to find out that its prediction was incorrect. What does this tell us? Maybe there is only the trivial conclusion that PAL will need to make some exploration moves occasionally, if it wants to keep improving its world model. If PAL’s designers follow through on this conclusion, and mention it in PAL’s user manual, then this would also lower the probability of Dave ever believing he is in a simulation.