The problem of future unaligned AI leaking into human imitation is something I wrote about before. Notice that IDA-style recursion help a lot, because instead of simulating a process going deep into the external timeline’s future, you’re simulating a “groundhog day” where the researcher wakes up over and over at the same external time (more realistically, the restart time is drifting forward with the time outside the simulation) with a written record of all their previous work (but no memory of it). There can still be a problem if there is a positive probability of unaligned AI takeover in the present (i.e. during the time interval of the simulated loop), but it’s a milder problem. It can be further ameliorated if the AI has enough information about the external world to make confident predictions about the possibility of unaligned takeover during this period. The out-of-distribution problem is also less severe: the AI can occasionally query the real researcher to make sure its predictions are still on track.
The problem of future unaligned AI leaking into human imitation is something I wrote about before. Notice that IDA-style recursion help a lot, because instead of simulating a process going deep into the external timeline’s future, you’re simulating a “groundhog day” where the researcher wakes up over and over at the same external time (more realistically, the restart time is drifting forward with the time outside the simulation) with a written record of all their previous work (but no memory of it). There can still be a problem if there is a positive probability of unaligned AI takeover in the present (i.e. during the time interval of the simulated loop), but it’s a milder problem. It can be further ameliorated if the AI has enough information about the external world to make confident predictions about the possibility of unaligned takeover during this period. The out-of-distribution problem is also less severe: the AI can occasionally query the real researcher to make sure its predictions are still on track.