The simulated TDT agent is not aware that it won’t receive a reward, and therefore it does not work. … I don’t think that the ability to simulate without rewarding the simulation is what pushes it over the threshold of “unfair”.
I do agree. I think my previous post was still exploring the “can TDT break with a simulation of itself?” question, which is interesting but orthogonal.
I do agree. I think my previous post was still exploring the “can TDT break with a simulation of itself?” question, which is interesting but orthogonal.