I should have been clearer. Don’t exactly remember what I was thinking about now. Maybe about the suggested prompt in the “Simulating Human Alignment Researchers” section: if the oracle is suggested to somehow cut through everything that would have happened in 2000 (or more?) years in its simulation, it should either run for thousands or millions of years itself (a sufficiently high-fidelity simulation) or, or the simulation will dramatically diverge from what is actually likely to happen in reality. (However, there is no particular relation between this idea, the halting problem, and the second law).
Alternatively, if the prompt is designed just to “prime the oracle’s imagination” before writing the textbook, rather than an invitation for an elaborate simulation, I don’t see how it’s at all safer than plainly asking the oracle to write the alignment textbook with proofs.
I should have been clearer. Don’t exactly remember what I was thinking about now. Maybe about the suggested prompt in the “Simulating Human Alignment Researchers” section: if the oracle is suggested to somehow cut through everything that would have happened in 2000 (or more?) years in its simulation, it should either run for thousands or millions of years itself (a sufficiently high-fidelity simulation) or, or the simulation will dramatically diverge from what is actually likely to happen in reality. (However, there is no particular relation between this idea, the halting problem, and the second law).
Alternatively, if the prompt is designed just to “prime the oracle’s imagination” before writing the textbook, rather than an invitation for an elaborate simulation, I don’t see how it’s at all safer than plainly asking the oracle to write the alignment textbook with proofs.