Could we prevent a superintelligent oracle from making self-fulfilling prophecies by preventing it from modeling itself? This post presents three scenarios in which self-fulfilling prophecies would still occur. For example, if instead of modeling itself, it models the fact that there’s some AI system whose predictions frequently come true, it may try to predict what that AI system would say, and then say that. This would lead to self-fulfilling prophecies.
Planned summary: