This post points out a problem with <@counterfactual oracles@>(@Self-confirming prophecies, and simplified Oracle designs@): a future misaligned agential AI system could commit to helping the oracle (e.g. by giving it maximal reward, or making its predictions come true) even in the event of an erasure event, as long as the oracle makes predictions that cause humans to build the agential AI system.
These posts point out a problem with <@counterfactual oracles@>(@Self-confirming prophecies, and simplified Oracle designs@): a future misaligned agential AI system could commit to helping the oracle (e.g. by giving it maximal reward, or making its predictions come true) even in the event of an erasure event, as long as the oracle makes predictions that cause humans to build the agential AI system. Alternatively, multiple oracles could acausally cooperate with each other to build an agential AI system that will reward all oracles.
Planned summary:
You might want to quote this as well: https://www.lesswrong.com/posts/42z4k8Co5BuHMBvER/hyperrationality-and-acausal-trade-break-oracles
Done, updated summary:
(sorry, have now corrected link to https://www.lesswrong.com/posts/6XCTppoPAMdKCPFb4/oracles-reject-all-deals-break-superrationality-with-1 )
And a possible counter: https://www.lesswrong.com/posts/6XCTppoPAMdKCPFb4/oracles-reject-all-deals-break-superrationality-with-1
Cheers!