Rohin Shah comments on Analysing: Dangerous messages from future UFAI via Oracles

Rohin Shah 25 Nov 2019 17:28 UTC
LW: 4 AF: 3
AF
Done, updated summary:
These posts point out a problem with <@counterfactual oracles@>(@Self-confirming prophecies, and simplified Oracle designs@): a future misaligned agential AI system could commit to helping the oracle (e.g. by giving it maximal reward, or making its predictions come true) even in the event of an erasure event, as long as the oracle makes predictions that cause humans to build the agential AI system. Alternatively, multiple oracles could acausally cooperate with each other to build an agential AI system that will reward all oracles.
- Stuart_Armstrong 6 Dec 2019 20:34 UTC
  LW: 4 AF: 2
  AF Parent
  (sorry, have now corrected link to https://www.lesswrong.com/posts/6XCTppoPAMdKCPFb4/oracles-reject-all-deals-break-superrationality-with-1 )
- Stuart_Armstrong 6 Dec 2019 13:57 UTC
  LW: 2 AF: 1
  AF Parent
  And a possible counter: https://www.lesswrong.com/posts/6XCTppoPAMdKCPFb4/oracles-reject-all-deals-break-superrationality-with-1
- Stuart_Armstrong 25 Nov 2019 21:29 UTC
  LW: 2 AF: 1
  AF Parent
  Cheers!