Rohin Shah comments on Analysing: Dangerous messages from future UFAI via Oracles

Rohin Shah 24 Nov 2019 23:02 UTC
LW: 4 AF: 3
AF
Planned summary:
This post points out a problem with <@counterfactual oracles@>(@Self-confirming prophecies, and simplified Oracle designs@): a future misaligned agential AI system could commit to helping the oracle (e.g. by giving it maximal reward, or making its predictions come true) even in the event of an erasure event, as long as the oracle makes predictions that cause humans to build the agential AI system.
- Stuart_Armstrong 25 Nov 2019 10:40 UTC
  LW: 2 AF: 1
  AF Parent
  You might want to quote this as well: https://www.lesswrong.com/posts/42z4k8Co5BuHMBvER/hyperrationality-and-acausal-trade-break-oracles
  - Rohin Shah 25 Nov 2019 17:28 UTC
    LW: 4 AF: 3
    AF Parent
    Done, updated summary:
    These posts point out a problem with <@counterfactual oracles@>(@Self-confirming prophecies, and simplified Oracle designs@): a future misaligned agential AI system could commit to helping the oracle (e.g. by giving it maximal reward, or making its predictions come true) even in the event of an erasure event, as long as the oracle makes predictions that cause humans to build the agential AI system. Alternatively, multiple oracles could acausally cooperate with each other to build an agential AI system that will reward all oracles.
    - Stuart_Armstrong 6 Dec 2019 20:34 UTC
      LW: 4 AF: 2
      AF Parent
      (sorry, have now corrected link to https://www.lesswrong.com/posts/6XCTppoPAMdKCPFb4/oracles-reject-all-deals-break-superrationality-with-1 )
    - Stuart_Armstrong 6 Dec 2019 13:57 UTC
      LW: 2 AF: 1
      AF Parent
      And a possible counter: https://www.lesswrong.com/posts/6XCTppoPAMdKCPFb4/oracles-reject-all-deals-break-superrationality-with-1
    - Stuart_Armstrong 25 Nov 2019 21:29 UTC
      LW: 2 AF: 1
      AF Parent
      Cheers!