cousin_it comments on Contest: $1,000 for good questions to ask to an Oracle AI

cousin_it 26 Aug 2019 12:44 UTC
LW: 15 AF: 6
AF
Submission for a counterfactual oracle: precommit that, if the oracle stays silent, a week from now you’ll try to write the most useful message to your past self, based on what happens in the world during that week. Ask the oracle to predict that message. This is similar to existing solutions, but slightly more meta, because the content of the message is up to your future self—it could be lottery numbers, science papers, disaster locations, or anything else that fits within the oracle’s size limit. (If there’s no size limit, just send the whole internet.)

You could also form a bucket brigade to relay messages from further ahead, but that’s a bad idea. If the oracle’s continued silence eventually leads to an unfriendly AI, it can manipulate the past by hijacking your chain of messages and thus make itself much more likely. The same is true for all high-bandwidth counterfactual oracles—they aren’t unfriendly in themselves, but using them creates a thicket of “retrocausal” links that can be exploited by any potential future UFAI. The more UFAI risk grows, the less you should use oracles.
What links here?
- Results of $1,000 Oracle contest! by Stuart_Armstrong (17 Jun 2020 17:44 UTC; 60 points)
- Deliberation as a method to find the “actual preferences” of humans by riceissa (22 Oct 2019 9:23 UTC; 23 points)
- Wei Dai 26 Aug 2019 19:24 UTC
  LW: 9 AF: 5
  AF Parent
  
  This is similar to existing solutions, but slightly more meta
  
  I feel like this is about equally meta as my “Superintelligent Agent” submission, since my committee could output “Show the following message to the operator: …” and your message could say “I suggest that you perform the following action: …”, so the only difference between your idea and mine is that in my submission the output of the Oracle is directly coupled to some effectors to let the agent act faster, and yours has a (real) human in the loop.
  
  The more UFAI risk grows, the less you should use oracles.
  
  Hmm, good point. I guess Chris Leong made a similar point, but it didn’t sink in until now how general the concern is. This seems to affect Paul’s counterfactual oversight idea as well, and maybe other kinds of human imitations and predictors/oracles, as well as things that are built using these components like quantilizers and IDA.
  - cousin_it 27 Aug 2019 12:06 UTC
    LW: 22 AF: 11
    AF Parent
    Thinking about this some more, all high-bandwidth oracles (counterfactual or not) risk receiving messages crafted by future UFAI to take over the present. If the ranges of oracles overlap in time, such messages can colonize their way backwards from decades ahead. It’s especially bad if humanity’s FAI project depends on oracles—that increases the chance of UFAI in the world where oracles are silent, which is where the predictions come from.
    
    One possible precaution is to use only short-range oracles, and never use an oracle while still in prediction range of any other oracle. But that has drawbacks: 1) it requires worldwide coordination, 2) it only protects the past. The safety of the present depends on whether you’ll follow the precaution in the future. And people will be tempted to bend it, use longer or overlapping ranges to get more power.
    
    In short, if humanity starts using high-bandwidth oracles, that will likely increase the chance of UFAI and hasten it. So such oracles are dangerous and shouldn’t be used. Sorry, Stuart :-)
    What links here?
    Counterfactual Oracles = online supervised learning with random selection of training episodes by Wei Dai (10 Sep 2019 8:29 UTC; 52 points)
    Analysing: Dangerous messages from future UFAI via Oracles by Stuart_Armstrong (22 Nov 2019 14:17 UTC; 22 points)
    cousin_it's comment on Contest: $1,000 for good questions to ask to an Oracle AI by Stuart_Armstrong (27 Aug 2019 12:47 UTC; 4 points)
    - Wei Dai 27 Aug 2019 22:41 UTC
      LW: 4 AF: 2
      AF Parent
      
      Thinking about this some more, all high-bandwidth oracles (counterfactual or not) risk receiving messages crafted by future UFAI to take over the present.
      
      Note that in the case of counterfactual oracle, this depends on UFAI “correctly” solving counterfactual mugging (i.e., the UFAI has to decide to pay some cost in its own world to take over a counterfactual world where the erasure event didn’t occur).
      
      So such oracles are dangerous and shouldn’t be used.
      
      This seems too categorical. Depending on the probabilities of various conditions, using such oracles might still be the best option in some circumstances.
      
      What links here?
      Analysing: Dangerous messages from future UFAI via Oracles by Stuart_Armstrong (22 Nov 2019 14:17 UTC; 22 points)
      - Stuart_Armstrong 22 Nov 2019 14:18 UTC
        LW: 2 AF: 1
        AF Parent
        Some thoughts on that idea: https://www.lesswrong.com/posts/6WbLRLdmTL4JxxvCq/analysing-dangerous-messages-from-future-ufai-via-oracles
      - cousin_it 27 Aug 2019 22:54 UTC
        LW: 2 AF: 1
        AF Parent
        Yeah, agreed on both points.
    - Stuart_Armstrong 22 Nov 2019 14:18 UTC
      LW: 3 AF: 2
      AF Parent
      Some thoughts on this idea, thanks for it: https://www.lesswrong.com/posts/6WbLRLdmTL4JxxvCq/analysing-dangerous-messages-from-future-ufai-via-oracles
    - Stuart_Armstrong 27 Aug 2019 21:23 UTC
      LW: 3 AF: 2
      AF Parent
      Very worthwhile concern, and I will think about it more.
      - Gurkenglas 28 Aug 2019 1:00 UTC
        3 points
        Parent
        In case of erasure, you should be able to get enough power to prevent another UFAI summoning session.
        cousin_it 28 Aug 2019 11:00 UTC
        4 points
        Parent
        Sure, in case of erasure you can decide to use oracles less, and compensate your clients with money you got from “erasure insurance” (since that’s a low probability event). But that doesn’t seem to solve the problem I’m talking about—UFAI arising naturally in erasure-worlds and spreading to non-erasure-worlds through oracles.
        
        Gurkenglas 28 Aug 2019 11:25 UTC
        5 points
        Parent
        The problem you were talking about seemed to rely on bucket brigades. I agree that UFAIs jumping back a single step is a fair concern. (Though I guess you could counterfactually have enough power to halt AGI research completely...) I’m trying to address it elsethread. :)
        
        cousin_it 28 Aug 2019 11:52 UTC
        4 points
        Parent
        Ah, sorry, you’re right. To prevent bucket brigades, it’s enough to stop using oracles for N days whenever an N-day oracle has an erasure event, and the money from “erasure insurance” can help with that. When there are no erasure events, we can use oracles as often as we want. That’s a big improvement, thanks!
        
        Stuart_Armstrong 29 Aug 2019 2:26 UTC
        2 points
        Parent
        Good idea.
  - cousin_it 26 Aug 2019 21:37 UTC
    LW: 6 AF: 3
    AF Parent
    Yeah. And low-bandwidth oracles can have a milder version of the same problem. Consider your “consequentialist” idea: if UFAI is about to arise, and one of the offered courses of action leads to UFAI getting stopped, then the oracle will recommend against that course of action (and for some other course where UFAI wins and maxes out the oracle’s reward).