roystgnr comments on Is it possible to build a safe oracle AI?

roystgnr 20 Apr 2011 22:22 UTC
1 point
A box that does nothing except predict the next bit in a sequence seems pretty innocuous, in the unlikely event that its creators managed to get its programming so awesomely correct on the first try that they didn’t bother to give it any self-improvement goals at all.

But even in that case there are probably still gotchas. Once you start providing the box with sequences that correspond to data about the real-world results of the previous and current predictions, then even a seemingly const optimization problem statement like “find the most accurate approximation of the probability distribution function for the next data set” becomes a form of a real-world goal. Stochastic approximation accuracy is typically positively correlated with the variance of the true solution, for instance, and it’s clear that the output variance of the world’s future would be greatly reduced if only there weren’t all those random humans mucking it up...
What links here?
- Computation Hazards by Alex_Altair (13 Jun 2012 21:49 UTC; 26 points)
- AI Risk & Opportunity: Questions We Want Answered by lukeprog (1 Apr 2012 19:19 UTC; 12 points)
- cousin_it 21 Apr 2011 4:41 UTC
  1 point
  Parent
  That doesn’t sound right. The box isn’t trying to minimize the “variance of the true solution”. It is stating its current beliefs that were computed from the input bit sequence by using a formula. If you think it will manipulate the operator when some of its output bits are fed into itself, could you explain that a little more technically?
  - roystgnr 21 Apr 2011 5:56 UTC
    21 points
    Parent
    I never said the box was trying to minimize the variance of the true solution for it’s own sake, just that it was trying to find an efficient accurate approximation to the true solution. That this efficiency typically increases as the variance of the true solution decreases means that the possibility of increasing efficiency by manipulating the true solution follows. Surely, no matter how goal-agnostic your oracle is, you’re going to try to make it as accurate as possible for a given computational cost, right?
    
    That’s just the first failure mode that popped into my mind, and I think it’s a good one for any real computing device, but let’s try to come up with an example that even applies to oracles with infinite computational capability (and that explains how that manipulation occurs in either case). Here’s a slightly more technical but still grossly oversimplified discussion:
    
    Suppose you give me the sequence of real world data y1, y2, y3, y4… and I come up with a superintelligent way to predict y5, so I tell you y5 := x5. You tell me the true y5 later, I use this new data to predict y6 := x6.
    
    But wait! No matter how good my rule xn = f(y1...y{n-1}) was, it’s now giving me the wrong answers! Even if y4 was a function of {y1,y2,y3}, the very fact that you’re using my prediction x5 to affect the future of the real world means that y5 is now a function of {y1, y2, y3, y4, x5}. Eventually I’m going to notice this, and now I’m going to have to come up with a new, implicit rule for xn = f(y1...y{n-1},xn).
    
    So now we’re not just trying to evaluate an f, we’re trying to find fixed points for an f—where in this context “a fixed point” is math lingo for “a self-fulfilling prophecy”. And depending on what predictions are called for, that’s a very different problem. “What would the stock market be likely to do tomorrow in a world with no oracles?” may give you a much more stable answer than “What is the stock market likely to do tomorrow after everybody hears the announcement of what a super-intelligent AI thinks the stock market is likely to do tomorrow?” “Who would be likely to kill someone tomorrow in a world with no oracles?” will probably result in a much shorter list than “Who is likely to kill someone tomorrow, after the police receives this answer from the oracle and sends SWAT to break down their doors?” “What is the probability of WW3 within ten years have been without an oracle?” may have a significantly more pleasant answer than “What would the probability of WW3 within ten years be, given that anyone whom the oracle convinces of a high probability has motivation to react with arms races and/or pre-emptive strikes?”
    What links here?
    cousin_it's comment on Is it possible to build a safe oracle AI? by Karl (20 Apr 2011 13:36 UTC; 14 points)
    roystgnr's comment on A taxonomy of Oracle AIs by lukeprog (9 Mar 2012 17:14 UTC; 9 points)
    cousin_it's comment on Intuitive Explanation of Solomonoff Induction by lukeprog (1 Dec 2011 20:49 UTC; 9 points)
    cousin_it's comment on A taxonomy of Oracle AIs by lukeprog (9 Mar 2012 9:15 UTC; 6 points)
    interstice's comment on Self-supervised learning & manipulative predictions by Steven Byrnes (20 Aug 2019 14:44 UTC; 5 points)
    roystgnr's comment on Why safe Oracle AI is easier than safe general AI, in a nutshell by Stuart_Armstrong (3 Dec 2011 16:03 UTC; 1 point)
    interstice's comment on Predictors as Agents by interstice (25 Mar 2019 20:57 UTC; 1 point)
    timtyler's comment on Intuitive Explanation of Solomonoff Induction by lukeprog (1 Dec 2011 21:34 UTC; 0 points)
    - cousin_it 21 Apr 2011 6:30 UTC
      10 points
      Parent
      Ooh, this looks right. A predictor that “notices” itself in the outside world can output predictions that make themselves true, e.g. by stopping us from preventing predicted events, or something even more weird. Thanks!
      
      (At first I thought Solomonoff induction doesn’t have this problem, because it’s uncomputable and thus cannot include a model of itself. But it seems that a computable approximation to Solomonoff induction may well exhibit such “UDT-ish” behavior, because it’s computable.)
      - Vladimir_Nesov 23 Apr 2011 19:48 UTC
        4 points
        Parent
        This idea is probably hard to notice at first, since it requires recognizing that a future with a fixed definition can still be controlled by other things with fixed definitions (you don’t need to replace the question in order to control its answer). So even if a “predictor” doesn’t “act”, it still does determine facts that control other facts, and anything that we’d call intelligent cares about certain facts. For a predictor, this would be the fact that its prediction is accurate, and this fact could conceivably be controlled by its predictions, or even by some internal calculations not visible to its builders. With acausal control, air-tight isolation is more difficult.
      - timtyler 1 Dec 2011 22:03 UTC
        0 points
        Parent
        I am pretty sure that Solomonoff induction doesn’t have this problem.
        
        Not because it is uncomputable, but because it’s not attempting to minimise its error rate. It doesn’t care if its predictions don’t match reality.
        What links here?
        timtyler's comment on A taxonomy of Oracle AIs by lukeprog (9 Mar 2012 17:07 UTC; 0 points)
        cousin_it 1 Dec 2011 23:06 UTC
        0 points
        Parent
        If reality ~ computable, then minimizing error rate ~ matching reality.
        
        (Retracted because I misread your comment. Will think more.)
      - timtyler 1 Dec 2011 21:42 UTC
        0 points
        Parent
        I am pretty sure that Solomonoff induction doesn’t have this problem. Not because it is uncomputable, but because it’s not attempting to minimise its error rate.