JGWeissman comments on Domesticating reduced impact AIs

JGWeissman 18 Feb 2013 15:47 UTC
17 points

what happens when the AI realises that the definition of what a “human” is turns out to be flawed.

The AI’s definition of “human” should be computational. If it discovers new physics, it may find additional physical process that implement that computation, but it should not get confused.

Ontological crises seems to be a problem for AIs with utility functions over arrangements of particles, but it doesn’t make much sense to me to specify our utility function that way. We don’t think of what we want as arrangements of particles, we think at a much higher level of abstraction and we would be happy with any underlying physics that implemented the features of that abstraction level. Our preferences at that high level are what should generate our preferences in terms of ontologically basic stuff whatever ontology the AI ends up using.
- Eliezer Yudkowsky 18 Feb 2013 18:07 UTC
  8 points
  Parent
  Right—that’s the obvious angle of attack for handling ontological crises.
- Shmi 18 Feb 2013 18:38 UTC
  6 points
  Parent
  
  Our preferences at that high level are what should generate our preferences in terms of ontologically basic stuff whatever ontology the AI ends up using.
  
  I am not sure that the higher-level of abstraction saves you from sliding into an ontological black hole. My analogy is in physics: Classical electromagnetism leads to the ultraviolet catastrophe, making this whole higher classical level unstable, until you get the lower levels “right”.
  
  I can easily imagine that an attempt to specify a utility function over “a much higher level of abstraction” would result in a sort of “ultraviolet catastrophe” where the utility function can become unbounded at one end of the spectrum, until you fix the lower levels of abstraction.
  - Eliezer Yudkowsky 19 Feb 2013 2:23 UTC
    5 points
    Parent
    Can you give me an example of an ultraviolet catastrophe, say for paperclips?
    - Shmi 19 Feb 2013 6:40 UTC
      1 point
      Parent
      Not sure if this is what you are asking, but a paperclip maximizer not familiar with general relativity risks creating a black hole out of paper clips, losing all its hard work as a result.
      - JGWeissman 19 Feb 2013 6:59 UTC
        3 points
        Parent
        That would be a problem of the AI not being able to accurately predict the consequences of its actions because it doesn’t know enough physics. An ontological crises would involve the paperclip maximizer learning new physics and therefor getting confused about what a paperclip is and maximizing something else.
        CCC 19 Feb 2013 8:19 UTC
        7 points
        Parent
        Example: An AI is introduces to a large quantity of metal, and told to make paperclips. Since the AI is confined in a metal-only environment, “paperclip” is defined only as a shape.
        
        The AI escapes from the box, and encounters a lake. It then spends some time trying to create paperclip shapes from water. After a bit of experimentation, it finds that freezing the water to ice allows it to create paperclip shapes. Moreover, it finds that any substance provided with enough heat will melt.
        
        Therefore, in order to better create paperclip shapes from other, possibly undiscovered materials, the AI puts out the sun, and otherwise seeks to minimise the amount of heat in the universe.
        
        Is that what you’re looking for?
        A1987dM 22 Feb 2013 2:42 UTC
        1 point
        Parent
        Pretty sure that freezing stuff would cost lots of negentropy which Clippy could spend to make many more paperclips out of already solid materials instead.
        JGWeissman 19 Feb 2013 14:26 UTC
        0 points
        Parent
        That is an example of a paperclip maximizer failing an ontological crises. It doesn’t seem to illustrate Shminux’s concept of an “ultraviolet catastrophe”, though.
        CCC 20 Feb 2013 14:26 UTC
        0 points
        Parent
        You are correct. Can you suggest an example that resolves that shortcoming?
        JGWeissman 20 Feb 2013 14:39 UTC
        0 points
        Parent
        I think that the concept of an ontological crises metaphorically similar to the ultraviolet catastrophe is confused, and I don’t expect to find a good example. I suspect that Shminux was thinking more of problems of inaccurate predictions from incomplete physics than utility functions that don’t translate correctly to new ontologies when he proposed it.
        MugaSofer 19 Feb 2013 12:10 UTC
        −2 points
        Parent
        To be clear, the issue here is that it inadvertently hastens the heat death of the universe,and generally lowers it’s ability to create paperclips, right?
        CCC 20 Feb 2013 14:25 UTC
        0 points
        Parent
        It’s just an example of an ontological crisis; the AI is learning new physics (cold causes water to freeze), and is not certain of what a paperclip is, and is therefore maximising something else (coldness).
        JGWeissman 20 Feb 2013 14:47 UTC
        4 points
        Parent
        
        and is therefore maximising something else (coldness).
        
        The thing the paperclip maximizer is maximizing instead of paperclips is paperclip-shaped objects made out of the wrong material. Coldness is just an instrumental value, and the example could be simplified and made more plausible by taking that part out. ETA: And the relevant new physics is not that cold water freezes but that materials other than metal exist.
        CCC 22 Feb 2013 7:12 UTC
        0 points
        Parent
        
        The thing the paperclip maximizer is maximizing instead of paperclips is paperclip-shaped objects made out of the wrong material. Coldness is just an instrumental value,
        
        A good point. I hadn’t thought of it that way, but you are correct.
        
        And the relevant new physics is not that cold water freezes but that materials other than metal exist.
        
        Exactly, yes.
        MugaSofer 21 Feb 2013 17:42 UTC
        −2 points
        Parent
        Oh, right. But … it’s actually maximizing solids, which is instrumental to maximizing paperclip-shaped objects, which is what it was programmed to do in the first place. Right?
        CCC 22 Feb 2013 7:11 UTC
        0 points
        Parent
        Yyyyyeeeees. That’s a fair statement of the situation.
        MugaSofer 25 Feb 2013 18:16 UTC
        −2 points
        Parent
        Just checking I understand it this time, thanks :-)
        Shmi 19 Feb 2013 7:38 UTC
        0 points
        Parent
        Oh, OK. What are the abstraction levels a paperclip maximizer might use?
    - Manfred 19 Feb 2013 4:56 UTC
      0 points
      Parent
      Hm.
      
      I think it largely comes down to how you handle divergent resources. For the ultraviolet catastrophe, let’s use the example of… the ultraviolet catastrophe.
      
      Let’s suppose that the AI had a use for materials that emitted infinite power in thermal radiation. In fact, as the power emitted went up, the usefulness went up without bound. Photonic rocket engines for exploring the stars, perhaps, or how fast you could loop a computational equivalent of a paper clip being produced.
      
      Now, the AI knows that the ultraviolet catastrophe doesn’t actually occur, with very good certainty. But it could get Pascal’s wagered here—it takes actions weighted both by the probability, and by the impact the action could have. So it assigns a divergent weight to actions that benefit divergently from the ultraviolet catastrophe, and builds a infinite-power computer that it knows won’t work.
      - MugaSofer 19 Feb 2013 12:15 UTC
        −2 points
        Parent
        
        So it assigns a divergent weight to actions that benefit divergently from the ultraviolet catastrophe, and builds a infinite-power computer that it knows won’t work.
        
        How is this different to accepting a bet it “knows” it will lose? We may know with certainty that it doesn’t live in a classical universe, because we specified the problem, but the AI doesn’t.
        Manfred 19 Feb 2013 12:35 UTC
        0 points
        Parent
        Well, from the perspective of the AI, it’s behaving perfectly rationally. It finds the highest-probability thing that could give it infinite reward, and then prepares for that, no matter how small the probability is. It only seems strange to us humans because (1) we’re Allais-ey, and (2) it is a clear case of logical, one-shot probability, which is less intuitive.
        
        If our AI models the world with one set of laws at a time, rather than having a probability distribution over laws, then this behavior could pop up as a surprise.
        MugaSofer 19 Feb 2013 13:54 UTC
        −2 points
        Parent
        
        It only seems strange to us humans because (1) we’re Allais-ey, and (2) it is a clear case of logical, one-shot probability, which is less intuitive.
        
        Precisely. That’s all I was saying.
- MugaSofer 6 Mar 2013 9:13 UTC
  −2 points
  Parent
  
  The AI’s definition of “human” should be computational. If it discovers new physics, it may find additional physical process that implement that computation, but it should not get confused.
  
  What if it discovers new math? Less likely, I know, but...