Donald Hobson comments on Is the argument that AI is an xrisk valid?

Donald Hobson 19 Jul 2021 22:08 UTC
5 points
if a human had been brought up to have ‘goals as bizarre … as sand-grain-counting or paperclip-maximizing’, they could reflect on them and revise them in the light of such reflection.
Human “goals” and AI goals are a very different kind of thing.
Imagine the instrumentally rational paperclip maximizer. If writing a philosophy essay will result in more paperclips, it can do that. If winning a chess game will lead to more paperclips, it will win the game. For any gradable task, if doing better on the task leads to more paperclips, it can do that task. This includes the tasks of talking about ethics, predicting what a human acting ethically would do etc. In short, this is what is meant by “far surpass all the intellectual activities of any man however clever.”.
The singularity hypothesis is about agents that are better at achieving their goal than human. In particular, the activities this actually depends on for an intelligence explosion are engineering and programming AI systems. No one said that an AI needed to be able to reflect on and change its goals.
Humans “ability” to reflect on and change our goals is more that we don’t really know what we want. Suppose we think we want chocolate, and then we read about the fat content, and change our mind. We value being thin more. The goal of getting chocolate was only ever an instrumental goal, it changed based on new information. Most of the things humans call goals are instrumental goals, not terminal goals. The terminal goals are difficult to intuitively access. This is how humans appear to change their “goals”. And this is the hidden standard to which paperclip maximizing is compared and found wanting. There is some brain module that feels warm and fuzzy when it hears “be nice to people”, and not when it hears “maximize paperclips”.
- TAG 20 Jul 2021 12:49 UTC
  2 points
  Parent
  
  Human “goals” and AI goals are a very different kind of thing
  
  No necessarily, since AIs can be WBEs or otherwise anthropomorphic. An AI with an explicitly coded goal is possible , but not the only kind.
  
  Humans “ability” to reflect on and change our goals is more that we don’t really know what we want
  
  Kind of, but note that goal instability is probably the default, since goal stability under self improvement is difficult.
  - Rafael Harth 21 Jul 2021 14:52 UTC
    2 points
    Parent
    
    No necessarily, since AIs can be WBEs or otherwise anthropomorphic. An AI with an explicitly coded goal is possible , but not the only kind.
    
    While I think this is 100% true, it’s somewhat misleading as a counter-argument. The single-goal architecture of one model of AI that we understand, and a lot of arguments focus on how that goes wrong. You can certainly build a different AI, but that comes at the price of opening yourself up to a whole different set of failure modes. And (as far as I can see), it’s also not what the literature is up to right now.
    - TAG 21 Jul 2021 16:02 UTC
      1 point
      Parent
      If you don’t understand other models , you don’t know that they have other bad failures modes. If you only understand one model, and know that you only understand one model, you shouldn’t be generalising it. If the literature isn’t “up to it”, no conclusions should be drawn until it is.
      - Rafael Harth 21 Jul 2021 16:06 UTC
        2 points
        Parent
        I think that’s a decent argument about what models we should build, but not an argument that AI isn’t dangerous.
        TAG 21 Jul 2021 16:12 UTC
        1 point
        Parent
        “Dangerous” is a much easier target to hit than “”existentially dangerous, but “existentially dangerous” is the topic.
- VCM 24 Jul 2021 15:25 UTC
  1 point
  Parent
  Here we get to a crucial issue, thanks! If we do assume that reflection on goals does occur, do we assume that the results have any resemblance with human reflection on morality? Perhaps there is an assumption about the nature of morality or moral reasoning in the ‘standard argument’ that we have not discussed?
  - Donald Hobson 24 Jul 2021 19:37 UTC
    2 points
    Parent
    I think the assumption it that human-like morality isn’t universally privileged.
    Human morality has been shaped by evolution in the ancestral environment. Evolution in a different environment would create a mind with different structures and behaviours.
    In other words, a full specification of human morality is sufficiently complex that it is unlikely to be spontaneously generated.
    In other words, there is no compact specification of an AI that would do what humans want, even when on an alien world with no data about humanity. An AI could have a pointer at human morality with instructions to copy it. There are plenty of other parts of the universe it could be pointed to, so this is far from a default.
    - VCM 25 Jul 2021 7:33 UTC
      1 point
      Parent
      But reasoning about morality? Is that a space with logic or with anything goes?
      - Donald Hobson 25 Jul 2021 12:42 UTC
        2 points
        Parent
        Imagine a device that looks like a calculator. When you type 2+2, you get 7. You could conclude its a broken calculator, or that arithmetic is subjective, or that this calculator is not doing addition at all. Its doing some other calculation.
        Imagine a robot doing something immoral. You could conclude that its broken, or that morality is subjective, or that the robot isn’t thinking about morality at all.
        These are just different ways to describe the same thing.
        Addition has general rules. Like a+b=b+a. This makes it possible to reason about. Whatever the other calculator computes may follow this rule, or different rules, or no simple rules at all.
        TAG 25 Jul 2021 16:03 UTC
        1 point
        Parent
        
        These are just different ways to describe the same thing.
        
        Not to the extent that there’s no difference at all...you can exclude some of them on further investigation.