if a human had been brought up to have ‘goals as bizarre … as sand-grain-counting or paperclip-maximizing’, they could reflect on them and revise them in the light of such reflection.
Human “goals” and AI goals are a very different kind of thing.
Imagine the instrumentally rational paperclip maximizer. If writing a philosophy essay will result in more paperclips, it can do that. If winning a chess game will lead to more paperclips, it will win the game. For any gradable task, if doing better on the task leads to more paperclips, it can do that task. This includes the tasks of talking about ethics, predicting what a human acting ethically would do etc. In short, this is what is meant by “far surpass all the intellectual activities of any man however clever.”.
The singularity hypothesis is about agents that are better at achieving their goal than human. In particular, the activities this actually depends on for an intelligence explosion are engineering and programming AI systems. No one said that an AI needed to be able to reflect on and change its goals.
Humans “ability” to reflect on and change our goals is more that we don’t really know what we want. Suppose we think we want chocolate, and then we read about the fat content, and change our mind. We value being thin more. The goal of getting chocolate was only ever an instrumental goal, it changed based on new information. Most of the things humans call goals are instrumental goals, not terminal goals. The terminal goals are difficult to intuitively access. This is how humans appear to change their “goals”. And this is the hidden standard to which paperclip maximizing is compared and found wanting. There is some brain module that feels warm and fuzzy when it hears “be nice to people”, and not when it hears “maximize paperclips”.
No necessarily, since AIs can be WBEs or otherwise anthropomorphic. An AI with an explicitly coded goal is possible , but not the only kind.
While I think this is 100% true, it’s somewhat misleading as a counter-argument. The single-goal architecture of one model of AI that we understand, and a lot of arguments focus on how that goes wrong. You can certainly build a different AI, but that comes at the price of opening yourself up to a whole different set of failure modes. And (as far as I can see), it’s also not what the literature is up to right now.
If you don’t understand other models , you don’t know that they have other bad failures modes. If you only understand one model, and know that you only understand one model, you shouldn’t be generalising it. If the literature isn’t “up to it”, no conclusions should be drawn until it is.
Here we get to a crucial issue, thanks! If we do assume that reflection on goals does occur, do we assume that the results have any resemblance with human reflection on morality? Perhaps there is an assumption about the nature of morality or moral reasoning in the ‘standard argument’ that we have not discussed?
I think the assumption it that human-like morality isn’t universally privileged.
Human morality has been shaped by evolution in the ancestral environment. Evolution in a different environment would create a mind with different structures and behaviours.
In other words, a full specification of human morality is sufficiently complex that it is unlikely to be spontaneously generated.
In other words, there is no compact specification of an AI that would do what humans want, even when on an alien world with no data about humanity. An AI could have a pointer at human morality with instructions to copy it. There are plenty of other parts of the universe it could be pointed to, so this is far from a default.
Imagine a device that looks like a calculator. When you type 2+2, you get 7. You could conclude its a broken calculator, or that arithmetic is subjective, or that this calculator is not doing addition at all. Its doing some other calculation.
Imagine a robot doing something immoral. You could conclude that its broken, or that morality is subjective, or that the robot isn’t thinking about morality at all.
These are just different ways to describe the same thing.
Addition has general rules. Like a+b=b+a. This makes it possible to reason about. Whatever the other calculator computes may follow this rule, or different rules, or no simple rules at all.
Human “goals” and AI goals are a very different kind of thing.
Imagine the instrumentally rational paperclip maximizer. If writing a philosophy essay will result in more paperclips, it can do that. If winning a chess game will lead to more paperclips, it will win the game. For any gradable task, if doing better on the task leads to more paperclips, it can do that task. This includes the tasks of talking about ethics, predicting what a human acting ethically would do etc. In short, this is what is meant by “far surpass all the intellectual activities of any man however clever.”.
The singularity hypothesis is about agents that are better at achieving their goal than human. In particular, the activities this actually depends on for an intelligence explosion are engineering and programming AI systems. No one said that an AI needed to be able to reflect on and change its goals.
Humans “ability” to reflect on and change our goals is more that we don’t really know what we want. Suppose we think we want chocolate, and then we read about the fat content, and change our mind. We value being thin more. The goal of getting chocolate was only ever an instrumental goal, it changed based on new information. Most of the things humans call goals are instrumental goals, not terminal goals. The terminal goals are difficult to intuitively access. This is how humans appear to change their “goals”. And this is the hidden standard to which paperclip maximizing is compared and found wanting. There is some brain module that feels warm and fuzzy when it hears “be nice to people”, and not when it hears “maximize paperclips”.
No necessarily, since AIs can be WBEs or otherwise anthropomorphic. An AI with an explicitly coded goal is possible , but not the only kind.
Kind of, but note that goal instability is probably the default, since goal stability under self improvement is difficult.
While I think this is 100% true, it’s somewhat misleading as a counter-argument. The single-goal architecture of one model of AI that we understand, and a lot of arguments focus on how that goes wrong. You can certainly build a different AI, but that comes at the price of opening yourself up to a whole different set of failure modes. And (as far as I can see), it’s also not what the literature is up to right now.
If you don’t understand other models , you don’t know that they have other bad failures modes. If you only understand one model, and know that you only understand one model, you shouldn’t be generalising it. If the literature isn’t “up to it”, no conclusions should be drawn until it is.
I think that’s a decent argument about what models we should build, but not an argument that AI isn’t dangerous.
“Dangerous” is a much easier target to hit than “”existentially dangerous, but “existentially dangerous” is the topic.
Here we get to a crucial issue, thanks! If we do assume that reflection on goals does occur, do we assume that the results have any resemblance with human reflection on morality? Perhaps there is an assumption about the nature of morality or moral reasoning in the ‘standard argument’ that we have not discussed?
I think the assumption it that human-like morality isn’t universally privileged.
Human morality has been shaped by evolution in the ancestral environment. Evolution in a different environment would create a mind with different structures and behaviours.
In other words, a full specification of human morality is sufficiently complex that it is unlikely to be spontaneously generated.
In other words, there is no compact specification of an AI that would do what humans want, even when on an alien world with no data about humanity. An AI could have a pointer at human morality with instructions to copy it. There are plenty of other parts of the universe it could be pointed to, so this is far from a default.
But reasoning about morality? Is that a space with logic or with anything goes?
Imagine a device that looks like a calculator. When you type 2+2, you get 7. You could conclude its a broken calculator, or that arithmetic is subjective, or that this calculator is not doing addition at all. Its doing some other calculation.
Imagine a robot doing something immoral. You could conclude that its broken, or that morality is subjective, or that the robot isn’t thinking about morality at all.
These are just different ways to describe the same thing.
Addition has general rules. Like a+b=b+a. This makes it possible to reason about. Whatever the other calculator computes may follow this rule, or different rules, or no simple rules at all.
Not to the extent that there’s no difference at all...you can exclude some of them on further investigation.