(Note that I do not necessarily agree with what I wrote below. You asked for possible counter-arguments. So here goes.)
Might intelligence imply benevolence?
I believe that a fundamental requirement for any rational agent is the motivation to act maximally intelligently and correctly. That requirement seems even more obvious if we are talking about a conjectured artificial general intelligence (AGI) that is able to improve itself to the point where it is substantially better at most activities than humans. Since if it wouldn’t want to be maximally correct then it wouldn’t become superhuman intelligent in the first place.
If we consider giving such an AGI a simple goal, e.g. the established goal of paperclip maximization. Is it really clear that human values are not implicit even given such a simplistic goal?
To pose an existential risk in the first place, an AGI would have to maximize paperclips in an unbounded way, eventually taking over the whole universe and convert all matter into paperclips. Given that no sane human would explicitly define such a goal, an AGI with the goal of maximizing paperclips would have to infer it as implicit to do so. But would such an inference make sense, given its superhuman intelligence?
The question boils down to how an AGI would interpret any vagueness present in its goal architecture and how it would deal with the implied invisible.
Given that any rational agent, especially AGI’s capable of recursive self-improvement, want to act in the most intelligent and correct way possible, it seems reasonable that it would interpret any vagueness in a way that most closely reflects the most probable way it was meant to be interpreted.
Would it be intelligent and correct to ignore human volition in the context of maximizing paperclips? Would it be less wrong to maximize paperclips in the most literal sense possible?
The argument uttered by advocates of friendly AI is that any AGI that isn’t explicitly designed to be friendly won’t be friendly. But I wonder how much sense this actually makes.
Every human craftsman who enters into an agreement is bound by a contract that includes a lot of implied conditions. Humans use their intelligence to fill the gaps. For example, if a human craftsman is told to decorate a house, they are not going to attempt to take over the neighbourhood to protect their work.
A human craftsman wouldn’t do that, not because they share human values, but simply because it wouldn’t be sensible to do so given the implicit frame of reference of their contract. The contract implicitly includes the volition of the person that told them to decorate their house. They might not even like the way they are supposed to do it. It would simply be stupid to do it any different way.
How would a superhuman AI not contemplate its own drives and interpret them given the right frame of reference, i.e. human volition? Why would a superhuman general intelligence misunderstand what is meant by “maximize paperclips”, while any human intelligence will be better able to infer the correct interpretation?
You are assuming that the AI needs something from us, which may not be true as it develops further. The decorator follows the implied wishes not because he is smart enough to know what they are, but because he wishes to act in his client’s interest to gain payment, reputation, etc. Or he may believe that fulfilling his client’s wishes are morally good according to his morality. The mere fact that the wishes of his client are known does not guarantee that he will carry them out unless he values the client in some way to begin with (for their money or maybe their happiness)
The decorator follows the implied wishes not because he is smart enough to know what they are, but because he wishes to act in his client’s interest to gain payment, reputation, etc.
And an AGI wishes to achieve its goals the way they are meant to be achieved. Which includes all implicit conditions.
An AGI does not have to explicitly care about humans and their values as long as the implied context of its goals is human volition.
Consider a rich but sociopathic human decorator who solely cares about being a good decorator. What does a good decorator do? It does what its contract explicitly tells him to do AND what is implied by it, including the satisfaction of the customer.
You don’t need human moral values or any other complex values as long as you care to achieve your goals the way they are meant to be achieved, explicitly and implicitly.
If an AI has human interests as its main goal, it is already friendly. The question was whether intelligence on its own is enough to align it with human interests, which seems very unlikely. If the AI actually has cooperation with humans or fulfillment of some human wish as its goal, it will be able to use intelligence to better fulfill the wishes with all available context. But it’s getting the AI to operate with that goal that is difficult, I believe.
I think a better analogy with an AI would be a sociopathic decorator that doesn’t care about being a good decorator, but does care about fulfulling contracts, and cares about nothing not stated in the contract.
The “I obeyed the explicit content of the contract but didn’t give you what you want, sucks to be you” attitude exists in some humans (who are intelligent enough to know the implied meaning of the contract), so why wouldn’t it also exist in AIs?
The “I obeyed the explicit content of the contract but didn’t give you what you want, sucks to be you” attitude exists in some humans (who are intelligent enough to know the implied meaning of the contract), so why wouldn’t it also exist in AIs?
Sure, but why would anyone likely build such an AI? Which is at the core of what Ben Goertzel argues, we do not pull minds from design space at random.
A tool does what it is supposed to do. If you add a lot of intelligence, why would it suddenly do something completely nuts like taking over the universe, something that was obviously not the intended purpose?
I think a better analogy with an AI would be a sociopathic decorator that doesn’t care about being a good decorator, but does care about fulfulling contracts, and cares about nothing not stated in the contract.
I don’t think it would make sense to create an AGI that does not care about the implications and context of its goals but only follows the definitions verbatim. That doesn’t seem to be very intelligent behavior. And that’s exactly a quality an AGI capable of self-improvement needs, a sense for context and implications.
Many of our tools are supposed to be web browsers, email clients, etc., but have a history of suddenly doing something completely nuts like taking over the whole computer, which was obviously not the intended purpose. Programming is hard that way—the result will only follow your program, verbatim. Attempts to give programs a greater sense of context and implications aren’t new—they’re called “higher level languages”. They feel less like hand-holding a dumb machine and more like describing a thought process, and you can even design the language to make whole classes of lower-level bugs unwriteable, but machines still end up doing what they’re instructed, verbatim (where “what they’re instructed” can now also include the output of compiler bugs).
The trouble is that you can’t rule out every class of bugs. It’s hard (impossible?) to distinguish a priori between what might be a bug and what might just be a different programmers’ intention, even though we’ve been wishing for the ability to do so for over a century. “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?”
Thank you. I’ve been trying to argue that “the computer does what you tell it to” is a much more chaotic situation than those who want to build FAI seem to believe, and you lay it out better than I have.
Sure, but why would anyone likely build such an AI?
Because computer programs do what they’re programmed to do, without taking into account the actual intention of the user.
Creating an AGI that does take into account what people really want (bearing in mind that the AGI is massively more intelligent than the people wanting the things) is, it seems to me, what the whole Friendly thing is about. If you know how to do that, you’ve solved Friendliness.
Edit: With added complications such as people not knowing what they want, people having conflicting goals, people wanting different things once there’s a powerful AI doing stuff, etc etc
You are assuming that the AI needs something from us, which may not be true as it develops further. The decorator follows the implied wishes not because he is smart enough to know what they are, but because he wishes to act in his client’s interest to gain payment, reputation, etc. Or he may believe that fulfilling his client’s wishes are morally good according to his morality. The mere fact that the wishes of his client are known does not guarantee that he will carry them out unless he values the client in some way to begin with (for their money or maybe their happiness)
You are assuming that an .AI will last have only instrumental rationality. That the OT is true.
I believe that a fundamental requirement for any rational agent is the motivation to act maximally intelligently and correctly. That requirement seems even more obvious if we are talking about a conjectured artificial general intelligence (AGI) that is able to improve itself to the point where it is substantially better at most activities than humans. Since if it wouldn’t want to be maximally correct then it wouldn’t become superhuman intelligent in the first place.
The standard counterargument is along the lines of:
it won’t care about getting things right per se, it will only employ rationality as a means to other goals.(Or: instrumental rationality is the only kind, because that’s how we define rationality).
What justifies the “will”, the claim of necessity, or at least high probability, brings us back to the title of the original posting: evidence for the Orthogonality Thesis. Is non-instrumental rationality, rationality-as-a-goal impossible? Is no-one trying to build it? Why try to build single minded Artificial Obsessive Compulsives if it is dangerous? Isn’t rationality-as-a-goal a safer architecture?
(Note that I do not necessarily agree with what I wrote below. You asked for possible counter-arguments. So here goes.)
Might intelligence imply benevolence?
I believe that a fundamental requirement for any rational agent is the motivation to act maximally intelligently and correctly. That requirement seems even more obvious if we are talking about a conjectured artificial general intelligence (AGI) that is able to improve itself to the point where it is substantially better at most activities than humans. Since if it wouldn’t want to be maximally correct then it wouldn’t become superhuman intelligent in the first place.
If we consider giving such an AGI a simple goal, e.g. the established goal of paperclip maximization. Is it really clear that human values are not implicit even given such a simplistic goal?
To pose an existential risk in the first place, an AGI would have to maximize paperclips in an unbounded way, eventually taking over the whole universe and convert all matter into paperclips. Given that no sane human would explicitly define such a goal, an AGI with the goal of maximizing paperclips would have to infer it as implicit to do so. But would such an inference make sense, given its superhuman intelligence?
The question boils down to how an AGI would interpret any vagueness present in its goal architecture and how it would deal with the implied invisible.
Given that any rational agent, especially AGI’s capable of recursive self-improvement, want to act in the most intelligent and correct way possible, it seems reasonable that it would interpret any vagueness in a way that most closely reflects the most probable way it was meant to be interpreted.
Would it be intelligent and correct to ignore human volition in the context of maximizing paperclips? Would it be less wrong to maximize paperclips in the most literal sense possible?
The argument uttered by advocates of friendly AI is that any AGI that isn’t explicitly designed to be friendly won’t be friendly. But I wonder how much sense this actually makes.
Every human craftsman who enters into an agreement is bound by a contract that includes a lot of implied conditions. Humans use their intelligence to fill the gaps. For example, if a human craftsman is told to decorate a house, they are not going to attempt to take over the neighbourhood to protect their work.
A human craftsman wouldn’t do that, not because they share human values, but simply because it wouldn’t be sensible to do so given the implicit frame of reference of their contract. The contract implicitly includes the volition of the person that told them to decorate their house. They might not even like the way they are supposed to do it. It would simply be stupid to do it any different way.
How would a superhuman AI not contemplate its own drives and interpret them given the right frame of reference, i.e. human volition? Why would a superhuman general intelligence misunderstand what is meant by “maximize paperclips”, while any human intelligence will be better able to infer the correct interpretation?
You are assuming that the AI needs something from us, which may not be true as it develops further. The decorator follows the implied wishes not because he is smart enough to know what they are, but because he wishes to act in his client’s interest to gain payment, reputation, etc. Or he may believe that fulfilling his client’s wishes are morally good according to his morality. The mere fact that the wishes of his client are known does not guarantee that he will carry them out unless he values the client in some way to begin with (for their money or maybe their happiness)
And an AGI wishes to achieve its goals the way they are meant to be achieved. Which includes all implicit conditions.
An AGI does not have to explicitly care about humans and their values as long as the implied context of its goals is human volition.
Consider a rich but sociopathic human decorator who solely cares about being a good decorator. What does a good decorator do? It does what its contract explicitly tells him to do AND what is implied by it, including the satisfaction of the customer.
You don’t need human moral values or any other complex values as long as you care to achieve your goals the way they are meant to be achieved, explicitly and implicitly.
If an AI has human interests as its main goal, it is already friendly. The question was whether intelligence on its own is enough to align it with human interests, which seems very unlikely. If the AI actually has cooperation with humans or fulfillment of some human wish as its goal, it will be able to use intelligence to better fulfill the wishes with all available context. But it’s getting the AI to operate with that goal that is difficult, I believe.
I think a better analogy with an AI would be a sociopathic decorator that doesn’t care about being a good decorator, but does care about fulfulling contracts, and cares about nothing not stated in the contract.
The “I obeyed the explicit content of the contract but didn’t give you what you want, sucks to be you” attitude exists in some humans (who are intelligent enough to know the implied meaning of the contract), so why wouldn’t it also exist in AIs?
Sure, but why would anyone likely build such an AI? Which is at the core of what Ben Goertzel argues, we do not pull minds from design space at random.
A tool does what it is supposed to do. If you add a lot of intelligence, why would it suddenly do something completely nuts like taking over the universe, something that was obviously not the intended purpose?
I don’t think it would make sense to create an AGI that does not care about the implications and context of its goals but only follows the definitions verbatim. That doesn’t seem to be very intelligent behavior. And that’s exactly a quality an AGI capable of self-improvement needs, a sense for context and implications.
Many of our tools are supposed to be web browsers, email clients, etc., but have a history of suddenly doing something completely nuts like taking over the whole computer, which was obviously not the intended purpose. Programming is hard that way—the result will only follow your program, verbatim. Attempts to give programs a greater sense of context and implications aren’t new—they’re called “higher level languages”. They feel less like hand-holding a dumb machine and more like describing a thought process, and you can even design the language to make whole classes of lower-level bugs unwriteable, but machines still end up doing what they’re instructed, verbatim (where “what they’re instructed” can now also include the output of compiler bugs).
The trouble is that you can’t rule out every class of bugs. It’s hard (impossible?) to distinguish a priori between what might be a bug and what might just be a different programmers’ intention, even though we’ve been wishing for the ability to do so for over a century. “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?”
Thank you. I’ve been trying to argue that “the computer does what you tell it to” is a much more chaotic situation than those who want to build FAI seem to believe, and you lay it out better than I have.
Yet, people around here seem to believe that the AI will develop an accurate model of the world even if its input isn’t all that accurate.
Who believes what, exactly?
Because computer programs do what they’re programmed to do, without taking into account the actual intention of the user.
Creating an AGI that does take into account what people really want (bearing in mind that the AGI is massively more intelligent than the people wanting the things) is, it seems to me, what the whole Friendly thing is about. If you know how to do that, you’ve solved Friendliness.
Edit: With added complications such as people not knowing what they want, people having conflicting goals, people wanting different things once there’s a powerful AI doing stuff, etc etc
You are assuming that an .AI will last have only instrumental rationality. That the OT is true.
The standard counterargument is along the lines of: it won’t care about getting things right per se, it will only employ rationality as a means to other goals.(Or: instrumental rationality is the only kind, because that’s how we define rationality).
What justifies the “will”, the claim of necessity, or at least high probability, brings us back to the title of the original posting: evidence for the Orthogonality Thesis. Is non-instrumental rationality, rationality-as-a-goal impossible? Is no-one trying to build it? Why try to build single minded Artificial Obsessive Compulsives if it is dangerous? Isn’t rationality-as-a-goal a safer architecture?