An AI that is successfully “Friendly” poses an extistential risk of a kind that other AIs don’t pose. The main risk from an unfriendly AI is that it will kill all humans. That isn’t much of a risk; humans are on the way out in any case. Whereas the main risk from a “friendly” AI is that it will successfully impose a single set of values, defined by hairless monkeys, on the entire Universe until the end of time.
And, if you are afraid of unfriendly AI because you’re afraid it will kill you—why do you think that a “Friendly” AI is less likely to kill you? An “unfriendly” AI is following goals that probably appear random to us. There are arguments that it will inevitably take resources away from humans, but these are just that—arguments. Whereas a “friendly” AI will be designed to try to seize absolute power, and take every possible measure to prevent humans from creating another AI. If your name appears on this website, you’re already on its list of people whose continued existence will be risky.
(Also, all these numbers seem to be pulled out of thin air.)
I see no reason an AI with any other expansionist value system will not exhibit the exact same behaviour, except towards a different goal. There’s nothing so special about human values (except that they’re, y’know, good, but that’s a different issue).
You’re using a different definition of “friendly” than I am. An 80% chance SIAI’s AI is Unfriendly already contains all of your “takes over but messes everything up in unpredictable ways” scenarios.
The numbers were exaggerated for effect, to show contrast and my thought process. It seems to me that you think the probabilities are reversed.
The term “Friendly AI” refers to the production of human-benefiting, non-human-harming actions in Artificial Intelligence systems that have advanced to the point of making real-world plans in pursuit of goals.
See the “non-human-harming” bit. Regarding:
If your name appears on this website, you’re already on its list of people whose continued existence will be risky.
Yes, one of their PR problems is that they are implicitly threatening their rivals. In the case of Ben Goertzel some of the threats are appearing IRL. Let us hear the tale of how threats and nastiness will be avoided. No plan is not a good plan, in this particular case.
An AI that is successfully “Friendly” poses an extistential risk of a kind that other AIs don’t pose. The main risk from an unfriendly AI is that it will kill all humans. That isn’t much of a risk
What do you mean by existential risk, then? I thought things that killed all humans were, by definition, existential risks.
humans are on the way out in any case.
What, if anything, do you value that you expect to exist in the long term?
There are arguments that [an UFAI] will inevitably take resources away from humans, but these are just that—arguments.
Pretty compelling arguments, IMO. It’s simple—the vast majority of goals can be achieved more easily if one has more resources, and humans control resources, so an entity that is able to self-improve will tend to seize control of all the resources and therefore take control of those resources from the humans.
Do you have a counterargument, or something relevant to the issue that isn’t just an argument?
AI will be designed to try to seize absolute power, and take every possible measure to prevent humans from creating another AI. If your name appears on this website, you’re already on its list of people whose continued existence will be risky.
Not much risk. Hunting down irrelevant blog commenters is a greater risk than leaving them be. There isn’t much of a window during which any human is a slightest threat and during that window going around killing people is just going to enhance the risk to it.
The window is presumably between now and when the winner is obvious—assuming we make it that far.
IMO, there’s plenty of scope for paranoia in the interim. Looking at the logic so far some teams will reason that unless their chosen values get implemented, much of value is likely to be lost. They will then mulitiply that by a billion years and a billion planets—and conclude that their competitors might really matter.
Killing people might indeed backfire—but that still leaves plenty of scope for dirty play.
Uh huh. So: world view difference. Corps and orgs will most likely go from 90% human to 90% machine through the well-known and gradual process of automation, gaining power as they go—and the threats from bad organisations are unlikely to be something that will appear suddenly at some point.
If we take those probabilities as a given, they strongly encourage a strategy that increases the chance that the first seed AI is Friendly.
jsalvatier already had a suggestion along those lines:
I wonder if SIAI could publicly discuss the values part of the AI without discussing the optimization part.
A public Friendly design could draw funding, benefit from technical collaboration, and hopefully end up used in whichever seed AI wins. Unfortunately, you’d have to decouple the F and AI parts, which is impossible.
I’m talking about publishing a technical design of Friendliness that’s conserved under self-improving optimization without also publishing (in math and code) exactly what is meant by self-improving optimization. CEV is a good first step, but a programmatically reusable solution it is not.
Before you the terrible blank wall stretches up and up and up, unimaginably far out of reach. And there is also the need to solve it, really solve it, not “try your best”.
I think of it this way:
Chance SIAI’s AI is Unfriendly: 80%
Chance anyone else’s AI is Unfriendly: >99%
Chance SIAI builds their AI first: 10%
Chance SIAI builds their AI first while making all their designs public: <1% (no change to other probabilities)
An AI that is successfully “Friendly” poses an extistential risk of a kind that other AIs don’t pose. The main risk from an unfriendly AI is that it will kill all humans. That isn’t much of a risk; humans are on the way out in any case. Whereas the main risk from a “friendly” AI is that it will successfully impose a single set of values, defined by hairless monkeys, on the entire Universe until the end of time.
And, if you are afraid of unfriendly AI because you’re afraid it will kill you—why do you think that a “Friendly” AI is less likely to kill you? An “unfriendly” AI is following goals that probably appear random to us. There are arguments that it will inevitably take resources away from humans, but these are just that—arguments. Whereas a “friendly” AI will be designed to try to seize absolute power, and take every possible measure to prevent humans from creating another AI. If your name appears on this website, you’re already on its list of people whose continued existence will be risky.
(Also, all these numbers seem to be pulled out of thin air.)
I see no reason an AI with any other expansionist value system will not exhibit the exact same behaviour, except towards a different goal. There’s nothing so special about human values (except that they’re, y’know, good, but that’s a different issue).
You’re using a different definition of “friendly” than I am. An 80% chance SIAI’s AI is Unfriendly already contains all of your “takes over but messes everything up in unpredictable ways” scenarios.
The numbers were exaggerated for effect, to show contrast and my thought process. It seems to me that you think the probabilities are reversed.
One definition of the term explains:
See the “non-human-harming” bit. Regarding:
Yes, one of their PR problems is that they are implicitly threatening their rivals. In the case of Ben Goertzel some of the threats are appearing IRL. Let us hear the tale of how threats and nastiness will be avoided. No plan is not a good plan, in this particular case.
What do you mean by existential risk, then? I thought things that killed all humans were, by definition, existential risks.
What, if anything, do you value that you expect to exist in the long term?
Pretty compelling arguments, IMO. It’s simple—the vast majority of goals can be achieved more easily if one has more resources, and humans control resources, so an entity that is able to self-improve will tend to seize control of all the resources and therefore take control of those resources from the humans.
Do you have a counterargument, or something relevant to the issue that isn’t just an argument?
Not much risk. Hunting down irrelevant blog commenters is a greater risk than leaving them be. There isn’t much of a window during which any human is a slightest threat and during that window going around killing people is just going to enhance the risk to it.
The window is presumably between now and when the winner is obvious—assuming we make it that far.
IMO, there’s plenty of scope for paranoia in the interim. Looking at the logic so far some teams will reason that unless their chosen values get implemented, much of value is likely to be lost. They will then mulitiply that by a billion years and a billion planets—and conclude that their competitors might really matter.
Killing people might indeed backfire—but that still leaves plenty of scope for dirty play.
No. Reread the context. This is the threat from “F”AI, not from designers. The window opens when someone clicks ‘run’.
Uh huh. So: world view difference. Corps and orgs will most likely go from 90% human to 90% machine through the well-known and gradual process of automation, gaining power as they go—and the threats from bad organisations are unlikely to be something that will appear suddenly at some point.
If we take those probabilities as a given, they strongly encourage a strategy that increases the chance that the first seed AI is Friendly.
jsalvatier already had a suggestion along those lines:
A public Friendly design could draw funding, benefit from technical collaboration, and hopefully end up used in whichever seed AI wins. Unfortunately, you’d have to decouple the F and AI parts, which is impossible.
Isn’t CEV an attempt to separate F and AI parts?
It’s half of the F. Between the CEV and the AGI is the ‘goal stability under recursion’ part.
It’s a good first step.
I don’t understand your impossibility comment, then.
I’m talking about publishing a technical design of Friendliness that’s conserved under self-improving optimization without also publishing (in math and code) exactly what is meant by self-improving optimization. CEV is a good first step, but a programmatically reusable solution it is not.
On doing the impossible:
OK, I understand that much better now. Great point.