The paper had nothing to do with what you talked about in your opening paragraph, and your comment:
Please go read some actual scientific material rather than assuming that The Metamorphosis of Prime Intellect is up-to-date with the current literature
… was extremely rude.
I build AI systems, and I have been working in the field (and reading the literature) since the early 1980s.
Even so, I would be happy to answer questions if you could read the paper carefully enough to see that it was not about the topic you thought it was about.
The paper had nothing to do with what you talked about in your opening paragraph
What? Your post starts with:
My goal in this essay is to analyze some widely discussed scenarios that predict dire and almost unavoidable negative behavior from future artificial general intelligences, even if they are programmed to be friendly to humans.
Eli’s opening paragraph explains the “basic UFAI doomsday scenario”. How is this not what you talked about?
The paper’s goal is not to discuss “basic UFAI doomsday scenarios” in the general sense, but to discuss the particular case where the AI goes all pear-shaped EVEN IF it is programmed to be friendly to humans.
That last part (even if it is programmed to be friendly to humans) is the critical qualifier that narrows down the discussion to those particular doomsday scenarios in which the AI does claim to be trying to be friendly to humans—it claims to be maximizing human happiness—but in spite of that it does something insanely wicked.
So, Eli says:
The basic UFAI doomsday scenario is: the AI has vast powers of learning and inference with respect to its world-model, but has its utility function (value system) hardcoded. Since the hardcoded utility function does not specify a naturalization of morality, or CEV, or whatever, the UFAI proceeds to tile the universe in whatever it happens to like (which are things we people don’t like), precisely because it has no motivation to “fix” its hardcoded utility function
… and this clearly says that the type of AI he has in mind is one that is not even trying to be friendly. Rather, he talks about how its
hardcoded utility function does not specify a naturalization of morality, or CEV, or whatever
And then he adds that
the UFAI proceeds to tile the universe in whatever it happens to like
… which has nothing to do with the cases that the entire paper is about, namely the cases where the AI is trying really hard to be friendly, but doing it in a way that we did not intend.
If you read the paper all of this is obvious pretty quickly, but perhaps if you only skim-read a few paragraphs you might get the wrong impression. I suspect that is what happened.
namely the cases where the AI is trying really hard to be friendly, but doing it in a way that we did not intend.
If the AI knows what friendly is or what mean means, than your conclusion is trivially true. The problem is programming those in—that’s what FAI is all about.
I still agree with Eli and think you’re “really failing to clarify the issue”, and claiming that xyz is not the issue does not resolve anything. Disengaging.
The paper had nothing to do with what you talked about in your opening paragraph
Except that the paper was about more-or-less exactly what I said in that paragraph. But the whole lesson is: do not hard-code things into AGI systems. Luckily, we learn this lesson everywhere: symbolic, first-order logic-based AI failed miserably, failed not only to generate a superintelligent ethicist but failed, in fact, to detect which pictures are cat pictures or perform commonsense inference.
I build AI systems, and I have been working in the field (and reading the literature) since the early 1980s.
Ok, and how many of those possessed anything like human-level cognitive abilities? How many were intended to, but failed? How many were designed on a solid basis in statistical learning?
No, as I just explained to SimonF, below, that is not what it is about.
I will repeat what I said:
The paper’s goal is not to discuss “basic UFAI doomsday scenarios” in the general sense, but to discuss the particular case where the AI goes all pear-shaped EVEN IF it is programmed to be friendly to humans.
That last part (even if it is programmed to be friendly to humans) is the critical qualifier that narrows down the discussion to those particular doomsday scenarios in which the AI does claim to be trying to be friendly to humans—it claims to be maximizing human happiness—but in spite of that it does something insanely wicked.
So, you said:
The basic UFAI doomsday scenario is: the AI has vast powers of learning and inference with respect to its world-model, but has its utility function (value system) hardcoded. Since the hardcoded utility function does not specify a naturalization of morality, or CEV, or whatever, the UFAI proceeds to tile the universe in whatever it happens to like (which are things we people don’t like), precisely because it has no motivation to “fix” its hardcoded utility function
… and this clearly says that the type of AI you have in mind is one that is not even trying to be friendly. Rather, you talk about how its
hardcoded utility function does not specify a naturalization of morality, or CEV, or whatever
And then you add that
the UFAI proceeds to tile the universe in whatever it happens to like
… which has nothing to do with the cases that the entire paper is about, namely the cases where the AI is trying really hard to be friendly, but doing it in a way that we did not intend.
There is very little distinction, from the point of view of actual behaviors, between a supposedly-Friendly-but-actually-not AI, and a regular UFAI. Well, maybe the former will wait a bit longer before its pathological behavior shows up. Maybe. I really don’t want to be the sorry bastard who tries that experiment: it would just be downright embarrassing.
But of course, the simplest way to bypass this is precisely to be able to, as previously mentioned in my comment and by nearly all authors on the issue, specify the utility function as the outcome of an inference problem, thus ensuring that additional interaction with humans causes the AI to update its utility function and become Friendlier with time.
Causal inference that allows for deliberate conditioning of distributions on complex, counterfactual scenarios should actually help with this. Causal reasoning does dissolve into counterfactual reasoning, after all, so rational action on evaluative criteria can be considered a kind of push-and-pull force acting on an agent’s trajectory through the space of possible histories: undesirable counterfactuals push the agent’s actions away (ie: push the agent to prevent their becoming real), while desirable counterfactuals pull the agent’s actions towards themselves (ie: the agent takes actions to achieve those events as goals) :-p.
The paper had nothing to do with what you talked about in your opening paragraph, and your comment:
… was extremely rude.
I build AI systems, and I have been working in the field (and reading the literature) since the early 1980s.
Even so, I would be happy to answer questions if you could read the paper carefully enough to see that it was not about the topic you thought it was about.
What? Your post starts with:
Eli’s opening paragraph explains the “basic UFAI doomsday scenario”. How is this not what you talked about?
The paper’s goal is not to discuss “basic UFAI doomsday scenarios” in the general sense, but to discuss the particular case where the AI goes all pear-shaped EVEN IF it is programmed to be friendly to humans.
That last part (even if it is programmed to be friendly to humans) is the critical qualifier that narrows down the discussion to those particular doomsday scenarios in which the AI does claim to be trying to be friendly to humans—it claims to be maximizing human happiness—but in spite of that it does something insanely wicked.
So, Eli says:
… and this clearly says that the type of AI he has in mind is one that is not even trying to be friendly. Rather, he talks about how its
And then he adds that
… which has nothing to do with the cases that the entire paper is about, namely the cases where the AI is trying really hard to be friendly, but doing it in a way that we did not intend.
If you read the paper all of this is obvious pretty quickly, but perhaps if you only skim-read a few paragraphs you might get the wrong impression. I suspect that is what happened.
If the AI knows what friendly is or what mean means, than your conclusion is trivially true. The problem is programming those in—that’s what FAI is all about.
I still agree with Eli and think you’re “really failing to clarify the issue”, and claiming that xyz is not the issue does not resolve anything. Disengaging.
Yes, it was rude.
Except that the paper was about more-or-less exactly what I said in that paragraph. But the whole lesson is: do not hard-code things into AGI systems. Luckily, we learn this lesson everywhere: symbolic, first-order logic-based AI failed miserably, failed not only to generate a superintelligent ethicist but failed, in fact, to detect which pictures are cat pictures or perform commonsense inference.
Ok, and how many of those possessed anything like human-level cognitive abilities? How many were intended to, but failed? How many were designed on a solid basis in statistical learning?
No, as I just explained to SimonF, below, that is not what it is about.
I will repeat what I said:
The paper’s goal is not to discuss “basic UFAI doomsday scenarios” in the general sense, but to discuss the particular case where the AI goes all pear-shaped EVEN IF it is programmed to be friendly to humans.
That last part (even if it is programmed to be friendly to humans) is the critical qualifier that narrows down the discussion to those particular doomsday scenarios in which the AI does claim to be trying to be friendly to humans—it claims to be maximizing human happiness—but in spite of that it does something insanely wicked.
So, you said:
… and this clearly says that the type of AI you have in mind is one that is not even trying to be friendly. Rather, you talk about how its
And then you add that
… which has nothing to do with the cases that the entire paper is about, namely the cases where the AI is trying really hard to be friendly, but doing it in a way that we did not intend.
There is very little distinction, from the point of view of actual behaviors, between a supposedly-Friendly-but-actually-not AI, and a regular UFAI. Well, maybe the former will wait a bit longer before its pathological behavior shows up. Maybe. I really don’t want to be the sorry bastard who tries that experiment: it would just be downright embarrassing.
But of course, the simplest way to bypass this is precisely to be able to, as previously mentioned in my comment and by nearly all authors on the issue, specify the utility function as the outcome of an inference problem, thus ensuring that additional interaction with humans causes the AI to update its utility function and become Friendlier with time.
Causal inference that allows for deliberate conditioning of distributions on complex, counterfactual scenarios should actually help with this. Causal reasoning does dissolve into counterfactual reasoning, after all, so rational action on evaluative criteria can be considered a kind of push-and-pull force acting on an agent’s trajectory through the space of possible histories: undesirable counterfactuals push the agent’s actions away (ie: push the agent to prevent their becoming real), while desirable counterfactuals pull the agent’s actions towards themselves (ie: the agent takes actions to achieve those events as goals) :-p.