Steven, I’m a little surprised that the paper you reference convinces you of a high probability of imminent danger. I have read this paper several times, and would summarize its relevant points thusly:
We tend to anthropomorphise, so our intuitive ideas about how an AI would behave might be biased. In particular, assuming that an AI will be “friendly” because people are more or less friendly might be wrong.
Through self-improvement, AI might become intelligent enough to accomplish tasks much more quickly and effectively than we expect.
This super-effective AI would have the ability (perhaps just as a side effect of its goal attainment) to wipe out humanity. Because of the bias in (1) we do not give sufficient credibility to this possibility when in fact it is the default scenario unless the AI is constructed very carefully to avoid it.
It might be possible to do that careful construction (that is, create a Friendly AI), if we work hard on achieving that task. It is not impossible.
The only arguments for the likelihood of imminence despite little to none apparent progress toward a machine capable of acting intelligently in the world and rapidly rewriting its own source code are:
A. a “loosely analogous historical surprise”—the above-mentioned nuclear reaction analogy.
B. the observation that breakthroughs do not occur on predictable timeframes, so it could happen tomorrow.
C. we might already have sufficient prerequisites for the breakthrough to occur (computing power, programming productivity, etc)
I find these points to all be reasonable enough and imagine that most people would agree. The problem is going from this set of “mights” and suggestive analogies to a probability of imminence. You can’t expect to get much traction for something that might happen someday, you have to link from possibility to likelihood. That people make this leap without saying how they got there is why observers refer to the believers as a sort of religious cult. Perhaps the case is made somewhere but I haven’t seen it. I know that Yudkowsky and Hanson debated a closely related topic on Overcoming Bias at some length, but I found Eliezer’s case to be completely unconvincing.
I just don’t see it myself… “Seed AI” (as one example of a sort of scenario sketch) was written almost a decade ago and contains many different requirements. As far as I can see, none of them have had any meaningful progress in the meantime. If multiple or many breakthroughs are necessary, let’s see one of them for starters. One might hypothesize that just one magic bullet brfeakthrough is necessary but that sounds more like a paranoid fantasy than a credible scientific hypothesis.
Now, I’m personally sympathetic to these ideas (check the SIAI donor page if you need proof), and if the lack of a case from possibility to likelihood leaves me cold, it shouldn’t be surprising that society as a whole remains unconvinced.
Given the stakes, if you already accept the expected utility maximization decision principle, it’s enough to become convinced that there is even a nontrivial probability of this happening. The paper seems to be adequate for snapping the reader’s mind out of conviction in the absurdity and impossibility of dangerous AI.
The stakes on the other side of the equation are also the survival of the human race.
Refraining from developing AI unless we can formally prove it is safe may also lead to extinction if it reduces our ability to cope with other existential threats,
“Enough” is ambiguous; your point is true but doesn’t affect Vladimir’s if he meant “enough to justify devoting a large amount of your attention (given the current distribution of allocated attention) to the risk of UFAI hard takeoff”.
Hmm, I was thinking more of being convinced there’s a “significant probability”, for a definition of “significant probability” that may be much lower than the one you intended. I’m not sure if I’d also claim the paper convinces me of a “high probability”. Agreed that it would be more convincing to the general public if there were an argument for that. I may comment more after rereading.
Apparently you and others have some sort of estimate of probability distribution over time leading you to being alarmed enough to demand action. Maybe it’s say “1% chance in the next 20 years of hard takeoff” or something like that. Say what it is and how you got to it from “conceivability” or “non-impossibility”. If there is a reasoned link that can be analyzed producing such a result, it is no longer a leap of faith; it can be reasoned about rationally and discussed in more detail. Don’t get hung up on the number exactly, use a qualitative measure if you like, but the point is how you got there.
I am not attempting to ridicule hard takeoff or Friendly AI, just giving my opinion about the thesis question of this post: “what can we do to efficiently change the opinion of millions of people...”
Hanson’s position was that something like a singularity will occur due to smarter than human Cognition, but he differs from eliezer by claiming that it will be a distributed intelligence analogous to the economy, trillions of smart human uploads and narrow AIs exchanging skills and subroutines.
He still ultimately supports the idea of a fast transition, based on historical transitions. I think robin would say that something midway between 2 weeks and 20 years is reasonable. Ultimately, if you think hanson has a stronger case, you’re still talking about a fast transition to superintelligence that we need to think about very carefully.
In the CES model (which this author prefers) if the next number of doubles of DT were the same as one of the last three DT doubles, the next doubling time would be either would be 1.3, 2.1, or 2.3 weeks. This suggests a remarkably precise estimate of an amazingly fast growth rate.
Let us now consider the simplest endogenous growth model … lowering ˜α just a little, from .25 to .241, reduces the economic doubling time from 16 years to 13 months … Reducing ˜α further to .24 eliminates diminishing returns and steady growth solutions entirely.
My current thinking is that AI might be in the space of things we can’t understand.
While we are improving our knowledge of the brain, no one is coming up with simple theories that explains the brain as a whole rather than as bits and pieces with no coherent design, that we can see.
Under this scenario AI is still possible, but if we do make it, it will be done by semi-blindly copying the machinery we have with random tweaks. And if it does start to self-improve it will be doing so with random tweaks only, as it will have our lack of ability to comprehend itself.
Why does AI design need to have anything to do with the brain? (Third Alternative: ab initio development based on a formal normative theory of general intelligence, not a descriptive theory of human intelligence, comprehensible even to us to say nothing of itself once it gets smart enough.)
(Edit: Also, it’s a huge leap from “no one is coming up with simple theories of the brain yet” to “we may well never understand intelligence”.)
A specific AI design need be nothing like the design of the brain. However the brain is the only object we know of in mind space, so having difficulty understanding it is evidence, although very weak, that we may have difficulty understanding minds in general.
We might expect it to be a special case as we are trying to understand methods of understanding, so we are being somewhat self-referential.
If you read my comment you’ll see I only raised it as a possibility, something to try and estimate the probability of, rather than necessarily the most likely case.
What would you estimate the probability of this scenario being, and why?
There might be formal proofs, but they probably are reliant on the definition of things like what understanding is, I’ve been trying to think of mathematical formalisms to explore this question, but I haven’t come up with a satisfactory one yet.
It is trivial to say one AIXI can’t comprehend another instance of AIXI, if by comprehend you mean form an accurate model.
AIXI expects the environment to be computable and is itself incomputable. So if one AIXI comes across another, it won’t be able to form a true model of it.
However I am not sure of the value of this argument as we expect intelligence to be computable.
Seems plausible. However under this model, there’s still room for self-improvement using something like genetic algorithms; that is, it could make small, random tweaks, but find and implement the best ones in much less time than we could possibly do with humans. Then it could still be recursively self-improving.
A lot of us think this scenario is much more likely. Mostly those on the side of Chaos in a particular Grand Narrative. Plug for The Future and its Enemies—arguably one of the most important works in political philosophy from the 20th century.
That is much weaker than the type of RSI that is supposed to cause FOOM. For one you are only altering software not hardware, and secondly I don’t think a system that replaces itself with a random variation, even if it has been tested, will necessarily be better, if it doesn’t understand itself. Random alterations, may cause madness, introduce bugs or other problems a long time after the change.
I think this is a convincing case but clearly others disagree. Do you have specific suggestions for arguments that could be expanded upon?
Steven, I’m a little surprised that the paper you reference convinces you of a high probability of imminent danger. I have read this paper several times, and would summarize its relevant points thusly:
We tend to anthropomorphise, so our intuitive ideas about how an AI would behave might be biased. In particular, assuming that an AI will be “friendly” because people are more or less friendly might be wrong.
Through self-improvement, AI might become intelligent enough to accomplish tasks much more quickly and effectively than we expect.
This super-effective AI would have the ability (perhaps just as a side effect of its goal attainment) to wipe out humanity. Because of the bias in (1) we do not give sufficient credibility to this possibility when in fact it is the default scenario unless the AI is constructed very carefully to avoid it.
It might be possible to do that careful construction (that is, create a Friendly AI), if we work hard on achieving that task. It is not impossible.
The only arguments for the likelihood of imminence despite little to none apparent progress toward a machine capable of acting intelligently in the world and rapidly rewriting its own source code are:
A. a “loosely analogous historical surprise”—the above-mentioned nuclear reaction analogy. B. the observation that breakthroughs do not occur on predictable timeframes, so it could happen tomorrow. C. we might already have sufficient prerequisites for the breakthrough to occur (computing power, programming productivity, etc)
I find these points to all be reasonable enough and imagine that most people would agree. The problem is going from this set of “mights” and suggestive analogies to a probability of imminence. You can’t expect to get much traction for something that might happen someday, you have to link from possibility to likelihood. That people make this leap without saying how they got there is why observers refer to the believers as a sort of religious cult. Perhaps the case is made somewhere but I haven’t seen it. I know that Yudkowsky and Hanson debated a closely related topic on Overcoming Bias at some length, but I found Eliezer’s case to be completely unconvincing.
I just don’t see it myself… “Seed AI” (as one example of a sort of scenario sketch) was written almost a decade ago and contains many different requirements. As far as I can see, none of them have had any meaningful progress in the meantime. If multiple or many breakthroughs are necessary, let’s see one of them for starters. One might hypothesize that just one magic bullet brfeakthrough is necessary but that sounds more like a paranoid fantasy than a credible scientific hypothesis.
Now, I’m personally sympathetic to these ideas (check the SIAI donor page if you need proof), and if the lack of a case from possibility to likelihood leaves me cold, it shouldn’t be surprising that society as a whole remains unconvinced.
Given the stakes, if you already accept the expected utility maximization decision principle, it’s enough to become convinced that there is even a nontrivial probability of this happening. The paper seems to be adequate for snapping the reader’s mind out of conviction in the absurdity and impossibility of dangerous AI.
The stakes on the other side of the equation are also the survival of the human race.
Refraining from developing AI unless we can formally prove it is safe may also lead to extinction if it reduces our ability to cope with other existential threats,
“Enough” is ambiguous; your point is true but doesn’t affect Vladimir’s if he meant “enough to justify devoting a large amount of your attention (given the current distribution of allocated attention) to the risk of UFAI hard takeoff”.
Hmm, I was thinking more of being convinced there’s a “significant probability”, for a definition of “significant probability” that may be much lower than the one you intended. I’m not sure if I’d also claim the paper convinces me of a “high probability”. Agreed that it would be more convincing to the general public if there were an argument for that. I may comment more after rereading.
Apparently you and others have some sort of estimate of probability distribution over time leading you to being alarmed enough to demand action. Maybe it’s say “1% chance in the next 20 years of hard takeoff” or something like that. Say what it is and how you got to it from “conceivability” or “non-impossibility”. If there is a reasoned link that can be analyzed producing such a result, it is no longer a leap of faith; it can be reasoned about rationally and discussed in more detail. Don’t get hung up on the number exactly, use a qualitative measure if you like, but the point is how you got there.
I am not attempting to ridicule hard takeoff or Friendly AI, just giving my opinion about the thesis question of this post: “what can we do to efficiently change the opinion of millions of people...”
Hanson’s position was that something like a singularity will occur due to smarter than human Cognition, but he differs from eliezer by claiming that it will be a distributed intelligence analogous to the economy, trillions of smart human uploads and narrow AIs exchanging skills and subroutines.
He still ultimately supports the idea of a fast transition, based on historical transitions. I think robin would say that something midway between 2 weeks and 20 years is reasonable. Ultimately, if you think hanson has a stronger case, you’re still talking about a fast transition to superintelligence that we need to think about very carefully.
Indeed:
See also Economic Growth Given Machine Intelligence:
My current thinking is that AI might be in the space of things we can’t understand.
While we are improving our knowledge of the brain, no one is coming up with simple theories that explains the brain as a whole rather than as bits and pieces with no coherent design, that we can see.
Under this scenario AI is still possible, but if we do make it, it will be done by semi-blindly copying the machinery we have with random tweaks. And if it does start to self-improve it will be doing so with random tweaks only, as it will have our lack of ability to comprehend itself.
Why does AI design need to have anything to do with the brain? (Third Alternative: ab initio development based on a formal normative theory of general intelligence, not a descriptive theory of human intelligence, comprehensible even to us to say nothing of itself once it gets smart enough.)
(Edit: Also, it’s a huge leap from “no one is coming up with simple theories of the brain yet” to “we may well never understand intelligence”.)
A specific AI design need be nothing like the design of the brain. However the brain is the only object we know of in mind space, so having difficulty understanding it is evidence, although very weak, that we may have difficulty understanding minds in general.
We might expect it to be a special case as we are trying to understand methods of understanding, so we are being somewhat self-referential.
If you read my comment you’ll see I only raised it as a possibility, something to try and estimate the probability of, rather than necessarily the most likely case.
What would you estimate the probability of this scenario being, and why?
There might be formal proofs, but they probably are reliant on the definition of things like what understanding is, I’ve been trying to think of mathematical formalisms to explore this question, but I haven’t come up with a satisfactory one yet.
Have you looked at AIXI?
It is trivial to say one AIXI can’t comprehend another instance of AIXI, if by comprehend you mean form an accurate model.
AIXI expects the environment to be computable and is itself incomputable. So if one AIXI comes across another, it won’t be able to form a true model of it.
However I am not sure of the value of this argument as we expect intelligence to be computable.
Seems plausible. However under this model, there’s still room for self-improvement using something like genetic algorithms; that is, it could make small, random tweaks, but find and implement the best ones in much less time than we could possibly do with humans. Then it could still be recursively self-improving.
A lot of us think this scenario is much more likely. Mostly those on the side of Chaos in a particular Grand Narrative. Plug for The Future and its Enemies—arguably one of the most important works in political philosophy from the 20th century.
That is much weaker than the type of RSI that is supposed to cause FOOM. For one you are only altering software not hardware, and secondly I don’t think a system that replaces itself with a random variation, even if it has been tested, will necessarily be better, if it doesn’t understand itself. Random alterations, may cause madness, introduce bugs or other problems a long time after the change.
Note: Deliberate alterations may cause madness or introduce bugs or other problems a long time after the change.
The idea with Eliezer style RSI is formally proved good alterations.