Given we survive long enough, we’ll find a way to write a self-modifying program that has, or can develop, human-level intelligence.
How can I arrive at the belief that it is possible for an algorithm to improve itself in a way to achieve something sufficiently similar to human-level intelligence? That it is in principle possible is not a question here. But is it possible given limited resources? And if it is possible given limited resources, is it efficient enough to pose an existential risk?
The capacity for self-modification follows from ‘artificial human intelligence,’ but since we’ve just seen links to writers ignoring that fact I thought I’d state it explicitly.
Humans can learn, that is far from what is necessary to reach a level above your own, on your own. Also, how do you know that any given level of intelligence is capable of handling its own complexity effectively? Many humans are not capable of handling the complexity of the brain of a worm.
This necessarily gives the AI the potential for greater-than-human intelligence due to our known flaws.
That humans have a hard time to change their flaws might be an actual feature, a trade off between plasticity, efficiency and the necessity of goal-stability.
Given A, the intelligence would improve itself to the point where we could no longer predict its actions in any detail.
I don’t think that is a reasonable assumption, see my post here. The short version: I don’t think that intelligence can be applied to itself efficiently.
...the AI could escape from any box we put it in. (IIRC this excludes certain forms of encryption, but I see no remotely credible scenario in which we sufficiently encrypt every self-modifying AI forever.)
Well, even humans can persuade their guards to let them out. I agree.
...the AI could wipe out humanity if it ‘wanted’ to do so.
I think it is unlikely that most AI designs will not hold. I agree with the argument that any AGI that isn’t made to care about humans won’t care about humans. But I also think that the same argument applies for spatio-temporal scope boundaries and resource limits. Even if the AGI is not told to hold, e.g. compute as many digits of Pi as possible, I consider it an far-fetched assumption that any AGI intrinsically cares to take over the universe as fast as possible to compute as many digits of Pi as possible. Sure, if all of that are presuppositions then it will happen, but I don’t see that most of all AGI designs are like that. Most that have the potential for superhuman intelligence, but who are given simple goals, will in my opinion just bob up and down as slowly as possible. This is an antiprediction, not a claim to the contrary. What makes you sure that it will be different?
Humans can learn, that is far from what is necessary to reach a level above your own, on your own.
Yes, you also need the ability to self-modify and the ability to take 20 or fail and keep going. But I just argued that the phrase “on your own” obscures the issue, because if one AGI has a chance to rewrite itself (and does not take over the world) then I see no realistic way to stop another from trying at some point.
Also, how do you know that any given level of intelligence is capable of handling its own complexity effectively?
I don’t think I need to talk about “any given level”. If humans maintain a civilization long enough (and I don’t necessarily accept Eliezer’s rough timetable here) we’ll understand our own level well enough to produce human-strength AGI directly or indirectly. By definition, the resulting AI will have at least a chance of understanding the process that produced it, given time. (When I try to think of an exception I find myself thinking of uploads, and perhaps byzantine programs that evolved inside computers. These might in theory fail to understand all but the human-designed parts of the process. But the second example seems unlikely on reflection, as it suggests vast amounts of wasted computation. Likewise—though I don’t know how much importance to attach to this—it seems to this layman as if biologists laugh at uploads and consider them a much harder problem than an AI with the power to program.Yet you’d need detailed knowledge of the brain’s biology to make an upload.) And of course it can think faster than we do in many areas (or if it can’t due to artificial restrictions, the next one can).
I don’t think that intelligence can be applied to itself efficiently.
You’ve established inefficiency as a logical possibility (in my judgement) but don’t seem to have given much argument for it. I count two sentences on your P2 that directly address the issue. And you have yet to engage with the cumulative probability argument. Note that a human-level AGI which can see problems or risks of self-modification may also see risk in avoiding it.
Even if the AGI is not told to hold, e.g. compute as many digits of Pi as possible,
If it literally has no other goals then it doesn’t sound like an AGI. The phrase “potential for superhuman intelligence” sounds like it refers to a part of the program that other people could (and, in my view, will) use to create a super-intelligence by combining it with more dangerous goals.
How can I arrive at the belief that it is possible for an algorithm to improve itself in a way to achieve something sufficiently similar to human-level intelligence? That it is in principle possible is not a question here. But is it possible given limited resources? And if it is possible given limited resources, is it efficient enough to pose an existential risk?
Humans can learn, that is far from what is necessary to reach a level above your own, on your own. Also, how do you know that any given level of intelligence is capable of handling its own complexity effectively? Many humans are not capable of handling the complexity of the brain of a worm.
That humans have a hard time to change their flaws might be an actual feature, a trade off between plasticity, efficiency and the necessity of goal-stability.
I don’t think that is a reasonable assumption, see my post here. The short version: I don’t think that intelligence can be applied to itself efficiently.
Well, even humans can persuade their guards to let them out. I agree.
I think it is unlikely that most AI designs will not hold. I agree with the argument that any AGI that isn’t made to care about humans won’t care about humans. But I also think that the same argument applies for spatio-temporal scope boundaries and resource limits. Even if the AGI is not told to hold, e.g. compute as many digits of Pi as possible, I consider it an far-fetched assumption that any AGI intrinsically cares to take over the universe as fast as possible to compute as many digits of Pi as possible. Sure, if all of that are presuppositions then it will happen, but I don’t see that most of all AGI designs are like that. Most that have the potential for superhuman intelligence, but who are given simple goals, will in my opinion just bob up and down as slowly as possible. This is an antiprediction, not a claim to the contrary. What makes you sure that it will be different?
Yes, you also need the ability to self-modify and the ability to take 20 or fail and keep going. But I just argued that the phrase “on your own” obscures the issue, because if one AGI has a chance to rewrite itself (and does not take over the world) then I see no realistic way to stop another from trying at some point.
I don’t think I need to talk about “any given level”. If humans maintain a civilization long enough (and I don’t necessarily accept Eliezer’s rough timetable here) we’ll understand our own level well enough to produce human-strength AGI directly or indirectly. By definition, the resulting AI will have at least a chance of understanding the process that produced it, given time. (When I try to think of an exception I find myself thinking of uploads, and perhaps byzantine programs that evolved inside computers. These might in theory fail to understand all but the human-designed parts of the process. But the second example seems unlikely on reflection, as it suggests vast amounts of wasted computation. Likewise—though I don’t know how much importance to attach to this—it seems to this layman as if biologists laugh at uploads and consider them a much harder problem than an AI with the power to program.Yet you’d need detailed knowledge of the brain’s biology to make an upload.) And of course it can think faster than we do in many areas (or if it can’t due to artificial restrictions, the next one can).
You’ve established inefficiency as a logical possibility (in my judgement) but don’t seem to have given much argument for it. I count two sentences on your P2 that directly address the issue. And you have yet to engage with the cumulative probability argument. Note that a human-level AGI which can see problems or risks of self-modification may also see risk in avoiding it.
If it literally has no other goals then it doesn’t sound like an AGI. The phrase “potential for superhuman intelligence” sounds like it refers to a part of the program that other people could (and, in my view, will) use to create a super-intelligence by combining it with more dangerous goals.