If one were to approach it as an actual problem, it would certainly be worthwhile to focus on applying the safety engineering practices from the other fields—making it fail-safe, whenever possible by omitting, rather than adding, features. E.g. a nuclear reactor can’t blow up like a nuke chiefly by the lack of implosion assembly, lack of purity from neutron emitters, lack of neutron initiator, etc.
For instance, a “reward optimizer” would, normally, merely combine the reward button signal with the clock signal to produce the value which is actually being optimized. The fantastic adventures of the robot boy who’s trying to hold a button down until the heat death of the universe need not be relevant; results are likely to be far less spectacular and go along the lines of setting time to MAX_INT (or more likely, optimizing directly for the final result after the time is factored in), or in case of a more stupid system, starting a fire in the lab because turning off the cooling fans through some driver glitch has raised the oscillator frequency a little bit.
Of course, given that we lack any good idea of how a practical AGI might be built, and the theoretical implementation are highly technical and difficult to mentally process, it is too speculative for us to presently know what the features might be and what could be omitted, and in science fiction all you can do is take our (ontologically basic for humans) notion of intelligence and bolt on some variety of laws of robotics (or a constitution of robotics, or other form of wish list) on top of it.
For the “explosion”:
Consider an alien hivemind beehive made of rather unintelligent bees. They’re trying to build an artificial bee. If they build one, it’s very sub-beehive intelligent, it’s below the threshold of any intelligence explosion (assuming that the beehive is roughly at the cusp of intelligence explosion and assuming an intelligence explosion is possible). Yes, eventual displacement of the beehive may happen, but not through some instant “explosion”.
The AI—it’s hardware and software—will be a product of literally millions human-years of very bright (by the human standards) individuals working on various aspects of the relevant technology, and it’s not clear why you would expect different results than for above mentioned alien bees. It is clear why you would want that in a movie—makes for a better plot than “people slowly lose jobs”.
Security by omission is a very good point. The same is true in omitting options in protocols. If, for instance, “let the AI out of its box” or even “Give this AI extra hardware or communication with the broader public” are not official options for the result of a certain phase of the project, that makes it all the harder for it to happen.
Questions about architecture, and how we could begin to “bolt on” behavioral constraints are critical. And that’s precisely why we need to experiment. I suspect many architectures will be highly opaque and correspondingly difficult to constrain or directly modify. A simulated human brain might fall under this category.
Well, one thing about practical problem solving software is that in reality the action space is very highly dimensional and very, very huge. To make any kind of search work in anything close to reasonable computing time, one needs to be able to control the way the search looks for solutions. The ‘optimizations’ coincide with greater control over how something works and with movement away from theoretically simple models.
Mere absence of certain aspects of generality can simultaneously render AI massively easier to build (and able to solve given problems we want to solve when running on weaker hardware), and massively safer. Safer in the way that can not be described as some added on goals onto basic human notion of intelligence. Just as a nuclear power plant’s safety can not be described as that of a nuke with extra controls. Very little of the safety you find in engineering is based upon some additions onto a theoretical ideal. The problem of making, say, a tablesaw safer (that clever mechanism which instantly stops and retracts it when touched) is dramatically different from the problem of making an ideal forcefield cutting blade safer.
As for simulated human brain, there’s really no reason to expect that to lead to some extremely rapid intelligence explosion. What people think is likely, is dependent to what they are exposed to, and this applies as much to the regular masses who think terrorism is a huge risk because it’s all over the TV as to nerds who think this sort of scenario is a huge problem because it’s all over the sci-fi.
It’s the very quintessence of what Bruce Schneier termed a “movie plot threat” (with a movie plot solution as well). It may seem like worrying about movie plot threats can’t hurt. But it diverts resources from general safety towards safety against overly specific plots. E.g. instead of being more concerned about the economical impact of emerging technologies, individuals so inclined focus on an overly specific (and correspondingly unlikely) scenario, which was created primarily for entertainment purposes.
Security is both built into (design) and bolted onto (passwords, anti-virus software) software. It is build into (structural integrity, headlights, rules of the road) and bolted onto (seatbelts, airbags) cars. Safety will be architecture dependent. Provable safety in the sense that MIRI researches might be awkward to incorporate into many architectures, if it is possible at all.
If an intelligence explosion is possible, it is probably possible with any architecture, but much more efficient with some. But we won’t really know until we experiment enough to at least understand properties of these architectures under naive scaling based on computational resources.
I mention brain emulation specifically because it’s the clearest path we have to artificial intelligence (In the same sense that Fusion is a clear path to supplying global energy needs—the theory is known and sound but the engineering obstacles could put it off indefinitely). And presumably once you can make one brain in silico you could make it smarter than a person’s by a number of methods.
I’m presuming that at some point, we will want an AI that can program other AIs or self-modify in unexpected ways to improve itself.
But you’re right, external safety could be a stopgap not until we could make FOOM-capable AI provably safe, but until we could make FOOM impossible, and keep humans in the driver’s seat.
The bolted on security, though, is never bolted onto some idealized notion originating from fiction. That has all the potential of being even more distant from what’s needed as hypothetical teleport gate safety is from airbags.
As for brain emulation, the necessary computational power is truly immense, and the path towards such is anything but clear.
With regards to foom, it seems to me that the belief in foom is related to certain ignorance with regards to the intelligence already present, or the role of that on the “takeoff”. The combined human (and software) intelligence working on the relevant technologies is already massively superhuman, in the sense of superiority to any individual human. The end result is that the takeoff starts earlier and slower, much like how if you try to bring together chunks of plutonium, due to the substantial level of spontaneous fission already present, the chain reaction will reach massive power level before coefficient of multiplication gets larger than 1.
I agree with the point about how any intelligence that constructs a supercomputer is already superhuman, even if it’s just humans working in concert. I think this provides a major margin of safety. I am not quite as skeptical of takeoff overall as you seem to be. But a big science style effort is likely to minimize a lot of risks while a small one is not likely to succeed at all.
Brain emulation is definitely hard, but no other route even approaches plausibility currently. We’re 5 breakthroughs away from brain emulation, and 8 away from anything else. So using brain emulation as one possible scenario isn’t totally unreasonable imo.
Why do you expect “foom” from brain emulation, though?
My theory is that such expectations are driven by it being so far away that it is hard to picture us getting there gradually, instead you picture skipping straight to some mind upload that can run a thousand copies of itself or the like...
What I expect from the first “mind upload” is a simulated epileptic seizure. Refined gradually into some minor functionality. It is not an actual upload, either, just some samples of different human brains were used to infer general network topology and the like, and that has been simulated, and learns things, running at below realtime. On a computer that is consuming many megawatts of power, and costs more per day than the most expensive movie star or the like. The computer for price of which you could hire a hundred qualified engineers each thinking perhaps 10 times faster than this machine. Gradually refined—with immense difficulty—into human level performance. Nothing like some easy ways to make it smarter—these were used to make it work earlier.
This would be contemporary to (and making use of) software that can and did—by simulation and refinement of parameters—do utterly amazing things—more advanced variation of the software that can design ultra efficient turbine blades and the like today. (Non-AI, non autonomous software which can also be used to design DNA for some cancer-curing virus, or—by a deliberately malicious normal humans—everyone-killing virus, or the like, rendering the upload itself fairly irrelevant as a threat).
What I suspect many futurists imagine, is the mind upload of full working human mind, appearing in the computer, talking and the like—starting point, their mental model got here by magic, not by imagining actual progress. Then there’s some easy tweaks, which are again magicked into the mental model, no reduction to anything. The imaginary upload strains one’s mental simulator’s capacity quite a bit, and in the futurist’s mental model, it is not contemporary to any particularly cool technology. So the mind upload enjoys the advantages akin to that of a modern army sent back in time into 1000 BC (with nothing needing any fuel to operate or runways to take off from). And so the imaginary mind upload easily takes over the imaginary world.
I think your points are valid. I don’t expect FOOM from anything, necessarily, I just find it plausible (based on Eliezer’s arguments about all the possible methods of scaling that might be available to an AI).
I am pitching my arguments towards people who expect FOOM, but the possibility of non-FOOM for a longish while is very real.
And It is probably unwarranted to say anything about architecture, yo’ure right.
But Suppose we have human-level AIs, then decide to consciously build a substantially superhuman AI. Or we have superhuman AIs that can’t FOOM, and actively seek to make one that can. The same points apply.
It seems to me that this argument (and arguments which rely on unspecified methods and the like) boils down to breaking the world model to add things with unclear creation history and unclear decomposition into components, and resulting non-reductionist magic infested mental world model misbehaving. Just as it always did in the human history, yielding gods and the like.
You postulate that unspecific magic can create superhuman intelligence—it arises without mental model of necessary work, problems being solved, returns diminishing, and available optimizations being exhausted—is it a surprise that in this broken mental model (broken because we don’t know how the AI would be built), because the work is absent, the superhuman intelligence in question creates a greater still intelligence in days, merely continuing the trend of it’s unspecific creation? If it’s not at all surprising then it’s not informative that mental model goes in this direction.
If one were to approach it as an actual problem, it would certainly be worthwhile to focus on applying the safety engineering practices from the other fields—making it fail-safe, whenever possible by omitting, rather than adding, features. E.g. a nuclear reactor can’t blow up like a nuke chiefly by the lack of implosion assembly, lack of purity from neutron emitters, lack of neutron initiator, etc.
For instance, a “reward optimizer” would, normally, merely combine the reward button signal with the clock signal to produce the value which is actually being optimized. The fantastic adventures of the robot boy who’s trying to hold a button down until the heat death of the universe need not be relevant; results are likely to be far less spectacular and go along the lines of setting time to MAX_INT (or more likely, optimizing directly for the final result after the time is factored in), or in case of a more stupid system, starting a fire in the lab because turning off the cooling fans through some driver glitch has raised the oscillator frequency a little bit.
Of course, given that we lack any good idea of how a practical AGI might be built, and the theoretical implementation are highly technical and difficult to mentally process, it is too speculative for us to presently know what the features might be and what could be omitted, and in science fiction all you can do is take our (ontologically basic for humans) notion of intelligence and bolt on some variety of laws of robotics (or a constitution of robotics, or other form of wish list) on top of it.
For the “explosion”:
Consider an alien hivemind beehive made of rather unintelligent bees. They’re trying to build an artificial bee. If they build one, it’s very sub-beehive intelligent, it’s below the threshold of any intelligence explosion (assuming that the beehive is roughly at the cusp of intelligence explosion and assuming an intelligence explosion is possible). Yes, eventual displacement of the beehive may happen, but not through some instant “explosion”.
The AI—it’s hardware and software—will be a product of literally millions human-years of very bright (by the human standards) individuals working on various aspects of the relevant technology, and it’s not clear why you would expect different results than for above mentioned alien bees. It is clear why you would want that in a movie—makes for a better plot than “people slowly lose jobs”.
Security by omission is a very good point. The same is true in omitting options in protocols. If, for instance, “let the AI out of its box” or even “Give this AI extra hardware or communication with the broader public” are not official options for the result of a certain phase of the project, that makes it all the harder for it to happen.
Questions about architecture, and how we could begin to “bolt on” behavioral constraints are critical. And that’s precisely why we need to experiment. I suspect many architectures will be highly opaque and correspondingly difficult to constrain or directly modify. A simulated human brain might fall under this category.
Well, one thing about practical problem solving software is that in reality the action space is very highly dimensional and very, very huge. To make any kind of search work in anything close to reasonable computing time, one needs to be able to control the way the search looks for solutions. The ‘optimizations’ coincide with greater control over how something works and with movement away from theoretically simple models.
Mere absence of certain aspects of generality can simultaneously render AI massively easier to build (and able to solve given problems we want to solve when running on weaker hardware), and massively safer. Safer in the way that can not be described as some added on goals onto basic human notion of intelligence. Just as a nuclear power plant’s safety can not be described as that of a nuke with extra controls. Very little of the safety you find in engineering is based upon some additions onto a theoretical ideal. The problem of making, say, a tablesaw safer (that clever mechanism which instantly stops and retracts it when touched) is dramatically different from the problem of making an ideal forcefield cutting blade safer.
As for simulated human brain, there’s really no reason to expect that to lead to some extremely rapid intelligence explosion. What people think is likely, is dependent to what they are exposed to, and this applies as much to the regular masses who think terrorism is a huge risk because it’s all over the TV as to nerds who think this sort of scenario is a huge problem because it’s all over the sci-fi.
It’s the very quintessence of what Bruce Schneier termed a “movie plot threat” (with a movie plot solution as well). It may seem like worrying about movie plot threats can’t hurt. But it diverts resources from general safety towards safety against overly specific plots. E.g. instead of being more concerned about the economical impact of emerging technologies, individuals so inclined focus on an overly specific (and correspondingly unlikely) scenario, which was created primarily for entertainment purposes.
Security is both built into (design) and bolted onto (passwords, anti-virus software) software. It is build into (structural integrity, headlights, rules of the road) and bolted onto (seatbelts, airbags) cars. Safety will be architecture dependent. Provable safety in the sense that MIRI researches might be awkward to incorporate into many architectures, if it is possible at all.
If an intelligence explosion is possible, it is probably possible with any architecture, but much more efficient with some. But we won’t really know until we experiment enough to at least understand properties of these architectures under naive scaling based on computational resources.
I mention brain emulation specifically because it’s the clearest path we have to artificial intelligence (In the same sense that Fusion is a clear path to supplying global energy needs—the theory is known and sound but the engineering obstacles could put it off indefinitely). And presumably once you can make one brain in silico you could make it smarter than a person’s by a number of methods.
I’m presuming that at some point, we will want an AI that can program other AIs or self-modify in unexpected ways to improve itself.
But you’re right, external safety could be a stopgap not until we could make FOOM-capable AI provably safe, but until we could make FOOM impossible, and keep humans in the driver’s seat.
The bolted on security, though, is never bolted onto some idealized notion originating from fiction. That has all the potential of being even more distant from what’s needed as hypothetical teleport gate safety is from airbags.
As for brain emulation, the necessary computational power is truly immense, and the path towards such is anything but clear.
With regards to foom, it seems to me that the belief in foom is related to certain ignorance with regards to the intelligence already present, or the role of that on the “takeoff”. The combined human (and software) intelligence working on the relevant technologies is already massively superhuman, in the sense of superiority to any individual human. The end result is that the takeoff starts earlier and slower, much like how if you try to bring together chunks of plutonium, due to the substantial level of spontaneous fission already present, the chain reaction will reach massive power level before coefficient of multiplication gets larger than 1.
I agree with the point about how any intelligence that constructs a supercomputer is already superhuman, even if it’s just humans working in concert. I think this provides a major margin of safety. I am not quite as skeptical of takeoff overall as you seem to be. But a big science style effort is likely to minimize a lot of risks while a small one is not likely to succeed at all.
Brain emulation is definitely hard, but no other route even approaches plausibility currently. We’re 5 breakthroughs away from brain emulation, and 8 away from anything else. So using brain emulation as one possible scenario isn’t totally unreasonable imo.
Why do you expect “foom” from brain emulation, though?
My theory is that such expectations are driven by it being so far away that it is hard to picture us getting there gradually, instead you picture skipping straight to some mind upload that can run a thousand copies of itself or the like...
What I expect from the first “mind upload” is a simulated epileptic seizure. Refined gradually into some minor functionality. It is not an actual upload, either, just some samples of different human brains were used to infer general network topology and the like, and that has been simulated, and learns things, running at below realtime. On a computer that is consuming many megawatts of power, and costs more per day than the most expensive movie star or the like. The computer for price of which you could hire a hundred qualified engineers each thinking perhaps 10 times faster than this machine. Gradually refined—with immense difficulty—into human level performance. Nothing like some easy ways to make it smarter—these were used to make it work earlier.
This would be contemporary to (and making use of) software that can and did—by simulation and refinement of parameters—do utterly amazing things—more advanced variation of the software that can design ultra efficient turbine blades and the like today. (Non-AI, non autonomous software which can also be used to design DNA for some cancer-curing virus, or—by a deliberately malicious normal humans—everyone-killing virus, or the like, rendering the upload itself fairly irrelevant as a threat).
What I suspect many futurists imagine, is the mind upload of full working human mind, appearing in the computer, talking and the like—starting point, their mental model got here by magic, not by imagining actual progress. Then there’s some easy tweaks, which are again magicked into the mental model, no reduction to anything. The imaginary upload strains one’s mental simulator’s capacity quite a bit, and in the futurist’s mental model, it is not contemporary to any particularly cool technology. So the mind upload enjoys the advantages akin to that of a modern army sent back in time into 1000 BC (with nothing needing any fuel to operate or runways to take off from). And so the imaginary mind upload easily takes over the imaginary world.
I think your points are valid. I don’t expect FOOM from anything, necessarily, I just find it plausible (based on Eliezer’s arguments about all the possible methods of scaling that might be available to an AI).
I am pitching my arguments towards people who expect FOOM, but the possibility of non-FOOM for a longish while is very real.
And It is probably unwarranted to say anything about architecture, yo’ure right.
But Suppose we have human-level AIs, then decide to consciously build a substantially superhuman AI. Or we have superhuman AIs that can’t FOOM, and actively seek to make one that can. The same points apply.
It seems to me that this argument (and arguments which rely on unspecified methods and the like) boils down to breaking the world model to add things with unclear creation history and unclear decomposition into components, and resulting non-reductionist magic infested mental world model misbehaving. Just as it always did in the human history, yielding gods and the like.
You postulate that unspecific magic can create superhuman intelligence—it arises without mental model of necessary work, problems being solved, returns diminishing, and available optimizations being exhausted—is it a surprise that in this broken mental model (broken because we don’t know how the AI would be built), because the work is absent, the superhuman intelligence in question creates a greater still intelligence in days, merely continuing the trend of it’s unspecific creation? If it’s not at all surprising then it’s not informative that mental model goes in this direction.