I think you are missing a crucial point here. It might be the case (arguably, it is likely to be the case) that the only feasible way to construct a human level AGI without mind uploading (WBE) is to create a self-improving AGI. Such an AGI will start from subhuman intelligence but use its superior introspection and self-modification powers to go supercritical and rapidly rise in intelligence. Assuming we don’t have an automatic shut-down triggered by the AGI reaching a certain level of intelligence (since it’s completely unclear how to implement one), the AGI might go superhuman rapidly after reaching human level intelligence, w/o anyone having the chance to stop it. Once superhuman, no boxing protocol will make it safe. Why? Because a superhuman intelligence will find ways to get out of the box that you have no chance to guess. Because you cannot imagine the exponentially large space of all the things the AGI can try so you have no way to guarantee it cannot find niddles in that haystack the existence of which you don’t even suspect.
As a side note, it is not possible to meaningfully prevent a program from self-modifying if it is running on a universal Turing machine. Arbitrary code execution is always possible, at least with a mild performance penalty (the program can always implement an interpreter).
I think you are missing a crucial point here. It might be the case (arguably, it is likely to be the case) that the only feasible way to construct a human level AGI without mind uploading (WBE) is to create a self-improving AGI. Such an AGI will start from subhuman intelligence but use its superior introspection and self-modification powers to go supercritical and rapidly rise in intelligence.
Seems unlikely. If this seed AI is the most intelligent thing humans can design, and yet it is significantly less intelligent than humans, how can it design something more intelligent than itself?
Software development is hard. We don’t know any good heuristic to approach it with a narrow AI like a chess playing engine. We can automate some stuff, like compiling from an high-level programming language to machine code, performing type checking and some optimizations. But automating “coding” (going from a specification to a runnable program) or the invention of new algorithms are probably “AI-complete” problems: we would need an AGI, or something remarkably close to it, in order to do that.
Assuming we don’t have an automatic shut-down triggered by the AGI reaching a certain level of intelligence (since it’s completely unclear how to implement one), the AGI might go superhuman rapidly after reaching human level intelligence, w/o anyone having the chance to stop it.
Even if it can self-improve its code, and this doesn’t quickly run into diminishing returns (which is a quite likely possibility), it would still have limited hardware resources and limited access to outside knowledge. Having, say, a SAT solver which is 5x faster than the industry state of the art won’t automatically turn you into an omniscient god.
As a side note, it is not possible to meaningfully prevent a program from self-modifying if it is running on a universal Turing machine.
An universal Turing machine is not physically realizable, but even if it was, your claim is false. Running Tetris on an UTM won’t result in self-modification.
Arbitrary code execution is always possible, at least with a mild performance penalty (the program can always implement an interpreter).
Only if the program has access to an interpreter that can execute arbitrary code.
Seems unlikely. If this seed AI is the most intelligent thing humans can design, and yet it is significantly less intelligent than humans, how can it design something more intelligent than itself?
Because humans are not optimized for designing AI. Evolution is much less intelligent than humans and yet it designed something more intelligent than itself: humans. Only it did it very inefficiently and it doesn’t bootstrap. But it doesn’t mean you need something initially as intelligent as a human to do it efficiently.
But automating “coding” (going from a specification to a runnable program) or the invention of new algorithms are probably “AI-complete” problems: we would need an AGI, or something remarkably close to it, in order to do that.
They are. It just doesn’t mean the AI has to be as smart as human the moment it is born.
Even if it can self-improve its code, and this doesn’t quickly run into diminishing returns (which is a quite likely possibility)...
There is no reason to believe the diminishing returns point is around human intelligence. Therefore if it is powerful enough to make it to human level, it is probably powerful enough to make it much further.
An universal Turing machine is not physically realizable, but even if it was, your claim is false. Running Tetris on an UTM won’t result in self-modification… Only if the program has access to an interpreter that can execute arbitrary code.
The program is the interpreter. My point is that you cannot prevent self-modification by constraining the environment (as long as the environment admits universal computation), you can only prevent it by constraining the program itself. I don’t see how RAM limitations significantly alter this conclusion.
Because humans are not optimized for designing AI.
While a sub-human narrow AI can be optimized at designing general AI? That seems unlikely.
Evolution is much less intelligent than humans and yet it designed something more intelligent than itself: humans.
Speaking of biological evolution in a teleological terms always carries the risk of false analogy. If we were to reboot evolution from the Cambrian, it’s entirely unnecessary that it would still produce humans, or something of similar intelligence, within the same time frame.
Moreover, evolution is a process of adaptation to the environment. How can a boxed narrow AI produce something which is well adapted to the environment outside the box?
There is no reason to believe the diminishing returns point is around human intelligence. Therefore if it is powerful enough to make it to human level, it is probably powerful enough to make it much further.
Why not? Humans can’t self-improve to any significant extent. Stuff less intelligent than humans we can design can’t self-improve to any significant extent. Algorithmic efficiency, however defined, is always going to be bounded.
The program is the interpreter.
The program doesn’t have a supernatural ghost who can decide “I’m going to be an interpreter starting from now”. Either is an interpreter (in which case it is not an AI) or it is not. You can give an AI an interpreter as part of its environment. Or transistors to build a computer, or something like that. It can even make a cellular automaton using pebbles or do some other “exotic” form of computation. But all these things run into the resource limits of the box, and the more exotic, the higher the resource requirements per unit of work done.
While a sub-human narrow AI can be optimized at designing general AI? That seems unlikely.
I think it’s either that or we won’t be able to build any human-level AGI without WBE.
If we were to reboot evolution from the Cambrian, it’s entirely unnecessary that it would still produce humans, or something of similar intelligence, within the same time frame.
Agreed. However hominid evolution was clearly not pure luck since it involved significant improvement over a relatively short time span.
Moreover, evolution is a process of adaptation to the environment. How can a boxed narrow AI produce something which is well adapted to the environment outside the box?
Evolution produced something which is adapted to a very wide range of environments, including environments vastly different from the environment in which evolution happened. E.g., US Astronauts walked the surface of the moon which is very different from anything relevant to evolution. We call this something “general intelligence”. Ergo, it is possible to produce general intelligence by a process which has little of it.
Humans can’t self-improve to any significant extent. Stuff less intelligent than humans we can design can’t self-improve to any significant extent.
My point is that it’s unlikely the point of diminishing returns is close to human intelligence. If this point is significantly below human intelligence then IMO we won’t be able to build AGI without WBE.
The program doesn’t have a supernatural ghost who can decide “I’m going to be an interpreter starting from now”. Either is an interpreter (in which case it is not an AI) or it is not.
It is an AI which contains an interpreter as a subroutine. My point is, if you somehow succeed to freeze a self-modifying AI at a point in which it is already interesting and but not yet dangerous, then the next experiment has to start from scratch anyway. You cannot keep running it while magically turning self-modification off since the self-modification is an inherent part of the program. This stands in contrast to your ability to e.g. turn on/off certain input/output channels.
I think it’s either that or we won’t be able to build any human-level AGI without WBE.
Why?
Agreed. However hominid evolution was clearly not pure luck since it involved significant improvement over a relatively short time span.
It wasn’t pure luck, there was selective pressure. But this signal towards improvement is often weak and noisy, and it doesn’t necessarily correlate well with intelligence: a chimp is smarter than a lion, but not generally more evolutionary fit. Even homo sapiens had a population bottleneck 70,000 which almost led to extinction.
It is my intuition that if something as complex and powerful as human-level intelligence can be engineered in the foreseeable future, than it would have to use some kind of bootstrapping. I admit it is possible than I’m wrong and that in fact progress in AGI would be through a very long sequence of small improvements and that the AGI will be given no introspective / self-modification powers. In this scenario, a “proto-singularity” is a real possibility. However, what I think will happen is that we won’t make significant progress before we develop a powerful mathematical formalism. Once such a formalism exists, it will be much more efficient to use it in order to build a pseudo-narrow self-modifying AI than keep improving AI “brick by brick”.
Such an AGI will start from subhuman intelligence but use its superior introspection and self-modification powers to go supercritical and rapidly rise in intelligence.
If this seed AI is the most intelligent thing humans can design, and yet it is significantly less intelligent than humans, how can it design something more intelligent than itself?
Because humans are not optimized for designing AI. Evolution is much less intelligent than humans and yet it designed something more intelligent than itself: humans. Only it did it very inefficiently and it doesn’t bootstrap. But it doesn’t mean you need something initially as intelligent as a human to do it efficiently.
That evolution “designed” something more intelligent than itself inefficiently does imply that we can efficiently design something less intelligent than ourselves that can in turn efficiently design something much more intelligent than its creators?
And your confidence in this is high enough to believe that such an AI can’t be contained? Picture me starring in utter disbelief.
People already suck at telling whether Vitamin D is good for you, yet some people seem to believe that they can have non-negligible confidence about the power and behavior of artificial general intelligence.
Even if it can self-improve its code, and this doesn’t quickly run into diminishing returns (which is a quite likely possibility)...
There is no reason to believe the diminishing returns point is around human intelligence.
For important abilities, such as persuasion, there are good reasons to believe that there are no minds much better than humans. There are no spoken or written mind hacks that can be installed and executed in a human brain.
That evolution “designed” something more intelligent than itself inefficiently does imply that we can efficiently design something less intelligent than ourselves that can in turn efficiently design something much more intelligent than its creators?
Either we can design a human-level AGI (without WBE) or we cannot. If we cannot, this entire discussion about safety protocols is irrelevant. Maybe we need some safety protocols for experiments with WBE but it’s a different story. If we can, then it seems likely that there exists a subhuman AGI which is able to design a superhuman AGI (because there’s no reason to believe human-level intelligence is a special point & because this weaker intelligence will be better optimized for designing AGI than humans). Such a self-improvement process creates a positive feedback loop which might lead to very rapid rise in intelligence.
People already suck at telling whether Vitamin D is good for you, yet some people seem to believe that they can have non-negligible confidence about the power and behavior of artificial general intelligence.
Low confidence means stronger safety requirements, not the other way around.
For important abilities, such as persuasion, there are good reasons to believe that there are no minds much better than humans.
One of the arguments I heard for humans being the bare minimum level of intelligence for a technological civilization is that there existed no further evolutionary pressure to select for even higher levels of general intelligence.
You just claim that there can be levels of intelligence below us that are better than us at designing levels of intelligence above us and that we can create such intelligences. In my opinion such a belief requires strong justification.
People already suck at telling whether Vitamin D is good for you, yet some people seem to believe that they can have non-negligible confidence about the power and behavior of artificial general intelligence.
Low confidence means stronger safety requirements, not the other way around.
Yes. Something is very wrong with this line of reasoning. I hope GiveWell succeeds at writing a post on this soon. My technical skills are not sufficient to formalize my doubts.
I’ll just say as much. I am not going to spend resources on the possibility of catching some exotic disease, even though it could kill me in a horrible way, when there are other more likely risks that could cripple me.
What are these reasons?
I list some caveats here. Even humans hit diminishing returns on many tasks and just stop exploring and start exploiting. For persuasion this should be pretty obvious. Improving a sentence you want to send to your gatekeeper for a million subjective years does not make it one hundred thousand times more persuasive than improving it for 10 subjective years.
When having a fist fight with someone, strategy only gives you little advantage if your combatant is much stronger. An AI trying to take over the world would have to account for its fragility when fighting humans, who are adapted to living outside the box.
To take over the world you either require excellent persuasion skills or raw power. That an AI could somehow become good at persuasion, given its huge inferential distance, lack of direct insight, and without a theory of mind, is in my opinion nearly impossible. And regarding the acquisition of raw power, you will have to show how it is likely going to do so without just conjecturing technological magic.
At the time of the first AI, the global infrastructure will still require humans to keep it running. You need to show that the AI is independent enough of this infrastructure that it can risk its demise in a confrontation with humans.
There are a huge number of questions looming in the background. How would the AI hide its motivations and make predictions about human countermeasures? Why would it be given unsupervised controlled of the equipment necessary to construct molecular factories?
I can of course imagine science fiction stories where an AI does anything. That proves nothing.
I am not going to spend resources on the possibility of catching some exotic disease, even though it could kill me in a horrible way, when there are other more likely risks that could cripple me.
Allow me to make a different analogy. Suppose that someone is planning to build a particle accelerator of unprecedented power. Some experts claim the accelerator is going to create a black hole which will destroy Earth. Other experts think differently. Everyone agrees (in stark contrast to what happened with LHC) that our understanding of processes at these energies is very poor. In these conditions, do you think it would be a good idea to build the accelerator?
In these conditions, do you think it would be a good idea to build the accelerator?
It would not be a good idea. Ideally, you should then try to raise your confidence that it won’t destroy the world so far that the expected benefits of building it outweigh the risks. But that’s probably not feasible, and I have no idea where to draw the line.
If you can already build something, and there are good reasons to be cautious, then that passed the threshold where I can afford to care, without risking to waste my limited amount of attention on risks approaching Pascal’s mugging type scenarios.
I like to make the comparison between an extinction type asteroid, spotted with telescopes, and calculated to have .001 probability of hitting Earth in 2040, vs. a 50% probability of extinction by unfriendly AI at the same time. The former calculation is based on hard facts, empirical evidence, while the latter is purely inference based and therefore very unstable.
In other words, one may assign 50% probability to “a coin will come up heads” and “there is intelligent life on other planets,” but one’s knowledge about the two scenarios is different in important ways.
ETA:
Suppose there are 4 risks. One mundane risk has a probability of 1⁄10 and and you assign 20 utils to its prevention. Another less likely risk has a probability of 1⁄100 but you assign 1000 utils to its prevention. Yet another risk is very unlikely, having a probability of 1/1000, but you assign 1 million utils to its prevention. The fourth risk is extremely unlikely, having a probability of 10^-10000, but you assign 10^10006 to its prevention. All else equal, which one would you choose to prevent and why?
If you wouldn’t choose risk 4 then why wouldn’t the same line of reasoning, or intuition, not be similarly valid in choosing risk number 1 over 2 or 3? And in case that you would choose risk 4 then do you also give money to a Pascalian mugger?
The important difference between an AI risks charity and a deworming charity can’t be its expected utility, because that results in Pascal’s mugging. The difference can neither be that deworming is more probable than AI risks. Because that argument also works against deworming, by choosing a cause that is even more probable than deworming.
And in case you are saying that AI risk is the most probable underfunded risk, then what is the greatest lower bound for “probable” here and how do you formally define it? In other words, in conjunction with doesn’t work either. Because any case of Pascal’s mugging is underfunded as well. You’d have to formally define and justify some well-grounded minimum for “probable”.
The probability of unfriendly AI is too low, and the evidence is too “brittle”.
Earlier you said: “People already suck at telling whether Vitamin D is good for you, yet some people seem to believe that they can have non-negligible confidence about the power and behavior of artificial general intelligence.” Now you’re making high confidence claims about AGI. Also, I remind you the discussion started from my criticism of the proposed AGI safety protocols. If there is no UFAI risk than the safety protocols are pointless.
In other words, one may assign 50% probability to “a coin will come up heads” and “there is intelligent life on other planets,” but one’s knowledge about the two scenarios is different in important ways.
Not in ways that have to do with expected utility calculation.
Suppose there are 4 risks. One mundane risk has a probability of 1⁄10 and and you assign 20 utils to its prevention. Another less likely risk has a probability of 1⁄100 but you assign 1000 utils to its prevention. Yet another risk is very unlikely, having a probability of 1/1000, but you assign 1 million utils to its prevention. The fourth risk is extremely unlikely, having a probability of 10^-10000, but you assign 10^10006 to its prevention. All else equal, which one would you choose to prevent and why?
Risk 4 since it corresponds to highest expected utility.
And in case that you would choose risk 4 then do you also give money to a Pascalian mugger?
My utility function is bounded (I think) so you can only Pascal-mug me that much.
And in case you are saying that AI risk is the most probable underfunded risk...
I have no idea whether it is underfunded. I can try to think about it, but it has little to do with the present discussion.
There are many ways self-modification can be restricted. Only certain numerical parameters may be modified, only some source may be modified while other stuff remains a black box. If it has to implement its own interpreter, that’s not a “mild performance penalty” it’s a gargantuan one, not to mention that it can be made impossible.
You can also freeze self-modification abilities at any given time and examine the current machine to evaluate intelligence.
These are only examples, but I think we are far too far away from constructing an AI to assume that the first ones would be introspective or highly self-modifying. And by the time we start building one, we’ll know, and we’ll be able to prepare procedures to put in place.
What I strongly doubt is that somebody messing around in their basement (or the corporate lab equivalent) will stumble on a superintelligence by accident. And the alternative is that a coherent, large, well-funded effort will build something with many theories and proofs-of-concept partial prototypes along the way to guide safety procedures.
There are many ways self-modification can be restricted. Only certain numerical parameters may be modified, only some source may be modified while other stuff remains a black box. If it has to implement its own interpreter, that’s not a “mild performance penalty” it’s a gargantuan one, not to mention that it can be made impossible.
If you place too many restrictions you will probably never reach human-like intelligence.
You can also freeze self-modification abilities at any given time and examine the current machine to evaluate intelligence.
If you do it frequently you won’t reach human-like intelligence in a reasonable span of time. If you do it infrequently, you will miss the transition into superhuman and it will be too late.
These are only examples, but I think we are far too far away from constructing an AI to assume that the first ones would be introspective or highly self-modifying. And by the time we start building one, we’ll know, and we’ll be able to prepare procedures to put in place… a coherent, large, well-funded effort will build something with many theories and proofs-of-concept partial prototypes along the way to guide safety procedures.
A coherent, large, well-funded effort can still make a fatal mistake. The Challenger was such an effort. The Chernobyl power plant was such an effort. Trouble is, this time the stakes are much higher.
I think you are missing a crucial point here. It might be the case (arguably, it is likely to be the case) that the only feasible way to construct a human level AGI without mind uploading (WBE) is to create a self-improving AGI. Such an AGI will start from subhuman intelligence but use its superior introspection and self-modification powers to go supercritical and rapidly rise in intelligence. Assuming we don’t have an automatic shut-down triggered by the AGI reaching a certain level of intelligence (since it’s completely unclear how to implement one), the AGI might go superhuman rapidly after reaching human level intelligence, w/o anyone having the chance to stop it. Once superhuman, no boxing protocol will make it safe. Why? Because a superhuman intelligence will find ways to get out of the box that you have no chance to guess. Because you cannot imagine the exponentially large space of all the things the AGI can try so you have no way to guarantee it cannot find niddles in that haystack the existence of which you don’t even suspect.
As a side note, it is not possible to meaningfully prevent a program from self-modifying if it is running on a universal Turing machine. Arbitrary code execution is always possible, at least with a mild performance penalty (the program can always implement an interpreter).
Seems unlikely. If this seed AI is the most intelligent thing humans can design, and yet it is significantly less intelligent than humans, how can it design something more intelligent than itself?
Software development is hard. We don’t know any good heuristic to approach it with a narrow AI like a chess playing engine. We can automate some stuff, like compiling from an high-level programming language to machine code, performing type checking and some optimizations. But automating “coding” (going from a specification to a runnable program) or the invention of new algorithms are probably “AI-complete” problems: we would need an AGI, or something remarkably close to it, in order to do that.
Even if it can self-improve its code, and this doesn’t quickly run into diminishing returns (which is a quite likely possibility), it would still have limited hardware resources and limited access to outside knowledge. Having, say, a SAT solver which is 5x faster than the industry state of the art won’t automatically turn you into an omniscient god.
An universal Turing machine is not physically realizable, but even if it was, your claim is false. Running Tetris on an UTM won’t result in self-modification.
Only if the program has access to an interpreter that can execute arbitrary code.
Because humans are not optimized for designing AI. Evolution is much less intelligent than humans and yet it designed something more intelligent than itself: humans. Only it did it very inefficiently and it doesn’t bootstrap. But it doesn’t mean you need something initially as intelligent as a human to do it efficiently.
They are. It just doesn’t mean the AI has to be as smart as human the moment it is born.
There is no reason to believe the diminishing returns point is around human intelligence. Therefore if it is powerful enough to make it to human level, it is probably powerful enough to make it much further.
The program is the interpreter. My point is that you cannot prevent self-modification by constraining the environment (as long as the environment admits universal computation), you can only prevent it by constraining the program itself. I don’t see how RAM limitations significantly alter this conclusion.
While a sub-human narrow AI can be optimized at designing general AI? That seems unlikely.
Speaking of biological evolution in a teleological terms always carries the risk of false analogy. If we were to reboot evolution from the Cambrian, it’s entirely unnecessary that it would still produce humans, or something of similar intelligence, within the same time frame.
Moreover, evolution is a process of adaptation to the environment. How can a boxed narrow AI produce something which is well adapted to the environment outside the box?
Why not? Humans can’t self-improve to any significant extent. Stuff less intelligent than humans we can design can’t self-improve to any significant extent. Algorithmic efficiency, however defined, is always going to be bounded.
The program doesn’t have a supernatural ghost who can decide “I’m going to be an interpreter starting from now”. Either is an interpreter (in which case it is not an AI) or it is not.
You can give an AI an interpreter as part of its environment. Or transistors to build a computer, or something like that. It can even make a cellular automaton using pebbles or do some other “exotic” form of computation.
But all these things run into the resource limits of the box, and the more exotic, the higher the resource requirements per unit of work done.
I think it’s either that or we won’t be able to build any human-level AGI without WBE.
Agreed. However hominid evolution was clearly not pure luck since it involved significant improvement over a relatively short time span.
Evolution produced something which is adapted to a very wide range of environments, including environments vastly different from the environment in which evolution happened. E.g., US Astronauts walked the surface of the moon which is very different from anything relevant to evolution. We call this something “general intelligence”. Ergo, it is possible to produce general intelligence by a process which has little of it.
My point is that it’s unlikely the point of diminishing returns is close to human intelligence. If this point is significantly below human intelligence then IMO we won’t be able to build AGI without WBE.
It is an AI which contains an interpreter as a subroutine. My point is, if you somehow succeed to freeze a self-modifying AI at a point in which it is already interesting and but not yet dangerous, then the next experiment has to start from scratch anyway. You cannot keep running it while magically turning self-modification off since the self-modification is an inherent part of the program. This stands in contrast to your ability to e.g. turn on/off certain input/output channels.
Why?
It wasn’t pure luck, there was selective pressure. But this signal towards improvement is often weak and noisy, and it doesn’t necessarily correlate well with intelligence: a chimp is smarter than a lion, but not generally more evolutionary fit. Even homo sapiens had a population bottleneck 70,000 which almost led to extinction.
It is my intuition that if something as complex and powerful as human-level intelligence can be engineered in the foreseeable future, than it would have to use some kind of bootstrapping. I admit it is possible than I’m wrong and that in fact progress in AGI would be through a very long sequence of small improvements and that the AGI will be given no introspective / self-modification powers. In this scenario, a “proto-singularity” is a real possibility. However, what I think will happen is that we won’t make significant progress before we develop a powerful mathematical formalism. Once such a formalism exists, it will be much more efficient to use it in order to build a pseudo-narrow self-modifying AI than keep improving AI “brick by brick”.
That evolution “designed” something more intelligent than itself inefficiently does imply that we can efficiently design something less intelligent than ourselves that can in turn efficiently design something much more intelligent than its creators?
And your confidence in this is high enough to believe that such an AI can’t be contained? Picture me starring in utter disbelief.
People already suck at telling whether Vitamin D is good for you, yet some people seem to believe that they can have non-negligible confidence about the power and behavior of artificial general intelligence.
For important abilities, such as persuasion, there are good reasons to believe that there are no minds much better than humans. There are no spoken or written mind hacks that can be installed and executed in a human brain.
Either we can design a human-level AGI (without WBE) or we cannot. If we cannot, this entire discussion about safety protocols is irrelevant. Maybe we need some safety protocols for experiments with WBE but it’s a different story. If we can, then it seems likely that there exists a subhuman AGI which is able to design a superhuman AGI (because there’s no reason to believe human-level intelligence is a special point & because this weaker intelligence will be better optimized for designing AGI than humans). Such a self-improvement process creates a positive feedback loop which might lead to very rapid rise in intelligence.
Low confidence means stronger safety requirements, not the other way around.
What are these reasons?
One of the arguments I heard for humans being the bare minimum level of intelligence for a technological civilization is that there existed no further evolutionary pressure to select for even higher levels of general intelligence.
You just claim that there can be levels of intelligence below us that are better than us at designing levels of intelligence above us and that we can create such intelligences. In my opinion such a belief requires strong justification.
Yes. Something is very wrong with this line of reasoning. I hope GiveWell succeeds at writing a post on this soon. My technical skills are not sufficient to formalize my doubts.
I’ll just say as much. I am not going to spend resources on the possibility of catching some exotic disease, even though it could kill me in a horrible way, when there are other more likely risks that could cripple me.
I list some caveats here. Even humans hit diminishing returns on many tasks and just stop exploring and start exploiting. For persuasion this should be pretty obvious. Improving a sentence you want to send to your gatekeeper for a million subjective years does not make it one hundred thousand times more persuasive than improving it for 10 subjective years.
When having a fist fight with someone, strategy only gives you little advantage if your combatant is much stronger. An AI trying to take over the world would have to account for its fragility when fighting humans, who are adapted to living outside the box.
To take over the world you either require excellent persuasion skills or raw power. That an AI could somehow become good at persuasion, given its huge inferential distance, lack of direct insight, and without a theory of mind, is in my opinion nearly impossible. And regarding the acquisition of raw power, you will have to show how it is likely going to do so without just conjecturing technological magic.
At the time of the first AI, the global infrastructure will still require humans to keep it running. You need to show that the AI is independent enough of this infrastructure that it can risk its demise in a confrontation with humans.
There are a huge number of questions looming in the background. How would the AI hide its motivations and make predictions about human countermeasures? Why would it be given unsupervised controlled of the equipment necessary to construct molecular factories?
I can of course imagine science fiction stories where an AI does anything. That proves nothing.
Allow me to make a different analogy. Suppose that someone is planning to build a particle accelerator of unprecedented power. Some experts claim the accelerator is going to create a black hole which will destroy Earth. Other experts think differently. Everyone agrees (in stark contrast to what happened with LHC) that our understanding of processes at these energies is very poor. In these conditions, do you think it would be a good idea to build the accelerator?
It would not be a good idea. Ideally, you should then try to raise your confidence that it won’t destroy the world so far that the expected benefits of building it outweigh the risks. But that’s probably not feasible, and I have no idea where to draw the line.
If you can already build something, and there are good reasons to be cautious, then that passed the threshold where I can afford to care, without risking to waste my limited amount of attention on risks approaching Pascal’s mugging type scenarios.
Unfriendly AI does not pass this threshold. The probability of unfriendly AI is too low, and the evidence is too “brittle”.
I like to make the comparison between an extinction type asteroid, spotted with telescopes, and calculated to have .001 probability of hitting Earth in 2040, vs. a 50% probability of extinction by unfriendly AI at the same time. The former calculation is based on hard facts, empirical evidence, while the latter is purely inference based and therefore very unstable.
In other words, one may assign 50% probability to “a coin will come up heads” and “there is intelligent life on other planets,” but one’s knowledge about the two scenarios is different in important ways.
ETA:
Suppose there are 4 risks. One mundane risk has a probability of 1⁄10 and and you assign 20 utils to its prevention. Another less likely risk has a probability of 1⁄100 but you assign 1000 utils to its prevention. Yet another risk is very unlikely, having a probability of 1/1000, but you assign 1 million utils to its prevention. The fourth risk is extremely unlikely, having a probability of 10^-10000, but you assign 10^10006 to its prevention. All else equal, which one would you choose to prevent and why?
If you wouldn’t choose risk 4 then why wouldn’t the same line of reasoning, or intuition, not be similarly valid in choosing risk number 1 over 2 or 3? And in case that you would choose risk 4 then do you also give money to a Pascalian mugger?
The important difference between an AI risks charity and a deworming charity can’t be its expected utility, because that results in Pascal’s mugging. The difference can neither be that deworming is more probable than AI risks. Because that argument also works against deworming, by choosing a cause that is even more probable than deworming.
And in case you are saying that AI risk is the most probable underfunded risk, then what is the greatest lower bound for “probable” here and how do you formally define it? In other words, in conjunction with doesn’t work either. Because any case of Pascal’s mugging is underfunded as well. You’d have to formally define and justify some well-grounded minimum for “probable”.
Earlier you said: “People already suck at telling whether Vitamin D is good for you, yet some people seem to believe that they can have non-negligible confidence about the power and behavior of artificial general intelligence.” Now you’re making high confidence claims about AGI. Also, I remind you the discussion started from my criticism of the proposed AGI safety protocols. If there is no UFAI risk than the safety protocols are pointless.
Not in ways that have to do with expected utility calculation.
Risk 4 since it corresponds to highest expected utility.
My utility function is bounded (I think) so you can only Pascal-mug me that much.
I have no idea whether it is underfunded. I can try to think about it, but it has little to do with the present discussion.
There are many ways self-modification can be restricted. Only certain numerical parameters may be modified, only some source may be modified while other stuff remains a black box. If it has to implement its own interpreter, that’s not a “mild performance penalty” it’s a gargantuan one, not to mention that it can be made impossible.
You can also freeze self-modification abilities at any given time and examine the current machine to evaluate intelligence.
These are only examples, but I think we are far too far away from constructing an AI to assume that the first ones would be introspective or highly self-modifying. And by the time we start building one, we’ll know, and we’ll be able to prepare procedures to put in place.
What I strongly doubt is that somebody messing around in their basement (or the corporate lab equivalent) will stumble on a superintelligence by accident. And the alternative is that a coherent, large, well-funded effort will build something with many theories and proofs-of-concept partial prototypes along the way to guide safety procedures.
If you place too many restrictions you will probably never reach human-like intelligence.
If you do it frequently you won’t reach human-like intelligence in a reasonable span of time. If you do it infrequently, you will miss the transition into superhuman and it will be too late.
A coherent, large, well-funded effort can still make a fatal mistake. The Challenger was such an effort. The Chernobyl power plant was such an effort. Trouble is, this time the stakes are much higher.