[link] Disjunctive AI Risk Scenarios
Arguments for risks from general AI are sometimes criticized on the grounds that they rely on a series of linear events, each of which has to occur for the proposed scenario to go through. For example, that a sufficiently intelligent AI could escape from containment, that it could then go on to become powerful enough to take over the world, that it could do this quickly enough without being detected, etc.
The intent of my following series of posts is to briefly demonstrate that AI risk scenarios are in fact disjunctive: composed of multiple possible pathways, each of which could be sufficient by itself. To successfully control the AI systems, it is not enough to simply block one of the pathways: they all need to be dealt with.
I’ve got two posts in this series up so far:
AIs gaining a decisive advantage discusses four different ways by which AIs could achieve a decisive advantage over humanity. The one-picture version is:
AIs gaining the power to act autonomously discusses ways by which AIs might come to act as active agents in the world, despite possible confinement efforts or technology. The one-picture version (which you may wish to click to enlarge) is:
These posts draw heavily on my old paper, Responses to Catastrophic AGI Risk, as well as some recent conversations here on LW. Upcoming posts will try to cover more new ground.
I think that one of the main disjunctions is that neither self-improving, nor high level intelligence nor control of the world are necessary conditions of human extinction because of AI.
Imagine a computer which helps to create biological viruses for a terrorist. It is neither AGI, nor self-improving, not agent, doesn’t have values, and is local and confined. But it will help to calculate and create perfect virus, which will be capable to wipe out humanity.
This is an excellent point! I’m intending to discuss non-superintelligence scenarios in a follow-up post.
The fact that we have many very different scenarios means that there is (almost) no any single intervention which may stop all of them. Exceptions are “destroys all computers” and “create Singleton based of FAI as soon as possible”.
In all other cases we should think not only about correct AI safety theory, but also of the ways to implement it all over the world. For example we could prove that “many level AI boxing” create enough uncertainty for AI, so it will always think that real human could punish it for wrong doing, which would (may be) result in perfect alining. But these prove will be useless if we also do not find the ways to implement it all over AI field. (And we still can’t win over computer viruses in the computer field, even if we know a lot how to prevent them, because a lot of people invest in violating.)
So we have three unknown and very complex tasks: AI, AI safety and delivery of AI safety theory to AI research. To solve the last one we need a system model of global AI research, which should show us where implement actions which will make global research safer.
The best interferences of this kind will help to solve all three hard problems simultaneously.
I’m not sure an intelligence explosion can happen without significant speed or computational power improvements.
I guess it boils down to what happens if you let human-level intelligence self-modify without modifying the hardware (a.k.a how much human intelligence is optimised). Until now the ratio results to computational power used in significantly in favor of humans compared to I.A but the later is improving fast, and you don’t need an I.A to be as versatile as human. Is there any work on what the limit on optimisation for intelligence?
It looks like a nitpick since hardware capacity is increasing steadily and will soon exceed the capacities of the human brain, but it is a lot easier to prevent intelligence explosion by putting a limit on the computational power.
It’s unclear, but in narrow AI we’ve seen software get smarter even in cases where the hardware is kept constant, or even made worse. For example, the top chess engine of 2014 beats a top engine from 2006, even when you give the 2014 engine 2% the computing power of the 2006 engine. That would seem to suggest that an intelligence explosion without hardware improvements might be possible, at least in principle.
In practice I would expect an intelligence explosion to lead to hardware improvements as well, though. No reason for the AI to constrain itself just to the software side.
You should do a similar mapping of the disjunctive ways in which AI could go right and lead to world bettering technological growth.
I guess you could consider all of Responses such a disjunctive post, if you consider the disjunctive options to be “this proposed response to AGI succeeds”. :)
I would be interested in hearing whether you had more extended critiques of these posts. I incorporated some of our earlier discussion into my post, and was hoping to develop them further in part by having conversations with people who were more skeptical of the scenarios depicted.
To be honest I only did a brief read through. The context of the debate itself is what I object to. I find the concept of “friendly” AI itself to be terrifying. It’s my life work to make sure that we don’t end up in such a dystopian tyrannical future. Debating the probabilities of whether what you call AI “risk” is likely or unlikely (disjunctive or conjunctive) is rather pointless when you are ambivalent towards that particular outcome.
Now I think that you’ve left out a LOT of things that must happen a certain way in order for your AI risk outcomes to come to pass. You’ve also left out ALL of the corrective actions that could be taken by any of the human actors in the picture. It reminds me of a martial arts demonstration where the attacker throws a punch and then stands there in frozen form, unreactive while the teacher demonstrate the appropriate response at leisure. But if like me you don’t see such a scenario as a bad thing in the first place, then it’s an academic point. And I tire of debating things of no real world significance.
Hmm. There may have been a miscommunication here.
This sounds like you’re assuming that I’m trying to argue in favor of Friendly AI as the best solution. Now I admittedly do currently find FAI one of the most promising options for trying to navigate AI risk, but I’m not committed to that. I just want to find whatever solution works, regardless of whether it happens to be FAI or something completely else. But in order to find out what’s the best solution, one needs to have a comprehensive idea of what the problem is like and how it’s going to manifest itself, and that’s what I’m trying to do—map out the problem, so that we can figure out what the best solutions are.
Would appreciate hearing more about these.
Isn’t that the standard way of figuring out the appropriate corrective actions? First figure out what would happen absent any intervention, then see which points seem like most amenable to correction.
(Responding to the whole paragraph but don’t want to quote it all) I would be interested to hear a definition of “AI risk” that does not reduce to “risk of unfriendly outcome” which itself is defined in terms of friendliness aka relation to human morality. If, like me, you reject the idea of consistent, discoverable morality in the first place, and therefore find friendliness to be an ill-formed, inconsistent idea, then it’s hard to say anything concrete about AI risk either. If you have a better definition that does not reduce to alignment with human morality, please provide it.
Mapping the problem starts with defining what the problem is. What is AI risk, without reference to dubious notions of human morality?
To start with there’s all the normal, benign things that happen in any large scale software project that require human intervention. Like, say, the AGI crashes. Or the database that holds its memories becomes inconsistent. Or it gets deadlocked on choosing actions due to a race condition. The humanity threatening failure mode presume that the AGI, on its first attempt at break-out, doesn’t suffer any normal engineering defect failures—or that if it does then the humans operating it just fix it and turn it back on. I’m not interested in any arguments that assume the latter, and the former is highly conjunctive.
I may have misread your intent, and if so I apologize. The first sentence of your post here made it seem like you were countering a criticism, aka advocating for the original position. So I read your posts in that context and may have inferred too much.
I also reject the idea of a consistent, discoverable morality, at least to the extent that the morality is assumed to be unique. I think that moralities are not so much discovered but constructed: a morality is in a sense an adaptation to a specific environment, and it will continue to constantly evolve as the environment changes (including the social environment, so the morality changing will by itself cause more changes to the environment, which will trigger new changes to the morality, etc.). There is no reason to assume that this will produce a consistent moral system: there will be inconsistencies which will need to be resolved when they become relevant, and the order in which they are resolved seems likely to affect the final outcome.
But to answer your actual question: I don’t have a rigorous answer for exactly what the criteria for “success” are. The intuitive answer is that there are some futures that I’d consider horrifying if they came true and some which I’d consider fantastic if they came true, and I want to further the fantastic ones and avoid the horrifying ones. (I presume this to also be the case for you, because why else would you care about the topic in the first place?)
Given that this is very much a “I can’t give you a definition, but I know it when I see it” thing, it seems hard to make sure that we avoid the horrifying outcomes without grounding the AIs in human values somehow, and making sure that they share our reaction when they see (imagine) some particular future. (Either that, or trying to make sure that we evolve to be equally powerful as the AIs, but this seems unlikely to me.)
Depending on your definitions, you could say that this still reduces to alignment with human morality, but with the note that my conception of human morality is that of a dynamic process, and that the AIs could be allowed to e.g. nudge the development of our values towards a direction that made it easier to reconcile value differences between different cultures, even if there was no “objective” reason for why that direction was any better or worse than any other one.
Are you assuming that there will only ever be one AGI that might try to escape, that its creators never decide to release it, and that it can’t end up effectively in control even if boxed?
To talk about risk you need to define “bad outcomes”. You don’t have to define them in terms of morality, but you have to define them somehow.
I am working on the such map, about how AI application in medicine could help us become immortal.