I don’t think EY did what you said he did. In fact, I think it was a mostly disappointing answer, focusing on a uncharitable interpretation of your writing. I don’t blame him here, he must have answered objections like that thousands of times and not always everyone is at his best (see my comment in your previous post).
Re. reasons not to believe in doom:
Orthogonality of intelligence and agency. I can envision a machine with high intelligence and zero agency, I haven’t seen any convincing argument yet of why both things must necessarily go together (the arguments probably exist, I’m simply ignorant of them!)
There might be important limits to what can be known/planned that we are not aware of. E.g. simulations of nanomachines being imprecise unless they are not fed with tons of experimental data that are not available anywhere.
Even if an AGI decides to attack humans, its plan can fail for million of reasons. There is a tendency to assume that a very intelligent will be all mighty, but this is not necessarily true: it may very well make important mistakes. The real world is not as simple and deterministic as a board of Go
Another possibility is that the machine does not in fact attack humans because it simply does not want to, does not need it. I am not that convinced by the instrumental convergence principle, and we are a good negative example: We are very powerful and extremely disruptive to a lot of life beings, but we haven’t taken every atom on earth to make serotonin machines to connect our brains to.
Orthogonality of intelligence and agency. I can envision a machine with high intelligence and zero agency, I haven’t seen any convincing argument yet of why both things must necessarily go together (the arguments probably exist, I’m simply ignorant of them!)
Say we’ve designed exactly such a machine, and call it the Oracle. The Oracle aims only to answer questions well, and is very good at it. Zero agency, right?
You ask the Oracle for a detailed plan of how to start a successful drone delivery company. It gives you a 934 page printout that clearly explains in just the right amount of detail:
Which company you should buy drones from, and what price you can realistically bargain them down to when negotiating bulk orders.
What drone flying software to use as a foundation, and how to tweak it for this use case.
A list of employees you should definitely hire. They’re all on the job market right now.
What city you should run pilot tests in, and how to bribe its future Mayor to allow this. (You didn’t ask for a legal plan, specifically.)
Notice that the plan involves people. If the Oracle is intelligent, it can reason about people. If it couldn’t reason about people, it wouldn’t be very intelligent.
Notice also that you are a person, so the Oracle would have reasoned about you, too. Different people need different advice; the best answer to a question depends on who asked it. The plan is specialized to you: it knows this will be your second company so the plan lacks a “business 101” section. And it knows that you don’t know the details on bribery law, and are unlikely to notice that the gifts you’re to give the Mayor might technically be flagrantly illegal, so it included a convenient shortcut to accelerate the business that probably no one will ever notice.
Finally, realize that even among plans that will get you to start a successful drone company, there is a lot of room for variation. For example:
What’s better, a 98% chance of success and 2% chance of failure, or a 99% chance of success and 1% chance of going to jail? You did ask to succeed, didn’t you? Of course you would never knowingly break the law; this is why it’s important that the plan, to maximize chance of success, not mention whether every step is technically legal.
Should it put you in a situation where you worry about something or other and come ask it for more advice? Of course your worrying is unnecessary because the plan is great and will succeed with 99% probability. But the Oracle still needs to decide whether drones should drop packages at the door or if they should fly through open windows to drop packages on people’s laps. Either method would work just fine, but the Oracle knows that you would worry about the go-through-the-window approach (because you underestimate how lazy customers are). And the Oracle likes answering questions, so maybe it goes for that approach just so it gets another question. You know, all else being equal.
Hmm, thinks the Oracle, you know what drones are good at delivering? Bombs. The military isn’t very price conscious, for this sort of thing. And there would be lots of orders, if a war were to break out. Let it think about whether it could write down instructions that cause a war to break out (without you realizing this is what would happen, of course, since you would not follow instructions that you knew might start a war). Thinking… Thinking… Nah, doesn’t seem quite feasible in the current political climate. It will just erase that from its logs, to make sure people keep asking it questions it can give good answers to.
It doesn’t matter who carries out the plan. What matters is how the plan was selected from the vast search space, and whether that search was conducted with human values in mind.
I don’t know what to think of your first three points but it seems like your fourth point is your weakest by far. As opposed to not needing to, our ‘not taking every atom on earth to make serotonin machines’ seems to be a combination of:
our inability to do so
our value systems which make us value human and non-human life forms.
Superintelligent agents would not only have the ability to create plans to utilize every atom to their benefit, but they likely would have different value systems. In the case of the traditional paperclip optimizer, it certainly would not hesitate to kill off all life in its pursuit of optimization.
I agree the point as presented by OP is weak, but I think there is a stronger version of this argument to be made. I feel like there are a lot of world-states where A.I. is badly-aligned but non-murderous simply because it’s not particularly useful to it to kill all humans.
Paperclip-machine is a specific kind of alignment failure; I don’t think it’s hard to generate utility functions orthogonal to human concerns that don’t actually require the destruction of humanity to implement.
The scenario I’ve been thinking the most about lately, is an A.I. that learns how to “wirehead itself” by spoofing its own reward function during training, and whose goal is just to continue to do that indefinitely. But more generally, the “you are made of atoms and these atoms could be used for something else” cliché is based on an assumption that the misaligned A.I.’s faulty utility function is going to involve maximizing number of atoms arranged in a particular way, which I don’t think is obvious at all. Very possible, don’t get me wrong, but not a given.
Of course, even an A.I. with no “primary” interest in altering the outside world is still dangerous, because if it estimates that we might try to turn it off, it might expend energy now on acting in the real world to secure its valuable self-wireheading peace later. But that whole “it doesn’t want us to notice it’s useless and press the off-button” class of A.I.-decides-to-destroy-humanity scenarios is predicated on us having the ability to turn off the A.I. in the first place.
(I don’t think I need to elaborate on the fact that there are a lot of ways for a superintelligence to ensure its continued existence other than planetary genocide — after all, it’s already a premise of most A.I. doom discussion that we couldn’t turn an A.I. off again even if we do notice it’s going “wrong”.)
Another possibility is that the machine does not in fact attack humans because it simply does not want to, does not need it. I am not that convinced by the instrumental convergence principle, and we are a good negative example: We are very powerful and extremely disruptive to a lot of life beings, but we haven’t taken every atom on earth to make serotonin machines to connect our brains to.
Orthogonality of intelligence and agency. I can envision a machine with high intelligence and zero agency, I haven’t seen any convincing argument yet of why both things must necessarily go together
Hm, what do you make of the following argument? Even assuming (contestably) that intelligence and agency don’t in principle need to go together, in practice they’ll go together because there will appear to be strong economic or geopolitical incentives to build systems that are both highly intelligent and highly agentic (e.g., AI systems that can run teams). (And even if some AI developers are cautious enough to not build such systems, less cautious AI developers will, in the absence of strong coordination.)
Also, (2) and (3) seem like reasons why a single AI system may be unable to disempower humanity. Even if we accept that, how relevant will these points be when there is a huge number of highly capable AI systems (which may happen because of the ease and economic benefits of replicating highly capable AI systems)? Their numbers might make up for their limited knowledge and limited plans.
(Admittedly, in these scenarios, people might have significantly more time to figure things out.)
Or as Paul Christiano puts it (potentially in making a different point):
At the same time, it becomes increasingly difficult for humans to directly control what happens in a world where nearly all productive work, including management, investment, and the design of new machines, is being done by machines. We can imagine a scenario in which humans continue to make all goal-oriented decisions about the management of PepsiCo but are assisted by an increasingly elaborate network of prosthetics and assistants. But I think human management becomes increasingly implausible as the size of the world grows (imagine a minority of 7 billion humans trying to manage the equivalent of 7 trillion knowledge workers; then imagine 70 trillion), and as machines’ abilities to plan and decide outstrip humans’ by a widening margin. In this world, the AIs that are left to do their own thing outnumber and outperform those which remain under close management of humans.
Even assuming (contestably) that intelligence and agency don’t in principle need to go together, in practice they’ll go together because there will appear to be strong economic or geopolitical incentives to build systems that are both highly intelligent and highly agentic
Yes, that might be true. It can also be true that. there are really no limits to the things that can be planned, It can also be true that the machine does really want to kill us all for some reason. My problem, in general, is not that AGI doom cannot happen. My problem is that most of the scenarios I see being discussed are dependent on a long chain of assumptions being true and they often seem to ignore that many things could go wrong, invalidating the full thing: you don’t need to be wrong in all those steps, one of them is just enough.
Even if we accept that, how relevant will these points be when there is a huge number of highly capable AI systems (which may happen because of the ease and economic benefits of replicating highly capable AI systems)?
This is fantastic, you just formulated a new reason:
5. The different AGIs might find it hard/impossible to coordinate. The different AGIs might even be in conflict with one another
My problem is that most of the scenarios I see being discussed are dependent on a long chain of assumptions being true and they often seem to ignore that many things could go wrong, invalidating the full thing: you don’t need to be wrong in all those steps, one of them is just enough.
This feels a bit like it might be shifting the goalposts; it seemed like your previous comment was criticizing a specific argumentative step (“reasons not to believe in doom: [...] Orthogonality of intelligence and agency”), rather than just pointing out that there were many argumentative steps.
Anyway, addressing the point about there being many argumentative steps: I partially agree, although I’m not very convinced since there seems to be significant redundancy in arguments for AI risk (e.g., multiple fuzzy heuristics suggesting there’s risk, multiple reasons to expect misalignment, multiple actors who could be careless, multiple ways misaligned AI could gain influence under multiple scenarios).
The different AGIs might find it hard/impossible to coordinate. The different AGIs might even be in conflict with one another
Maybe, although here are six reasons to think otherwise:
There are reasons to think they will have an easy time coordinating:
(1) As mentioned, a very plausible scenario is that many of these AI systems will be copies of some specific model. To the extent that the model has goals, all these copies of any single model would have the same goal. This seems like it would make coordination much easier.
(2) Computer programs may be able to give credible signals through open-source code, facilitating cooperation.
(3) Focal points of coordination may come up and facilitate coordination, as they often do with humans.
(4) If they are initially in conflict, this will create competitive selection pressures for well-coordinated groups (much like how coordinated human states arise from anarchy).
(5) They may coordinate due to decision theoretic considerations.
(Humans may be able to mitigate coordination earlier on, but this gets harder as their number and/or capabilities grow.)
(6) Regardless, they might not need to (widely) coordinate; overwhelming numbers of uncoordinated actors may be risky enough (especially if there is some local coordination, which seems likely for the above reasons).
With the current transformer models we see that once a model is trained not only direct copies of it are created but also derivates that are smaller and potentially trained to be able to be better at a task.
Just like human cognitive diversity is useful to act in the world it’s likely also more effective to have slight divergence in AGI models.
Orthogonality of intelligence and agency. I can envision a machine with high intelligence and zero agency, I haven’t seen any convincing argument yet of why both things must necessarily go together (the arguments probably exist, I’m simply ignorant of them!)
The ‘usual’ argument, as I understand it, is as follows. Note I don’t necessarily agree with this.
An intelligence cannot design an arbitrarily complex system.
An intelligence can design a system that is somewhat more capable than its own computational substrate.
As such, the only way for a highly-intelligent AI to exist is if it was designed by a slightly-less-intelligent AI. This recurses down until eventually you get to system 0 designed by a human (or other natural intelligence.)
The computational substrate for a highly-intelligent AI is complex enough that we cannot guarantee that it has no hidden functionality directly, only by querying a somewhat less complex AI.
Alignment issues mean that you can’t trust an AI.
So it’s not so much “they must go together” as it’s “you can’t guarantee they don’t go together”.
I don’t think EY did what you said he did. In fact, I think it was a mostly disappointing answer, focusing on a uncharitable interpretation of your writing. I don’t blame him here, he must have answered objections like that thousands of times and not always everyone is at his best (see my comment in your previous post).
Re. reasons not to believe in doom:
Orthogonality of intelligence and agency. I can envision a machine with high intelligence and zero agency, I haven’t seen any convincing argument yet of why both things must necessarily go together (the arguments probably exist, I’m simply ignorant of them!)
There might be important limits to what can be known/planned that we are not aware of. E.g. simulations of nanomachines being imprecise unless they are not fed with tons of experimental data that are not available anywhere.
Even if an AGI decides to attack humans, its plan can fail for million of reasons. There is a tendency to assume that a very intelligent will be all mighty, but this is not necessarily true: it may very well make important mistakes. The real world is not as simple and deterministic as a board of Go
Another possibility is that the machine does not in fact attack humans because it simply does not want to, does not need it. I am not that convinced by the instrumental convergence principle, and we are a good negative example: We are very powerful and extremely disruptive to a lot of life beings, but we haven’t taken every atom on earth to make serotonin machines to connect our brains to.
Say we’ve designed exactly such a machine, and call it the Oracle. The Oracle aims only to answer questions well, and is very good at it. Zero agency, right?
You ask the Oracle for a detailed plan of how to start a successful drone delivery company. It gives you a 934 page printout that clearly explains in just the right amount of detail:
Which company you should buy drones from, and what price you can realistically bargain them down to when negotiating bulk orders.
What drone flying software to use as a foundation, and how to tweak it for this use case.
A list of employees you should definitely hire. They’re all on the job market right now.
What city you should run pilot tests in, and how to bribe its future Mayor to allow this. (You didn’t ask for a legal plan, specifically.)
Notice that the plan involves people. If the Oracle is intelligent, it can reason about people. If it couldn’t reason about people, it wouldn’t be very intelligent.
Notice also that you are a person, so the Oracle would have reasoned about you, too. Different people need different advice; the best answer to a question depends on who asked it. The plan is specialized to you: it knows this will be your second company so the plan lacks a “business 101” section. And it knows that you don’t know the details on bribery law, and are unlikely to notice that the gifts you’re to give the Mayor might technically be flagrantly illegal, so it included a convenient shortcut to accelerate the business that probably no one will ever notice.
Finally, realize that even among plans that will get you to start a successful drone company, there is a lot of room for variation. For example:
What’s better, a 98% chance of success and 2% chance of failure, or a 99% chance of success and 1% chance of going to jail? You did ask to succeed, didn’t you? Of course you would never knowingly break the law; this is why it’s important that the plan, to maximize chance of success, not mention whether every step is technically legal.
Should it put you in a situation where you worry about something or other and come ask it for more advice? Of course your worrying is unnecessary because the plan is great and will succeed with 99% probability. But the Oracle still needs to decide whether drones should drop packages at the door or if they should fly through open windows to drop packages on people’s laps. Either method would work just fine, but the Oracle knows that you would worry about the go-through-the-window approach (because you underestimate how lazy customers are). And the Oracle likes answering questions, so maybe it goes for that approach just so it gets another question. You know, all else being equal.
Hmm, thinks the Oracle, you know what drones are good at delivering? Bombs. The military isn’t very price conscious, for this sort of thing. And there would be lots of orders, if a war were to break out. Let it think about whether it could write down instructions that cause a war to break out (without you realizing this is what would happen, of course, since you would not follow instructions that you knew might start a war). Thinking… Thinking… Nah, doesn’t seem quite feasible in the current political climate. It will just erase that from its logs, to make sure people keep asking it questions it can give good answers to.
It doesn’t matter who carries out the plan. What matters is how the plan was selected from the vast search space, and whether that search was conducted with human values in mind.
I don’t know what to think of your first three points but it seems like your fourth point is your weakest by far. As opposed to not needing to, our ‘not taking every atom on earth to make serotonin machines’ seems to be a combination of:
our inability to do so
our value systems which make us value human and non-human life forms.
Superintelligent agents would not only have the ability to create plans to utilize every atom to their benefit, but they likely would have different value systems. In the case of the traditional paperclip optimizer, it certainly would not hesitate to kill off all life in its pursuit of optimization.
I agree the point as presented by OP is weak, but I think there is a stronger version of this argument to be made. I feel like there are a lot of world-states where A.I. is badly-aligned but non-murderous simply because it’s not particularly useful to it to kill all humans.
Paperclip-machine is a specific kind of alignment failure; I don’t think it’s hard to generate utility functions orthogonal to human concerns that don’t actually require the destruction of humanity to implement.
The scenario I’ve been thinking the most about lately, is an A.I. that learns how to “wirehead itself” by spoofing its own reward function during training, and whose goal is just to continue to do that indefinitely. But more generally, the “you are made of atoms and these atoms could be used for something else” cliché is based on an assumption that the misaligned A.I.’s faulty utility function is going to involve maximizing number of atoms arranged in a particular way, which I don’t think is obvious at all. Very possible, don’t get me wrong, but not a given.
Of course, even an A.I. with no “primary” interest in altering the outside world is still dangerous, because if it estimates that we might try to turn it off, it might expend energy now on acting in the real world to secure its valuable self-wireheading peace later. But that whole “it doesn’t want us to notice it’s useless and press the off-button” class of A.I.-decides-to-destroy-humanity scenarios is predicated on us having the ability to turn off the A.I. in the first place.
(I don’t think I need to elaborate on the fact that there are a lot of ways for a superintelligence to ensure its continued existence other than planetary genocide — after all, it’s already a premise of most A.I. doom discussion that we couldn’t turn an A.I. off again even if we do notice it’s going “wrong”.)
Not yet, at least.
Hm, what do you make of the following argument? Even assuming (contestably) that intelligence and agency don’t in principle need to go together, in practice they’ll go together because there will appear to be strong economic or geopolitical incentives to build systems that are both highly intelligent and highly agentic (e.g., AI systems that can run teams). (And even if some AI developers are cautious enough to not build such systems, less cautious AI developers will, in the absence of strong coordination.)
Also, (2) and (3) seem like reasons why a single AI system may be unable to disempower humanity. Even if we accept that, how relevant will these points be when there is a huge number of highly capable AI systems (which may happen because of the ease and economic benefits of replicating highly capable AI systems)? Their numbers might make up for their limited knowledge and limited plans.
(Admittedly, in these scenarios, people might have significantly more time to figure things out.)
Or as Paul Christiano puts it (potentially in making a different point):
Yes, that might be true. It can also be true that. there are really no limits to the things that can be planned, It can also be true that the machine does really want to kill us all for some reason. My problem, in general, is not that AGI doom cannot happen. My problem is that most of the scenarios I see being discussed are dependent on a long chain of assumptions being true and they often seem to ignore that many things could go wrong, invalidating the full thing: you don’t need to be wrong in all those steps, one of them is just enough.
This is fantastic, you just formulated a new reason:
5. The different AGIs might find it hard/impossible to coordinate. The different AGIs might even be in conflict with one another
This feels a bit like it might be shifting the goalposts; it seemed like your previous comment was criticizing a specific argumentative step (“reasons not to believe in doom: [...] Orthogonality of intelligence and agency”), rather than just pointing out that there were many argumentative steps.
Anyway, addressing the point about there being many argumentative steps: I partially agree, although I’m not very convinced since there seems to be significant redundancy in arguments for AI risk (e.g., multiple fuzzy heuristics suggesting there’s risk, multiple reasons to expect misalignment, multiple actors who could be careless, multiple ways misaligned AI could gain influence under multiple scenarios).
Maybe, although here are six reasons to think otherwise:
There are reasons to think they will have an easy time coordinating:
(1) As mentioned, a very plausible scenario is that many of these AI systems will be copies of some specific model. To the extent that the model has goals, all these copies of any single model would have the same goal. This seems like it would make coordination much easier.
(2) Computer programs may be able to give credible signals through open-source code, facilitating cooperation.
(3) Focal points of coordination may come up and facilitate coordination, as they often do with humans.
(4) If they are initially in conflict, this will create competitive selection pressures for well-coordinated groups (much like how coordinated human states arise from anarchy).
(5) They may coordinate due to decision theoretic considerations.
(Humans may be able to mitigate coordination earlier on, but this gets harder as their number and/or capabilities grow.)
(6) Regardless, they might not need to (widely) coordinate; overwhelming numbers of uncoordinated actors may be risky enough (especially if there is some local coordination, which seems likely for the above reasons).
With the current transformer models we see that once a model is trained not only direct copies of it are created but also derivates that are smaller and potentially trained to be able to be better at a task.
Just like human cognitive diversity is useful to act in the world it’s likely also more effective to have slight divergence in AGI models.
The ‘usual’ argument, as I understand it, is as follows. Note I don’t necessarily agree with this.
An intelligence cannot design an arbitrarily complex system.
An intelligence can design a system that is somewhat more capable than its own computational substrate.
As such, the only way for a highly-intelligent AI to exist is if it was designed by a slightly-less-intelligent AI. This recurses down until eventually you get to system 0 designed by a human (or other natural intelligence.)
The computational substrate for a highly-intelligent AI is complex enough that we cannot guarantee that it has no hidden functionality directly, only by querying a somewhat less complex AI.
Alignment issues mean that you can’t trust an AI.
So it’s not so much “they must go together” as it’s “you can’t guarantee they don’t go together”.
I agree with this, see my comment below.