Are you saying that I’d have to kill everyone so noone can build AGI?
Yup. Anything short of that is just a delaying tactic.
From the last part of your comment, you seem to agree with that, actually. 1000 years is still just a delay.
But I didn’t see you as presenting preventing fully general, self-improving AGI as a delaying tactic. I saw you as presenting it as a solution.
Also, isn’t suppressing fully general AGI actually a separate question from building narrow AI? You could try suppress fully general AGI and narrow AI. Or you could build narrow AI while still also trying to do fully general AGI. You can do either with or without the other.
you have to provide evidence that a) this is distracting relevant people from doing things that are more productive (such as solving alignment?)
I don’t know if it’s distracting any individuals from finding any way to guarantee good AGI behavior[1]. But it definitely tends to distract social attention from that. Finding one “solution” for a problem tends to make it hard to continue any negotiated process, including government policy development, for doing another “solution”. The attitude is “We’ve solved that (or solved it for now), so on to the next crisis”. And the suppression regime could itself make it harder to work on guaranteeing behavior.
True, I don’t don’t know if the good behavior problem can be solved, and am very unsure that it can be solved in time, regardless.
But at the very least, even if we’re totally doomed, the idea of total, permanent suppression distracts people from getting whatever value they can out of whatever time they have left, and may lead them to actions that make it harder for others to get that value.
AND b) that solving alignment before we can build AGI is not only possible, but highly likely.
Oh, no, I don’t think that at all. Given the trends we seem to be on, things aren’t looking remotely good.
I do think there’s some hope for solving the good behavior problem, but honestly I pin more of my hope for the future on limitations of the amount of intelligence that’s physically possbile, and even more on limitations of what you can do with intelligence no matter how much of it you have. And another, smaller, chunk on it possibly turning out that a random self-improving intelligence simply won’t feel like doing anything that bad anyway.
… but even if you were absolutely sure you couldn’t make a guaranteed well-behaved self-improving AGI, and also absolutely sure that a random self-improving AGI meant certain extinction, it still wouldn’t follow that you should turn around and do something else that also won’t work. Not unless the cost were zero.
And the cost of the kind of totalitarian regime you’d have to set up to even try for long-term suppression is far from zero. Not only could it stop people from enjoying what remains, but when that regime failed, it could end up turning X-risk into S-risk by causing whatever finally escaped to have a particularly nasty goal system.
For all the people who continuously claim that it’s impossible to coordinate humankind into not doing obviously stupid things, here are some counter examples: We have the Darwin awards for precisely the reason that almost all people on earth would never do the stupid things that get awarded. A very large majority of humans will not let their children play on the highway, will not eat the first unknown mushrooms they find in the woods, will not use chloroquine against covid, will not climb into the cage in the zoo to pet the tigers, etc.
Those things are obviously bad from an individual point of view. They’re bad in readily understandable ways. The bad consequences are very certain and have been seen many times. Almost all of the bad consequences of doing any one of them accrue personally to whoever does it. If other people do them, it still doesn’t introduce any considerations that might drive you to want to take the risk of doing them too.
Yet lots of people DID (and do) take hydroxychloroquine and ivermectin for COVID, a nontrivial number of people do in fact eat random mushrooms, and the others aren’t unheard-of. The good part is that when somebody dies from doing one of those things, everybody else doesn’t also die. That doesn’t apply to unleashing the killer robots.
… and if making a self-improving AGI were as easy as eating the wrong mushrooms, I think it would have happened already.
The challenge here is not the coordination, but the common acceptance that certain things are stupid.
Pretty much everybody nowadays has a pretty good understanding of the outlines of the climate change problem. The people who don’t are the pretty much the same people who eat horse paste. Yet people, in the aggregate, have not stopped making it worse. Not only has every individual not stopped, but governments have been negotiating about it for like 30 years… agreeing at every stage on probably inadequate targets… which they then go on not to meet.
… and climate change is much, much easier than AGI. Climate change rules could still be effective without perfect compliance at an individual level. And there’s no arms race involved, not even between governments. A climate change defector may get some economic advantage over other players, but doesn’t get an unstoppable superweapon to use against the other players. A climate change defector also doesn’t get to “align” the entire future with the defector’s chosen value system. And all the players know that.
Speaking of arms races, many people think that war is stupid. Almost everybody thinks that nuclear war is stupid, even if they don’t think nuclear deterrence is stupid. Almost everybody thinks that starting a war you will lose is stupid. Yet people still start wars that they will lose, and there is real fear that nuclear war can happen.
This is maybe hard in certain cases, but NOT impossible. Sure, this will maybe not hold for the next 1,000 years, but it will buy us time.
I agree that suppressing full-bore self-improving ultra-general AGI can buy time, if done carefully and correctly. I’m even in favor of it at this point.
But I suspect we have some huge quantitative differences, because I think the best you’ll get out of it is probably less than 10 years, not anywhere near 1000. And again I don’t see what substituting narrow AI has to do with it. If anything, that would make it harder by requiring you to tell the difference.
I also think that putting too much energy into making that kind of system “non-leaky” would be counterproductive. It’s one thing to make it inconvenient to start a large research group, build a 10,000-GPU cluster, and start trying for the most agenty thing you can imagine. It’s both harder and more harmful to set up a totalitarian surveillance state to try to control every individual’s use of gaming-grade hardware.
And there are possible measures to reduce the ability of the most stupid 1% of humanity to build AGI and kill everyone.
A climate change defector also doesn’t get to “align” the entire future with the defector’s chosen value system.
I do understand that the problem with AGI is exactly that you don’t know how to align anything with anything at all, and if you know you can’t, then obviously you shouldn’t try. That would be stupid.
The problem is that there’ll be an arms race to become able to do so… and a huge amount of pressure to deploy any solution you think you have as soon as you possibly can. That kind of pressure leads to motivated cognition and institutional failure, so you become “sure” that something will work when it won’t. It also leads to building up all the prerequisite capabilities for a “pivotal act”, so that you can put it into practice immediately when (you think) you have an alignment solution.
But I didn’t see you as presenting preventing fully general, self-improving AGI as a delaying tactic. I saw you as presenting it as a solution.
Actually, my point in this post is that we don’t NEED AGI for a great future, because often people equate Not AGI = Not amazing future (or even a terrible one) and I think this is wrong. The point of this post is not to argue that preventing AGI is easy.
However, it’s actually very simple: If we build a misaligned AGI, we’re dead. So there are only two options: A) solve alignment, B) not build AGI. If not A), then there’s only B), however “impossible” that may be.
Yet lots of people DID (and do) take hydroxychloroquine and ivermectin for COVID, a nontrivial number of people do in fact eat random mushrooms, and the others aren’t unheard-of.
Yes. My hope is not that 100% of mankind will be smart enough to not build an AGI, but that maybe 90+% will be good enough, because we can prevent the rest from getting there, at least for a while. Currently, you need a lot of compute to train a Sub-AGI LLM. Maybe we can put the lid on who’s getting how much compute, at least for a time. And maybe the top guys at the big labs are among the 90% non-insane people. Doesn’t look very hopeful, I admit.
Anyway, I haven’t seen you offer an alternative. Once again, I’m not saying not developing AGI is an easy task. But saying it’s impossible (while not having solved alignment) is saying “we’ll all die anyway”. If that’s the case, then we can as well try the “impossible” things and at least die with dignity.
Actually, my point in this post is that we don’t NEED AGI for a great future, because often people equate Not AGI = Not amazing future (or even a terrible one) and I think this is wrong.
I don’t have so much of a problem with that part.
It would prevent my personal favorite application for fully generally strongly superhuman AGI… which is to have it take over the world and keep humans from screwing things up more. I’m not sure I’d want humans to have access to some of the stuff non-AGI could do… but I don’t think here’s any way to prevent that.
If we build a misaligned AGI, we’re dead. So there are only two options: A) solve alignment, B) not build AGI. If not A), then there’s only B), however “impossible” that may be.
C) Give up.
Anyway, I haven’t seen you offer an alternative.
You’re not going to like it...
Personally, if made king of the world, I would try to discourage at least large scale efforts to develop either generalized agents or “narrow AI”, especially out of opaque technology like ML. Thats because narrow AI could easily become parts or tools for a generalized agent, because many kinds of narrow AI are too dangerous in human hands, and because the tools and expertise for narrow AI are too close to those for generalized AGI,. It would be extremely difficult to suppress one in practice without suppressing the other.
I’d probably start by making it as unprofitable as I could by banning likely applications. That’s relatively easy to enforce because many applications are visible. A lot of the current narrow AI applications need bannin’ anyhow. Then I’d start working on a list of straight-up prohibitions.
Then I’d dump a bunch of resources into research on assuring behavior in general and on more transparent architectures. I would not actually expect it to work, but it has enough of a chance to be worth a try,. That work would be a lot more public than most people on Less Wrong would be comfortable with, because I’m afraid of nasty knock-on effects from trying to make it secret. And I’d be a little looser about capability work in service of that goal than in service of any other.
I would think very hard about banning large aggregations of vector compute hardware, and putting various controls on smaller ones, and would almost certainly end up doing it for some size thresholds. I’m not sure what the thresholds would be, nor exactly what the controls would be. This part would be very hard to enforce regardless.
I would not do anything that relied on perfect enforcement for its effecitveness, and I would not try to ramp up enforcement to the point where it was absolutely impossible to break my rules, because I would fail and make people miserable. I would titrate enforcement and stick with measures that seemed to be working without causing horrible pain.
I’d hope to get a few years out of that, and maybe a breakthrough on safety if I were tremendously lucky. Given oerfect confidence in a real breakthrough, I would try to abdicate in favor of the AGI.
If made king of only part of the world, I would try to convince the other parts to collaborate with me in imposing roughly the same regime. How I reacted if they didn’t do that would depend on how much leverage I had and what they did seem to be doing. I would try really, really hard not to start any wars over it. Regardless of what they said they were doing I would assume that they were engaging in AGI research under the table. Not quite sure what I’d do with that assumption, though.
But I am not king of the world, and I do not think it’s feasible for me to become king of the world.
I also doubt that the actual worldwide political system, or even the political systems of most large countries, can actually be made to take any very effective measures within any useful amount of time. There are too many people out there with too many different opinions, too many power centers with contrary interests, too much mutal distrust, and too many other people with too much skill at deflecting any kind of policy initiative down ways that sort of look like they serve the original purpose, but mostly don’t. The devil is often in the details.
If it is possible to get the system to do that, I know that I am not capable of doing so. I mean, I’ll vote for it, maybe make write some letters, but I know from experience that I have nearly no ability to persuade the sorts of people who’d need to be persuaded.
I am also not capable of solving the technical problem myself and doing some “pivotal act”. In fact I’m pretty sure I have no technical ideas for things to try that aren’t obvious to most specialists. And I don’t much buy any of the the ideas I’ve heard from other people.
My only real hopes are things that neither I nor anybody else can influence, especially not in any predictable direction, like limitations on intelligence and uncertainty about doom.
So my personal solution is to read random stuff, study random things, putter around in my workshop, spend time with my kid, and generally have a good time.
We’re not as far apart as you probably think. I’d agree with most of your decisions. I’d even vote for you to become king! :) Like I wrote, I think we must also be cautious with narrow AI as well, and I agree with your points about opaqueness and the potential of narrow AI turning into AGI. Again, the purpose of my post was not to argue how we could make AI safe, but to point out that we could have a great future without AGI. And I still see a lot of beneficial potential in narrow AI, IF we’re cautious enough.
Independent of potential for growing into AGI and {S,X}-risk resulting from that?
With the understanding that these are very rough descriptions that need much more clarity and nuance, that one or two of them might be flat out wrong, that some of them might turn out to be impossible to codify usefully in practice, that there there might be specific exceptions for some of them, and that the list isn’t necessarily complete--
Recommendation systems that optimize for “engagement” (or proxy measures thereof).
Anything that identifies or tracks people, or proxies like vehicles, in spaces open to the public. Also collection of data that would be useful for this.
Anything that mass-classifies private communications, including closed group communications, for any use by anybody not involved in the communication.
Anything specifically designed to produce media showing real people in false situations or to show them saying or doing things they have not actually done.
Anything that adaptively tries to persuade anybody to buy anything or give anybody money, or to hold or not hold any opinion of any person or organization.
Anything that tries to make people anthropomorphize it or develop affection for it.
Anything that tries to classify humans into risk groups based on, well, anything.
Anything that purports to read minds or act as a lie detector, live or on recorded or written material.
Good list. Another one that caught my attention that I saw in the EU act was AIs specialised into subliminal messages. people’s choices can be somewhat conditioned in favor or against things in certain ways by feeding them sensory data even if it’s not consciously perceptible, it can also affect their emotional states more broadly.
I don’t know how effective this stuff is in real life, but I know that it at least works.
Anything that tries to classify humans into risk groups based on, well, anything.
A particular example of that one is systems of social scoring, which are surely gonna be used by authoritarian regimes. You can screw people up in so many ways when social control is centralised with AI systems. It’s great to punish people for not being chauvinists
Yup. Anything short of that is just a delaying tactic.
From the last part of your comment, you seem to agree with that, actually. 1000 years is still just a delay.
But I didn’t see you as presenting preventing fully general, self-improving AGI as a delaying tactic. I saw you as presenting it as a solution.
Also, isn’t suppressing fully general AGI actually a separate question from building narrow AI? You could try suppress fully general AGI and narrow AI. Or you could build narrow AI while still also trying to do fully general AGI. You can do either with or without the other.
I don’t know if it’s distracting any individuals from finding any way to guarantee good AGI behavior[1]. But it definitely tends to distract social attention from that. Finding one “solution” for a problem tends to make it hard to continue any negotiated process, including government policy development, for doing another “solution”. The attitude is “We’ve solved that (or solved it for now), so on to the next crisis”. And the suppression regime could itself make it harder to work on guaranteeing behavior.
True, I don’t don’t know if the good behavior problem can be solved, and am very unsure that it can be solved in time, regardless.
But at the very least, even if we’re totally doomed, the idea of total, permanent suppression distracts people from getting whatever value they can out of whatever time they have left, and may lead them to actions that make it harder for others to get that value.
Oh, no, I don’t think that at all. Given the trends we seem to be on, things aren’t looking remotely good.
I do think there’s some hope for solving the good behavior problem, but honestly I pin more of my hope for the future on limitations of the amount of intelligence that’s physically possbile, and even more on limitations of what you can do with intelligence no matter how much of it you have. And another, smaller, chunk on it possibly turning out that a random self-improving intelligence simply won’t feel like doing anything that bad anyway.
… but even if you were absolutely sure you couldn’t make a guaranteed well-behaved self-improving AGI, and also absolutely sure that a random self-improving AGI meant certain extinction, it still wouldn’t follow that you should turn around and do something else that also won’t work. Not unless the cost were zero.
And the cost of the kind of totalitarian regime you’d have to set up to even try for long-term suppression is far from zero. Not only could it stop people from enjoying what remains, but when that regime failed, it could end up turning X-risk into S-risk by causing whatever finally escaped to have a particularly nasty goal system.
Those things are obviously bad from an individual point of view. They’re bad in readily understandable ways. The bad consequences are very certain and have been seen many times. Almost all of the bad consequences of doing any one of them accrue personally to whoever does it. If other people do them, it still doesn’t introduce any considerations that might drive you to want to take the risk of doing them too.
Yet lots of people DID (and do) take hydroxychloroquine and ivermectin for COVID, a nontrivial number of people do in fact eat random mushrooms, and the others aren’t unheard-of. The good part is that when somebody dies from doing one of those things, everybody else doesn’t also die. That doesn’t apply to unleashing the killer robots.
… and if making a self-improving AGI were as easy as eating the wrong mushrooms, I think it would have happened already.
Pretty much everybody nowadays has a pretty good understanding of the outlines of the climate change problem. The people who don’t are the pretty much the same people who eat horse paste. Yet people, in the aggregate, have not stopped making it worse. Not only has every individual not stopped, but governments have been negotiating about it for like 30 years… agreeing at every stage on probably inadequate targets… which they then go on not to meet.
… and climate change is much, much easier than AGI. Climate change rules could still be effective without perfect compliance at an individual level. And there’s no arms race involved, not even between governments. A climate change defector may get some economic advantage over other players, but doesn’t get an unstoppable superweapon to use against the other players. A climate change defector also doesn’t get to “align” the entire future with the defector’s chosen value system. And all the players know that.
Speaking of arms races, many people think that war is stupid. Almost everybody thinks that nuclear war is stupid, even if they don’t think nuclear deterrence is stupid. Almost everybody thinks that starting a war you will lose is stupid. Yet people still start wars that they will lose, and there is real fear that nuclear war can happen.
I agree that suppressing full-bore self-improving ultra-general AGI can buy time, if done carefully and correctly. I’m even in favor of it at this point.
But I suspect we have some huge quantitative differences, because I think the best you’ll get out of it is probably less than 10 years, not anywhere near 1000. And again I don’t see what substituting narrow AI has to do with it. If anything, that would make it harder by requiring you to tell the difference.
I also think that putting too much energy into making that kind of system “non-leaky” would be counterproductive. It’s one thing to make it inconvenient to start a large research group, build a 10,000-GPU cluster, and start trying for the most agenty thing you can imagine. It’s both harder and more harmful to set up a totalitarian surveillance state to try to control every individual’s use of gaming-grade hardware.
What in detail would you like to do?
I don’t like the word “alignment” for reasons that are largely irrelevant here.
Replying to myself to clarify this:
I do understand that the problem with AGI is exactly that you don’t know how to align anything with anything at all, and if you know you can’t, then obviously you shouldn’t try. That would be stupid.
The problem is that there’ll be an arms race to become able to do so… and a huge amount of pressure to deploy any solution you think you have as soon as you possibly can. That kind of pressure leads to motivated cognition and institutional failure, so you become “sure” that something will work when it won’t. It also leads to building up all the prerequisite capabilities for a “pivotal act”, so that you can put it into practice immediately when (you think) you have an alignment solution.
… which basically sets up a bunch of time bombs.
I agree with that.
Fine. I’ll take it.
Actually, my point in this post is that we don’t NEED AGI for a great future, because often people equate Not AGI = Not amazing future (or even a terrible one) and I think this is wrong. The point of this post is not to argue that preventing AGI is easy.
However, it’s actually very simple: If we build a misaligned AGI, we’re dead. So there are only two options: A) solve alignment, B) not build AGI. If not A), then there’s only B), however “impossible” that may be.
Yes. My hope is not that 100% of mankind will be smart enough to not build an AGI, but that maybe 90+% will be good enough, because we can prevent the rest from getting there, at least for a while. Currently, you need a lot of compute to train a Sub-AGI LLM. Maybe we can put the lid on who’s getting how much compute, at least for a time. And maybe the top guys at the big labs are among the 90% non-insane people. Doesn’t look very hopeful, I admit.
Anyway, I haven’t seen you offer an alternative. Once again, I’m not saying not developing AGI is an easy task. But saying it’s impossible (while not having solved alignment) is saying “we’ll all die anyway”. If that’s the case, then we can as well try the “impossible” things and at least die with dignity.
I don’t have so much of a problem with that part.
It would prevent my personal favorite application for fully generally strongly superhuman AGI… which is to have it take over the world and keep humans from screwing things up more. I’m not sure I’d want humans to have access to some of the stuff non-AGI could do… but I don’t think here’s any way to prevent that.
C) Give up.
You’re not going to like it...
Personally, if made king of the world, I would try to discourage at least large scale efforts to develop either generalized agents or “narrow AI”, especially out of opaque technology like ML. Thats because narrow AI could easily become parts or tools for a generalized agent, because many kinds of narrow AI are too dangerous in human hands, and because the tools and expertise for narrow AI are too close to those for generalized AGI,. It would be extremely difficult to suppress one in practice without suppressing the other.
I’d probably start by making it as unprofitable as I could by banning likely applications. That’s relatively easy to enforce because many applications are visible. A lot of the current narrow AI applications need bannin’ anyhow. Then I’d start working on a list of straight-up prohibitions.
Then I’d dump a bunch of resources into research on assuring behavior in general and on more transparent architectures. I would not actually expect it to work, but it has enough of a chance to be worth a try,. That work would be a lot more public than most people on Less Wrong would be comfortable with, because I’m afraid of nasty knock-on effects from trying to make it secret. And I’d be a little looser about capability work in service of that goal than in service of any other.
I would think very hard about banning large aggregations of vector compute hardware, and putting various controls on smaller ones, and would almost certainly end up doing it for some size thresholds. I’m not sure what the thresholds would be, nor exactly what the controls would be. This part would be very hard to enforce regardless.
I would not do anything that relied on perfect enforcement for its effecitveness, and I would not try to ramp up enforcement to the point where it was absolutely impossible to break my rules, because I would fail and make people miserable. I would titrate enforcement and stick with measures that seemed to be working without causing horrible pain.
I’d hope to get a few years out of that, and maybe a breakthrough on safety if I were tremendously lucky. Given oerfect confidence in a real breakthrough, I would try to abdicate in favor of the AGI.
If made king of only part of the world, I would try to convince the other parts to collaborate with me in imposing roughly the same regime. How I reacted if they didn’t do that would depend on how much leverage I had and what they did seem to be doing. I would try really, really hard not to start any wars over it. Regardless of what they said they were doing I would assume that they were engaging in AGI research under the table. Not quite sure what I’d do with that assumption, though.
But I am not king of the world, and I do not think it’s feasible for me to become king of the world.
I also doubt that the actual worldwide political system, or even the political systems of most large countries, can actually be made to take any very effective measures within any useful amount of time. There are too many people out there with too many different opinions, too many power centers with contrary interests, too much mutal distrust, and too many other people with too much skill at deflecting any kind of policy initiative down ways that sort of look like they serve the original purpose, but mostly don’t. The devil is often in the details.
If it is possible to get the system to do that, I know that I am not capable of doing so. I mean, I’ll vote for it, maybe make write some letters, but I know from experience that I have nearly no ability to persuade the sorts of people who’d need to be persuaded.
I am also not capable of solving the technical problem myself and doing some “pivotal act”. In fact I’m pretty sure I have no technical ideas for things to try that aren’t obvious to most specialists. And I don’t much buy any of the the ideas I’ve heard from other people.
My only real hopes are things that neither I nor anybody else can influence, especially not in any predictable direction, like limitations on intelligence and uncertainty about doom.
So my personal solution is to read random stuff, study random things, putter around in my workshop, spend time with my kid, and generally have a good time.
We’re not as far apart as you probably think. I’d agree with most of your decisions. I’d even vote for you to become king! :) Like I wrote, I think we must also be cautious with narrow AI as well, and I agree with your points about opaqueness and the potential of narrow AI turning into AGI. Again, the purpose of my post was not to argue how we could make AI safe, but to point out that we could have a great future without AGI. And I still see a lot of beneficial potential in narrow AI, IF we’re cautious enough.
Which? I wonder.
Independent of potential for growing into AGI and {S,X}-risk resulting from that?
With the understanding that these are very rough descriptions that need much more clarity and nuance, that one or two of them might be flat out wrong, that some of them might turn out to be impossible to codify usefully in practice, that there there might be specific exceptions for some of them, and that the list isn’t necessarily complete--
Recommendation systems that optimize for “engagement” (or proxy measures thereof).
Anything that identifies or tracks people, or proxies like vehicles, in spaces open to the public. Also collection of data that would be useful for this.
Anything that mass-classifies private communications, including closed group communications, for any use by anybody not involved in the communication.
Anything specifically designed to produce media showing real people in false situations or to show them saying or doing things they have not actually done.
Anything that adaptively tries to persuade anybody to buy anything or give anybody money, or to hold or not hold any opinion of any person or organization.
Anything that tries to make people anthropomorphize it or develop affection for it.
Anything that tries to classify humans into risk groups based on, well, anything.
Anything that purports to read minds or act as a lie detector, live or on recorded or written material.
Good list. Another one that caught my attention that I saw in the EU act was AIs specialised into subliminal messages. people’s choices can be somewhat conditioned in favor or against things in certain ways by feeding them sensory data even if it’s not consciously perceptible, it can also affect their emotional states more broadly.
I don’t know how effective this stuff is in real life, but I know that it at least works.
A particular example of that one is systems of social scoring, which are surely gonna be used by authoritarian regimes. You can screw people up in so many ways when social control is centralised with AI systems. It’s great to punish people for not being chauvinists
This is already beginning in China.