I feel like this is a good example of a post that—IMO—painfully misses the primary objection of many people it is trying to persuade (e.g., me): how can we stop 100.0% of people from building AGI this century (let alone in future centuries)? How can we possibly ensure that there isn’t a single person over the next 200 years who decides “screw it, they can’t tell me what to do,” and builds misaligned AGI? How can we stop the race between nation-states that may lack verification mechanisms? How can we identify and enforce red lines while companies are actively looking for loopholes and other ways to push the boundaries?
The point of this comment is less to say “this definitely can’t be done” (although I do think such a future is fairly implausible/unsustainable), and more to say “why did you not address this objection?” You probably ought to have a dedicated section that very clearly addresses this objection in detail. Without such a clearly sign-posted section, I felt like I mostly wasted my time skimming your article, to be entirely honest
I think the post addresses a key objection that many people opposed to EA and longtermist concerns have voiced with the EA view of AI, and thought it was fairly well written to make the points it made, without also making the mostly unrelated point that you wanted it to have addressed.
The point of this comment is less to say “this definitely can’t be done” (although I do think such a future is fairly implausible/unsustainable), and more to say “why did you not address this objection?” You probably ought to have a dedicated section that very clearly addresses this objection in detail.
That’s a valid point, thank you for making it. I have given some explanation of my point of view in my reply to jbash, but I agree that this should have been in the post in the first place.
You cannot permanently stop self-improving AGI from being created or run. Not without literally destroying all humans.
You can’t stop it for a “civilizationally significant” amount of time. Not without destroying civilization.
You can slow it down by maybe a few years (probably not decades), and everybody should be trying to do that. However, it’s a nontrivial effort, has important costs of many kinds, quickly reaches a point of sharply diminishing returns, is easy to screw up in actually counterproductive ways, and involves meaningful risk of making the way it happens worse.
If you want to quibble, OK, none of those things are absolute certainties, because there are no absolute certainties. They are, however, the most certain things in the whole AI risk field of thought.
What really concerns me is that the same idea has been coming up continuously since (at least) the 1990s, and people still talk about it as if it were possible. It’s dangerous; it distracts people into fantasies, and keeps them from thinking clearly about what can actually be done.
I’ve noticed a pattern of such proposals talking about what “we” should do, as if there were a “we” with a unified will that ever could or would act in a unified way. There is no such “we”. Although it’s not necessarily wrong to ever use the word “we”, using it carelessly is a good way to lead yourself into such errors.
There are definitely timelines where privacy and privacy imbalance were eliminated completely and AGI development prevented, possibly indefinitely, or possibly until alignment was guaranteed. (obviously/insignificantly, bare with me)
Imagine that everyone could ‘switch’ their senses to any other with minimal effort, including sensors like cameras and microphones on devices, drones, persons, or with not too distant technology, any of human senses, and imagine there was some storage system allowing access to those senses back into time, practically there may be some rolling storage window with longer windows for certain sets of the data.
I think a society like this could hold off AGI, that getting to a society like this is possibly easier than alignment first, and that such a society, while still with some difficulties, would be an improvement for most people over an imaginary regular AGI-less society and current society
It is a hard idea to come around to, there are a lot of rather built in notions that scream against it. A few otherwise rational people have told me that they’d rather misaligned AGI, that ‘privacy is a human right’ (with no further justification for it) etc.
It seems all issues associated with a lack of privacy depend on some privacy imbalance, e.g. your ssh key is leaked (in such a society, they may still be used to protect control), you don’t know who has it, and when they abuse it, but if somehow everyone did, and everyone knew everyone did, then it is much less bad. Someone might still abuse what they know against their own best interests, but they could quickly be recognized as someone who does that, and prevented from doing it more via therapy/imprisonment.
Depending on the technology available, there could still be windows for doing small, quick wrongs, e.g. someone could use your ssh key to delete a small amount of some of your data because they don’t like your face, simply hoping no one is watching, or will rewatch in the future. I don’t know how much of an issue these would be, but I figure that a combination of systems being modified in the context of complete global information freedom to do things like trigger/mark a recording of people accessing the system, and an incentive for efforts to scout wrongdoings and build systems to help scout for them, there is a good potential that the total wrongdoing cost will go down versus the alternative.
I don’t think conformity / groupthink / tribalism etc. would get the best of us. They may still have some of their impacts, but it seems that transparency almost always reduces them—strawmen less relevant when actual arguments are replayable, fabricated claims about others easily debunkable, and possibly punished under new laws. Yes, you wouldn’t be able to hide from a hateful group that doesn’t like an arbitrary thing about you, but they wouldn’t be able to circulate falsehoods about you to fuel that hate, they would be an easy cash cow for any wrongdoing scouts (and wrongdoing scouts encouraging these cashcows, a larger cashcow for other wrongdoing scouts), and with a combination of social pressure to literally put ourselves in another’s shoes, I think it is reasonable to at least consider these problems won’t be worse
Society got along okay with little to no privacy in the past, but if we later decide that we want it back for idk bedroom activities, I think it is possible that abstraction could be used to rebuild some forms of privacy in a way that still prevents us from building a misaligned AGI. For example, it could be only people on the other side of the world who you don’t know who can review what happens in your bedroom, and if you abuse your partner there or start building AGI, they could raise it to a larger audience. Maybe some narrow AI could be used to help make things more efficient and allow you private time.
Under this society, I think we could distinguish alignment and capability work, and maybe we’d need to lean heavily towards the safe direction and delay aligned AGI a bunch, but conceivably it would allow us to have an aligned AGI at the end, and we can bring back full privacy.
In timelines with this technology and society structure, I wonder what they looked like to get there. I imagine they could have started with just computer systems and drones. Maybe they had the technology ready and rolled it out only when people were on board, which may have been spurred by widespread fear of misaligned AGI. Maybe they were quite lucky, and there were other dominant cultural forces which all had ulterior incentives for such a system to be in place. I do think it would be quite hard to get everyone on board, but it seems like a reasonable route to aligned AGI worth talking more about to me.
This is an interesting thought. I think even without AGI, we’ll have total transparency of human minds soon—already AI can read thoughts in a limited way. Still, as you write, there’s an instinctive aversion against this scenario, which sounds very much like an Orwellian dystopia. But if some people have machines that can read minds, which I don’t think we can prevent, it may indeed be better if everyone could do it—deception by autocrats and bad actors would be much harder that way. On the other hand, it is hard to imagine that the people in power would agree to that: I’m pretty sure that Xi or Putin would love to read the minds of their people, but won’t allow them to read theirs. Also it would probably be possible to fake thoughts and memories, so the people in power could still deceive others. I think it’s likely that we wouldn’t overcome this imbalance anytime soon. This only shows that the future with “narrow” AI won’t be easy to navigate either.
You cannot permanently stop self-improving AGI from being created or run. Not without literally destroying all humans.
You can’t stop it for a “civilizationally significant” amount of time. Not without destroying civilization.
I’m not sure what this is supposed to mean. Are you saying that I’d have to kill everyone so noone can build AGI? Maybe, but I don’t think so. Or are you saying that not building an AGI will destroy all humans? This I strongly disagree with. I don’t know what a “civiliatzionally significant” amount of time is. For me, the next 10 years are a “significant” amount of time.
What really concerns me is that the same idea has been coming up continuously since (at least) the 1990s, and people still talk about it as if it were possible. It’s dangerous; it distracts people into fantasies, and keeps them from thinking clearly about what can actually be done.
This is a very strong claim. Calling ideas “dangerous” is in itself dangerous IMO, especially if you’re not providing any concrete evidence. If you think talking about building narrow AI instead of AGI is “dangerous” or a “fantasy”, you have to provide evidence that a) this is distracting relevant people from doing things that are more productive (such as solving alignment?) AND b) that solving alignment before we can build AGI is not only possible, but highly likely. The “fantasy” here to me seems to be that b) could be true. I can see no evidence for that at all.
For all the people who continuously claim that it’s impossible to coordinate humankind into not doing obviously stupid things, here are some counter examples: We have the Darwin awards for precisely the reason that almost all people on earth would never do the stupid things that get awarded. A very large majority of humans will not let their children play on the highway, will not eat the first unknown mushrooms they find in the woods, will not use chloroquine against covid, will not climb into the cage in the zoo to pet the tigers, etc. The challenge here is not the coordination, but the common acceptance that certain things are stupid. This is maybe hard in certain cases, but NOT impossible. Sure, this will maybe not hold for the next 1,000 years, but it will buy us time. And there are possible measures to reduce the ability of the most stupid 1% of humanity to build AGI and kill everyone.
That said, I agree that my proposal is very difficult to put into practice. The problem is, I don’t have a better idea. Do you?
Are you saying that I’d have to kill everyone so noone can build AGI?
Yup. Anything short of that is just a delaying tactic.
From the last part of your comment, you seem to agree with that, actually. 1000 years is still just a delay.
But I didn’t see you as presenting preventing fully general, self-improving AGI as a delaying tactic. I saw you as presenting it as a solution.
Also, isn’t suppressing fully general AGI actually a separate question from building narrow AI? You could try suppress fully general AGI and narrow AI. Or you could build narrow AI while still also trying to do fully general AGI. You can do either with or without the other.
you have to provide evidence that a) this is distracting relevant people from doing things that are more productive (such as solving alignment?)
I don’t know if it’s distracting any individuals from finding any way to guarantee good AGI behavior[1]. But it definitely tends to distract social attention from that. Finding one “solution” for a problem tends to make it hard to continue any negotiated process, including government policy development, for doing another “solution”. The attitude is “We’ve solved that (or solved it for now), so on to the next crisis”. And the suppression regime could itself make it harder to work on guaranteeing behavior.
True, I don’t don’t know if the good behavior problem can be solved, and am very unsure that it can be solved in time, regardless.
But at the very least, even if we’re totally doomed, the idea of total, permanent suppression distracts people from getting whatever value they can out of whatever time they have left, and may lead them to actions that make it harder for others to get that value.
AND b) that solving alignment before we can build AGI is not only possible, but highly likely.
Oh, no, I don’t think that at all. Given the trends we seem to be on, things aren’t looking remotely good.
I do think there’s some hope for solving the good behavior problem, but honestly I pin more of my hope for the future on limitations of the amount of intelligence that’s physically possbile, and even more on limitations of what you can do with intelligence no matter how much of it you have. And another, smaller, chunk on it possibly turning out that a random self-improving intelligence simply won’t feel like doing anything that bad anyway.
… but even if you were absolutely sure you couldn’t make a guaranteed well-behaved self-improving AGI, and also absolutely sure that a random self-improving AGI meant certain extinction, it still wouldn’t follow that you should turn around and do something else that also won’t work. Not unless the cost were zero.
And the cost of the kind of totalitarian regime you’d have to set up to even try for long-term suppression is far from zero. Not only could it stop people from enjoying what remains, but when that regime failed, it could end up turning X-risk into S-risk by causing whatever finally escaped to have a particularly nasty goal system.
For all the people who continuously claim that it’s impossible to coordinate humankind into not doing obviously stupid things, here are some counter examples: We have the Darwin awards for precisely the reason that almost all people on earth would never do the stupid things that get awarded. A very large majority of humans will not let their children play on the highway, will not eat the first unknown mushrooms they find in the woods, will not use chloroquine against covid, will not climb into the cage in the zoo to pet the tigers, etc.
Those things are obviously bad from an individual point of view. They’re bad in readily understandable ways. The bad consequences are very certain and have been seen many times. Almost all of the bad consequences of doing any one of them accrue personally to whoever does it. If other people do them, it still doesn’t introduce any considerations that might drive you to want to take the risk of doing them too.
Yet lots of people DID (and do) take hydroxychloroquine and ivermectin for COVID, a nontrivial number of people do in fact eat random mushrooms, and the others aren’t unheard-of. The good part is that when somebody dies from doing one of those things, everybody else doesn’t also die. That doesn’t apply to unleashing the killer robots.
… and if making a self-improving AGI were as easy as eating the wrong mushrooms, I think it would have happened already.
The challenge here is not the coordination, but the common acceptance that certain things are stupid.
Pretty much everybody nowadays has a pretty good understanding of the outlines of the climate change problem. The people who don’t are the pretty much the same people who eat horse paste. Yet people, in the aggregate, have not stopped making it worse. Not only has every individual not stopped, but governments have been negotiating about it for like 30 years… agreeing at every stage on probably inadequate targets… which they then go on not to meet.
… and climate change is much, much easier than AGI. Climate change rules could still be effective without perfect compliance at an individual level. And there’s no arms race involved, not even between governments. A climate change defector may get some economic advantage over other players, but doesn’t get an unstoppable superweapon to use against the other players. A climate change defector also doesn’t get to “align” the entire future with the defector’s chosen value system. And all the players know that.
Speaking of arms races, many people think that war is stupid. Almost everybody thinks that nuclear war is stupid, even if they don’t think nuclear deterrence is stupid. Almost everybody thinks that starting a war you will lose is stupid. Yet people still start wars that they will lose, and there is real fear that nuclear war can happen.
This is maybe hard in certain cases, but NOT impossible. Sure, this will maybe not hold for the next 1,000 years, but it will buy us time.
I agree that suppressing full-bore self-improving ultra-general AGI can buy time, if done carefully and correctly. I’m even in favor of it at this point.
But I suspect we have some huge quantitative differences, because I think the best you’ll get out of it is probably less than 10 years, not anywhere near 1000. And again I don’t see what substituting narrow AI has to do with it. If anything, that would make it harder by requiring you to tell the difference.
I also think that putting too much energy into making that kind of system “non-leaky” would be counterproductive. It’s one thing to make it inconvenient to start a large research group, build a 10,000-GPU cluster, and start trying for the most agenty thing you can imagine. It’s both harder and more harmful to set up a totalitarian surveillance state to try to control every individual’s use of gaming-grade hardware.
And there are possible measures to reduce the ability of the most stupid 1% of humanity to build AGI and kill everyone.
A climate change defector also doesn’t get to “align” the entire future with the defector’s chosen value system.
I do understand that the problem with AGI is exactly that you don’t know how to align anything with anything at all, and if you know you can’t, then obviously you shouldn’t try. That would be stupid.
The problem is that there’ll be an arms race to become able to do so… and a huge amount of pressure to deploy any solution you think you have as soon as you possibly can. That kind of pressure leads to motivated cognition and institutional failure, so you become “sure” that something will work when it won’t. It also leads to building up all the prerequisite capabilities for a “pivotal act”, so that you can put it into practice immediately when (you think) you have an alignment solution.
But I didn’t see you as presenting preventing fully general, self-improving AGI as a delaying tactic. I saw you as presenting it as a solution.
Actually, my point in this post is that we don’t NEED AGI for a great future, because often people equate Not AGI = Not amazing future (or even a terrible one) and I think this is wrong. The point of this post is not to argue that preventing AGI is easy.
However, it’s actually very simple: If we build a misaligned AGI, we’re dead. So there are only two options: A) solve alignment, B) not build AGI. If not A), then there’s only B), however “impossible” that may be.
Yet lots of people DID (and do) take hydroxychloroquine and ivermectin for COVID, a nontrivial number of people do in fact eat random mushrooms, and the others aren’t unheard-of.
Yes. My hope is not that 100% of mankind will be smart enough to not build an AGI, but that maybe 90+% will be good enough, because we can prevent the rest from getting there, at least for a while. Currently, you need a lot of compute to train a Sub-AGI LLM. Maybe we can put the lid on who’s getting how much compute, at least for a time. And maybe the top guys at the big labs are among the 90% non-insane people. Doesn’t look very hopeful, I admit.
Anyway, I haven’t seen you offer an alternative. Once again, I’m not saying not developing AGI is an easy task. But saying it’s impossible (while not having solved alignment) is saying “we’ll all die anyway”. If that’s the case, then we can as well try the “impossible” things and at least die with dignity.
Actually, my point in this post is that we don’t NEED AGI for a great future, because often people equate Not AGI = Not amazing future (or even a terrible one) and I think this is wrong.
I don’t have so much of a problem with that part.
It would prevent my personal favorite application for fully generally strongly superhuman AGI… which is to have it take over the world and keep humans from screwing things up more. I’m not sure I’d want humans to have access to some of the stuff non-AGI could do… but I don’t think here’s any way to prevent that.
If we build a misaligned AGI, we’re dead. So there are only two options: A) solve alignment, B) not build AGI. If not A), then there’s only B), however “impossible” that may be.
C) Give up.
Anyway, I haven’t seen you offer an alternative.
You’re not going to like it...
Personally, if made king of the world, I would try to discourage at least large scale efforts to develop either generalized agents or “narrow AI”, especially out of opaque technology like ML. Thats because narrow AI could easily become parts or tools for a generalized agent, because many kinds of narrow AI are too dangerous in human hands, and because the tools and expertise for narrow AI are too close to those for generalized AGI,. It would be extremely difficult to suppress one in practice without suppressing the other.
I’d probably start by making it as unprofitable as I could by banning likely applications. That’s relatively easy to enforce because many applications are visible. A lot of the current narrow AI applications need bannin’ anyhow. Then I’d start working on a list of straight-up prohibitions.
Then I’d dump a bunch of resources into research on assuring behavior in general and on more transparent architectures. I would not actually expect it to work, but it has enough of a chance to be worth a try,. That work would be a lot more public than most people on Less Wrong would be comfortable with, because I’m afraid of nasty knock-on effects from trying to make it secret. And I’d be a little looser about capability work in service of that goal than in service of any other.
I would think very hard about banning large aggregations of vector compute hardware, and putting various controls on smaller ones, and would almost certainly end up doing it for some size thresholds. I’m not sure what the thresholds would be, nor exactly what the controls would be. This part would be very hard to enforce regardless.
I would not do anything that relied on perfect enforcement for its effecitveness, and I would not try to ramp up enforcement to the point where it was absolutely impossible to break my rules, because I would fail and make people miserable. I would titrate enforcement and stick with measures that seemed to be working without causing horrible pain.
I’d hope to get a few years out of that, and maybe a breakthrough on safety if I were tremendously lucky. Given oerfect confidence in a real breakthrough, I would try to abdicate in favor of the AGI.
If made king of only part of the world, I would try to convince the other parts to collaborate with me in imposing roughly the same regime. How I reacted if they didn’t do that would depend on how much leverage I had and what they did seem to be doing. I would try really, really hard not to start any wars over it. Regardless of what they said they were doing I would assume that they were engaging in AGI research under the table. Not quite sure what I’d do with that assumption, though.
But I am not king of the world, and I do not think it’s feasible for me to become king of the world.
I also doubt that the actual worldwide political system, or even the political systems of most large countries, can actually be made to take any very effective measures within any useful amount of time. There are too many people out there with too many different opinions, too many power centers with contrary interests, too much mutal distrust, and too many other people with too much skill at deflecting any kind of policy initiative down ways that sort of look like they serve the original purpose, but mostly don’t. The devil is often in the details.
If it is possible to get the system to do that, I know that I am not capable of doing so. I mean, I’ll vote for it, maybe make write some letters, but I know from experience that I have nearly no ability to persuade the sorts of people who’d need to be persuaded.
I am also not capable of solving the technical problem myself and doing some “pivotal act”. In fact I’m pretty sure I have no technical ideas for things to try that aren’t obvious to most specialists. And I don’t much buy any of the the ideas I’ve heard from other people.
My only real hopes are things that neither I nor anybody else can influence, especially not in any predictable direction, like limitations on intelligence and uncertainty about doom.
So my personal solution is to read random stuff, study random things, putter around in my workshop, spend time with my kid, and generally have a good time.
We’re not as far apart as you probably think. I’d agree with most of your decisions. I’d even vote for you to become king! :) Like I wrote, I think we must also be cautious with narrow AI as well, and I agree with your points about opaqueness and the potential of narrow AI turning into AGI. Again, the purpose of my post was not to argue how we could make AI safe, but to point out that we could have a great future without AGI. And I still see a lot of beneficial potential in narrow AI, IF we’re cautious enough.
Independent of potential for growing into AGI and {S,X}-risk resulting from that?
With the understanding that these are very rough descriptions that need much more clarity and nuance, that one or two of them might be flat out wrong, that some of them might turn out to be impossible to codify usefully in practice, that there there might be specific exceptions for some of them, and that the list isn’t necessarily complete--
Recommendation systems that optimize for “engagement” (or proxy measures thereof).
Anything that identifies or tracks people, or proxies like vehicles, in spaces open to the public. Also collection of data that would be useful for this.
Anything that mass-classifies private communications, including closed group communications, for any use by anybody not involved in the communication.
Anything specifically designed to produce media showing real people in false situations or to show them saying or doing things they have not actually done.
Anything that adaptively tries to persuade anybody to buy anything or give anybody money, or to hold or not hold any opinion of any person or organization.
Anything that tries to make people anthropomorphize it or develop affection for it.
Anything that tries to classify humans into risk groups based on, well, anything.
Anything that purports to read minds or act as a lie detector, live or on recorded or written material.
Good list. Another one that caught my attention that I saw in the EU act was AIs specialised into subliminal messages. people’s choices can be somewhat conditioned in favor or against things in certain ways by feeding them sensory data even if it’s not consciously perceptible, it can also affect their emotional states more broadly.
I don’t know how effective this stuff is in real life, but I know that it at least works.
Anything that tries to classify humans into risk groups based on, well, anything.
A particular example of that one is systems of social scoring, which are surely gonna be used by authoritarian regimes. You can screw people up in so many ways when social control is centralised with AI systems. It’s great to punish people for not being chauvinists
I feel like this is a good example of a post that—IMO—painfully misses the primary objection of many people it is trying to persuade (e.g., me): how can we stop 100.0% of people from building AGI this century (let alone in future centuries)? How can we possibly ensure that there isn’t a single person over the next 200 years who decides “screw it, they can’t tell me what to do,” and builds misaligned AGI? How can we stop the race between nation-states that may lack verification mechanisms? How can we identify and enforce red lines while companies are actively looking for loopholes and other ways to push the boundaries?
The point of this comment is less to say “this definitely can’t be done” (although I do think such a future is fairly implausible/unsustainable), and more to say “why did you not address this objection?” You probably ought to have a dedicated section that very clearly addresses this objection in detail. Without such a clearly sign-posted section, I felt like I mostly wasted my time skimming your article, to be entirely honest
I think the post addresses a key objection that many people opposed to EA and longtermist concerns have voiced with the EA view of AI, and thought it was fairly well written to make the points it made, without also making the mostly unrelated point that you wanted it to have addressed.
That’s a valid point, thank you for making it. I have given some explanation of my point of view in my reply to jbash, but I agree that this should have been in the post in the first place.
I’ll say it. It definitely can’t be done.
You cannot permanently stop self-improving AGI from being created or run. Not without literally destroying all humans.
You can’t stop it for a “civilizationally significant” amount of time. Not without destroying civilization.
You can slow it down by maybe a few years (probably not decades), and everybody should be trying to do that. However, it’s a nontrivial effort, has important costs of many kinds, quickly reaches a point of sharply diminishing returns, is easy to screw up in actually counterproductive ways, and involves meaningful risk of making the way it happens worse.
If you want to quibble, OK, none of those things are absolute certainties, because there are no absolute certainties. They are, however, the most certain things in the whole AI risk field of thought.
What really concerns me is that the same idea has been coming up continuously since (at least) the 1990s, and people still talk about it as if it were possible. It’s dangerous; it distracts people into fantasies, and keeps them from thinking clearly about what can actually be done.
I’ve noticed a pattern of such proposals talking about what “we” should do, as if there were a “we” with a unified will that ever could or would act in a unified way. There is no such “we”. Although it’s not necessarily wrong to ever use the word “we”, using it carelessly is a good way to lead yourself into such errors.
There are definitely timelines where privacy and privacy imbalance were eliminated completely and AGI development prevented, possibly indefinitely, or possibly until alignment was guaranteed. (obviously/insignificantly, bare with me)
Imagine that everyone could ‘switch’ their senses to any other with minimal effort, including sensors like cameras and microphones on devices, drones, persons, or with not too distant technology, any of human senses, and imagine there was some storage system allowing access to those senses back into time, practically there may be some rolling storage window with longer windows for certain sets of the data.
I think a society like this could hold off AGI, that getting to a society like this is possibly easier than alignment first, and that such a society, while still with some difficulties, would be an improvement for most people over an imaginary regular AGI-less society and current society
It is a hard idea to come around to, there are a lot of rather built in notions that scream against it. A few otherwise rational people have told me that they’d rather misaligned AGI, that ‘privacy is a human right’ (with no further justification for it) etc.
It seems all issues associated with a lack of privacy depend on some privacy imbalance, e.g. your ssh key is leaked (in such a society, they may still be used to protect control), you don’t know who has it, and when they abuse it, but if somehow everyone did, and everyone knew everyone did, then it is much less bad. Someone might still abuse what they know against their own best interests, but they could quickly be recognized as someone who does that, and prevented from doing it more via therapy/imprisonment.
Depending on the technology available, there could still be windows for doing small, quick wrongs, e.g. someone could use your ssh key to delete a small amount of some of your data because they don’t like your face, simply hoping no one is watching, or will rewatch in the future. I don’t know how much of an issue these would be, but I figure that a combination of systems being modified in the context of complete global information freedom to do things like trigger/mark a recording of people accessing the system, and an incentive for efforts to scout wrongdoings and build systems to help scout for them, there is a good potential that the total wrongdoing cost will go down versus the alternative.
I don’t think conformity / groupthink / tribalism etc. would get the best of us. They may still have some of their impacts, but it seems that transparency almost always reduces them—strawmen less relevant when actual arguments are replayable, fabricated claims about others easily debunkable, and possibly punished under new laws. Yes, you wouldn’t be able to hide from a hateful group that doesn’t like an arbitrary thing about you, but they wouldn’t be able to circulate falsehoods about you to fuel that hate, they would be an easy cash cow for any wrongdoing scouts (and wrongdoing scouts encouraging these cashcows, a larger cashcow for other wrongdoing scouts), and with a combination of social pressure to literally put ourselves in another’s shoes, I think it is reasonable to at least consider these problems won’t be worse
Society got along okay with little to no privacy in the past, but if we later decide that we want it back for idk bedroom activities, I think it is possible that abstraction could be used to rebuild some forms of privacy in a way that still prevents us from building a misaligned AGI. For example, it could be only people on the other side of the world who you don’t know who can review what happens in your bedroom, and if you abuse your partner there or start building AGI, they could raise it to a larger audience. Maybe some narrow AI could be used to help make things more efficient and allow you private time.
Under this society, I think we could distinguish alignment and capability work, and maybe we’d need to lean heavily towards the safe direction and delay aligned AGI a bunch, but conceivably it would allow us to have an aligned AGI at the end, and we can bring back full privacy.
In timelines with this technology and society structure, I wonder what they looked like to get there. I imagine they could have started with just computer systems and drones. Maybe they had the technology ready and rolled it out only when people were on board, which may have been spurred by widespread fear of misaligned AGI. Maybe they were quite lucky, and there were other dominant cultural forces which all had ulterior incentives for such a system to be in place. I do think it would be quite hard to get everyone on board, but it seems like a reasonable route to aligned AGI worth talking more about to me.
This is an interesting thought. I think even without AGI, we’ll have total transparency of human minds soon—already AI can read thoughts in a limited way. Still, as you write, there’s an instinctive aversion against this scenario, which sounds very much like an Orwellian dystopia. But if some people have machines that can read minds, which I don’t think we can prevent, it may indeed be better if everyone could do it—deception by autocrats and bad actors would be much harder that way. On the other hand, it is hard to imagine that the people in power would agree to that: I’m pretty sure that Xi or Putin would love to read the minds of their people, but won’t allow them to read theirs. Also it would probably be possible to fake thoughts and memories, so the people in power could still deceive others. I think it’s likely that we wouldn’t overcome this imbalance anytime soon. This only shows that the future with “narrow” AI won’t be easy to navigate either.
I’m not sure what this is supposed to mean. Are you saying that I’d have to kill everyone so noone can build AGI? Maybe, but I don’t think so. Or are you saying that not building an AGI will destroy all humans? This I strongly disagree with. I don’t know what a “civiliatzionally significant” amount of time is. For me, the next 10 years are a “significant” amount of time.
This is a very strong claim. Calling ideas “dangerous” is in itself dangerous IMO, especially if you’re not providing any concrete evidence. If you think talking about building narrow AI instead of AGI is “dangerous” or a “fantasy”, you have to provide evidence that a) this is distracting relevant people from doing things that are more productive (such as solving alignment?) AND b) that solving alignment before we can build AGI is not only possible, but highly likely. The “fantasy” here to me seems to be that b) could be true. I can see no evidence for that at all.
For all the people who continuously claim that it’s impossible to coordinate humankind into not doing obviously stupid things, here are some counter examples: We have the Darwin awards for precisely the reason that almost all people on earth would never do the stupid things that get awarded. A very large majority of humans will not let their children play on the highway, will not eat the first unknown mushrooms they find in the woods, will not use chloroquine against covid, will not climb into the cage in the zoo to pet the tigers, etc. The challenge here is not the coordination, but the common acceptance that certain things are stupid. This is maybe hard in certain cases, but NOT impossible. Sure, this will maybe not hold for the next 1,000 years, but it will buy us time. And there are possible measures to reduce the ability of the most stupid 1% of humanity to build AGI and kill everyone.
That said, I agree that my proposal is very difficult to put into practice. The problem is, I don’t have a better idea. Do you?
Yup. Anything short of that is just a delaying tactic.
From the last part of your comment, you seem to agree with that, actually. 1000 years is still just a delay.
But I didn’t see you as presenting preventing fully general, self-improving AGI as a delaying tactic. I saw you as presenting it as a solution.
Also, isn’t suppressing fully general AGI actually a separate question from building narrow AI? You could try suppress fully general AGI and narrow AI. Or you could build narrow AI while still also trying to do fully general AGI. You can do either with or without the other.
I don’t know if it’s distracting any individuals from finding any way to guarantee good AGI behavior[1]. But it definitely tends to distract social attention from that. Finding one “solution” for a problem tends to make it hard to continue any negotiated process, including government policy development, for doing another “solution”. The attitude is “We’ve solved that (or solved it for now), so on to the next crisis”. And the suppression regime could itself make it harder to work on guaranteeing behavior.
True, I don’t don’t know if the good behavior problem can be solved, and am very unsure that it can be solved in time, regardless.
But at the very least, even if we’re totally doomed, the idea of total, permanent suppression distracts people from getting whatever value they can out of whatever time they have left, and may lead them to actions that make it harder for others to get that value.
Oh, no, I don’t think that at all. Given the trends we seem to be on, things aren’t looking remotely good.
I do think there’s some hope for solving the good behavior problem, but honestly I pin more of my hope for the future on limitations of the amount of intelligence that’s physically possbile, and even more on limitations of what you can do with intelligence no matter how much of it you have. And another, smaller, chunk on it possibly turning out that a random self-improving intelligence simply won’t feel like doing anything that bad anyway.
… but even if you were absolutely sure you couldn’t make a guaranteed well-behaved self-improving AGI, and also absolutely sure that a random self-improving AGI meant certain extinction, it still wouldn’t follow that you should turn around and do something else that also won’t work. Not unless the cost were zero.
And the cost of the kind of totalitarian regime you’d have to set up to even try for long-term suppression is far from zero. Not only could it stop people from enjoying what remains, but when that regime failed, it could end up turning X-risk into S-risk by causing whatever finally escaped to have a particularly nasty goal system.
Those things are obviously bad from an individual point of view. They’re bad in readily understandable ways. The bad consequences are very certain and have been seen many times. Almost all of the bad consequences of doing any one of them accrue personally to whoever does it. If other people do them, it still doesn’t introduce any considerations that might drive you to want to take the risk of doing them too.
Yet lots of people DID (and do) take hydroxychloroquine and ivermectin for COVID, a nontrivial number of people do in fact eat random mushrooms, and the others aren’t unheard-of. The good part is that when somebody dies from doing one of those things, everybody else doesn’t also die. That doesn’t apply to unleashing the killer robots.
… and if making a self-improving AGI were as easy as eating the wrong mushrooms, I think it would have happened already.
Pretty much everybody nowadays has a pretty good understanding of the outlines of the climate change problem. The people who don’t are the pretty much the same people who eat horse paste. Yet people, in the aggregate, have not stopped making it worse. Not only has every individual not stopped, but governments have been negotiating about it for like 30 years… agreeing at every stage on probably inadequate targets… which they then go on not to meet.
… and climate change is much, much easier than AGI. Climate change rules could still be effective without perfect compliance at an individual level. And there’s no arms race involved, not even between governments. A climate change defector may get some economic advantage over other players, but doesn’t get an unstoppable superweapon to use against the other players. A climate change defector also doesn’t get to “align” the entire future with the defector’s chosen value system. And all the players know that.
Speaking of arms races, many people think that war is stupid. Almost everybody thinks that nuclear war is stupid, even if they don’t think nuclear deterrence is stupid. Almost everybody thinks that starting a war you will lose is stupid. Yet people still start wars that they will lose, and there is real fear that nuclear war can happen.
I agree that suppressing full-bore self-improving ultra-general AGI can buy time, if done carefully and correctly. I’m even in favor of it at this point.
But I suspect we have some huge quantitative differences, because I think the best you’ll get out of it is probably less than 10 years, not anywhere near 1000. And again I don’t see what substituting narrow AI has to do with it. If anything, that would make it harder by requiring you to tell the difference.
I also think that putting too much energy into making that kind of system “non-leaky” would be counterproductive. It’s one thing to make it inconvenient to start a large research group, build a 10,000-GPU cluster, and start trying for the most agenty thing you can imagine. It’s both harder and more harmful to set up a totalitarian surveillance state to try to control every individual’s use of gaming-grade hardware.
What in detail would you like to do?
I don’t like the word “alignment” for reasons that are largely irrelevant here.
Replying to myself to clarify this:
I do understand that the problem with AGI is exactly that you don’t know how to align anything with anything at all, and if you know you can’t, then obviously you shouldn’t try. That would be stupid.
The problem is that there’ll be an arms race to become able to do so… and a huge amount of pressure to deploy any solution you think you have as soon as you possibly can. That kind of pressure leads to motivated cognition and institutional failure, so you become “sure” that something will work when it won’t. It also leads to building up all the prerequisite capabilities for a “pivotal act”, so that you can put it into practice immediately when (you think) you have an alignment solution.
… which basically sets up a bunch of time bombs.
I agree with that.
Fine. I’ll take it.
Actually, my point in this post is that we don’t NEED AGI for a great future, because often people equate Not AGI = Not amazing future (or even a terrible one) and I think this is wrong. The point of this post is not to argue that preventing AGI is easy.
However, it’s actually very simple: If we build a misaligned AGI, we’re dead. So there are only two options: A) solve alignment, B) not build AGI. If not A), then there’s only B), however “impossible” that may be.
Yes. My hope is not that 100% of mankind will be smart enough to not build an AGI, but that maybe 90+% will be good enough, because we can prevent the rest from getting there, at least for a while. Currently, you need a lot of compute to train a Sub-AGI LLM. Maybe we can put the lid on who’s getting how much compute, at least for a time. And maybe the top guys at the big labs are among the 90% non-insane people. Doesn’t look very hopeful, I admit.
Anyway, I haven’t seen you offer an alternative. Once again, I’m not saying not developing AGI is an easy task. But saying it’s impossible (while not having solved alignment) is saying “we’ll all die anyway”. If that’s the case, then we can as well try the “impossible” things and at least die with dignity.
I don’t have so much of a problem with that part.
It would prevent my personal favorite application for fully generally strongly superhuman AGI… which is to have it take over the world and keep humans from screwing things up more. I’m not sure I’d want humans to have access to some of the stuff non-AGI could do… but I don’t think here’s any way to prevent that.
C) Give up.
You’re not going to like it...
Personally, if made king of the world, I would try to discourage at least large scale efforts to develop either generalized agents or “narrow AI”, especially out of opaque technology like ML. Thats because narrow AI could easily become parts or tools for a generalized agent, because many kinds of narrow AI are too dangerous in human hands, and because the tools and expertise for narrow AI are too close to those for generalized AGI,. It would be extremely difficult to suppress one in practice without suppressing the other.
I’d probably start by making it as unprofitable as I could by banning likely applications. That’s relatively easy to enforce because many applications are visible. A lot of the current narrow AI applications need bannin’ anyhow. Then I’d start working on a list of straight-up prohibitions.
Then I’d dump a bunch of resources into research on assuring behavior in general and on more transparent architectures. I would not actually expect it to work, but it has enough of a chance to be worth a try,. That work would be a lot more public than most people on Less Wrong would be comfortable with, because I’m afraid of nasty knock-on effects from trying to make it secret. And I’d be a little looser about capability work in service of that goal than in service of any other.
I would think very hard about banning large aggregations of vector compute hardware, and putting various controls on smaller ones, and would almost certainly end up doing it for some size thresholds. I’m not sure what the thresholds would be, nor exactly what the controls would be. This part would be very hard to enforce regardless.
I would not do anything that relied on perfect enforcement for its effecitveness, and I would not try to ramp up enforcement to the point where it was absolutely impossible to break my rules, because I would fail and make people miserable. I would titrate enforcement and stick with measures that seemed to be working without causing horrible pain.
I’d hope to get a few years out of that, and maybe a breakthrough on safety if I were tremendously lucky. Given oerfect confidence in a real breakthrough, I would try to abdicate in favor of the AGI.
If made king of only part of the world, I would try to convince the other parts to collaborate with me in imposing roughly the same regime. How I reacted if they didn’t do that would depend on how much leverage I had and what they did seem to be doing. I would try really, really hard not to start any wars over it. Regardless of what they said they were doing I would assume that they were engaging in AGI research under the table. Not quite sure what I’d do with that assumption, though.
But I am not king of the world, and I do not think it’s feasible for me to become king of the world.
I also doubt that the actual worldwide political system, or even the political systems of most large countries, can actually be made to take any very effective measures within any useful amount of time. There are too many people out there with too many different opinions, too many power centers with contrary interests, too much mutal distrust, and too many other people with too much skill at deflecting any kind of policy initiative down ways that sort of look like they serve the original purpose, but mostly don’t. The devil is often in the details.
If it is possible to get the system to do that, I know that I am not capable of doing so. I mean, I’ll vote for it, maybe make write some letters, but I know from experience that I have nearly no ability to persuade the sorts of people who’d need to be persuaded.
I am also not capable of solving the technical problem myself and doing some “pivotal act”. In fact I’m pretty sure I have no technical ideas for things to try that aren’t obvious to most specialists. And I don’t much buy any of the the ideas I’ve heard from other people.
My only real hopes are things that neither I nor anybody else can influence, especially not in any predictable direction, like limitations on intelligence and uncertainty about doom.
So my personal solution is to read random stuff, study random things, putter around in my workshop, spend time with my kid, and generally have a good time.
We’re not as far apart as you probably think. I’d agree with most of your decisions. I’d even vote for you to become king! :) Like I wrote, I think we must also be cautious with narrow AI as well, and I agree with your points about opaqueness and the potential of narrow AI turning into AGI. Again, the purpose of my post was not to argue how we could make AI safe, but to point out that we could have a great future without AGI. And I still see a lot of beneficial potential in narrow AI, IF we’re cautious enough.
Which? I wonder.
Independent of potential for growing into AGI and {S,X}-risk resulting from that?
With the understanding that these are very rough descriptions that need much more clarity and nuance, that one or two of them might be flat out wrong, that some of them might turn out to be impossible to codify usefully in practice, that there there might be specific exceptions for some of them, and that the list isn’t necessarily complete--
Recommendation systems that optimize for “engagement” (or proxy measures thereof).
Anything that identifies or tracks people, or proxies like vehicles, in spaces open to the public. Also collection of data that would be useful for this.
Anything that mass-classifies private communications, including closed group communications, for any use by anybody not involved in the communication.
Anything specifically designed to produce media showing real people in false situations or to show them saying or doing things they have not actually done.
Anything that adaptively tries to persuade anybody to buy anything or give anybody money, or to hold or not hold any opinion of any person or organization.
Anything that tries to make people anthropomorphize it or develop affection for it.
Anything that tries to classify humans into risk groups based on, well, anything.
Anything that purports to read minds or act as a lie detector, live or on recorded or written material.
Good list. Another one that caught my attention that I saw in the EU act was AIs specialised into subliminal messages. people’s choices can be somewhat conditioned in favor or against things in certain ways by feeding them sensory data even if it’s not consciously perceptible, it can also affect their emotional states more broadly.
I don’t know how effective this stuff is in real life, but I know that it at least works.
A particular example of that one is systems of social scoring, which are surely gonna be used by authoritarian regimes. You can screw people up in so many ways when social control is centralised with AI systems. It’s great to punish people for not being chauvinists
This is already beginning in China.