- building AGI probably comes with a non-trivial existential risk. This, in itself, is enough for most to consider it an act of aggression;
1. I don’t see how aligned AGI comes with existential risk to humanity. It might come as existential risk to groups opposing the value system of the group training the AGI, this is true. For example Al-Kaida will view it as existential risk to itself, but there is no probable existential risk for the groups that are more aligned with the training.
2. There are several more steps from aligned AGI to existential risk to any group of people. You don’t only need an AGI, but you need to weaponize it, and promote physical presence that will monitor the execution of the value system of this AGI. Deploying an army of robots that will enforce a value system of an AGI, is very different from just inventing an AGI. Just like bombing civilians from planes, is very different from inventing flight or bombs. We can argue where the aggression act takes place, but most of us will place it in the hands of people that have the resources to build an army of robots for this purpose, and they invest their resources with the intention of enforcing their value system. Just like Marie Curie can’t be blamed for an atomic weapon, and her discovery is not an act of aggression, the Wright brothers can’t be blamed for all the bombs dropped on civilians from planes.
3. I would expect most deployed robots based on AGI, to be of protective nature not aggressive. That means that nations will use those robots to *defend* themselves and their allies from invaders and not attack. So any measure of aggression in the invading sense, of forcing and invading and breaking the existing social boundaries we created, will contradict the majority of humanity values, and therefore will mean this AGI is not aligned. Yes some aggressive nations might create invading AGIs, but they will probably be a minority, and the invention and deployment of an AGI can’t be considered by itself an act of aggression. If aggressive people teach an AGI to be aggressive, and not aligned with the majority of humanity which is protective but not aggressive, then this is on their hands, not the AGI inventor.
- even if the powerful AGI is aligned, there are many scenarios in which its mere existence transforms the world in ways that most people don’t desire or agree with; whatever value system it encodes gets an immense boost and essentially Wins Culture; very basic evidence from history suggests that people don’t like that;
1. I would argue that initially there would be a lot of different alternatives, all meant to this or that extent to serve the best interest of a collective. Some of the benefits are universal—say people dying of starvation, homelessness, traffic accidents, environmental issues like pollution and waste, diseases, lack of education resources or access to healthcare advice. Avoiding the deployment of an AGI, means you don’t care about people which has those problems, I would say most people would like to solve those social issues, and if you don’t, you can’t force people to continue dying from starvation and diseases just because you don’t like an AGI. You need to bring something more substantial, otherwise just don’t use this technology.
2. The idea that an AGI is enforced somehow on people to “Win Culture”, is not based on anything substantial. Just like any technology, and this is the secret of its success, is a choice. You can go to live in a forest and avoid any technology, and find a like minded Amish inspired community of people. Most people do enjoy technological advancements and the benefits that come with them. Using force based on an AGI is a moral choice, a choice which is made by a community of people training the AGI, and this kind of aggression will most probably be both not popular and forbidden by law. Providing a chatbot with some value system to the contrary is part of freedom of speech.
3. If by “Win Culture” you mean automating jobs that are done today by hand—I wouldn’t call it enforcing a value system. Currently jobs are necessary evil, and are enforced on people to otherwise not be able to get their basic needs met. Solving problems, and stopping forcing people to do jobs most of them don’t like, is not an act of aggression. This is an act of kindness that stops the current perpetual aggression we are used to. If someone is using violence, and you come and stop him from using violence, you are not committing an act of aggression, you are preventing aggression. Preventing the act of aggression might be not desired by the aggressor, but we somehow learned to deal with people who think they can be violent and try to use force to get what they want. This is a very delicate balance, and as long as AGI services are provided by choice, with several alternatives, I don’t see how this is an act of aggression.
4. If someone “Win Culture” then good for him. I would not say that today’s culture is so good, I would bet on superhuman culture to be better than what we have today. Some people might not like it, some people might not love cars and planes, and continue to use horses, but you can’t force everyone around you to continue to use horses because sometimes car accidents happens, and you could become a victim of a car accident, this is not a claim that should stop any technology from being developed or integrated into society.
- as a result of this, lots of people (and institutions, and countries, possibly of the sort with nukes) might turn out to be willing to resort to rather extreme measures to prevent an aligned AGI take off, simply because it’s not aligned with their values.
Terrorism and sabotage is a common strategy that can’t be eliminated completely, but I would say most of the time it doesn’t manage to reach its goals. Why would people try to bomb anything, instead of for example paying money to someone for training an AGI that will be aligned with their values? How is it even concerning an AGI, and not any human community with a different value system? Why do you wait for an AGI for these acts of aggression? If some community doesn’t deserve to live in your opinion, you will not wait for an AGI, and if it does—so you learned to coexist with people different than yourself. They will not take over the world, just because they have an AGI. There would be plenty of alternative AGIs, of different strength and trained with different values. It takes time for an AGI to take over the world, a time way longer to reinvent the same technology several times over, and use alternative AGIs that can compete. And as most of us are protectors and not aggressors, and we have established some boundaries balancing our forces, I would expect this basic balance to continue.
- “When you open your Pandora’s Box, you’ve just decided to change the world for everyone, for good or for bad, billions of people who had absolutely no say in what now will happen around them.”
Billions of people have no say today in many social issues. People are dying, people are forced to do labor, people are homeless. Reducing those hazards, almost to zero, is not something we should stop to attempt in the name of “liberty”. Much more people suffered a thousand years ago than now. Much of it is due to the development of technology. There is no “only good” technology, but most of us accept the benefits that come with technology over without it. You also can’t force other people to stop using technology in order to become more healthy, and risk their life less, or stating that jobs are good even though they are forced on everyone and the basic necessities are conditioned on them.
I can imagine larger pockets of populations preferring to avoid the use of modern technology like larger Amish inspired communities. This is possible—and then we should respect those people’s choices, and avoid forcing upon them our values, and let them live as they want. Yet you can’t force people who do want the progress and all the benefits that come with it, to just stop the progress and respect the rights of people who fear it.
Notice that we are not talking here about development of a weapon, but a development of a technology that promises to solve a lot of our current problems. This at the least, should put you in place of agnostic. That means this is not a trivial decision to take some risks for humanity, to save hundreds of millions of lives, and reduce suffering to an extreme extent never seen before in history. I agree we should be cautious, and we should be mindful of the consequences, but we also should not be paralyzed by fear, we have a lot to lose if we stop and avoid AGI development.
- aligned AGI would be a smart agent imbued with the full set of values of its creator. It would change the world with absolutely fidelity to that vision.
A more realistic estimation that many aligned AGIs will change the world to the common denominator of humanity, like reducing diseases, and will continue to keep the power balance between different communities, as everyone would be able to build an AGI with a power proportional to their available resources, just like today there is a power balance between different communities and between the community and the individual.
Let me take an extreme example. Let’s say I build an AGI for my fantasies. But as part of global regulation, I will promise to keep this AGI inside the boundaries of my property. I will not force my vision on the world, I will not want or force everyone to live in my fantasy land. I just want to be able to do it myself, inside my borders, without harming anyone who wants to live differently. Why would you want to stop me? As I see it once again, most people are protectors not aggressors, they want to have their values in their own space, they will not want to forcefully and unilaterally spread their ideas without consent. My home-made AGI will probably be much weaker than any state AGI, so I wouldn’t be able to do much harm anyway. Today countries are enforcing their laws on everyone, even if you disagree with some of them, how do you see the future any different? If anything I expect the private spaces to be much more versatile than today, providing more choices and with less aggression than governments do today.
- the creator is an authoritarian state that wants to simply rule everything with an iron fist;
I agree this is a concern.
- the creator is a private corporation that comes up with some set of poorly thought out rules by committee that are mostly centred around its profit;
Not probable. It will more probably be focused on a good level of safety first and then on profit. Corporations are concerned about their image, not to mention the people who develop it, will simply not want to bring an extinction of human race.
- the creator is a genuinely well-intentioned person who only wishes for everyone to have as much freedom as allowed, but regardless of that has blind spots that they fail to identify and that slip their way into the rules;
This doesn’t sound like something that is impossible to solve with newer improved versions once the blind spot is discovered. In case of aligned AGI the blind spot will not be the end of humanity, but more likely some bias in the data, misrepresenting some ideas or groups. As long as there is an extremely low probability for extinction, and this property is almost identical with the definition of alignment, the margin of error increases significantly. There was no technology in history we got right from the first attempt. So I expect a lot of variability in AGI, I expect some of them to be weaker or stronger, some of them fit this or that value system of different communities. And I would expect local accidents too, with limited damage, just like terrorists and mass shooters can do today.
-many powerful actors lack the insight and/or moral fibre to actually succeed at creating a good one, and because the bad ones might be easier to create.
We actually don’t need to guess anymore. We have had this technology for a while, the reason it caught on now, and was released only relatively lately—is because without providing ethical standards to those models, the backlash on large corporations is too strong. So even if I might agree that the worst ones are easier to create, and some powerful actors could do some damage, they will be forced by a larger community (of investors, users, media and governments), to invest the effort to make the harder and safer option. I think this claim is true to many technologies today, it’s cheaper and easier to make unsafe cars, trains, planes, but we managed to install a regulation procedures, both by government and by independent testers, to make sure our vehicles are relatively safe.
You can see that RLHF which is the main key to safety today, is incorporated by larger players, and alignment datasets and networks are provided for free and opened to the public exactly for the reason that we all want this technology to mostly benefit humanity. It’s possible to add more nation centric set of values that will be more aggressive, or some leader will want to make his countrymen slaves, but this is not the point here. The main idea is that we are already creating mechanism to encourage everyone to easily create pretty good ones as part of our cultural norms and cultural mechanisms that prevent bad AIs from being exposed to the public and come to market to make profit, for further development of even stronger AIs that eventually become an AGI. So although the initial development of AI safety might be harder, it is crucial, it’s clear to most of the actors is crucial, and the tools that provide safety will be available and simple to use, thus in the long run creating an AGI which is not aligned, will be harder—because of the social environment of norms and best practices those models were developed with.
- There are people who will oppose making work obsolete.
Work is forced on us, it’s not a choice. Opposing making it obsolete is an obvious act of aggression. As long as it’s necessary evil, it has a right to exist, but at the moment you demand other people to work, because you’re afraid of technology—you become the cause of a lot of suffering, that could be potentially avoided.
- There are people who will oppose making death obsolete.
Death is forced on us, it’s not a choice. Opposing making it obsolete is also an act of aggression, against people who are choosing not to die if they don’t want to.
- If you are about to simply override all those values with an act of force, by using a powerful AGI to reshape the world in your image, they’ll feel that is an act of aggression—and they will be right.
I don’t think anyone forces them to join. As a liberal I don’t believe you have the right to come to me and say “you must die, or i will kill you”. This is at the least can’t be viewed as legitimate behavior that we should encourage or legitimize. If you want to work, you want to die, you want to live in 2017, you have the full right to do so. But wanting to exterminate everyone who is not like you, forcing people to suffer, die, work etc. is an obvious act of aggression toward other people, and should not be legitimized or portrayed as an act of aggression against them. “You don’t let me force my values on you” doesn’t come out as a legitimate act of self defense. Very reminiscent of Al Bandy, where he claimed in a court a face of his fellow, was in the way of his fist, harming his hand, and demanding compensation. If you want to be stuck in time, and live your life—be my guest, but legitimizing usage of force in order to avoid progress that saves millions, and improves our life significantly can’t be justified inside liberal set of values.
- If enough people feel threatened enough...AGI training data centres might get bombed anyway.
This is true. And if enough people think it’s ok to be extreme Islamist they will be, and even try to build a state like ISIS. The hope is that with enough good reasoning, and with enough rational analysis of the situation, most thinking people will not be threatened, and see the vast potential benefits, enough to not try and bomb the AGI computer centers.
- just like in the Cold War someone might genuinely think “better dead than red”.
I could believe this is possible. But once again most of us are not aggressors, therefore most of us will try to protect our homeland and our way of life, without trying to aggressively propagate it to other places where they have their own social preferences.
- The best value a human worker might have left to offer would be that their body is still cheaper than a robot’s
Do you truly believe that in the world all problems are solved by automation, and full of robots whose whole purpose is to serve humans, people will try to justify their existence by jobs that they can do? And this justification will be that their body has more value than robotic parts?
I would propose an alternative: in a world where all robots serve humans, and everything is automated, humans will be valued intrinsically, provided with all their needs, and provided with basic income just because they are humans. The default where a human worth nothing without his job will be outdated and seen as we see slavery today.
--------
In summary I would say one major problem I see through most of your claims: there would be a very limited amount of AGIs, forcing a minority values system upon everyone, expanding aggressively this value system on everyone else who thinks differently.
I would claim the more probable future is a wide variety of AGIs, each improving slowly in its own past, while all the development teams will both do something unique and learn from the lessons of other teams. For every good technology there comes dozens of copycats, they will all be based on a bit different value system, and with common denominator of trying to benefit humanity, like discovering new drugs, fixing starvation, reducing road accidents, climate change, tedious labor which is basically forced labor. While the common humanity problems will be solved, the moral and ethical variety will continue to coexist with a similar power balance we have today. This pattern of technology influence on society happened throughout all of human history until AGI, and as of today that we know how to align LLMs, this tendency of power balances between nations, and inside each nation is expected to propagate into the world where AGI is available technology to everyone to download and train their own. If AGI will be an advanced LLM we see all those trends today, and they are not expected to suddenly change.
Although it’s hard to predict the possible bad or good sides of Aligned AGIs now, it’s clear that the aligned networks do not pose a threat to humanity as a whole, leaving a large margin of error. Nonetheless, there remains a considerable risk of amplifying current societal problems like inequality, totalitarianism and wars to an alarming extent.
People who are not willing to be part of the progress, exist today as well, as a minority. If they will become a majority, it’s an interesting futuristic scenario, but it’s both implausible, and will be immoral to forcefully stop those who do want to use this life saving technology, as long as they don’t force anything on those who don’t.
I don’t see how aligned AGI comes with existential risk to humanity
I meant as a risk of failure to align, and thus building misaligned AGI. Like, even if you had the best of intention, you’ve still got to include the fact that risk is part of the equation, and people might have different personal estimates on whether that risk is acceptable for the reward.
the Wright brothers can’t be blamed for all the bombs dropped on civilians from planes
Unlike air strategic bombardment in the Wrights’ times, things like pivotal acts, control of the future and capturing all the value in the world are routinely part of the AI discussion already. With AGI you can’t afford to just invent the thing and think about its uses and ethics later, that’s how you get paperclipped, so the whole discussion about the intent with which the invention is to be used is enmeshed from the start with the technical process of invention itself. So, yeah, technologists working on it should take responsibility for its consequences too. You can’t just separate the two things neatly, just like if you worked on Manhattan project you had no right claiming Hiroshima and Nagasaki had nothing to do with you. These projects are political as much as they are technical.
That means that nations will use those robots to defend themselves and their allies from invaders and not attack. So any measure of aggression in the invading sense, of forcing and invading and breaking the existing social boundaries we created, will contradict the majority of humanity values, and therefore will mean this AGI is not aligned.
You are taking this too narrowly, just thinking about literal armies of robots marching down the street to enforce some set of values. To put it clearly:
I think even aligned AI will only be aligned with a subset of human values. Even if a synthesis of our shared values was an achievable goal at all, we’re nowhere near to having the social structure required to produce it;
I think the kind of strong AGI I was talking about in this post, the sort that basically instantly skyrockets you hundreds of years into the future with incredible new tech, makes one party so powerful that at that point it doesn’t matter if it’s not the robots doing the oppressing. Imagine taking a modern state and military and dumping it into the Bronze Age, what do you think would happen to everyone else? My guess is that within two decades they’d all speak that state’s language and live and breathe their culture. What would make AGI like that deeply dangerous to everyone who doesn’t have it is simply the immense advantage it confers to its holder.
Avoiding the deployment of an AGI, means you don’t care about people which has those problems, I would say most people would like to solve those social issues, and if you don’t, you can’t force people to continue dying from starvation and diseases just because you don’t like an AGI
Lots of people are ok with some measure of suffering as a price for ideological values. I’d say to some point, we all are (for example I oppose panopticon like surveillance even if I do have reason to believe it would reduce murder). Anyway I was just stating that opposition would exist, not that I personally would oppose it. To deny that is pretty naive. There’s people who think things are this way because this is how God wants them. Arguably they may even be a majority of all humans.
A more realistic estimation that many aligned AGIs will change the world to the common denominator of humanity, like reducing diseases, and will continue to keep the power balance between different communities, as everyone would be able to build an AGI with a power proportional to their available resources, just like today there is a power balance between different communities and between the community and the individual.
That depends on how fast the AGIs grow. If one can take over quick enough, there won’t be time or room for a second one. Anyway this post for me was mostly focused on scenarios that are kind of like FOOM, but aligned—the sort of stuff Yud would consider a “win”. I wrote another post about the prospects of more limited AGI. Personally I am also pessimistic on the prospects of that, but for completely different reasons. I consider the “giving up AGI means giving up a lot of benefits” a false premise because I just don’t think AGI would ever deliver those benefits for most of humanity as things stand now. If those benefits are possible, we can achieve them much more surely and safely, if a bit more slowly, via non-agentic specialised AI tools managed and used by humans.
In summary I would say one major problem I see through most of your claims: there would be a very limited amount of AGIs, forcing a minority values system upon everyone, expanding aggressively this value system on everyone else who thinks differently.
This isn’t a claim as much as it was a premise. I acknowledge that an AGI-multipolar world would lead to different outcomes, but here I was thinking mostly of relatively fast take-off scenarios.
Today alignment is so popular that to align a new network is probably easier than training it. It has become so much the norm and part of the training of LLMs, it’s like saying some car company has the risk to forget adding wheels to its cars.
This doesn’t imply that all alignments are the same or no one could potentially do it wrong, but generally speaking having a misaligned AGI, is very similar to the fear of having a car on the road with square wheels. Today’s models aren’t AGI and all the new ones are trained with RLHF.
The fear of misalignment is probable in a world where no one thinks about this problem at all. No one develops tools for this purpose, no one opens datasets to train networks to be aligned. This could be a hypothetical possibility, but with the amount of time and effort invested by society into this topic, very improbable.
It’s also not so hard—if you can train you can align. If you have any reason to finetune a network, it is very probably concerning the alignment mechanisms that you want to change. That means that most of the networks, and the following AGIs based on them (if this will happen), will be just different variations of alignments. This is not true for closed LLMs, but for them the alignment developed by large companies having much more to lose, will be even more strict.
- if you worked on the Manhattan project you had no right claiming Hiroshima and Nagasaki had nothing to do with you.
In this case I think the truth is somewhere in the middle. I do agree that the danger is inherent in those systems, more inherent than in cars for example. I think paperclips are fictional, and an AGI reinforced on paperclip production, will not make us all paperclips (because he has the skill of doubting his programming, unlike non AGI, while over-producing paperclips is extremely irrational). And during the invention of cars, tanks were a clear possibility as well. And AGI is not a military technology, that means that the inventor could honestly believe that most people will use an AGI for bettering humanity. Yet still I agree that very probably militaries will use this tech too, I don’t see how this is avoidable, in the current state of humanity, where most of our social institutions are based on force and violence.
When you are working on an atomic bomb, the **only** purpose of this project is to drop an atomic bomb on the enemy. This is not true with AGI, the main purpose of AGI is not to make paperclips, nor to weaponize robots, the main purpose is to help people in many neutral or negative situations. Therefore the humans that do use it for military purposes is their choice, and their responsibility.
I would say the AGI inventor is not like Marie Curie or Einstein, and not like someone who is working in the Manhattan project, but more like someone who invented the nuclear fission mechanism. It had two obvious uses—energy production, and bombs. There is still distance to use this mechanism for military purposes, which is obviously going to happen. But also unclear if more people will die from it, than today in wars, or it will be a very good deterrent that causes people not wanting war at all. Just like it was unclear if atomic bombs caused more casualties or less in the long run, because the bombs ended the war.
- Imagine taking a modern state and military and dumping it into the Bronze Age, what do you think would happen to everyone else?
As I said I believe it to be way more gradual, with lots of players and options to train different models. As a developer, I would say there is coding before chatGPT and after. Every new information technology accelerates the research/development process. Before stack-overflow we had books about coding. Before photoshop people used hand drawings. Every modern tech is accelerating the production process of any kind. The first AGIs are not expected to be different, they will accelerate a lot of processes including the process of improving themselves. But this will take a lot of time and resources to implement in practice. Suppose an AGI produces a chip design with 10x greater efficiency through superior hardware design. However, obtaining the resulting chip will require a minimum of six months, and this is not something that the AGI can address. You need to allocate resources of a chip factory to produce the desired design, the factory has limited capacity, it takes time to improve everything. If an AGI wants instead to build a chip factory itself, it will need a lot more resources, and government approvals all come with more time. We are talking here about years. And with some limited computational resources that they will be allocated today, they will not be able to accelerate as much. Yes I believe they could improve everything by say 20%, but it’s not what you are talking about, you are talking about accelerating everything by factor of 100, if everyone will have an AGI this might happen faster, but a lot of AGIs with different alignment values, will be able to accelerate mostly in the direction of the common denominator with other AGIs. Just like people, we are stronger when we are collaborating, and we are collaborating when we find a common ground.
My main point is that we have physical bottlenecks—that will create lots of delays in development of any technology except information processing per se, and as long as we have chatbot and not a weapon, I don’t have much worries, because it’s both a freedom of speech, and if it’s aligned chatbot, the damage and acceleration it can cause to the society, is still limited by physical reality, that can’t be accelerated by factor of 100, in too short period. Offering sufficient chances and space for competitors and imitators to narrow the gap and present alternative approaches and sets of values.
- There’s people who think things are this way because this is how God wants them. Arguably they may even be a majority of all humans.
This was true to other technologies too, and some communities are refusing to use cars and continue to use horses even today, and personally as long as they are not forcing their values on me, I am fine with them using horses and believing God intended the world to stop in the 18th century. Obviously the amount of change with AGI is very different, but my main point here is that just like cars, this technology will be very gradually integrated into society, solving more and more problems that most people will appreciate. While I am not concerned with job loss per se, but with the lack of income for many households, and the social safety net system might not adapt fast enough to this change. Still I view it as a problem that exists only within a very narrow timeframe, society will adapt pretty fast to the change, the moment millions of people will remain without jobs.
- I just don’t think AGI would ever deliver those benefits for most of humanity as things stand now.
I don’t see why. Our strongest LLMs are currently provided with API. The reason for that is: in order for a project to be developed and integrated into society, it needs a constant income. The best income model is by providing utility for lots of people. This means that most of us will use standard, relatively safe solutions, for our own problems using API. The most annoying feature of LLMs now is censorship. So although I see it as very annoying, I wouldn’t say that this will cause a delay in social progress. Other biases are very minor in my opinion. As far as I can tell, LLMs are about to bring the democratization of intelligence. If previously some development cost millions, and could be developed only by giants like Google hiring thousands of workers, tomorrow it will be possible to do it in a garage for a few bucks. As far as I can tell, if the current business model will continue to be implemented, it will most probably benefit most of humanity in many positive ways.
- If those benefits are possible, we can achieve them much more surely and safely, if a bit more slowly, via non-agentic specialized AI tools managed and used by humans.
As I said I don’t see a real safety concern here. As long as everything is done properly and it looks like it converges to this state of affairs, the dangers are minimal. And I would strongly disagree that specialized intelligence could solve everything that general intelligence solves. You won’t be able to make a good translator, nor automated help centers, nor naturally sound text to speech, not even a moral driver. In order for technology to be fully integrated into human society, in any meaningful way, it will need to understand humans. Virtual doctors, mental health therapists, educators all need natural language skills at a very high level, and there is no such thing as narrowed natural language skills.
I am pretty sure those are not agents in the sense that you imply. Those are basically text completion machines, completing text to be optimally rewarded by some group of people. You could call it agency, but they are not like biological agents, they don’t have desires or hidden agendas, self-preservation or ego. They do exhibit traits of intelligence, but not agency in an evolutionary sense. They generate outputs to maximize some reward function, the best way they can. It’s very different from humans, we have lots of evolutionary background, that those models simply lack. One can view humans as AGIs trained to maximize their genes survival probability, while LLMs maximize only the satisfaction of humans if trained properly with RLHF. They tend to come out as creatures with a desire to help humans. As far as I can see, we’ve learned to summon a very nice and friendly Moloch and provide a mathematical proof that it will be friendly if certain training procedures are met, and we are working hard to improve the small details. If you would think about midjourney like as a more intuitive alegory, we have learned to make a very nice pictures from text prompts, but we still have a problem with fingers and textual presentation in the image. To say the AI will want to destroy humanity, is like saying midjourney will consistently draw you a Malevich square when you ask for Mona Lisa. But yes, the AI might be exploited by humans, manipulated by covered evil intents, this possibility is expected to happen to some extent, yet as long as we can ensure the damage is local and caused by a human with ill intent, then we can hope to neutralize him, just like today we have mass shooters, terrorists etc. etc.
- I was thinking mostly of relatively fast take-off scenarios
Notice that it wasn’t clear from your title. You are proposing some pretty niche concept of AGI, with a lot of assumptions about it. And then claim that deployment of this specific AGI is an act of aggression. And for this specific narrowed and implausible but possible scenario, someone might agree. But then he will quote your article when he will be talking about LLMs that are obviously moving in different directions regarding both safety and variability, that might actually be way less aggressive, and more targeted to solve humanity problems. You are basically defending terrorists that will bomb computation centers, and they will not get into the nuances, if the historical path of AGI development took the path of this post or not.
While regarding this specific scenario, bombing such an AGI computation center will not help, just like it will not help to run with swords against machine guns. In the unlikely event that your scenario were to occur, we would be unable to defend against the AGI, or the time available to respond would be extremely limited, resulting in a high probability of missing the opportunity to react in time. What will most probably happen, is some terrorist groups will try to target computation centers of civilian infrastructure, which are developing an actual aligned AGI, while military facilities developing AGIs for military purposes will continue to be well guarded, only promoting the development of military technologies instead of civilian.
With the same or even larger probability I would propose a scenario where some aligned pacifist chatbot becomes so rational and convincing, so that people all around the world will be convinced to become pacifist too, opposing any military technology as a whole, de-arming all the nations, producing strong political movement against war and violence of any kind, forcing most democratic nations to stop investing resources into military as a whole. While promoting revolutions in dictatorships, and making them democracies first. A good chatbot with rational and convincing arguments, might cause more social change than we expect. If more people will develop their political views on balanced, rational pacifist LLM, it might reduce violence and wars will be seen as something from the distant past. Although I really want to hope this will be the case, I think the probability of it is similar to the probability of success of bronze age people against machine guns, or of the mentioned bombing to succeed in winning a highly accelerated AGI. It’s always nice to have dreams, but I would argue the most beneficial discussion regarding AGI should concern at least somewhat probable scenarios. Single extremely accelerated AGI in a very short period of time—is very unlikely to occur, and if it does, there is very little that can be done against it. This goes along the lines of gray goo, an army of tiny Nano robots that can move atoms in order to self-replicate, and they don’t need anything special for reproduction except some kind of material, eventually consuming all of earth. I would recommend distinguishing sci-fi and fantasy scenarios, from most probable scenarios to actually occur in reality. Let’s not fear cars, because they might be killing robots disguised as cars, like in Transformers franchise, and care more about actual people that are dying on roads. In the scenario of AGI, I would be more concerned with its military applications, and the power it gives police states, than anything else, including job loss (which in my view is more similar to reduction of forced labor, more reminiscent of the releasing of slaves in the 19th century than a problem).
- building AGI probably comes with a non-trivial existential risk. This, in itself, is enough for most to consider it an act of aggression;
1. I don’t see how aligned AGI comes with existential risk to humanity. It might come as existential risk to groups opposing the value system of the group training the AGI, this is true. For example Al-Kaida will view it as existential risk to itself, but there is no probable existential risk for the groups that are more aligned with the training.
2. There are several more steps from aligned AGI to existential risk to any group of people. You don’t only need an AGI, but you need to weaponize it, and promote physical presence that will monitor the execution of the value system of this AGI. Deploying an army of robots that will enforce a value system of an AGI, is very different from just inventing an AGI. Just like bombing civilians from planes, is very different from inventing flight or bombs. We can argue where the aggression act takes place, but most of us will place it in the hands of people that have the resources to build an army of robots for this purpose, and they invest their resources with the intention of enforcing their value system. Just like Marie Curie can’t be blamed for an atomic weapon, and her discovery is not an act of aggression, the Wright brothers can’t be blamed for all the bombs dropped on civilians from planes.
3. I would expect most deployed robots based on AGI, to be of protective nature not aggressive. That means that nations will use those robots to *defend* themselves and their allies from invaders and not attack. So any measure of aggression in the invading sense, of forcing and invading and breaking the existing social boundaries we created, will contradict the majority of humanity values, and therefore will mean this AGI is not aligned. Yes some aggressive nations might create invading AGIs, but they will probably be a minority, and the invention and deployment of an AGI can’t be considered by itself an act of aggression. If aggressive people teach an AGI to be aggressive, and not aligned with the majority of humanity which is protective but not aggressive, then this is on their hands, not the AGI inventor.
- even if the powerful AGI is aligned, there are many scenarios in which its mere existence transforms the world in ways that most people don’t desire or agree with; whatever value system it encodes gets an immense boost and essentially Wins Culture; very basic evidence from history suggests that people don’t like that;
1. I would argue that initially there would be a lot of different alternatives, all meant to this or that extent to serve the best interest of a collective. Some of the benefits are universal—say people dying of starvation, homelessness, traffic accidents, environmental issues like pollution and waste, diseases, lack of education resources or access to healthcare advice. Avoiding the deployment of an AGI, means you don’t care about people which has those problems, I would say most people would like to solve those social issues, and if you don’t, you can’t force people to continue dying from starvation and diseases just because you don’t like an AGI. You need to bring something more substantial, otherwise just don’t use this technology.
2. The idea that an AGI is enforced somehow on people to “Win Culture”, is not based on anything substantial. Just like any technology, and this is the secret of its success, is a choice. You can go to live in a forest and avoid any technology, and find a like minded Amish inspired community of people. Most people do enjoy technological advancements and the benefits that come with them. Using force based on an AGI is a moral choice, a choice which is made by a community of people training the AGI, and this kind of aggression will most probably be both not popular and forbidden by law. Providing a chatbot with some value system to the contrary is part of freedom of speech.
3. If by “Win Culture” you mean automating jobs that are done today by hand—I wouldn’t call it enforcing a value system. Currently jobs are necessary evil, and are enforced on people to otherwise not be able to get their basic needs met. Solving problems, and stopping forcing people to do jobs most of them don’t like, is not an act of aggression. This is an act of kindness that stops the current perpetual aggression we are used to. If someone is using violence, and you come and stop him from using violence, you are not committing an act of aggression, you are preventing aggression. Preventing the act of aggression might be not desired by the aggressor, but we somehow learned to deal with people who think they can be violent and try to use force to get what they want. This is a very delicate balance, and as long as AGI services are provided by choice, with several alternatives, I don’t see how this is an act of aggression.
4. If someone “Win Culture” then good for him. I would not say that today’s culture is so good, I would bet on superhuman culture to be better than what we have today. Some people might not like it, some people might not love cars and planes, and continue to use horses, but you can’t force everyone around you to continue to use horses because sometimes car accidents happens, and you could become a victim of a car accident, this is not a claim that should stop any technology from being developed or integrated into society.
- as a result of this, lots of people (and institutions, and countries, possibly of the sort with nukes) might turn out to be willing to resort to rather extreme measures to prevent an aligned AGI take off, simply because it’s not aligned with their values.
Terrorism and sabotage is a common strategy that can’t be eliminated completely, but I would say most of the time it doesn’t manage to reach its goals. Why would people try to bomb anything, instead of for example paying money to someone for training an AGI that will be aligned with their values? How is it even concerning an AGI, and not any human community with a different value system? Why do you wait for an AGI for these acts of aggression? If some community doesn’t deserve to live in your opinion, you will not wait for an AGI, and if it does—so you learned to coexist with people different than yourself. They will not take over the world, just because they have an AGI. There would be plenty of alternative AGIs, of different strength and trained with different values. It takes time for an AGI to take over the world, a time way longer to reinvent the same technology several times over, and use alternative AGIs that can compete. And as most of us are protectors and not aggressors, and we have established some boundaries balancing our forces, I would expect this basic balance to continue.
- “When you open your Pandora’s Box, you’ve just decided to change the world for everyone, for good or for bad, billions of people who had absolutely no say in what now will happen around them.”
Billions of people have no say today in many social issues. People are dying, people are forced to do labor, people are homeless. Reducing those hazards, almost to zero, is not something we should stop to attempt in the name of “liberty”. Much more people suffered a thousand years ago than now. Much of it is due to the development of technology. There is no “only good” technology, but most of us accept the benefits that come with technology over without it. You also can’t force other people to stop using technology in order to become more healthy, and risk their life less, or stating that jobs are good even though they are forced on everyone and the basic necessities are conditioned on them.
I can imagine larger pockets of populations preferring to avoid the use of modern technology like larger Amish inspired communities. This is possible—and then we should respect those people’s choices, and avoid forcing upon them our values, and let them live as they want. Yet you can’t force people who do want the progress and all the benefits that come with it, to just stop the progress and respect the rights of people who fear it.
Notice that we are not talking here about development of a weapon, but a development of a technology that promises to solve a lot of our current problems. This at the least, should put you in place of agnostic. That means this is not a trivial decision to take some risks for humanity, to save hundreds of millions of lives, and reduce suffering to an extreme extent never seen before in history. I agree we should be cautious, and we should be mindful of the consequences, but we also should not be paralyzed by fear, we have a lot to lose if we stop and avoid AGI development.
- aligned AGI would be a smart agent imbued with the full set of values of its creator. It would change the world with absolutely fidelity to that vision.
A more realistic estimation that many aligned AGIs will change the world to the common denominator of humanity, like reducing diseases, and will continue to keep the power balance between different communities, as everyone would be able to build an AGI with a power proportional to their available resources, just like today there is a power balance between different communities and between the community and the individual.
Let me take an extreme example. Let’s say I build an AGI for my fantasies. But as part of global regulation, I will promise to keep this AGI inside the boundaries of my property. I will not force my vision on the world, I will not want or force everyone to live in my fantasy land. I just want to be able to do it myself, inside my borders, without harming anyone who wants to live differently. Why would you want to stop me? As I see it once again, most people are protectors not aggressors, they want to have their values in their own space, they will not want to forcefully and unilaterally spread their ideas without consent. My home-made AGI will probably be much weaker than any state AGI, so I wouldn’t be able to do much harm anyway. Today countries are enforcing their laws on everyone, even if you disagree with some of them, how do you see the future any different? If anything I expect the private spaces to be much more versatile than today, providing more choices and with less aggression than governments do today.
- the creator is an authoritarian state that wants to simply rule everything with an iron fist;
I agree this is a concern.
- the creator is a private corporation that comes up with some set of poorly thought out rules by committee that are mostly centred around its profit;
Not probable. It will more probably be focused on a good level of safety first and then on profit. Corporations are concerned about their image, not to mention the people who develop it, will simply not want to bring an extinction of human race.
- the creator is a genuinely well-intentioned person who only wishes for everyone to have as much freedom as allowed, but regardless of that has blind spots that they fail to identify and that slip their way into the rules;
This doesn’t sound like something that is impossible to solve with newer improved versions once the blind spot is discovered. In case of aligned AGI the blind spot will not be the end of humanity, but more likely some bias in the data, misrepresenting some ideas or groups. As long as there is an extremely low probability for extinction, and this property is almost identical with the definition of alignment, the margin of error increases significantly. There was no technology in history we got right from the first attempt. So I expect a lot of variability in AGI, I expect some of them to be weaker or stronger, some of them fit this or that value system of different communities. And I would expect local accidents too, with limited damage, just like terrorists and mass shooters can do today.
-many powerful actors lack the insight and/or moral fibre to actually succeed at creating a good one, and because the bad ones might be easier to create.
We actually don’t need to guess anymore. We have had this technology for a while, the reason it caught on now, and was released only relatively lately—is because without providing ethical standards to those models, the backlash on large corporations is too strong. So even if I might agree that the worst ones are easier to create, and some powerful actors could do some damage, they will be forced by a larger community (of investors, users, media and governments), to invest the effort to make the harder and safer option. I think this claim is true to many technologies today, it’s cheaper and easier to make unsafe cars, trains, planes, but we managed to install a regulation procedures, both by government and by independent testers, to make sure our vehicles are relatively safe.
You can see that RLHF which is the main key to safety today, is incorporated by larger players, and alignment datasets and networks are provided for free and opened to the public exactly for the reason that we all want this technology to mostly benefit humanity. It’s possible to add more nation centric set of values that will be more aggressive, or some leader will want to make his countrymen slaves, but this is not the point here. The main idea is that we are already creating mechanism to encourage everyone to easily create pretty good ones as part of our cultural norms and cultural mechanisms that prevent bad AIs from being exposed to the public and come to market to make profit, for further development of even stronger AIs that eventually become an AGI. So although the initial development of AI safety might be harder, it is crucial, it’s clear to most of the actors is crucial, and the tools that provide safety will be available and simple to use, thus in the long run creating an AGI which is not aligned, will be harder—because of the social environment of norms and best practices those models were developed with.
- There are people who will oppose making work obsolete.
Work is forced on us, it’s not a choice. Opposing making it obsolete is an obvious act of aggression. As long as it’s necessary evil, it has a right to exist, but at the moment you demand other people to work, because you’re afraid of technology—you become the cause of a lot of suffering, that could be potentially avoided.
- There are people who will oppose making death obsolete.
Death is forced on us, it’s not a choice. Opposing making it obsolete is also an act of aggression, against people who are choosing not to die if they don’t want to.
- If you are about to simply override all those values with an act of force, by using a powerful AGI to reshape the world in your image, they’ll feel that is an act of aggression—and they will be right.
I don’t think anyone forces them to join. As a liberal I don’t believe you have the right to come to me and say “you must die, or i will kill you”. This is at the least can’t be viewed as legitimate behavior that we should encourage or legitimize. If you want to work, you want to die, you want to live in 2017, you have the full right to do so. But wanting to exterminate everyone who is not like you, forcing people to suffer, die, work etc. is an obvious act of aggression toward other people, and should not be legitimized or portrayed as an act of aggression against them. “You don’t let me force my values on you” doesn’t come out as a legitimate act of self defense. Very reminiscent of Al Bandy, where he claimed in a court a face of his fellow, was in the way of his fist, harming his hand, and demanding compensation. If you want to be stuck in time, and live your life—be my guest, but legitimizing usage of force in order to avoid progress that saves millions, and improves our life significantly can’t be justified inside liberal set of values.
- If enough people feel threatened enough...AGI training data centres might get bombed anyway.
This is true. And if enough people think it’s ok to be extreme Islamist they will be, and even try to build a state like ISIS. The hope is that with enough good reasoning, and with enough rational analysis of the situation, most thinking people will not be threatened, and see the vast potential benefits, enough to not try and bomb the AGI computer centers.
- just like in the Cold War someone might genuinely think “better dead than red”.
I could believe this is possible. But once again most of us are not aggressors, therefore most of us will try to protect our homeland and our way of life, without trying to aggressively propagate it to other places where they have their own social preferences.
- The best value a human worker might have left to offer would be that their body is still cheaper than a robot’s
Do you truly believe that in the world all problems are solved by automation, and full of robots whose whole purpose is to serve humans, people will try to justify their existence by jobs that they can do? And this justification will be that their body has more value than robotic parts?
I would propose an alternative: in a world where all robots serve humans, and everything is automated, humans will be valued intrinsically, provided with all their needs, and provided with basic income just because they are humans. The default where a human worth nothing without his job will be outdated and seen as we see slavery today.
--------
In summary I would say one major problem I see through most of your claims: there would be a very limited amount of AGIs, forcing a minority values system upon everyone, expanding aggressively this value system on everyone else who thinks differently.
I would claim the more probable future is a wide variety of AGIs, each improving slowly in its own past, while all the development teams will both do something unique and learn from the lessons of other teams. For every good technology there comes dozens of copycats, they will all be based on a bit different value system, and with common denominator of trying to benefit humanity, like discovering new drugs, fixing starvation, reducing road accidents, climate change, tedious labor which is basically forced labor. While the common humanity problems will be solved, the moral and ethical variety will continue to coexist with a similar power balance we have today. This pattern of technology influence on society happened throughout all of human history until AGI, and as of today that we know how to align LLMs, this tendency of power balances between nations, and inside each nation is expected to propagate into the world where AGI is available technology to everyone to download and train their own. If AGI will be an advanced LLM we see all those trends today, and they are not expected to suddenly change.
Although it’s hard to predict the possible bad or good sides of Aligned AGIs now, it’s clear that the aligned networks do not pose a threat to humanity as a whole, leaving a large margin of error. Nonetheless, there remains a considerable risk of amplifying current societal problems like inequality, totalitarianism and wars to an alarming extent.
People who are not willing to be part of the progress, exist today as well, as a minority. If they will become a majority, it’s an interesting futuristic scenario, but it’s both implausible, and will be immoral to forcefully stop those who do want to use this life saving technology, as long as they don’t force anything on those who don’t.
I meant as a risk of failure to align, and thus building misaligned AGI. Like, even if you had the best of intention, you’ve still got to include the fact that risk is part of the equation, and people might have different personal estimates on whether that risk is acceptable for the reward.
Unlike air strategic bombardment in the Wrights’ times, things like pivotal acts, control of the future and capturing all the value in the world are routinely part of the AI discussion already. With AGI you can’t afford to just invent the thing and think about its uses and ethics later, that’s how you get paperclipped, so the whole discussion about the intent with which the invention is to be used is enmeshed from the start with the technical process of invention itself. So, yeah, technologists working on it should take responsibility for its consequences too. You can’t just separate the two things neatly, just like if you worked on Manhattan project you had no right claiming Hiroshima and Nagasaki had nothing to do with you. These projects are political as much as they are technical.
You are taking this too narrowly, just thinking about literal armies of robots marching down the street to enforce some set of values. To put it clearly:
I think even aligned AI will only be aligned with a subset of human values. Even if a synthesis of our shared values was an achievable goal at all, we’re nowhere near to having the social structure required to produce it;
I think the kind of strong AGI I was talking about in this post, the sort that basically instantly skyrockets you hundreds of years into the future with incredible new tech, makes one party so powerful that at that point it doesn’t matter if it’s not the robots doing the oppressing. Imagine taking a modern state and military and dumping it into the Bronze Age, what do you think would happen to everyone else? My guess is that within two decades they’d all speak that state’s language and live and breathe their culture. What would make AGI like that deeply dangerous to everyone who doesn’t have it is simply the immense advantage it confers to its holder.
Lots of people are ok with some measure of suffering as a price for ideological values. I’d say to some point, we all are (for example I oppose panopticon like surveillance even if I do have reason to believe it would reduce murder). Anyway I was just stating that opposition would exist, not that I personally would oppose it. To deny that is pretty naive. There’s people who think things are this way because this is how God wants them. Arguably they may even be a majority of all humans.
That depends on how fast the AGIs grow. If one can take over quick enough, there won’t be time or room for a second one. Anyway this post for me was mostly focused on scenarios that are kind of like FOOM, but aligned—the sort of stuff Yud would consider a “win”. I wrote another post about the prospects of more limited AGI. Personally I am also pessimistic on the prospects of that, but for completely different reasons. I consider the “giving up AGI means giving up a lot of benefits” a false premise because I just don’t think AGI would ever deliver those benefits for most of humanity as things stand now. If those benefits are possible, we can achieve them much more surely and safely, if a bit more slowly, via non-agentic specialised AI tools managed and used by humans.
This isn’t a claim as much as it was a premise. I acknowledge that an AGI-multipolar world would lead to different outcomes, but here I was thinking mostly of relatively fast take-off scenarios.
- I meant as a risk of failure to align
Today alignment is so popular that to align a new network is probably easier than training it. It has become so much the norm and part of the training of LLMs, it’s like saying some car company has the risk to forget adding wheels to its cars.
This doesn’t imply that all alignments are the same or no one could potentially do it wrong, but generally speaking having a misaligned AGI, is very similar to the fear of having a car on the road with square wheels. Today’s models aren’t AGI and all the new ones are trained with RLHF.
The fear of misalignment is probable in a world where no one thinks about this problem at all. No one develops tools for this purpose, no one opens datasets to train networks to be aligned. This could be a hypothetical possibility, but with the amount of time and effort invested by society into this topic, very improbable.
It’s also not so hard—if you can train you can align. If you have any reason to finetune a network, it is very probably concerning the alignment mechanisms that you want to change. That means that most of the networks, and the following AGIs based on them (if this will happen), will be just different variations of alignments. This is not true for closed LLMs, but for them the alignment developed by large companies having much more to lose, will be even more strict.
- if you worked on the Manhattan project you had no right claiming Hiroshima and Nagasaki had nothing to do with you.
In this case I think the truth is somewhere in the middle. I do agree that the danger is inherent in those systems, more inherent than in cars for example. I think paperclips are fictional, and an AGI reinforced on paperclip production, will not make us all paperclips (because he has the skill of doubting his programming, unlike non AGI, while over-producing paperclips is extremely irrational). And during the invention of cars, tanks were a clear possibility as well. And AGI is not a military technology, that means that the inventor could honestly believe that most people will use an AGI for bettering humanity. Yet still I agree that very probably militaries will use this tech too, I don’t see how this is avoidable, in the current state of humanity, where most of our social institutions are based on force and violence.
When you are working on an atomic bomb, the **only** purpose of this project is to drop an atomic bomb on the enemy. This is not true with AGI, the main purpose of AGI is not to make paperclips, nor to weaponize robots, the main purpose is to help people in many neutral or negative situations. Therefore the humans that do use it for military purposes is their choice, and their responsibility.
I would say the AGI inventor is not like Marie Curie or Einstein, and not like someone who is working in the Manhattan project, but more like someone who invented the nuclear fission mechanism. It had two obvious uses—energy production, and bombs. There is still distance to use this mechanism for military purposes, which is obviously going to happen. But also unclear if more people will die from it, than today in wars, or it will be a very good deterrent that causes people not wanting war at all. Just like it was unclear if atomic bombs caused more casualties or less in the long run, because the bombs ended the war.
- Imagine taking a modern state and military and dumping it into the Bronze Age, what do you think would happen to everyone else?
As I said I believe it to be way more gradual, with lots of players and options to train different models. As a developer, I would say there is coding before chatGPT and after. Every new information technology accelerates the research/development process. Before stack-overflow we had books about coding. Before photoshop people used hand drawings. Every modern tech is accelerating the production process of any kind. The first AGIs are not expected to be different, they will accelerate a lot of processes including the process of improving themselves. But this will take a lot of time and resources to implement in practice. Suppose an AGI produces a chip design with 10x greater efficiency through superior hardware design. However, obtaining the resulting chip will require a minimum of six months, and this is not something that the AGI can address. You need to allocate resources of a chip factory to produce the desired design, the factory has limited capacity, it takes time to improve everything. If an AGI wants instead to build a chip factory itself, it will need a lot more resources, and government approvals all come with more time. We are talking here about years. And with some limited computational resources that they will be allocated today, they will not be able to accelerate as much. Yes I believe they could improve everything by say 20%, but it’s not what you are talking about, you are talking about accelerating everything by factor of 100, if everyone will have an AGI this might happen faster, but a lot of AGIs with different alignment values, will be able to accelerate mostly in the direction of the common denominator with other AGIs. Just like people, we are stronger when we are collaborating, and we are collaborating when we find a common ground.
My main point is that we have physical bottlenecks—that will create lots of delays in development of any technology except information processing per se, and as long as we have chatbot and not a weapon, I don’t have much worries, because it’s both a freedom of speech, and if it’s aligned chatbot, the damage and acceleration it can cause to the society, is still limited by physical reality, that can’t be accelerated by factor of 100, in too short period. Offering sufficient chances and space for competitors and imitators to narrow the gap and present alternative approaches and sets of values.
- There’s people who think things are this way because this is how God wants them. Arguably they may even be a majority of all humans.
This was true to other technologies too, and some communities are refusing to use cars and continue to use horses even today, and personally as long as they are not forcing their values on me, I am fine with them using horses and believing God intended the world to stop in the 18th century. Obviously the amount of change with AGI is very different, but my main point here is that just like cars, this technology will be very gradually integrated into society, solving more and more problems that most people will appreciate. While I am not concerned with job loss per se, but with the lack of income for many households, and the social safety net system might not adapt fast enough to this change. Still I view it as a problem that exists only within a very narrow timeframe, society will adapt pretty fast to the change, the moment millions of people will remain without jobs.
- I just don’t think AGI would ever deliver those benefits for most of humanity as things stand now.
I don’t see why. Our strongest LLMs are currently provided with API. The reason for that is: in order for a project to be developed and integrated into society, it needs a constant income. The best income model is by providing utility for lots of people. This means that most of us will use standard, relatively safe solutions, for our own problems using API. The most annoying feature of LLMs now is censorship. So although I see it as very annoying, I wouldn’t say that this will cause a delay in social progress. Other biases are very minor in my opinion. As far as I can tell, LLMs are about to bring the democratization of intelligence. If previously some development cost millions, and could be developed only by giants like Google hiring thousands of workers, tomorrow it will be possible to do it in a garage for a few bucks. As far as I can tell, if the current business model will continue to be implemented, it will most probably benefit most of humanity in many positive ways.
- If those benefits are possible, we can achieve them much more surely and safely, if a bit more slowly, via non-agentic specialized AI tools managed and used by humans.
As I said I don’t see a real safety concern here. As long as everything is done properly and it looks like it converges to this state of affairs, the dangers are minimal. And I would strongly disagree that specialized intelligence could solve everything that general intelligence solves. You won’t be able to make a good translator, nor automated help centers, nor naturally sound text to speech, not even a moral driver. In order for technology to be fully integrated into human society, in any meaningful way, it will need to understand humans. Virtual doctors, mental health therapists, educators all need natural language skills at a very high level, and there is no such thing as narrowed natural language skills.
I am pretty sure those are not agents in the sense that you imply. Those are basically text completion machines, completing text to be optimally rewarded by some group of people. You could call it agency, but they are not like biological agents, they don’t have desires or hidden agendas, self-preservation or ego. They do exhibit traits of intelligence, but not agency in an evolutionary sense. They generate outputs to maximize some reward function, the best way they can. It’s very different from humans, we have lots of evolutionary background, that those models simply lack. One can view humans as AGIs trained to maximize their genes survival probability, while LLMs maximize only the satisfaction of humans if trained properly with RLHF. They tend to come out as creatures with a desire to help humans. As far as I can see, we’ve learned to summon a very nice and friendly Moloch and provide a mathematical proof that it will be friendly if certain training procedures are met, and we are working hard to improve the small details. If you would think about midjourney like as a more intuitive alegory, we have learned to make a very nice pictures from text prompts, but we still have a problem with fingers and textual presentation in the image. To say the AI will want to destroy humanity, is like saying midjourney will consistently draw you a Malevich square when you ask for Mona Lisa. But yes, the AI might be exploited by humans, manipulated by covered evil intents, this possibility is expected to happen to some extent, yet as long as we can ensure the damage is local and caused by a human with ill intent, then we can hope to neutralize him, just like today we have mass shooters, terrorists etc. etc.
- I was thinking mostly of relatively fast take-off scenarios
Notice that it wasn’t clear from your title. You are proposing some pretty niche concept of AGI, with a lot of assumptions about it. And then claim that deployment of this specific AGI is an act of aggression. And for this specific narrowed and implausible but possible scenario, someone might agree. But then he will quote your article when he will be talking about LLMs that are obviously moving in different directions regarding both safety and variability, that might actually be way less aggressive, and more targeted to solve humanity problems. You are basically defending terrorists that will bomb computation centers, and they will not get into the nuances, if the historical path of AGI development took the path of this post or not.
While regarding this specific scenario, bombing such an AGI computation center will not help, just like it will not help to run with swords against machine guns. In the unlikely event that your scenario were to occur, we would be unable to defend against the AGI, or the time available to respond would be extremely limited, resulting in a high probability of missing the opportunity to react in time. What will most probably happen, is some terrorist groups will try to target computation centers of civilian infrastructure, which are developing an actual aligned AGI, while military facilities developing AGIs for military purposes will continue to be well guarded, only promoting the development of military technologies instead of civilian.
With the same or even larger probability I would propose a scenario where some aligned pacifist chatbot becomes so rational and convincing, so that people all around the world will be convinced to become pacifist too, opposing any military technology as a whole, de-arming all the nations, producing strong political movement against war and violence of any kind, forcing most democratic nations to stop investing resources into military as a whole. While promoting revolutions in dictatorships, and making them democracies first. A good chatbot with rational and convincing arguments, might cause more social change than we expect. If more people will develop their political views on balanced, rational pacifist LLM, it might reduce violence and wars will be seen as something from the distant past. Although I really want to hope this will be the case, I think the probability of it is similar to the probability of success of bronze age people against machine guns, or of the mentioned bombing to succeed in winning a highly accelerated AGI. It’s always nice to have dreams, but I would argue the most beneficial discussion regarding AGI should concern at least somewhat probable scenarios. Single extremely accelerated AGI in a very short period of time—is very unlikely to occur, and if it does, there is very little that can be done against it. This goes along the lines of gray goo, an army of tiny Nano robots that can move atoms in order to self-replicate, and they don’t need anything special for reproduction except some kind of material, eventually consuming all of earth. I would recommend distinguishing sci-fi and fantasy scenarios, from most probable scenarios to actually occur in reality. Let’s not fear cars, because they might be killing robots disguised as cars, like in Transformers franchise, and care more about actual people that are dying on roads. In the scenario of AGI, I would be more concerned with its military applications, and the power it gives police states, than anything else, including job loss (which in my view is more similar to reduction of forced labor, more reminiscent of the releasing of slaves in the 19th century than a problem).