Mod note. (LW mods are trying out moderating in public rather than via PMs. This may feel a bit harsh if you’re not used to this sort of thing, but we’re aiming for a culture where feedback feels more natural. I think is important to do publicly for a) accountability and b) so people can form a better model of how the LW moderators operate)
I do think globally banning autonomous weapons is a reasonable idea, but the framing of this post feels pretty off.
I downvoted for the first paragraph, which makes an (IMO wrong) assumption that this is the first step towards AI alignment. The paragraph also seems more like it is trying to build social consensus rather than explain information to me. I don’t think this is never appropriate on LW but I think it requires a lot more context than this post seems to imply (i.e. this post doesn’t mention anything about how autonomous weapons relate to existential risk. I think there’s a plausible connection between the two but this post doesn’t spell it out or address potential failure modes of confusing the two)
My very similar post had a somewhat better reception, although certainly people disagreed. I think there are two things going on. Firstly, Lucas’s post, and perhaps my post, could have been better written.
Secondly, and this is just my opinion, people coming from the orthodox alignment position (EY) have become obsessed with the need for a pure software solution, and have no interest in shoring up civilization’s general defenses by banning the most dangerous technologies that an AI could use. As I understand, they feel that focus on how the AI does the deed is a misconception, because the AI will be so smart that it could kill you with a butter knife and no hands.
Possibly the crux here is related to what is a promising path, what is a waste of time, and how much collective activism effort we have left, given time on the clock. Let me know if you disagree with this model.
Yes, the linked post makes a lot of sense: wet labs should be heavily regulated.
Most of the disagreement here is based on two premises:
A: Other vectors (wet labs, etc.) present a greater threat. Maybe, though intelligent weapons are the most clearly misanthropic variant of AI.
B: AI will become so powerful, so quickly, that limiting its vectors of attack will not be enough.
If B is true, the only solution is a general ban on AI research. However, this would need to be a coordinated effort across the globe. There is far more support for halting intelligent weapons development than for a general ban. A general ban could come as a subsequent agreement.
How is the framing of this post “off”? It provides an invitation for agreement on a thesis. The thesis is very broad, yes, and it would certainly be good to clarify these ideas.
What is the purpose of sharing information, if that information does not lead in the direction of a consensus? Would you have us share information simply to disagree on our interpretation of it?
The relationship between autonomous weapons and existential risk is this: autonomous weapons have built-in targeting and engagement capabilities. If we could make an analogy to a human warrior, in a rogue AI scenario, any autonomous weapons to which the AI gained access would serve as the ‘sword-arm’ of the rogue AI, while a reasoning model would provide the ‘brains’ to direct and coordinate it. The first step towards regaining control would be to disarm the rogue AI, as one might disarm a human, or remove the stinger on a stingray. The more limited the weaponry that the AI has access to, the easier it would be to disarm.
A high level thing about LessWrong is that we’re primarily focused on sharing information, not advocacy. There may be a later step where you advocate for something, but on LessWrong the dominant mode is discussing / explaining it, so that we can think clearly about what’s true.
Advocacy pushes you down a path of simplifying ideas rather than clearly articulating what’s true, and pushing for consensus for the sake of coordination regardless of whether you’ve actually found the right thing to coordinate on.
“What is the first step towards alignment” isn’t something there’s a strong consensus on, but I don’t think it’s banning autonomous weapons, for a few reasons:
banning weapons doesn’t help solve alignment, it just makes the consequences of one particular type of mis-alignment less bad. The first biggest problem with AI alignment is that it’s a confusing domain we haven’t dealt with before, and I think many first steps are more like “become less confused” than do a particular thing.
from the perspective of “hampering the efforts of a soft takeoff”, it’s not obvious you’d do autonomous weapons vs “dramatically improving security computer systems” or “better controlling wetlabs that the AI could hire to develop novel pathogens”. If you ban autonomous weapons the AI can still just hire mercenaries – killer robots help but are neither necessary nor sufficient for an AI takeover.
I bring this up to highlight that we’re nowhere near a place where it’s “obvious” that this is the first step, and that you can skip to building consensus towards it.
My intent here is to communicate some subtle things about the culture and intent LessWrong, so you can decide whether you want to stick around and participate. This is not a forum for arbitrary types of communication, it’s meant to focus on truthseeking first. Our experience is that people who veer towards advocacy-first or consensus-first tend to subtly degrade truthseeking norms in ways that are hard to reverse.
I also think there are a number of object level things about AI alignment you’re missing. I think your argument here is a reasonable piece of a puzzle but I wouldn’t at all call it “the first step towards AI alignment”. If you want to stick around, expect to have a lot to learn.
“Advocacy pushes you down a path of simplifying ideas rather than clearly articulating what’s true, and pushing for consensus for the sake of coordination regardless of whether you’ve actually found the right thing to coordinate on.”
Simplifying (abstracting) ideas allows us to use them efficiently.
Coordination allows us to combine our talents to achieve a common goal.
The right thing is the one which best helps us achieve our cause.
Our cause, in terms of alignment, is making intelligent machines that help us.
The first step towards helping us is not killing us.
Intelligent weapons are machines with built-in intelligence capabilities specialized for the task of killing humans.
Yes, a rogue AI could try to kill us in other ways: bioweapons, power grid sabotage, communications sabotage, etc. Limiting the development of new microorganisms, especially with regards to AI, would also be a very good step. However, bioweapons research requires human action, and there are very few humans that are both capable and willing to cause human extinction. Sabotage of civilian infrastructure could cause a lot of damage, especially the power grid, which may be vulnerable to cyberattack. https://www.gao.gov/blog/securing-u.s.-electricity-grid-cyberattacks
Human mercenaries causing a societal collapse? That would mean a large number of individuals who are willing to take orders from a machine to actively harm their communities. Very unlikely.
The more human action that an AI requires to function, the more likely a human will notice and eliminate a rogue AI. Unfortunately, the development of weapons which require less human action is proceeding rapidly.
Suppose an LLM or other reasoning model were to enter a bad loop, maybe as the result of a joke, in which it sought to destroy humanity. Suppose it wrote a program which, when installed by the unsuspecting user, created a much smaller model, and this model used other machines to communicate with autonomous weapons, instructing them to destroy key targets. The damage which arises in this scenario would be proportional to the power and intelligence of the autonomous weapons. Hence, the need to stop developing them immediately.
Human mercenaries causing a societal collapse? That would mean a large number of individuals who are willing to take orders from a machine to actively harm their communities. Very unlikely.
I’m wondering how you can hold that position given all the recent social disorder we’ve seen all over the world where social media driven outrage cycles have been a significant accelerating factor. People are absolutely willing to “take orders from a machine” (i.e. participate in collective action based on memes from social media) in order to “harm their communities” (i.e. cause violence and property destruction).
These memes have been magnified by the words of politicians and media. We need our leaders to discuss things more reasonably.
That said, restricting social media could also make sense. A requirement for in-person verification and limitation to a single account per site could be helpful.
Mod note. (LW mods are trying out moderating in public rather than via PMs. This may feel a bit harsh if you’re not used to this sort of thing, but we’re aiming for a culture where feedback feels more natural. I think is important to do publicly for a) accountability and b) so people can form a better model of how the LW moderators operate)
I do think globally banning autonomous weapons is a reasonable idea, but the framing of this post feels pretty off.
I downvoted for the first paragraph, which makes an (IMO wrong) assumption that this is the first step towards AI alignment. The paragraph also seems more like it is trying to build social consensus rather than explain information to me. I don’t think this is never appropriate on LW but I think it requires a lot more context than this post seems to imply (i.e. this post doesn’t mention anything about how autonomous weapons relate to existential risk. I think there’s a plausible connection between the two but this post doesn’t spell it out or address potential failure modes of confusing the two)
My very similar post had a somewhat better reception, although certainly people disagreed. I think there are two things going on. Firstly, Lucas’s post, and perhaps my post, could have been better written.
Secondly, and this is just my opinion, people coming from the orthodox alignment position (EY) have become obsessed with the need for a pure software solution, and have no interest in shoring up civilization’s general defenses by banning the most dangerous technologies that an AI could use. As I understand, they feel that focus on how the AI does the deed is a misconception, because the AI will be so smart that it could kill you with a butter knife and no hands.
Possibly the crux here is related to what is a promising path, what is a waste of time, and how much collective activism effort we have left, given time on the clock. Let me know if you disagree with this model.
Yes, the linked post makes a lot of sense: wet labs should be heavily regulated.
Most of the disagreement here is based on two premises:
A: Other vectors (wet labs, etc.) present a greater threat. Maybe, though intelligent weapons are the most clearly misanthropic variant of AI.
B: AI will become so powerful, so quickly, that limiting its vectors of attack will not be enough.
If B is true, the only solution is a general ban on AI research. However, this would need to be a coordinated effort across the globe. There is far more support for halting intelligent weapons development than for a general ban. A general ban could come as a subsequent agreement.
How is the framing of this post “off”? It provides an invitation for agreement on a thesis. The thesis is very broad, yes, and it would certainly be good to clarify these ideas.
What is the purpose of sharing information, if that information does not lead in the direction of a consensus? Would you have us share information simply to disagree on our interpretation of it?
The relationship between autonomous weapons and existential risk is this: autonomous weapons have built-in targeting and engagement capabilities. If we could make an analogy to a human warrior, in a rogue AI scenario, any autonomous weapons to which the AI gained access would serve as the ‘sword-arm’ of the rogue AI, while a reasoning model would provide the ‘brains’ to direct and coordinate it. The first step towards regaining control would be to disarm the rogue AI, as one might disarm a human, or remove the stinger on a stingray. The more limited the weaponry that the AI has access to, the easier it would be to disarm.
A high level thing about LessWrong is that we’re primarily focused on sharing information, not advocacy. There may be a later step where you advocate for something, but on LessWrong the dominant mode is discussing / explaining it, so that we can think clearly about what’s true.
Advocacy pushes you down a path of simplifying ideas rather than clearly articulating what’s true, and pushing for consensus for the sake of coordination regardless of whether you’ve actually found the right thing to coordinate on.
“What is the first step towards alignment” isn’t something there’s a strong consensus on, but I don’t think it’s banning autonomous weapons, for a few reasons:
banning weapons doesn’t help solve alignment, it just makes the consequences of one particular type of mis-alignment less bad. The first biggest problem with AI alignment is that it’s a confusing domain we haven’t dealt with before, and I think many first steps are more like “become less confused” than do a particular thing.
from the perspective of “hampering the efforts of a soft takeoff”, it’s not obvious you’d do autonomous weapons vs “dramatically improving security computer systems” or “better controlling wetlabs that the AI could hire to develop novel pathogens”. If you ban autonomous weapons the AI can still just hire mercenaries – killer robots help but are neither necessary nor sufficient for an AI takeover.
I bring this up to highlight that we’re nowhere near a place where it’s “obvious” that this is the first step, and that you can skip to building consensus towards it.
My intent here is to communicate some subtle things about the culture and intent LessWrong, so you can decide whether you want to stick around and participate. This is not a forum for arbitrary types of communication, it’s meant to focus on truthseeking first. Our experience is that people who veer towards advocacy-first or consensus-first tend to subtly degrade truthseeking norms in ways that are hard to reverse.
I also think there are a number of object level things about AI alignment you’re missing. I think your argument here is a reasonable piece of a puzzle but I wouldn’t at all call it “the first step towards AI alignment”. If you want to stick around, expect to have a lot to learn.
“Advocacy pushes you down a path of simplifying ideas rather than clearly articulating what’s true, and pushing for consensus for the sake of coordination regardless of whether you’ve actually found the right thing to coordinate on.”
Simplifying (abstracting) ideas allows us to use them efficiently.
Coordination allows us to combine our talents to achieve a common goal.
The right thing is the one which best helps us achieve our cause.
Our cause, in terms of alignment, is making intelligent machines that help us.
The first step towards helping us is not killing us.
Intelligent weapons are machines with built-in intelligence capabilities specialized for the task of killing humans.
Yes, a rogue AI could try to kill us in other ways: bioweapons, power grid sabotage, communications sabotage, etc. Limiting the development of new microorganisms, especially with regards to AI, would also be a very good step. However, bioweapons research requires human action, and there are very few humans that are both capable and willing to cause human extinction. Sabotage of civilian infrastructure could cause a lot of damage, especially the power grid, which may be vulnerable to cyberattack. https://www.gao.gov/blog/securing-u.s.-electricity-grid-cyberattacks
Human mercenaries causing a societal collapse? That would mean a large number of individuals who are willing to take orders from a machine to actively harm their communities. Very unlikely.
The more human action that an AI requires to function, the more likely a human will notice and eliminate a rogue AI. Unfortunately, the development of weapons which require less human action is proceeding rapidly.
Suppose an LLM or other reasoning model were to enter a bad loop, maybe as the result of a joke, in which it sought to destroy humanity. Suppose it wrote a program which, when installed by the unsuspecting user, created a much smaller model, and this model used other machines to communicate with autonomous weapons, instructing them to destroy key targets. The damage which arises in this scenario would be proportional to the power and intelligence of the autonomous weapons. Hence, the need to stop developing them immediately.
I’m wondering how you can hold that position given all the recent social disorder we’ve seen all over the world where social media driven outrage cycles have been a significant accelerating factor. People are absolutely willing to “take orders from a machine” (i.e. participate in collective action based on memes from social media) in order to “harm their communities” (i.e. cause violence and property destruction).
These memes have been magnified by the words of politicians and media. We need our leaders to discuss things more reasonably.
That said, restricting social media could also make sense. A requirement for in-person verification and limitation to a single account per site could be helpful.