How is the framing of this post “off”? It provides an invitation for agreement on a thesis. The thesis is very broad, yes, and it would certainly be good to clarify these ideas.
What is the purpose of sharing information, if that information does not lead in the direction of a consensus? Would you have us share information simply to disagree on our interpretation of it?
The relationship between autonomous weapons and existential risk is this: autonomous weapons have built-in targeting and engagement capabilities. If we could make an analogy to a human warrior, in a rogue AI scenario, any autonomous weapons to which the AI gained access would serve as the ‘sword-arm’ of the rogue AI, while a reasoning model would provide the ‘brains’ to direct and coordinate it. The first step towards regaining control would be to disarm the rogue AI, as one might disarm a human, or remove the stinger on a stingray. The more limited the weaponry that the AI has access to, the easier it would be to disarm.
A high level thing about LessWrong is that we’re primarily focused on sharing information, not advocacy. There may be a later step where you advocate for something, but on LessWrong the dominant mode is discussing / explaining it, so that we can think clearly about what’s true.
Advocacy pushes you down a path of simplifying ideas rather than clearly articulating what’s true, and pushing for consensus for the sake of coordination regardless of whether you’ve actually found the right thing to coordinate on.
“What is the first step towards alignment” isn’t something there’s a strong consensus on, but I don’t think it’s banning autonomous weapons, for a few reasons:
banning weapons doesn’t help solve alignment, it just makes the consequences of one particular type of mis-alignment less bad. The first biggest problem with AI alignment is that it’s a confusing domain we haven’t dealt with before, and I think many first steps are more like “become less confused” than do a particular thing.
from the perspective of “hampering the efforts of a soft takeoff”, it’s not obvious you’d do autonomous weapons vs “dramatically improving security computer systems” or “better controlling wetlabs that the AI could hire to develop novel pathogens”. If you ban autonomous weapons the AI can still just hire mercenaries – killer robots help but are neither necessary nor sufficient for an AI takeover.
I bring this up to highlight that we’re nowhere near a place where it’s “obvious” that this is the first step, and that you can skip to building consensus towards it.
My intent here is to communicate some subtle things about the culture and intent LessWrong, so you can decide whether you want to stick around and participate. This is not a forum for arbitrary types of communication, it’s meant to focus on truthseeking first. Our experience is that people who veer towards advocacy-first or consensus-first tend to subtly degrade truthseeking norms in ways that are hard to reverse.
I also think there are a number of object level things about AI alignment you’re missing. I think your argument here is a reasonable piece of a puzzle but I wouldn’t at all call it “the first step towards AI alignment”. If you want to stick around, expect to have a lot to learn.
“Advocacy pushes you down a path of simplifying ideas rather than clearly articulating what’s true, and pushing for consensus for the sake of coordination regardless of whether you’ve actually found the right thing to coordinate on.”
Simplifying (abstracting) ideas allows us to use them efficiently.
Coordination allows us to combine our talents to achieve a common goal.
The right thing is the one which best helps us achieve our cause.
Our cause, in terms of alignment, is making intelligent machines that help us.
The first step towards helping us is not killing us.
Intelligent weapons are machines with built-in intelligence capabilities specialized for the task of killing humans.
Yes, a rogue AI could try to kill us in other ways: bioweapons, power grid sabotage, communications sabotage, etc. Limiting the development of new microorganisms, especially with regards to AI, would also be a very good step. However, bioweapons research requires human action, and there are very few humans that are both capable and willing to cause human extinction. Sabotage of civilian infrastructure could cause a lot of damage, especially the power grid, which may be vulnerable to cyberattack. https://www.gao.gov/blog/securing-u.s.-electricity-grid-cyberattacks
Human mercenaries causing a societal collapse? That would mean a large number of individuals who are willing to take orders from a machine to actively harm their communities. Very unlikely.
The more human action that an AI requires to function, the more likely a human will notice and eliminate a rogue AI. Unfortunately, the development of weapons which require less human action is proceeding rapidly.
Suppose an LLM or other reasoning model were to enter a bad loop, maybe as the result of a joke, in which it sought to destroy humanity. Suppose it wrote a program which, when installed by the unsuspecting user, created a much smaller model, and this model used other machines to communicate with autonomous weapons, instructing them to destroy key targets. The damage which arises in this scenario would be proportional to the power and intelligence of the autonomous weapons. Hence, the need to stop developing them immediately.
Human mercenaries causing a societal collapse? That would mean a large number of individuals who are willing to take orders from a machine to actively harm their communities. Very unlikely.
I’m wondering how you can hold that position given all the recent social disorder we’ve seen all over the world where social media driven outrage cycles have been a significant accelerating factor. People are absolutely willing to “take orders from a machine” (i.e. participate in collective action based on memes from social media) in order to “harm their communities” (i.e. cause violence and property destruction).
These memes have been magnified by the words of politicians and media. We need our leaders to discuss things more reasonably.
That said, restricting social media could also make sense. A requirement for in-person verification and limitation to a single account per site could be helpful.
How is the framing of this post “off”? It provides an invitation for agreement on a thesis. The thesis is very broad, yes, and it would certainly be good to clarify these ideas.
What is the purpose of sharing information, if that information does not lead in the direction of a consensus? Would you have us share information simply to disagree on our interpretation of it?
The relationship between autonomous weapons and existential risk is this: autonomous weapons have built-in targeting and engagement capabilities. If we could make an analogy to a human warrior, in a rogue AI scenario, any autonomous weapons to which the AI gained access would serve as the ‘sword-arm’ of the rogue AI, while a reasoning model would provide the ‘brains’ to direct and coordinate it. The first step towards regaining control would be to disarm the rogue AI, as one might disarm a human, or remove the stinger on a stingray. The more limited the weaponry that the AI has access to, the easier it would be to disarm.
A high level thing about LessWrong is that we’re primarily focused on sharing information, not advocacy. There may be a later step where you advocate for something, but on LessWrong the dominant mode is discussing / explaining it, so that we can think clearly about what’s true.
Advocacy pushes you down a path of simplifying ideas rather than clearly articulating what’s true, and pushing for consensus for the sake of coordination regardless of whether you’ve actually found the right thing to coordinate on.
“What is the first step towards alignment” isn’t something there’s a strong consensus on, but I don’t think it’s banning autonomous weapons, for a few reasons:
banning weapons doesn’t help solve alignment, it just makes the consequences of one particular type of mis-alignment less bad. The first biggest problem with AI alignment is that it’s a confusing domain we haven’t dealt with before, and I think many first steps are more like “become less confused” than do a particular thing.
from the perspective of “hampering the efforts of a soft takeoff”, it’s not obvious you’d do autonomous weapons vs “dramatically improving security computer systems” or “better controlling wetlabs that the AI could hire to develop novel pathogens”. If you ban autonomous weapons the AI can still just hire mercenaries – killer robots help but are neither necessary nor sufficient for an AI takeover.
I bring this up to highlight that we’re nowhere near a place where it’s “obvious” that this is the first step, and that you can skip to building consensus towards it.
My intent here is to communicate some subtle things about the culture and intent LessWrong, so you can decide whether you want to stick around and participate. This is not a forum for arbitrary types of communication, it’s meant to focus on truthseeking first. Our experience is that people who veer towards advocacy-first or consensus-first tend to subtly degrade truthseeking norms in ways that are hard to reverse.
I also think there are a number of object level things about AI alignment you’re missing. I think your argument here is a reasonable piece of a puzzle but I wouldn’t at all call it “the first step towards AI alignment”. If you want to stick around, expect to have a lot to learn.
“Advocacy pushes you down a path of simplifying ideas rather than clearly articulating what’s true, and pushing for consensus for the sake of coordination regardless of whether you’ve actually found the right thing to coordinate on.”
Simplifying (abstracting) ideas allows us to use them efficiently.
Coordination allows us to combine our talents to achieve a common goal.
The right thing is the one which best helps us achieve our cause.
Our cause, in terms of alignment, is making intelligent machines that help us.
The first step towards helping us is not killing us.
Intelligent weapons are machines with built-in intelligence capabilities specialized for the task of killing humans.
Yes, a rogue AI could try to kill us in other ways: bioweapons, power grid sabotage, communications sabotage, etc. Limiting the development of new microorganisms, especially with regards to AI, would also be a very good step. However, bioweapons research requires human action, and there are very few humans that are both capable and willing to cause human extinction. Sabotage of civilian infrastructure could cause a lot of damage, especially the power grid, which may be vulnerable to cyberattack. https://www.gao.gov/blog/securing-u.s.-electricity-grid-cyberattacks
Human mercenaries causing a societal collapse? That would mean a large number of individuals who are willing to take orders from a machine to actively harm their communities. Very unlikely.
The more human action that an AI requires to function, the more likely a human will notice and eliminate a rogue AI. Unfortunately, the development of weapons which require less human action is proceeding rapidly.
Suppose an LLM or other reasoning model were to enter a bad loop, maybe as the result of a joke, in which it sought to destroy humanity. Suppose it wrote a program which, when installed by the unsuspecting user, created a much smaller model, and this model used other machines to communicate with autonomous weapons, instructing them to destroy key targets. The damage which arises in this scenario would be proportional to the power and intelligence of the autonomous weapons. Hence, the need to stop developing them immediately.
I’m wondering how you can hold that position given all the recent social disorder we’ve seen all over the world where social media driven outrage cycles have been a significant accelerating factor. People are absolutely willing to “take orders from a machine” (i.e. participate in collective action based on memes from social media) in order to “harm their communities” (i.e. cause violence and property destruction).
These memes have been magnified by the words of politicians and media. We need our leaders to discuss things more reasonably.
That said, restricting social media could also make sense. A requirement for in-person verification and limitation to a single account per site could be helpful.