It continues to feel very bizarre to me to interpret the word “accident” as strongly implying “nobody was being negligent, nobody is to blame, nobody could have possibly seen it coming, etc.”. But I don’t want to deny your lived experience. I guess you interpret the word “accident” as having those connotations, and I figure that if you do, there are probably other people who do too. Maybe it’s a regional dialect thing, or different fields use the term in different ways, who knows. So anyway, going forward, I will endeavor to keep that possibility in mind and maybe put in some extra words of clarification where possible to head off misunderstandings. :)
As I mentioned elsewhere, it is specifically the dichotomy of “accident vs. misuse” that I think is the most problematic and misleading.
I agree with this point.
I do think that there’s a pretty solid dichotomy between (A) “the AGI does things specifically intended by its designers” and (B) “the AGI does things that the designers never wanted it to do”.
I want to use the word “accident” universally for all bad outcomes downstream of (B), regardless of how grossly negligent and reckless people were etc., whereas you don’t want to use the word “accident”. OK, let’s put that aside.
I think (?) that we both agree that bad outcomes downstream of (A) are not necessarily related to “misuse” / “bad actors”. E.g., if there’s a war with AGIs on both sides, and humans are wiped out in the crossfire, I don’t necessarily want to say that either side necessarily was “bad actors”, or that either side’s behavior constitutes “misuse”.
So yeah, I agree that “accident vs misuse” is not a good dichotomy for AGI x-risk.
It inappropriately separates “coordination problems” and “everyone follows the manual”
Thanks, that’s interesting.
I didn’t intend my chart to imply that “everyone follows the manual” doesn’t also require avoiding coordination problems and avoiding bad decisions etc. Obviously it does—or at least, that was obvious to me. Anyway, your feedback is noted. :)
It seems to suppose that there is such a manual, or the goal of creating one. However, if we coordinate effectively, we can simply forgo development and deployment of dangerous technologies ~indefinitely.
I agree that “never ever create AGI” is an option in principle. (It doesn’t strike me as a feasible option in practice; does it to you? I know this is off-topic, I’m just curious)
Your text seems to imply that you don’t think creating such a manual is a goal at all. That seems so crazy to me that I have to assume I’m misunderstanding. (If there were a technical plan that would lead to aligned AGI, we would want to know what it is, right? Isn’t that the main thing that you & your group are working on?)
I do think that there’s a pretty solid dichotomy between (A) “the AGI does things specifically intended by its designers” and (B) “the AGI does things that the designers never wanted it to do”.
1) I don’t think this dichotomy is as solid as it seems once you start poking at it… e.g. in your war example, it would be odd to say that the designers of the AGI systems that wiped out humans intended for that outcome to occur. Intentions are perhaps best thought of as incomplete specifications.
2) From our current position, I think “never ever create AGI” is a significantly easier thing to coordinate around than “don’t build AGI until/unless we can do it safely”. I’m not very worried that we will coordinate too successfully and never build AGI and thus squander the cosmic endowment. This is both because I think that’s quite unlikely, and because I’m not sure we’ll make very good / the best use of it anyways (e.g. think S-risk, other civilizations).
1) Oh, sorry, what I meant was, the generals in Country A want their AGI to help them “win the war”, even if it involves killing people in Country B + innocent bystanders. And vice-versa for Country B. And then, between the efforts of both AGIs, the humans are all dead. But nothing here was either an “AGI accident unintended-by-the-designers behavior” nor “AGI misuse” by my definitions.
But anyway, yes I can imagine situations where it’s unclear whether “the AGI does things specifically intended by its designers”. That’s why I said “pretty solid” and not “rock solid” :) I think we probably disagree about whether these situations are the main thing we should be talking about, versus edge-cases we can put aside most of the time. From my perspective, they’re edge-cases. For example, the scenarios where a power-seeking AGI kills everyone are clearly on the “unintended” side of the (imperfect) dichotomy. But I guess it’s fine that other people are focused on different problems from me, and that “intent-alignment is poorly defined” may be a more central consideration for them. ¯\_(ツ)_/¯
3) I like your “note on terminology post”. But I also think of myself as subscribing to “the conventional framing of AI alignment”. I’m kinda confused that you see the former as counter to the latter.
2) From our current position, I think “never ever create AGI” is a significantly easier thing to coordinate around than “don’t build AGI until/unless we can do it safely”. I’m not very worried that we will coordinate too successfully and never build AGI and thus squander the cosmic endowment…
If you’re working on that, then I wish you luck! It does seem maybe feasible to buy some time. It doesn’t seem feasible to put off AGI forever. (But I’m not an expert.) It seems you agree.
I think creating such a manual is an incredibly ambitious goal, and I think more people in this community should aim for more moderate goals.
Obviously the manual will not be written by one person, and obviously some parts of the manual will not be written until the endgame, where we know more about AGI than we do today. But we can still try to make as much progress on the manual as we can, right?
The post you linked says “alignment is not enough”, which I see as obviously true, but that post doesn’t say “alignment is not necessary”. So, we still need that manual, right?
Delaying AGI forever would obviate the need for a manual, but it sounds like you’re only hoping to buy time—in which case, at some point we’re still gonna need that manual, right?
It’s possible that some ways to build AGI are safer than others. Having that information would be very useful, because then we can try to coordinate around never building AGI in the dangerous ways. And that’s exactly the kind of information that one would find in the manual, right?
Thanks for your reply!
It continues to feel very bizarre to me to interpret the word “accident” as strongly implying “nobody was being negligent, nobody is to blame, nobody could have possibly seen it coming, etc.”. But I don’t want to deny your lived experience. I guess you interpret the word “accident” as having those connotations, and I figure that if you do, there are probably other people who do too. Maybe it’s a regional dialect thing, or different fields use the term in different ways, who knows. So anyway, going forward, I will endeavor to keep that possibility in mind and maybe put in some extra words of clarification where possible to head off misunderstandings. :)
I agree with this point.
I do think that there’s a pretty solid dichotomy between (A) “the AGI does things specifically intended by its designers” and (B) “the AGI does things that the designers never wanted it to do”.
I want to use the word “accident” universally for all bad outcomes downstream of (B), regardless of how grossly negligent and reckless people were etc., whereas you don’t want to use the word “accident”. OK, let’s put that aside.
I think (?) that we both agree that bad outcomes downstream of (A) are not necessarily related to “misuse” / “bad actors”. E.g., if there’s a war with AGIs on both sides, and humans are wiped out in the crossfire, I don’t necessarily want to say that either side necessarily was “bad actors”, or that either side’s behavior constitutes “misuse”.
So yeah, I agree that “accident vs misuse” is not a good dichotomy for AGI x-risk.
Thanks, that’s interesting.
I didn’t intend my chart to imply that “everyone follows the manual” doesn’t also require avoiding coordination problems and avoiding bad decisions etc. Obviously it does—or at least, that was obvious to me. Anyway, your feedback is noted. :)
I agree that “never ever create AGI” is an option in principle. (It doesn’t strike me as a feasible option in practice; does it to you? I know this is off-topic, I’m just curious)
Your text seems to imply that you don’t think creating such a manual is a goal at all. That seems so crazy to me that I have to assume I’m misunderstanding. (If there were a technical plan that would lead to aligned AGI, we would want to know what it is, right? Isn’t that the main thing that you & your group are working on?)
1) I don’t think this dichotomy is as solid as it seems once you start poking at it… e.g. in your war example, it would be odd to say that the designers of the AGI systems that wiped out humans intended for that outcome to occur. Intentions are perhaps best thought of as incomplete specifications.
2) From our current position, I think “never ever create AGI” is a significantly easier thing to coordinate around than “don’t build AGI until/unless we can do it safely”. I’m not very worried that we will coordinate too successfully and never build AGI and thus squander the cosmic endowment. This is both because I think that’s quite unlikely, and because I’m not sure we’ll make very good / the best use of it anyways (e.g. think S-risk, other civilizations).
3) I think the conventional framing of AI alignment is something between vague and substantively incorrect, as well as being misleading. Here is a post I dashed off about that:
https://www.lesswrong.com/posts/biP5XBmqvjopvky7P/a-note-on-terminology-ai-alignment-ai-x-safety. I think creating such a manual is an incredibly ambitious goal, and I think more people in this community should aim for more moderate goals. I mostly agree with the perspective in this post: https://coordination.substack.com/p/alignment-is-not-enough, but I could say more on the matter.
4) RE connotations of accident: I think they are often strong.
1) Oh, sorry, what I meant was, the generals in Country A want their AGI to help them “win the war”, even if it involves killing people in Country B + innocent bystanders. And vice-versa for Country B. And then, between the efforts of both AGIs, the humans are all dead. But nothing here was either an “AGI
accidentunintended-by-the-designers behavior” nor “AGI misuse” by my definitions.But anyway, yes I can imagine situations where it’s unclear whether “the AGI does things specifically intended by its designers”. That’s why I said “pretty solid” and not “rock solid” :) I think we probably disagree about whether these situations are the main thing we should be talking about, versus edge-cases we can put aside most of the time. From my perspective, they’re edge-cases. For example, the scenarios where a power-seeking AGI kills everyone are clearly on the “unintended” side of the (imperfect) dichotomy. But I guess it’s fine that other people are focused on different problems from me, and that “intent-alignment is poorly defined” may be a more central consideration for them. ¯\_(ツ)_/¯
3) I like your “note on terminology post”. But I also think of myself as subscribing to “the conventional framing of AI alignment”. I’m kinda confused that you see the former as counter to the latter.
If you’re working on that, then I wish you luck! It does seem maybe feasible to buy some time. It doesn’t seem feasible to put off AGI forever. (But I’m not an expert.) It seems you agree.
Obviously the manual will not be written by one person, and obviously some parts of the manual will not be written until the endgame, where we know more about AGI than we do today. But we can still try to make as much progress on the manual as we can, right?
The post you linked says “alignment is not enough”, which I see as obviously true, but that post doesn’t say “alignment is not necessary”. So, we still need that manual, right?
Delaying AGI forever would obviate the need for a manual, but it sounds like you’re only hoping to buy time—in which case, at some point we’re still gonna need that manual, right?
It’s possible that some ways to build AGI are safer than others. Having that information would be very useful, because then we can try to coordinate around never building AGI in the dangerous ways. And that’s exactly the kind of information that one would find in the manual, right?