I agree on pretty much every point you’ve raised. I agree that there’s a huge danger in successful DWIMAC or alignment-to-a-person. It could well lead to catastrophic conflict. I think this deserves a lot more analysis, because the creators of AGI will probably going to shoot for that if there’s not a much better argument against than we’ve seen so far.
This was entirely off-topic for this post; I don’t know where we got off topic, but it didn’t start in my last comment. And as you say, I think the choice of alignment target is almost as important as technical alignment techniques.
On the other hand, if alignment to human values isn’t a stable target, we might be better off relying on the good nature of whoever both aligns their AGI to their intent/values, and wins the AGI war. It’s easier to indulge ones’ good nature when there is nearly zero downside to doing so, because you have incontestable control over the known lightcone. Even if horrible things happened in that war, most humans would prefer a happy, flourishing group of humans to be their friend. Sociopaths are the exception, so this route does not fill me with confidence either.
I think there’s more to be worked out here.
Your suggestion that multiple DWIMAC AGIs with different allegiences might establish both the wisdom and a means of cooperating and splitting the rapidly expanding pie. I also place some guarded optimism in that possibility.
I’m not sure if I’m the best person to be thinking/speculating on issues like that: I’m pretty sure I’m a better AI engineer than I am philosopher/ethicist, and there are a lot of people more familiar with the AI policy space than I am. On the other hand, I’m pretty sure I’ve spent longer thinking about the intersection of AI and ethics/philosophy than the great majority of AI engineers have (as in fifteen years), and few of the AI policy people that I’ve read have written much on the “if we solve the Alignment problem, what should we attempt align AI to, and what might the social and Realpolitique consequences of different choices be?” (And then there’s the complicating question of “Are there also internal/technical/stability under reflection/philosophical constraints on that choice?” — to which I strongly suspect the short answer is “yes”, even though I’m not a moral realist.) There was some discussion of this sort of stuff about 10–15 years ago on Less Wrong, but back then we knew a lot less about what sort of AI we were likely to be aligning, what its strengths and weaknesses would be, and how human-like vs alien and incomprehensible an intelligence it would be (the theoretical assumptions back then on Less Wrong tended to be more around some combination of direct construction like AIXI and/or reinforcement learning, rather than SGD token-prediction from the Internet), so we have a lot more useful information now about where the hard and easy parts are likely to be, and about the sociopolitical context.
I feel the same way about being unqualified to consider the geopolitical dynamics. But I also agree that the questions of technical alignment and best alignment target are interconnected (e.g., instruction-following as target seems to make technical alignment much easier). Therefore, I think no single human being is qualified to answer the whole question. As such, I think we need collaboration with people with other expertise. Do you happen to have any references or names for people who understand geopolitics and might grapple with technical alignment questions in conjunction with them?
I agree that we have much better footing to address both the technical and alignment target questions now than 10-15 years ago. So I think we need a new concerted effort.
Do you happen to have any references or names for people who understand geopolitics and might grapple with technical alignment questions in conjunction with them?
Also no, but I’m sure there are many such people reading Less Wrong/the Alignment Forum. Perhaps one or both of us should write posts outlining the issues, and see if we can get a discussion started?
I agree on pretty much every point you’ve raised. I agree that there’s a huge danger in successful DWIMAC or alignment-to-a-person. It could well lead to catastrophic conflict. I think this deserves a lot more analysis, because the creators of AGI will probably going to shoot for that if there’s not a much better argument against than we’ve seen so far.
This was entirely off-topic for this post; I don’t know where we got off topic, but it didn’t start in my last comment. And as you say, I think the choice of alignment target is almost as important as technical alignment techniques.
On the other hand, if alignment to human values isn’t a stable target, we might be better off relying on the good nature of whoever both aligns their AGI to their intent/values, and wins the AGI war. It’s easier to indulge ones’ good nature when there is nearly zero downside to doing so, because you have incontestable control over the known lightcone. Even if horrible things happened in that war, most humans would prefer a happy, flourishing group of humans to be their friend. Sociopaths are the exception, so this route does not fill me with confidence either.
I think there’s more to be worked out here.
Your suggestion that multiple DWIMAC AGIs with different allegiences might establish both the wisdom and a means of cooperating and splitting the rapidly expanding pie. I also place some guarded optimism in that possibility.
I’m not sure if I’m the best person to be thinking/speculating on issues like that: I’m pretty sure I’m a better AI engineer than I am philosopher/ethicist, and there are a lot of people more familiar with the AI policy space than I am. On the other hand, I’m pretty sure I’ve spent longer thinking about the intersection of AI and ethics/philosophy than the great majority of AI engineers have (as in fifteen years), and few of the AI policy people that I’ve read have written much on the “if we solve the Alignment problem, what should we attempt align AI to, and what might the social and Realpolitique consequences of different choices be?” (And then there’s the complicating question of “Are there also internal/technical/stability under reflection/philosophical constraints on that choice?” — to which I strongly suspect the short answer is “yes”, even though I’m not a moral realist.) There was some discussion of this sort of stuff about 10–15 years ago on Less Wrong, but back then we knew a lot less about what sort of AI we were likely to be aligning, what its strengths and weaknesses would be, and how human-like vs alien and incomprehensible an intelligence it would be (the theoretical assumptions back then on Less Wrong tended to be more around some combination of direct construction like AIXI and/or reinforcement learning, rather than SGD token-prediction from the Internet), so we have a lot more useful information now about where the hard and easy parts are likely to be, and about the sociopolitical context.
I feel the same way about being unqualified to consider the geopolitical dynamics. But I also agree that the questions of technical alignment and best alignment target are interconnected (e.g., instruction-following as target seems to make technical alignment much easier). Therefore, I think no single human being is qualified to answer the whole question. As such, I think we need collaboration with people with other expertise. Do you happen to have any references or names for people who understand geopolitics and might grapple with technical alignment questions in conjunction with them?
I agree that we have much better footing to address both the technical and alignment target questions now than 10-15 years ago. So I think we need a new concerted effort.
Also no, but I’m sure there are many such people reading Less Wrong/the Alignment Forum. Perhaps one or both of us should write posts outlining the issues, and see if we can get a discussion started?