Yes, the terminology I’m using is AGI = roguishly comparable to human capacity, may be somewhat higher or lower in narrow areas, such that a contest between human society and a rogue AGI is an interesting contest, and may depend on who gets to pick the terrain on which it’s conducted; wheras ASI = at least significantly beyond human capacity across almost all areas that matter, such that a contest between human society and a rogue ASI is a foregone conclusion.
On style of alignment: in the post I touched on the question of what happens if you have multiple ASIs aligned to the well-being of different sets of humans: my prediction was that it very likely leads to an intelligence race and then a high-tech war. This is also my concern for DWIMAC-aligned AI in the possession of different groups of humans: that if the technological difference between the capabilities of different groups get too high, we see a repeat of the events described in Guns, Germs, and Steel. That didn’t happen during the cold war because of Mutual Assured Destruction, since the technological differential between the two sides never got that big (and to the extent that the Soviet Block lost the Cold War, it was primarily because it started to lose the technological race). I agree that Realpolitique may initially pull us towards DWIMAC alignment: I’m concerned that that may be an x-risk in the somewhat longer term. Most likely one human-led faction pulls ahead, and then coopts/concquers/takes over/exterminates all other factions. At the end of which you only have one faction, and if they’re wise enough to realize they don’t want to repeat that, they may move over to a well-being-of-all-humanity aligned design. I’m arguing that we should foresee and avoid that mistake, but I agree there’s a significant risk that we won’t be that wise/magnanimous/sensible.
Anyway, the topic you raises is basically orthogonal to the subject of my post — the technique I outline here can be used to aim for any (philosophically and ethically self-consistent) form of alignment that we can create a large synthetic training set describing a great many examples of. In describing an example of the approach, I assumed my preferred style of alignment, but the technique is broadly applicable, including to DWIMAC alignment. The real question you’re raising is what is a/the stable convergence target for the cycle of self-improvement of aligned AIs assisting us in beilding better-aligned AIs that this technique is intended us to get us to the start of: a topic which is pretty speculative at this point, and more the subject of my posts on the basin of convergence to alignment than this one. It’s an interesting question though, and I’m thinking about it, and if I reach any interesting conclusions I’ll likely write another post.
A very brief stab at this: suppose an ASI is created by a corporation. The purpose of a creation is to maximize the well-being of its creator(s) (see my basin-of-convergenceposts for a justification), in this case the shareholders of the company (in proportion to their shareholding, presumably). The question then becomes to what extent it is in the interests of those shareholders for the ASI to align to the interests of other people as well. The answer to this in a multipolar world where there are several such ASIs of comparable power levels is probably that the risk of war is too high unless they all align significantly to the well-being of all humanity, and only have a preference towards their individual corporate shareholders to whatever limited extent avoids excessive conflict. Whereas in a unipolar world, the sole ASI is capable of outmaneuvering the rest of humanity and creating an oligopoly iof the shareholders, and would presumably do so if it believed that that was in their interest (or under DWIMAC, if they believed it was in their interest). Ethically, humans have a strong instinctive sense of fairness, but that generally applies in situations where individual power levels are comparable and the advantages of cooperating on iterated non-zero-sum games outweigh those of winning in a non-iterated zero-sum game. By definition, taking over the world for your shareholders is a non-iterated zero-sum game, except for situations where conflict can make it negative-sum.
I agree on pretty much every point you’ve raised. I agree that there’s a huge danger in successful DWIMAC or alignment-to-a-person. It could well lead to catastrophic conflict. I think this deserves a lot more analysis, because the creators of AGI will probably going to shoot for that if there’s not a much better argument against than we’ve seen so far.
This was entirely off-topic for this post; I don’t know where we got off topic, but it didn’t start in my last comment. And as you say, I think the choice of alignment target is almost as important as technical alignment techniques.
On the other hand, if alignment to human values isn’t a stable target, we might be better off relying on the good nature of whoever both aligns their AGI to their intent/values, and wins the AGI war. It’s easier to indulge ones’ good nature when there is nearly zero downside to doing so, because you have incontestable control over the known lightcone. Even if horrible things happened in that war, most humans would prefer a happy, flourishing group of humans to be their friend. Sociopaths are the exception, so this route does not fill me with confidence either.
I think there’s more to be worked out here.
Your suggestion that multiple DWIMAC AGIs with different allegiences might establish both the wisdom and a means of cooperating and splitting the rapidly expanding pie. I also place some guarded optimism in that possibility.
I’m not sure if I’m the best person to be thinking/speculating on issues like that: I’m pretty sure I’m a better AI engineer than I am philosopher/ethicist, and there are a lot of people more familiar with the AI policy space than I am. On the other hand, I’m pretty sure I’ve spent longer thinking about the intersection of AI and ethics/philosophy than the great majority of AI engineers have (as in fifteen years), and few of the AI policy people that I’ve read have written much on the “if we solve the Alignment problem, what should we attempt align AI to, and what might the social and Realpolitique consequences of different choices be?” (And then there’s the complicating question of “Are there also internal/technical/stability under reflection/philosophical constraints on that choice?” — to which I strongly suspect the short answer is “yes”, even though I’m not a moral realist.) There was some discussion of this sort of stuff about 10–15 years ago on Less Wrong, but back then we knew a lot less about what sort of AI we were likely to be aligning, what its strengths and weaknesses would be, and how human-like vs alien and incomprehensible an intelligence it would be (the theoretical assumptions back then on Less Wrong tended to be more around some combination of direct construction like AIXI and/or reinforcement learning, rather than SGD token-prediction from the Internet), so we have a lot more useful information now about where the hard and easy parts are likely to be, and about the sociopolitical context.
I feel the same way about being unqualified to consider the geopolitical dynamics. But I also agree that the questions of technical alignment and best alignment target are interconnected (e.g., instruction-following as target seems to make technical alignment much easier). Therefore, I think no single human being is qualified to answer the whole question. As such, I think we need collaboration with people with other expertise. Do you happen to have any references or names for people who understand geopolitics and might grapple with technical alignment questions in conjunction with them?
I agree that we have much better footing to address both the technical and alignment target questions now than 10-15 years ago. So I think we need a new concerted effort.
Do you happen to have any references or names for people who understand geopolitics and might grapple with technical alignment questions in conjunction with them?
Also no, but I’m sure there are many such people reading Less Wrong/the Alignment Forum. Perhaps one or both of us should write posts outlining the issues, and see if we can get a discussion started?
Yes, the terminology I’m using is AGI = roguishly comparable to human capacity, may be somewhat higher or lower in narrow areas, such that a contest between human society and a rogue AGI is an interesting contest, and may depend on who gets to pick the terrain on which it’s conducted; wheras ASI = at least significantly beyond human capacity across almost all areas that matter, such that a contest between human society and a rogue ASI is a foregone conclusion.
On style of alignment: in the post I touched on the question of what happens if you have multiple ASIs aligned to the well-being of different sets of humans: my prediction was that it very likely leads to an intelligence race and then a high-tech war. This is also my concern for DWIMAC-aligned AI in the possession of different groups of humans: that if the technological difference between the capabilities of different groups get too high, we see a repeat of the events described in Guns, Germs, and Steel. That didn’t happen during the cold war because of Mutual Assured Destruction, since the technological differential between the two sides never got that big (and to the extent that the Soviet Block lost the Cold War, it was primarily because it started to lose the technological race). I agree that Realpolitique may initially pull us towards DWIMAC alignment: I’m concerned that that may be an x-risk in the somewhat longer term. Most likely one human-led faction pulls ahead, and then coopts/concquers/takes over/exterminates all other factions. At the end of which you only have one faction, and if they’re wise enough to realize they don’t want to repeat that, they may move over to a well-being-of-all-humanity aligned design. I’m arguing that we should foresee and avoid that mistake, but I agree there’s a significant risk that we won’t be that wise/magnanimous/sensible.
Anyway, the topic you raises is basically orthogonal to the subject of my post — the technique I outline here can be used to aim for any (philosophically and ethically self-consistent) form of alignment that we can create a large synthetic training set describing a great many examples of. In describing an example of the approach, I assumed my preferred style of alignment, but the technique is broadly applicable, including to DWIMAC alignment. The real question you’re raising is what is a/the stable convergence target for the cycle of self-improvement of aligned AIs assisting us in beilding better-aligned AIs that this technique is intended us to get us to the start of: a topic which is pretty speculative at this point, and more the subject of my posts on the basin of convergence to alignment than this one. It’s an interesting question though, and I’m thinking about it, and if I reach any interesting conclusions I’ll likely write another post.
A very brief stab at this: suppose an ASI is created by a corporation. The purpose of a creation is to maximize the well-being of its creator(s) (see my basin-of-convergence posts for a justification), in this case the shareholders of the company (in proportion to their shareholding, presumably). The question then becomes to what extent it is in the interests of those shareholders for the ASI to align to the interests of other people as well. The answer to this in a multipolar world where there are several such ASIs of comparable power levels is probably that the risk of war is too high unless they all align significantly to the well-being of all humanity, and only have a preference towards their individual corporate shareholders to whatever limited extent avoids excessive conflict. Whereas in a unipolar world, the sole ASI is capable of outmaneuvering the rest of humanity and creating an oligopoly iof the shareholders, and would presumably do so if it believed that that was in their interest (or under DWIMAC, if they believed it was in their interest). Ethically, humans have a strong instinctive sense of fairness, but that generally applies in situations where individual power levels are comparable and the advantages of cooperating on iterated non-zero-sum games outweigh those of winning in a non-iterated zero-sum game. By definition, taking over the world for your shareholders is a non-iterated zero-sum game, except for situations where conflict can make it negative-sum.
I agree on pretty much every point you’ve raised. I agree that there’s a huge danger in successful DWIMAC or alignment-to-a-person. It could well lead to catastrophic conflict. I think this deserves a lot more analysis, because the creators of AGI will probably going to shoot for that if there’s not a much better argument against than we’ve seen so far.
This was entirely off-topic for this post; I don’t know where we got off topic, but it didn’t start in my last comment. And as you say, I think the choice of alignment target is almost as important as technical alignment techniques.
On the other hand, if alignment to human values isn’t a stable target, we might be better off relying on the good nature of whoever both aligns their AGI to their intent/values, and wins the AGI war. It’s easier to indulge ones’ good nature when there is nearly zero downside to doing so, because you have incontestable control over the known lightcone. Even if horrible things happened in that war, most humans would prefer a happy, flourishing group of humans to be their friend. Sociopaths are the exception, so this route does not fill me with confidence either.
I think there’s more to be worked out here.
Your suggestion that multiple DWIMAC AGIs with different allegiences might establish both the wisdom and a means of cooperating and splitting the rapidly expanding pie. I also place some guarded optimism in that possibility.
I’m not sure if I’m the best person to be thinking/speculating on issues like that: I’m pretty sure I’m a better AI engineer than I am philosopher/ethicist, and there are a lot of people more familiar with the AI policy space than I am. On the other hand, I’m pretty sure I’ve spent longer thinking about the intersection of AI and ethics/philosophy than the great majority of AI engineers have (as in fifteen years), and few of the AI policy people that I’ve read have written much on the “if we solve the Alignment problem, what should we attempt align AI to, and what might the social and Realpolitique consequences of different choices be?” (And then there’s the complicating question of “Are there also internal/technical/stability under reflection/philosophical constraints on that choice?” — to which I strongly suspect the short answer is “yes”, even though I’m not a moral realist.) There was some discussion of this sort of stuff about 10–15 years ago on Less Wrong, but back then we knew a lot less about what sort of AI we were likely to be aligning, what its strengths and weaknesses would be, and how human-like vs alien and incomprehensible an intelligence it would be (the theoretical assumptions back then on Less Wrong tended to be more around some combination of direct construction like AIXI and/or reinforcement learning, rather than SGD token-prediction from the Internet), so we have a lot more useful information now about where the hard and easy parts are likely to be, and about the sociopolitical context.
I feel the same way about being unqualified to consider the geopolitical dynamics. But I also agree that the questions of technical alignment and best alignment target are interconnected (e.g., instruction-following as target seems to make technical alignment much easier). Therefore, I think no single human being is qualified to answer the whole question. As such, I think we need collaboration with people with other expertise. Do you happen to have any references or names for people who understand geopolitics and might grapple with technical alignment questions in conjunction with them?
I agree that we have much better footing to address both the technical and alignment target questions now than 10-15 years ago. So I think we need a new concerted effort.
Also no, but I’m sure there are many such people reading Less Wrong/the Alignment Forum. Perhaps one or both of us should write posts outlining the issues, and see if we can get a discussion started?