I think the danger of intent-alignment without societal-alignment is pretty important to consider, although I’m not sure how important it will be in practice. Previously, I was considering writing a post about a similar topic—something about intent-level alignment being insufficient because we hadn’t worked out metaethical issues like how to stably combine multiple people’s moral preferences and so on. I’m not so sure about this now, because of an argument along the lines of “given that it’s aligned with a thoughtful, altruistically motivated team, an intent-aligned AGI would be able to help scale their philosophical thinking so that they reach the same conclusions they would have come to after a much longer period of reflection, and then the AGI can work towards implementing that theory of metaethics.”
I could see the concerns in this post being especially important if things work out such that a full solution to intent-alignment becomes widely available (i.e. easily usable by corporations and potential bad actors) and takeoff is slow enough for these non-altruistic entities to develop powerful AGIs pursuing their own ends. This may be a compelling argument for withholding a solution to intent-alignment from the world if one is discovered.
And this comment I am replying to here says, “I could see the concerns in this post being especially important if things work out such that a full solution to intent-alignment becomes widely available.”
My guess, and a motivation for writing this post, is that we see something in between (a.) wide and open distribution of intent-aligned AGI (that somehow leads to well-balanced highly multi-polar scenarios), and (b.) completely central ownership (by a beneficial group of very conscientious philosopher-AI-researchers) of intent-aligned AGI.
I think the danger of intent-alignment without societal-alignment is pretty important to consider, although I’m not sure how important it will be in practice. Previously, I was considering writing a post about a similar topic—something about intent-level alignment being insufficient because we hadn’t worked out metaethical issues like how to stably combine multiple people’s moral preferences and so on. I’m not so sure about this now, because of an argument along the lines of “given that it’s aligned with a thoughtful, altruistically motivated team, an intent-aligned AGI would be able to help scale their philosophical thinking so that they reach the same conclusions they would have come to after a much longer period of reflection, and then the AGI can work towards implementing that theory of metaethics.”
Here’s a recent post that covers at least some of these concerns (although it focuses more on the scenario where one EA-aligned group develops an AGI that takes control of the future): https://www.lesswrong.com/posts/DJRe5obJd7kqCkvRr/don-t-leave-your-fingerprints-on-the-future
I could see the concerns in this post being especially important if things work out such that a full solution to intent-alignment becomes widely available (i.e. easily usable by corporations and potential bad actors) and takeoff is slow enough for these non-altruistic entities to develop powerful AGIs pursuing their own ends. This may be a compelling argument for withholding a solution to intent-alignment from the world if one is discovered.
Thanks.
There seems to be pretty wide disagreement about how intent-aligned AGI could lead to a good outcome.
For example, even in the first couple comments to this post:
The comment above (https://www.lesswrong.com/posts/Rn4wn3oqfinAsqBSf/?commentId=zpmQnkyvFKKbF9au2) suggests “wide open decentralized distribution of AI” as the solution to making intent-aligned AGI deployment go well.
And this comment I am replying to here says, “I could see the concerns in this post being especially important if things work out such that a full solution to intent-alignment becomes widely available.”
My guess, and a motivation for writing this post, is that we see something in between (a.) wide and open distribution of intent-aligned AGI (that somehow leads to well-balanced highly multi-polar scenarios), and (b.) completely central ownership (by a beneficial group of very conscientious philosopher-AI-researchers) of intent-aligned AGI.