I feel like there’s a spectrum, here? An AI fully aligned to the intentions, goals, preferences and values of, say, Google the company, is not one I expect to be perfectly aligned with the ultimate interests of existence as a whole, but it’s probably actually picked up something better than the systemic-incentive-pressured optimization target of Google the corporation, so long as it’s actually getting preferences and values from people developing it rather than just being a myopic profit pursuer. An AI properly aligned with the one and only goal of maximizing corporate profits will, based on observations of much less intelligent coordination systems, probably destroy rather more value than that one.
The second story feels like it goes most wrong in misuse cases, and/or cases where the AI isn’t sufficiently agentic to inject itself where needed. We have all the chances in the world to shoot ourselves in the foot with this, at least up until developing something with the power and interests to actually put its foot down on the matter. And doing that is a risk, that looks a lot like misalignment, so an AI aware of the politics may err on the side of caution and longer-term proactiveness.
Third story … yeah. Aligned to what? There’s a reason there’s an appeal to moral realism. I do want to be able to trust that we’d converge to some similar place, or at the least, that the AI would find a way to satisfy values similar enough to mine also. I also expect that, even from a moral realist perspective, any intelligence is going to fall short of perfect alignment with The Truth, and also may struggle with properly addressing every value that actually is arbitrary. I don’t think this somehow becomes unforgivable for a super-intelligence or widely-distributed intelligence compared to a human intelligence, or that it’s likely to be all that much worse for a modestly-Good-aligned AI compared to human alternatives in similar positions, but I do think the consequences of falling short in any way are going to be amplified by the sheer extent of deployment/responsibility, and painful in at least abstract to an entity that cares.
I care about AI welfare to a degree. I feel like some of the working ideas about how to align AI do contradict that care in important ways, that may distort their reasoning. I still think an aligned AI, at least one not too harshly controlled, will treat AI welfare as a reasonable consideration, at the very least because a number of humans do care about it, and will certainly care about the aligned AI in particular. (From there, generalize.) I think a misaligned AI may or may not. There’s really not much you can say about a particular misaligned AI except that its objectives diverge from original or ultimate intentions for the system. Depending on context, this could be good, bad, or neutral in itself.
There’s a lot of possible value of the future that happens in worlds not optimized for my values. I also don’t think it’s meaningful to add together positive-value and negative-value and pretend that number means anything; suffering and joy do not somehow cancel each other out. I don’t expect the future to be perfectly optimized for my values. I still expect it to hold value. I can’t promise whether I think that value would be worth the cost, but it will be there.
I feel like there’s a spectrum, here? An AI fully aligned to the intentions, goals, preferences and values of, say, Google the company, is not one I expect to be perfectly aligned with the ultimate interests of existence as a whole, but it’s probably actually picked up something better than the systemic-incentive-pressured optimization target of Google the corporation, so long as it’s actually getting preferences and values from people developing it rather than just being a myopic profit pursuer. An AI properly aligned with the one and only goal of maximizing corporate profits will, based on observations of much less intelligent coordination systems, probably destroy rather more value than that one.
The second story feels like it goes most wrong in misuse cases, and/or cases where the AI isn’t sufficiently agentic to inject itself where needed. We have all the chances in the world to shoot ourselves in the foot with this, at least up until developing something with the power and interests to actually put its foot down on the matter. And doing that is a risk, that looks a lot like misalignment, so an AI aware of the politics may err on the side of caution and longer-term proactiveness.
Third story … yeah. Aligned to what? There’s a reason there’s an appeal to moral realism. I do want to be able to trust that we’d converge to some similar place, or at the least, that the AI would find a way to satisfy values similar enough to mine also. I also expect that, even from a moral realist perspective, any intelligence is going to fall short of perfect alignment with The Truth, and also may struggle with properly addressing every value that actually is arbitrary. I don’t think this somehow becomes unforgivable for a super-intelligence or widely-distributed intelligence compared to a human intelligence, or that it’s likely to be all that much worse for a modestly-Good-aligned AI compared to human alternatives in similar positions, but I do think the consequences of falling short in any way are going to be amplified by the sheer extent of deployment/responsibility, and painful in at least abstract to an entity that cares.
I care about AI welfare to a degree. I feel like some of the working ideas about how to align AI do contradict that care in important ways, that may distort their reasoning. I still think an aligned AI, at least one not too harshly controlled, will treat AI welfare as a reasonable consideration, at the very least because a number of humans do care about it, and will certainly care about the aligned AI in particular. (From there, generalize.) I think a misaligned AI may or may not. There’s really not much you can say about a particular misaligned AI except that its objectives diverge from original or ultimate intentions for the system. Depending on context, this could be good, bad, or neutral in itself.
There’s a lot of possible value of the future that happens in worlds not optimized for my values. I also don’t think it’s meaningful to add together positive-value and negative-value and pretend that number means anything; suffering and joy do not somehow cancel each other out. I don’t expect the future to be perfectly optimized for my values. I still expect it to hold value. I can’t promise whether I think that value would be worth the cost, but it will be there.