I don’t see a significant difference in your distinction between alignment and control. If you say alignment is about doing what you want (which I strongly disagree with in its generality, e.g. when someone might want to murder or torture people or otherwise act unethically), that obviously includes your wanting to “be OK” when the AI didn’t do exactly what you want. Alignment comes in degrees, and you merely seem to equate control with non-perfect alignment and alignment with perfect alignment. Or I might be misunderstanding what you have in mind.
In this post, we argue that AI labs should ensure that powerful AIs are controlled. That is, labs should make sure that the safety measures they apply to their powerful models prevent unacceptably bad outcomes, even if the AIs are misaligned and intentionally try to subvert those safety measures. We think no fundamental research breakthroughs are required for labs to implement safety measures that meet our standard for AI control for early transformatively useful AIs; we think that meeting our standard would substantially reduce the risks posed by intentional subversion.
I don’t see a significant difference in your distinction between alignment and control. If you say alignment is about doing what you want (which I strongly disagree with in its generality, e.g. when someone might want to murder or torture people or otherwise act unethically), that obviously includes your wanting to “be OK” when the AI didn’t do exactly what you want. Alignment comes in degrees, and you merely seem to equate control with non-perfect alignment and alignment with perfect alignment. Or I might be misunderstanding what you have in mind.
The actual definition comes from this quote here:
And the full link is below:
https://www.lesswrong.com/tag/ai-control