Another way of putting it: A parochially aligned AI (for task T) needs to understand task T, but doesn’t need to have common sense “background values” like “don’t kill anyone”.
Narrow AIs might require parochial alignment techniques in order to learn to perform tasks that we don’t know how to write a good reward function for. And we might try to combine parochial alignment with capability control in order to get something like a genie without having to teach it background values. When/whether that would be a good idea is unclear ATM.
Another way of putting it: A parochially aligned AI (for task T) needs to understand task T, but doesn’t need to have common sense “background values” like “don’t kill anyone”.
Narrow AIs might require parochial alignment techniques in order to learn to perform tasks that we don’t know how to write a good reward function for. And we might try to combine parochial alignment with capability control in order to get something like a genie without having to teach it background values. When/whether that would be a good idea is unclear ATM.
Got it, thanks. The part of the definition that I didn’t grasp was “H’s preferences over the intended task domain”.