I was surprised Bostrom said he thinks it would be easier to implement a working domesticity solution than directly specifying values; he earlier pointed out why even modest goals (eg. making exactly 10 paperclips) could result in AIs tiling the universe with computational infrastructure.
Also, he described indirect normativity as motivating an AI to carry out a particular process to determine what our goals are and then satisfying those goals, and mentioned in a footnote that you could also just motivate the AI to satisfy the goals that would be outputted by such a process, so that it only has instrumental reasons to carry out the process. It seems to me that the second option is superior. The only reason we want the AI to carry out the process is so that it can optimize the result, so directly motivating it to carry out the process seems like needlessly adding an expected instrumental goal as a final goal. This might even end up being important, for instance if the computation that we would ask the AI to carry out is morally relevant on its own, then we might want the AI to determine the output of the process in an ethical manner that we might not be able to specify ahead of time, but that the AI might be able to figure out before completing the process. If the AI is directly motivated to perform the computation, that might constrain it from optimizing the computation from the standpoint of agents being simulated within the computation.
I was surprised Bostrom said he thinks it would be easier to implement a working domesticity solution than directly specifying values; he earlier pointed out why even modest goals (eg. making exactly 10 paperclips) could result in AIs tiling the universe with computational infrastructure.
Also, he described indirect normativity as motivating an AI to carry out a particular process to determine what our goals are and then satisfying those goals, and mentioned in a footnote that you could also just motivate the AI to satisfy the goals that would be outputted by such a process, so that it only has instrumental reasons to carry out the process. It seems to me that the second option is superior. The only reason we want the AI to carry out the process is so that it can optimize the result, so directly motivating it to carry out the process seems like needlessly adding an expected instrumental goal as a final goal. This might even end up being important, for instance if the computation that we would ask the AI to carry out is morally relevant on its own, then we might want the AI to determine the output of the process in an ethical manner that we might not be able to specify ahead of time, but that the AI might be able to figure out before completing the process. If the AI is directly motivated to perform the computation, that might constrain it from optimizing the computation from the standpoint of agents being simulated within the computation.