This is perhaps the most promising solution (if we want to stick with Eliezer’s approach). I’m not sure it really works though. How does your IE process meta-moral arguments, for example, arguments about whether average utilitarianism or total utilitarianism is right? (Presumably BE wants IE to be influenced by those arguments in roughly the same way that BE would.) What does “right” mean to it while it’s thinking about those kinds of arguments?
It could refer to evaluation of potential self-improvements. What the agent does is not necessarily right, and even the thing with highest goodness-score which the agent will fail to find is not necessarily right, because the agent could self-improve instead and compute a right-er action using its improved architecture where there could be no longer any goodness score, for example.
This is perhaps the most promising solution (if we want to stick with Eliezer’s approach). I’m not sure it really works though. How does your IE process meta-moral arguments, for example, arguments about whether average utilitarianism or total utilitarianism is right? (Presumably BE wants IE to be influenced by those arguments in roughly the same way that BE would.) What does “right” mean to it while it’s thinking about those kinds of arguments?
It could refer to evaluation of potential self-improvements. What the agent does is not necessarily right, and even the thing with highest goodness-score which the agent will fail to find is not necessarily right, because the agent could self-improve instead and compute a right-er action using its improved architecture where there could be no longer any goodness score, for example.