One of the problem of Oracles is that they will mislead us (eg social engineering, seduction) to achieve their goals. Most ways of addressing this either require strict accuracy in the response (which means the response might be incomprehensible or useless, eg if its given in terms of atom position) or some measure about the human reaction to the response (which brings up the risk of social engineering).
If we can program the AI to distinguish between human beliefs and values, we can give it goals like “ensure that human beliefs post answer are more accurate, while their values are (almost?) unchanged”. This solves the issue of defining an accurate answer, and removes that part of the risk (it doesn’t remove others).
One of the problem of Oracles is that they will mislead us (eg social engineering, seduction) to achieve their goals. Most ways of addressing this either require strict accuracy in the response (which means the response might be incomprehensible or useless, eg if its given in terms of atom position) or some measure about the human reaction to the response (which brings up the risk of social engineering).
If we can program the AI to distinguish between human beliefs and values, we can give it goals like “ensure that human beliefs post answer are more accurate, while their values are (almost?) unchanged”. This solves the issue of defining an accurate answer, and removes that part of the risk (it doesn’t remove others).