If you cock up and define a terminal value that refers to a mutable epistemic state, all bets are off. Like Asimov’s robots on Solaria, who act in accordance with the First Law, but have ‘human’ redefined not to include non-Solarians. Oops. Trouble is that in order to evaluate how you’re doing, there has to be some coupling between values and knowledge, so you must prove the correctness of that coupling. But what is correct? Usually not too hard to define for the toy models we’re used to working with, damned hard as a general problem.
If you cock up and define a terminal value that refers to a mutable epistemic state, all bets are off. Like Asimov’s robots on Solaria, who act in accordance with the First Law, but have ‘human’ redefined not to include non-Solarians. Oops. Trouble is that in order to evaluate how you’re doing, there has to be some coupling between values and knowledge, so you must prove the correctness of that coupling. But what is correct? Usually not too hard to define for the toy models we’re used to working with, damned hard as a general problem.