have the true utility function, definition of chocolate, etc be “historical” facts that are not in the AI’s future.
The whole point of stratification (which is a kind of counterfactual reasoning) is to achieve this. Most value learning suggestions that I’ve seen do not.
The whole point of stratification (which is a kind of counterfactual reasoning) is to achieve this. Most value learning suggestions that I’ve seen do not.
What are you thinking of here? Could you point to an example?