FWIW, the thing I actually believe (“My corrigibility proposal sketch”) would be based on an abstract concept that involves causality, counterfactuals, and self-reference.
If I’m not mistaken, this whole conversation is all a big tangent on whether “preferences-over-trajectories” is technically correct terminology, but we don’t need to argue about that, because I’m already convinced that in future posts I should just call it “preferences about anything other than future states”. I consider that terminology equally correct, and (apparently) pedagogically superior. :)
FWIW, the thing I actually believe (“My corrigibility proposal sketch”) would be based on an abstract concept that involves causality, counterfactuals, and self-reference.
If I’m not mistaken, this whole conversation is all a big tangent on whether “preferences-over-trajectories” is technically correct terminology, but we don’t need to argue about that, because I’m already convinced that in future posts I should just call it “preferences about anything other than future states”. I consider that terminology equally correct, and (apparently) pedagogically superior. :)