I don’t see how this changes the picture? If you train a model on real time feedback from a human, that human algorithm is still the same one that is foolable by i.e cutting down the tree and replacing it with papier-mache or something. None of this forces the model to learn a correspondence between the human ontology and the model’s internal best-guess model because the reason any of this is a problem in the first place is the fact that the human algorithm points at a thing which is not the thing we actually care about.
I don’t see how this changes the picture? If you train a model on real time feedback from a human, that human algorithm is still the same one that is foolable by i.e cutting down the tree and replacing it with papier-mache or something. None of this forces the model to learn a correspondence between the human ontology and the model’s internal best-guess model because the reason any of this is a problem in the first place is the fact that the human algorithm points at a thing which is not the thing we actually care about.