Even worse, in some sense we are hoping that the AI isn’t smart enough to learn the true explanation for the training examples, which is that humans picked them.
Ah right, I don’t often think about this but it’s a good point and a likely source of Goodharting as systems become more capable.
Ah right, I don’t often think about this but it’s a good point and a likely source of Goodharting as systems become more capable.