To clarify, when said “performs well”, I did not mean “learns human values well”, nor did I have any sort of scoring rule in mind. I intended to mean that the algorithm learns patterns which are actually present in the world—much like earlier when I talked about “the human-labelling-algorithm ‘working correctly’”.
Ah well. I’ll probably argue with you more about this elsewhere, then :)
Ah well. I’ll probably argue with you more about this elsewhere, then :)