I think this is evidence against the hypothesis that a system trained to make lots of correct predictions will thereby intrinsically value making lots of correct predictions.
Note that Yudkowsky said
maybe if you train a thing really hard to predict humans, then among the things that it likes are tiny, little pseudo-things that meet the definition of human, but weren’t in its training data, and that are much easier to predict
which isn’t at all the same thing as intrinsically valuing making lots of correct predictions. A better analogy would be the question of whether humans like things that are easier to visually predict. (Except that’s presumably one of many things that went into human RL, so presumably this is a weaker prediction for humans than it is for GPT-n?)
Note that Yudkowsky said
which isn’t at all the same thing as intrinsically valuing making lots of correct predictions. A better analogy would be the question of whether humans like things that are easier to visually predict. (Except that’s presumably one of many things that went into human RL, so presumably this is a weaker prediction for humans than it is for GPT-n?)