DanielFilan comments on My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”

DanielFilan 21 Mar 2023 7:17 UTC
11 points
7

I think this is evidence against the hypothesis that a system trained to make lots of correct predictions will thereby intrinsically value making lots of correct predictions.

Note that Yudkowsky said

maybe if you train a thing really hard to predict humans, then among the things that it likes are tiny, little pseudo-things that meet the definition of human, but weren’t in its training data, and that are much easier to predict

which isn’t at all the same thing as intrinsically valuing making lots of correct predictions. A better analogy would be the question of whether humans like things that are easier to visually predict. (Except that’s presumably one of many things that went into human RL, so presumably this is a weaker prediction for humans than it is for GPT-n?)