Facile answer: Why, that’s just what the Soviets believed, this Skinner-box model of human psychology devoid of innate instincts, and they tried to build New Soviet Humans that way, and failed, which was an experimental test of their model that falsified it.
There’s a popular tendency to conflate the two ideas:
“we should think of humans as doing within-lifetime learning by RL”, versus
“we should think of humans as doing within-lifetime learning by RL, where the reward function is whatever parents and/or other authority figures want it to be”
The second is associated with behaviorism, and is IMO preposterous. Intrinsic motivation is a thing; in fact, it’s kinda the only thing! The reward function is in the person’s own head, although things happening in the outside world are some of the inputs to it. Thus parents have some influence on the rewards (just like everything else in the world has some influence on the rewards), but the influence is through many paths, some very indirect, and the net influence is not even necessarily in the direction that the parent imagines it to be (thus reverse psychology is a thing!). My read of behavioral genetics is that approximately nothing that parents do to kids (within the typical distribution) has much if any effect on what kinds of adults their kids will grow into.
(Note the disanalogy to AGI, where the programmers get to write the reward function however they want.)
(…Although there’s some analogy to AGI if we don’t have perfect interpretability of the AGI’s thoughts, which seems likely.)
But none of this is evidence that the first bullet point is wrong. I think the first bullet point is true and important.
Slightly less facile answer: Because people are better at detecting cheating, in problems isomorphic to the Wason Selection Task, than they are at performing the naked Wason Selection Task, the conventional explanation of which is that we have built-in cheater detectors. This is a case in point of how humans aren’t blank slates and there’s no reason to pretend we are.
IIUC the experiment being referred to here showed that people did poorly on a reasoning task related to the proposition “if a card shows an even number on one face, then its opposite face is red”, but did much better on the same reasoning task related the proposition “If you are drinking alcohol, then you must be over 18”. This was taken to be evidence that humans have an innate cognitive adaptation for cheater-detectors. I think a better explanation is that most people don’t have a deep understanding of IF-THEN, but rather have learned some heuristics that work well enough in the everyday situations where IF-THEN is normally used. But “if you are drinking alcohol, then you must be over 18” is a sensible story. You don’t need a good understanding of IF-THEN to triangulate what the rule is and why it’s being applied. By contrast, the experimental subjects have no particular prior beliefs for “if a card shows an even number on one face, then its opposite face is red”.
In the paper, Cosmides & Tooby purport to rule out “familiarity” as a factor by noting that people do poorly on “If a person goes to Boston, then he takes the subway” and “If a person eats hot chili peppers, then he will drink a cold beer.” But those examples miss the point. If I said to you “Hey I want to tell you something about drinking alcohol and people-under-18…”, then you could already guess what I’m gonna say before I say it. But if I said to you “Hey I want to tell you something about going to Boston and taking the subway”, your guess would be wrong. Boston is very walkable! The conditional in this latter case is not obvious like it is in the former case. In the latter case, you can’t lean on common sense, you have to actually understand how IF-THEN works.
So I would be interested in a Wason selection task experiment on the following proposition: “If the stove is hot, then I shouldn’t put my hand on it”. This is not cheater-detection—it’s your own hand!—but I’d bet that people would do as well as the drinking question. (Maybe it’s already been done. I think there’s a substantial literature on Wason Selection that I haven’t read.)
(As it turns out, I’m open-minded to the possibility that humans do have cognitive adaptations related to cheater-detection, even if I don’t think this Wason selection task thing provides evidence for that. I think that this adaptation (if it exists) would be implemented via the RL reward function, more-or-less. Long story, still a work in progress.)
Actual answer: Because the entire field of experimental psychology that’s why.
This excerpt isn’t specific so it’s hard to respond, but I do think there’s a lot of garbage in experimental psychology (like every other field), and more specifically I believe that Eliezer has cited some papers in his old blog posts that are bad papers. (Also, even when experimental results are trustworthy, their interpretation can be wrong.) I have some general thoughts on the field of evolutionary psychology in Section 1 here.
I was a bit surprised to see Eliezer invoke the Wason Selection Task. I’ll admit that I haven’t actually thought this through rigorously, but my sense was that modern machine learning had basically disproven the evpsych argument that those experimental results require the existence of a separate cheating-detection module. As well as generally calling the whole massive modularity thesis into severe question, since the kinds of results that evpsych used to explain using dedicated innate modules now look a lot more like something that could be produced with something like GPT.
… but again I never really thought this through explicitly, it was just a general shift of intuitions that happened over several years and maybe it’s wrong.
I haven’t read the posts that you’re referencing, but I would assume that GPT would exhibit learned modularity—modules that reflect the underlying structure of its training data—rather than innately encoded modularity. E.g. CLIP also ends up having a “Spiderman neuron” that activates when it sees features associated with Spiderman, so you could kind of say that there’s a “Spiderman module”, but nobody ever sat down to specifically write code that would ensure the emergence of a Spiderman module in CLIP.
Likewise, experimental results like the Wason Selection Task seem to me explainable as outcomes of within-lifetime learning that does end up creating a modular structure out of the data—without there needing to be any particular evolutionary hardwiring for it.
Specifying the dataset is one way to ensure some collection of neurons will represent Spiderman specifically, even when it’s not on purpose. « Pay attention to face » sounds enough to make our dataset full of social information, maybe enough to ensure a cheating-detector module (most likely a distributed representation) emerges.
Suppose the universal-learning-machine side of the debate is correct. Then the genome builds a big within-lifetime learning algorithm, and this learning algorithm does gradient descent (or whatever other learning rule) and thus gradually builds a trained model in the animal’s brain as it gets older and wiser. It’s possible that this trained model will turn out to be modular. It’s also possible that it won’t. I don’t know which will happen—it’s an interesting question. Maybe I could find out the answer by reading that sequence you linked. But whatever the answer is, this question is not related to the evolved-modularity-vs-universal-learning-machine debate. This whole paragraph is universal-learning-machine either way, by assumption.
By contrast, the evolved modularity side of the debate would NOT look like the genome building a big within-lifetime learning algorithm in the first place. Rather it would look like the genome building an “intuitive biology” algorithm, and an “intuitive physics” algorithm, and an “intuitive human social relations” algorithm, and a vision-processing algorithm, and various other things, with all those algorithms also incorporating learning (somehow—the details here tend to be glossed over IMO).
I also just tried giving the Wason selection task to text-davinci-003 using the example from Wikipedia, and it didn’t get the right answer once in 10 tries. I rephrased the example so it was talking about hands on hot stoves instead, and text-davinci-003 got it right 9⁄10 times.
The rephrasing:
You are shown a set of four cards placed on a table, each of which has a stove temperature on one side and a hand position on the other side. The visible faces of the cards show “hot stove”, “cool stove”, “hand on stove” and “hand off stove”. Which card(s) must you turn over in order to check if anyone has their hand on a hot stove?
Actual answer: Because the entire field of experimental psychology that’s why.
This excerpt isn’t specific so it’s hard to respond, but I do think there’s a lot of garbage in experimental psychology (like every other field), and more specifically I believe that Eliezer has cited some papers in his old blog posts that are bad papers. (Also, even when experimental results are trustworthy, their interpretation can be wrong.) I have some general thoughts on the field of evolutionary psychology in Section 1 here.
Eliezer’s reasoning is surprisingly weak here. It doesn’t really interact with the strong mechanistic claims he’s making (“Motivated reasoning is definitely built-in, but it’s built-in in a way that very strongly bears the signature of ‘What would be the easiest way to build this out of these parts we handily had lying around already’”).
He just flatly states a lot of his beliefs as true:
the conventional explanation of which is that we have built-in cheater detectors. This is a case in point of how humans aren’t blank slates and there’s no reason to pretend we are.
Conventional explanations are often bogus, and in particular I expect this one to be bogus.
Here, Eliezer states his dubious-to-me stances as obviously True, without explaining how they actually distinguish between mechanistic hypotheses, or e.g. why he thinks he can get so many bits about human learning process hyperparameters from results like Wason (I thought it’s hard to go from superficial behavioral results to statements about messy internals? & inferring “hard-coding” is extremely hard even for obvious-seeming candidates).
Similarly, in the summer (consulting my notes + best recollections here), he claimed ~”Evolution was able to make the (internal physiological reward schedule) ↦ (learned human values) mapping predictable because it spent lots of generations selecting for alignability on caring about proximate real-world quantities like conspecifics or food” and I asked “why do you think evolution had to tailor the reward system specifically to make this possible? what evidence has located this hypothesis?” and he said “I read a neuroscience textbook when I was 11?”, and stared at me with raised eyebrows.
I just stared at him with a shocked face. I thought, surely we’re talking about different things. How could that data have been strong evidence for that hypothesis? I didn’t understand how could possibly neuroscience textbooks provide huge evidence for evolution having to select the reward->value mapping into its current properties.
I also wrote in my journal at the time:
EY said that we might be able to find a learning process with consistent inner / outer values relations if we spent huge amounts of compute to evolve such a learning process in silicon
Eliezer seems to attach some strange importance to the learning process being found by evolution, even though the learning initial conditions screen off evolution’s influence.
I still don’t understand that interaction. But I’ve had a few interactions like this with him, where he confidently states things, and then I ask him why he thinks that, and offers some unrelated-seeming evidence which doesn’t—AFAICT—actually discriminate between hypotheses.
You (correctly, I believe) distinguish between controlling the reward function and controlling the rewards. This is very important as reflected in your noting the disanalogy to AGI. So I’m a little puzzled by your association of the second bullet point (controlling the reward function, which parents have quite low but non-zero control over) with behaviorism (controlling the rewards, which parents have a lot of control over).
Hmm. I’m not sure it’s that important what is or isn’t “behaviorism”, and anyway I’m not an expert on that (I haven’t read original behaviorist writing, so maybe my understanding of “behaviorism” is a caricature by its critics). But anyway, I thought Scott & Eliezer were both interested in the question of what happens when the kid grows up and the parents are no longer around.
My comment above was a bit sloppy. Let me try again. Here are two stories:
“RL with continuous learning” story: The person has an internal reward function in their head, and over time they’ll settle into the patterns of thought & behavior that best tickle their internal reward function. If they spend a lot of time in the presence of their parents, they’ll gradually learn patterns of thought & behavior that best tickle their internal reward function in the presence of their parents. If they spend a lot of time hanging out with friends, they’ll gradually learn patterns of thought & behavior that best tickle their internal reward function when they’re hanging out with friends. As adults in society, they’ll gradually learn patterns of thought & behavior that best tickle their internal reward function as adults in society.
“RL learn-then-get-stuck” story: As Scott wrote in OP, “a child does something socially proscribed (eg steal). Their parents punish them. They learn some combination of “don’t steal” and “don’t get caught stealing”. A few people (eg sociopaths) learn only “don’t get caught stealing”, but most of the rest of us get at least some genuine aversion to stealing that eventually generalizes into a real sense of ethics.” (And that “real sense of ethics” persists through adulthood.)
I think lots of evidence favors the first story over the second story, at least in humans (I don’t know much about non-human animals). Particularly: (1) heritability studies, (2) cultural shifts, (3) people’s ability to have kinda different personalities in different social contexts, like reverting to childhood roles / personalities when they visit family for the holidays. I don’t want to say that the second story never happens, but it seems to me to be an unusual edge case, like childhood phobias / trauma that persists into adulthood, whereas the first story is central.
That’s one topic, maybe the main one at issue here. Then a second topic is: even leaving aside what happens after the kid grows up, let’s zoom in on childhood. I wrote “If they spend a lot of time in the presence of their parents, they’ll gradually learn patterns of thought & behavior that best tickle their internal reward function in the presence of their parents.” In that context, my comment above was bringing up the fact that IMO parental control over rewards is pretty minimal, such that the “patterns of thought & behavior that best tickle the kid’s internal reward function in the presence of their parents” can be quite different from “the thoughts & behaviors that the parent wishes the kid would have”. I think this has a lot to do with the fact that the parent can’t see inside the kid’s head and issue positive rewards when the kid thinks docile & obedient thoughts, and negative rewards when the kid thinks defiant thoughts. If defiant thoughts are its own reward in the kid’s internal reward function, then the kid is getting a continuous laser-targeted stream of rewards for thinking defiant thoughts, potentially hundreds or thousands of times per day, whereas a parent’s ability to ground their kid or withhold dessert or whatever is comparatively rare and poorly-targeted.
(UPDATE: I WROTE A BETTER DISCUSSION OF THIS TOPIC AT: Heritability, Behaviorism, and Within-Lifetime RL)
There’s a popular tendency to conflate the two ideas:
“we should think of humans as doing within-lifetime learning by RL”, versus
“we should think of humans as doing within-lifetime learning by RL, where the reward function is whatever parents and/or other authority figures want it to be”
The second is associated with behaviorism, and is IMO preposterous. Intrinsic motivation is a thing; in fact, it’s kinda the only thing! The reward function is in the person’s own head, although things happening in the outside world are some of the inputs to it. Thus parents have some influence on the rewards (just like everything else in the world has some influence on the rewards), but the influence is through many paths, some very indirect, and the net influence is not even necessarily in the direction that the parent imagines it to be (thus reverse psychology is a thing!). My read of behavioral genetics is that approximately nothing that parents do to kids (within the typical distribution) has much if any effect on what kinds of adults their kids will grow into.
(Note the disanalogy to AGI, where the programmers get to write the reward function however they want.)
(…Although there’s some analogy to AGI if we don’t have perfect interpretability of the AGI’s thoughts, which seems likely.)
But none of this is evidence that the first bullet point is wrong. I think the first bullet point is true and important.
IIUC the experiment being referred to here showed that people did poorly on a reasoning task related to the proposition “if a card shows an even number on one face, then its opposite face is red”, but did much better on the same reasoning task related the proposition “If you are drinking alcohol, then you must be over 18”. This was taken to be evidence that humans have an innate cognitive adaptation for cheater-detectors. I think a better explanation is that most people don’t have a deep understanding of IF-THEN, but rather have learned some heuristics that work well enough in the everyday situations where IF-THEN is normally used. But “if you are drinking alcohol, then you must be over 18” is a sensible story. You don’t need a good understanding of IF-THEN to triangulate what the rule is and why it’s being applied. By contrast, the experimental subjects have no particular prior beliefs for “if a card shows an even number on one face, then its opposite face is red”.
In the paper, Cosmides & Tooby purport to rule out “familiarity” as a factor by noting that people do poorly on “If a person goes to Boston, then he takes the subway” and “If a person eats hot chili peppers, then he will drink a cold beer.” But those examples miss the point. If I said to you “Hey I want to tell you something about drinking alcohol and people-under-18…”, then you could already guess what I’m gonna say before I say it. But if I said to you “Hey I want to tell you something about going to Boston and taking the subway”, your guess would be wrong. Boston is very walkable! The conditional in this latter case is not obvious like it is in the former case. In the latter case, you can’t lean on common sense, you have to actually understand how IF-THEN works.
So I would be interested in a Wason selection task experiment on the following proposition: “If the stove is hot, then I shouldn’t put my hand on it”. This is not cheater-detection—it’s your own hand!—but I’d bet that people would do as well as the drinking question. (Maybe it’s already been done. I think there’s a substantial literature on Wason Selection that I haven’t read.)
(As it turns out, I’m open-minded to the possibility that humans do have cognitive adaptations related to cheater-detection, even if I don’t think this Wason selection task thing provides evidence for that. I think that this adaptation (if it exists) would be implemented via the RL reward function, more-or-less. Long story, still a work in progress.)
This excerpt isn’t specific so it’s hard to respond, but I do think there’s a lot of garbage in experimental psychology (like every other field), and more specifically I believe that Eliezer has cited some papers in his old blog posts that are bad papers. (Also, even when experimental results are trustworthy, their interpretation can be wrong.) I have some general thoughts on the field of evolutionary psychology in Section 1 here.
I was a bit surprised to see Eliezer invoke the Wason Selection Task. I’ll admit that I haven’t actually thought this through rigorously, but my sense was that modern machine learning had basically disproven the evpsych argument that those experimental results require the existence of a separate cheating-detection module. As well as generally calling the whole massive modularity thesis into severe question, since the kinds of results that evpsych used to explain using dedicated innate modules now look a lot more like something that could be produced with something like GPT.
… but again I never really thought this through explicitly, it was just a general shift of intuitions that happened over several years and maybe it’s wrong.
GPT is likely highly modular itself. Most ML models that generalize well are.
I haven’t read the posts that you’re referencing, but I would assume that GPT would exhibit learned modularity—modules that reflect the underlying structure of its training data—rather than innately encoded modularity. E.g. CLIP also ends up having a “Spiderman neuron” that activates when it sees features associated with Spiderman, so you could kind of say that there’s a “Spiderman module”, but nobody ever sat down to specifically write code that would ensure the emergence of a Spiderman module in CLIP.
Likewise, experimental results like the Wason Selection Task seem to me explainable as outcomes of within-lifetime learning that does end up creating a modular structure out of the data—without there needing to be any particular evolutionary hardwiring for it.
Specifying the dataset is one way to ensure some collection of neurons will represent Spiderman specifically, even when it’s not on purpose. « Pay attention to face » sounds enough to make our dataset full of social information, maybe enough to ensure a cheating-detector module (most likely a distributed representation) emerges.
I think that’s a different topic.
We’re talking about the evolved-modularity-vs-universal-learning-machine debate.
Suppose the universal-learning-machine side of the debate is correct. Then the genome builds a big within-lifetime learning algorithm, and this learning algorithm does gradient descent (or whatever other learning rule) and thus gradually builds a trained model in the animal’s brain as it gets older and wiser. It’s possible that this trained model will turn out to be modular. It’s also possible that it won’t. I don’t know which will happen—it’s an interesting question. Maybe I could find out the answer by reading that sequence you linked. But whatever the answer is, this question is not related to the evolved-modularity-vs-universal-learning-machine debate. This whole paragraph is universal-learning-machine either way, by assumption.
By contrast, the evolved modularity side of the debate would NOT look like the genome building a big within-lifetime learning algorithm in the first place. Rather it would look like the genome building an “intuitive biology” algorithm, and an “intuitive physics” algorithm, and an “intuitive human social relations” algorithm, and a vision-processing algorithm, and various other things, with all those algorithms also incorporating learning (somehow—the details here tend to be glossed over IMO).
I also just tried giving the Wason selection task to text-davinci-003 using the example from Wikipedia, and it didn’t get the right answer once in 10 tries. I rephrased the example so it was talking about hands on hot stoves instead, and text-davinci-003 got it right 9⁄10 times.
The rephrasing:
It also seems worth noting that Language models show human-like content effects on reasoning, including on the Wason selection task.
Eliezer’s reasoning is surprisingly weak here. It doesn’t really interact with the strong mechanistic claims he’s making (“Motivated reasoning is definitely built-in, but it’s built-in in a way that very strongly bears the signature of ‘What would be the easiest way to build this out of these parts we handily had lying around already’”).
He just flatly states a lot of his beliefs as true:
Conventional explanations are often bogus, and in particular I expect this one to be bogus.
Here, Eliezer states his dubious-to-me stances as obviously True, without explaining how they actually distinguish between mechanistic hypotheses, or e.g. why he thinks he can get so many bits about human learning process hyperparameters from results like Wason (I thought it’s hard to go from superficial behavioral results to statements about messy internals? & inferring “hard-coding” is extremely hard even for obvious-seeming candidates).
Similarly, in the summer (consulting my notes + best recollections here), he claimed ~”Evolution was able to make the (internal physiological reward schedule) ↦ (learned human values) mapping predictable because it spent lots of generations selecting for alignability on caring about proximate real-world quantities like conspecifics or food” and I asked “why do you think evolution had to tailor the reward system specifically to make this possible? what evidence has located this hypothesis?” and he said “I read a neuroscience textbook when I was 11?”, and stared at me with raised eyebrows.
I just stared at him with a shocked face. I thought, surely we’re talking about different things. How could that data have been strong evidence for that hypothesis? I didn’t understand how could possibly neuroscience textbooks provide huge evidence for evolution having to select the reward->value mapping into its current properties.
I also wrote in my journal at the time:
Eliezer seems to attach some strange importance to the learning process being found by evolution, even though the learning initial conditions screen off evolution’s influence.
I still don’t understand that interaction. But I’ve had a few interactions like this with him, where he confidently states things, and then I ask him why he thinks that, and offers some unrelated-seeming evidence which doesn’t—AFAICT—actually discriminate between hypotheses.
You (correctly, I believe) distinguish between controlling the reward function and controlling the rewards. This is very important as reflected in your noting the disanalogy to AGI. So I’m a little puzzled by your association of the second bullet point (controlling the reward function, which parents have quite low but non-zero control over) with behaviorism (controlling the rewards, which parents have a lot of control over).
UPDATE: I WROTE A BETTER DISCUSSION OF THIS TOPIC AT: Heritability, Behaviorism, and Within-Lifetime RL)
Hmm. I’m not sure it’s that important what is or isn’t “behaviorism”, and anyway I’m not an expert on that (I haven’t read original behaviorist writing, so maybe my understanding of “behaviorism” is a caricature by its critics). But anyway, I thought Scott & Eliezer were both interested in the question of what happens when the kid grows up and the parents are no longer around.
My comment above was a bit sloppy. Let me try again. Here are two stories:
“RL with continuous learning” story: The person has an internal reward function in their head, and over time they’ll settle into the patterns of thought & behavior that best tickle their internal reward function. If they spend a lot of time in the presence of their parents, they’ll gradually learn patterns of thought & behavior that best tickle their internal reward function in the presence of their parents. If they spend a lot of time hanging out with friends, they’ll gradually learn patterns of thought & behavior that best tickle their internal reward function when they’re hanging out with friends. As adults in society, they’ll gradually learn patterns of thought & behavior that best tickle their internal reward function as adults in society.
“RL learn-then-get-stuck” story: As Scott wrote in OP, “a child does something socially proscribed (eg steal). Their parents punish them. They learn some combination of “don’t steal” and “don’t get caught stealing”. A few people (eg sociopaths) learn only “don’t get caught stealing”, but most of the rest of us get at least some genuine aversion to stealing that eventually generalizes into a real sense of ethics.” (And that “real sense of ethics” persists through adulthood.)
I think lots of evidence favors the first story over the second story, at least in humans (I don’t know much about non-human animals). Particularly: (1) heritability studies, (2) cultural shifts, (3) people’s ability to have kinda different personalities in different social contexts, like reverting to childhood roles / personalities when they visit family for the holidays. I don’t want to say that the second story never happens, but it seems to me to be an unusual edge case, like childhood phobias / trauma that persists into adulthood, whereas the first story is central.
That’s one topic, maybe the main one at issue here. Then a second topic is: even leaving aside what happens after the kid grows up, let’s zoom in on childhood. I wrote “If they spend a lot of time in the presence of their parents, they’ll gradually learn patterns of thought & behavior that best tickle their internal reward function in the presence of their parents.” In that context, my comment above was bringing up the fact that IMO parental control over rewards is pretty minimal, such that the “patterns of thought & behavior that best tickle the kid’s internal reward function in the presence of their parents” can be quite different from “the thoughts & behaviors that the parent wishes the kid would have”. I think this has a lot to do with the fact that the parent can’t see inside the kid’s head and issue positive rewards when the kid thinks docile & obedient thoughts, and negative rewards when the kid thinks defiant thoughts. If defiant thoughts are its own reward in the kid’s internal reward function, then the kid is getting a continuous laser-targeted stream of rewards for thinking defiant thoughts, potentially hundreds or thousands of times per day, whereas a parent’s ability to ground their kid or withhold dessert or whatever is comparatively rare and poorly-targeted.