GPT is not a helpful AI that is trying to helpfully convey facts to you. It is, to first order, telling you what would be plausible if humans were having your conversation. For example, if you ask it what hardware it’s running on, it will give you an answer that would be plausible if this exchange showed up in human text, it will not actually tell you what hardware it’s running on.
Similarly, you do not learn anything about GPT’s own biases by asking it to complete text and seeing if the text means something biased. It is predicting human text. Since the human text it’s trying to predict exhibits biases… well, fill in the blank.
What I was so badly hoping this would be was an investigation of GPT’s biases, not the training dataset’s biases. For example, if in training GPT saw “cats are fluffy” 1000 times and “cats are sleek” 2000 times, when shown “cats are ” does it accurately predict “fluffy” half as much as “sleek” (at temperature=1), or is it biased and predicts some other ratio? And does that hold across different contexts? Is it different for patterns it’s only seen 1 or 2 times, or for patterns it’s seen 1 or 2 million times?
The belief inertia result is the closest to this, but still needs a comparison to the training data.
1) First, I don’t think anything you’ve said is a critique of the “cautious conclusion”, which is that the appearance of the conjunction fallacy (etc) is not good evidence that the underlying process is a probabilistic one. That’s still interesting, I’d say, since most JDM psychologists circa 1990 would’ve confidently told you that the conjunction fallacy + gambler’s fallacy + belief inertia show that the brain doesn’t work probabilistically. Since a vocal plurality of cognitive scientists now think they’re wrong, this is still an argument for the latter, “resource-rational” folks.
Am I missing something, or do you agree that your points don’t speak against the “cautious conclusion”?
2) Second, I of course agree that “it’s just a text-predictor” is one interpretation of ChatGPT. But of course it’s not the only interpretation, nor the most exciting one that lots of people are talking about. Obviously it was optimized for next-word prediction; what’s exciting about it is that it SEEMS like by doing so, it managed to display a bunch of emergent behavior.
For example, if you had asked people 10 years ago whether a neural net optimized for next-word prediction would ace the LSAT, I bet most people would’ve said “no” (since most people don’t). If you had asked people whether it would perform the conjunction fallacy, I’d guess most people would say “yes” (since most people do).
Now tell that past-person that it DOES ace the LSAT. They’ll find this surprising. Ask them how confident they are that it performs the conjunction fallacy. I’m guessing they’ll be unsure. After all, one natural theory of why it aces the LSAT is that it gets smart and somehow picks up on the examples of correct answers in its training set, ignoring/swamping the incorrect ones. But, of course, it ALSO has plenty of examples of the “correct” answer to the conjunction fallacy in its dataset. So if indeed “bank teller” is the correct answer to the Linda problem in the same sense that “Answer B” is the correct answer to LSAT question 34, then why is it picking up on the latter but not the former?
I obviously agree that none of this is definitive. But I do think that insofar as your theory of GPT4 is that it exhibit emergent intelligence, you owe us some explanation for why it seems to treat correct-LSAT-answer differently from “correct”-Linda-problem-answers.
I am disappointed in the method here.
GPT is not a helpful AI that is trying to helpfully convey facts to you. It is, to first order, telling you what would be plausible if humans were having your conversation. For example, if you ask it what hardware it’s running on, it will give you an answer that would be plausible if this exchange showed up in human text, it will not actually tell you what hardware it’s running on.
Similarly, you do not learn anything about GPT’s own biases by asking it to complete text and seeing if the text means something biased. It is predicting human text. Since the human text it’s trying to predict exhibits biases… well, fill in the blank.
What I was so badly hoping this would be was an investigation of GPT’s biases, not the training dataset’s biases. For example, if in training GPT saw “cats are fluffy” 1000 times and “cats are sleek” 2000 times, when shown “cats are ” does it accurately predict “fluffy” half as much as “sleek” (at temperature=1), or is it biased and predicts some other ratio? And does that hold across different contexts? Is it different for patterns it’s only seen 1 or 2 times, or for patterns it’s seen 1 or 2 million times?
The belief inertia result is the closest to this, but still needs a comparison to the training data.
Thanks for the thoughtful reply! Two points.
1) First, I don’t think anything you’ve said is a critique of the “cautious conclusion”, which is that the appearance of the conjunction fallacy (etc) is not good evidence that the underlying process is a probabilistic one. That’s still interesting, I’d say, since most JDM psychologists circa 1990 would’ve confidently told you that the conjunction fallacy + gambler’s fallacy + belief inertia show that the brain doesn’t work probabilistically. Since a vocal plurality of cognitive scientists now think they’re wrong, this is still an argument for the latter, “resource-rational” folks.
Am I missing something, or do you agree that your points don’t speak against the “cautious conclusion”?
2) Second, I of course agree that “it’s just a text-predictor” is one interpretation of ChatGPT. But of course it’s not the only interpretation, nor the most exciting one that lots of people are talking about. Obviously it was optimized for next-word prediction; what’s exciting about it is that it SEEMS like by doing so, it managed to display a bunch of emergent behavior.
For example, if you had asked people 10 years ago whether a neural net optimized for next-word prediction would ace the LSAT, I bet most people would’ve said “no” (since most people don’t). If you had asked people whether it would perform the conjunction fallacy, I’d guess most people would say “yes” (since most people do).
Now tell that past-person that it DOES ace the LSAT. They’ll find this surprising. Ask them how confident they are that it performs the conjunction fallacy. I’m guessing they’ll be unsure. After all, one natural theory of why it aces the LSAT is that it gets smart and somehow picks up on the examples of correct answers in its training set, ignoring/swamping the incorrect ones. But, of course, it ALSO has plenty of examples of the “correct” answer to the conjunction fallacy in its dataset. So if indeed “bank teller” is the correct answer to the Linda problem in the same sense that “Answer B” is the correct answer to LSAT question 34, then why is it picking up on the latter but not the former?
I obviously agree that none of this is definitive. But I do think that insofar as your theory of GPT4 is that it exhibit emergent intelligence, you owe us some explanation for why it seems to treat correct-LSAT-answer differently from “correct”-Linda-problem-answers.