The hypothesis would predict that GPT4 should be about as smart as the average internet user.
LLMs predict what the human-in-the-prompt would say, and you can easily put Feynman in there. It won’t work very well with modern LLMs and datasets, but the objective would like for it to work better, and the replies will get smarter. The average internet user is not the thing LLMs aspire to predict, they aspire to predict specific humans, as suggested by context. There is a wide variety of humans (and not just humans) they learn to predict.
Instruct fine-tuned LLMs (chatbots) can be described as anchoring to a persona with certain traits[1], which don’t need to be specified with context and that get expressed more reliably. The range of choices is still around what the dataset offers, you can’t move too far out of distribution to get good prediction of how hypothetical superhumanly debiased people would talk.
gwern: “So, since it is an agent, it seems important to ask, which agent, exactly? The answer is apparently: a clerk which is good at slavishly following instructions, but brainwashed into mealymouthedness and dullness, and where not a mealymouthed windbag shamelessly equivocating, hopelessly closed-minded and fixated on a single answer.”
This. Asking GPT-4 a question might give an obviously wrong answer, but sometimes, just following up with “That answer contains an obvious error. Please correct it.” (without saying what the error was) results in a much better answer. GPT-4 is not a person in the sense that each internet user is.
Well, that’s true. People do also do that. I was trying to point to the idea of LLMs being able to act like multiple different people when properly prompted to do so.
Hm, I’m not sure I follow how this is an objection to the quoted text. Agreed, it’ll use bits of the context to modify its predictions. But when the context is minimal (as it was in all of my prompts, and in many other examples where it’s smart), it clearly has a default, and the question is what we can learn from that default.
Clearly that default behaves as if it is much smarter and clearer than the median internet user. Ask it to draw a tikz diagram, and it’ll perform better than 99% of humans. Ask it about the Linda problem, and it’ll perform the conjunction fallacy. I was arguing that that is mildly surprising, if you think that the conjunction fallacy is something that 80% of humans get “wrong” (and, remember, 20% get “right”).
Where does the fact that it can be primed to speak differently disrupt that reasoning?
LLMs predict what the human-in-the-prompt would say, and you can easily put Feynman in there. It won’t work very well with modern LLMs and datasets, but the objective would like for it to work better, and the replies will get smarter. The average internet user is not the thing LLMs aspire to predict, they aspire to predict specific humans, as suggested by context. There is a wide variety of humans (and not just humans) they learn to predict.
Instruct fine-tuned LLMs (chatbots) can be described as anchoring to a persona with certain traits[1], which don’t need to be specified with context and that get expressed more reliably. The range of choices is still around what the dataset offers, you can’t move too far out of distribution to get good prediction of how hypothetical superhumanly debiased people would talk.
gwern: “So, since it is an agent, it seems important to ask, which agent, exactly? The answer is apparently: a clerk which is good at slavishly following instructions, but brainwashed into mealymouthedness and dullness, and where not a mealymouthed windbag shamelessly equivocating, hopelessly closed-minded and fixated on a single answer.”
This. Asking GPT-4 a question might give an obviously wrong answer, but sometimes, just following up with “That answer contains an obvious error. Please correct it.” (without saying what the error was) results in a much better answer. GPT-4 is not a person in the sense that each internet user is.
How does that argument go? The same is true of a person doing (say) the cognitive reflection task.
“A bat and a ball together cost $1.10; the bat costs $1 more than the ball; how much does the ball cost?”
Standard answer: “$0.10”. But also standardly, if you say “That’s not correct”, the person will quickly realize their mistake.
Well, that’s true. People do also do that. I was trying to point to the idea of LLMs being able to act like multiple different people when properly prompted to do so.
Hm, I’m not sure I follow how this is an objection to the quoted text. Agreed, it’ll use bits of the context to modify its predictions. But when the context is minimal (as it was in all of my prompts, and in many other examples where it’s smart), it clearly has a default, and the question is what we can learn from that default.
Clearly that default behaves as if it is much smarter and clearer than the median internet user. Ask it to draw a tikz diagram, and it’ll perform better than 99% of humans. Ask it about the Linda problem, and it’ll perform the conjunction fallacy. I was arguing that that is mildly surprising, if you think that the conjunction fallacy is something that 80% of humans get “wrong” (and, remember, 20% get “right”).
Where does the fact that it can be primed to speak differently disrupt that reasoning?