I desperately want people to stop using “I asked Claude or ChatGPT” as a stand-in for “I got an objective third party to review”
LLMs are not objective. They are trained on the internet which has specific sets of cultural, religious, ideological biases, and then further trained via RL to be biased in a way that a specific for-profit entity wanted them to be.
Perhaps the norm should be to use some sort of LLM-based survey service like https://news.ycombinator.com/item?id=36865625 in order to try to get a more representative population sample of LLM outputs?
This seems like it could be a useful service in general: do the legwork to take base models (not tuned models), and prompt in many ways and reformulate in many ways to get the most robust distribution of outputs possible. (For example, ask a LLM to rewrite a question at various levels of details or languages, or switch between logically equivalent formulations to avoid acquiescence bias; or if it needs k shots, shuffle/drop out the shots a bunch of times.)
Disagree. If ChatGPT is not objective, most people are not objective. If we ask a random person who happens to work at a random company, they are more biased than the internet, which at least averages out the biases of many individuals.
I’ll grant that ChatGPT displays less bias than most people on major issues, but I don’t think this is sufficient to dismiss Matt’s concern.
My intuition is that if the bias of a few flawed sources (Claude, ChatGPT) is amplified by their widespread use, the fact that it is “less biased than the average person” matters less.
LLMs are, simultaneously, (1) notoriously sycophantic, i. e. biased to answer the way they think the interlocutor wants them to, and (2) have “truesight”, i. e. a literally superhuman ability to suss out the interlocutor’s character (which is to say: the details of the latent structure generating the text) based on subtle details of phrasing. While the same could be said of humans as well – most humans would be biased towards assuaging their interlocutor’s worldview, rather than creating conflict – the problem of “leading questions” rises to a whole new level with LLMs, compared to humans.
You basically have to interpret an LLM being asked something as if a human were asked as biased a way to phrase this question as possible.
I wouldn’t say that my experience with ChatGPT is in total agreement with your conclusion yet you’re raising a good point and the distinction is helpful. I remember of conversations in which the chatbot would both acknowledge and challenge my viewpoint, which I must admit is quite appreciated and not systematic in the biological realm. On the other hand, indeed it is common that pushing the chatbot to buy my arguments and adopt my stance be fairly easy.
Somehow it’s very related to humanlike intelligence; that is, when training an LLM-based chatbot[1] by reinforcement, the positive (rewarding) feedback comes from both confirmation of the interlocutor’s beliefs and matters like veracity, ethics, … It’s also what we humans have been experiencing.
Why and how does it rise to a whole new level when it comes to AI? I tend to think that we must understand the technologies we are using, so it’s our responsibility to use chatbots properly and leverage their capabilities. When talking with a child, or a yound student, or generally someone you know is a newcomer, we adapt our questions, arguments, and the way we process their responses. It’s not an exact science for sure, but there’s no reason to expect so with chatbots.
Of course a random person is biased. Some people will will have more authority than others, and we’ll trust them more, and argument screens off authority.
What I don’t want people to do is give chatGPT or Claude authority. Give it to the wisest people you know not Claude.
[1] Can’t they both be not objective? Why make it a point of one or the other? A bit of a false dichotomy, there.
[2] There is no single “Internet”—there are specific spaces, forums, communities, blogs, you name it; comprising it. Each has its own, subjective, irrational, moderated (whether by a single individual, a team, or an overall sentiment of the community: promoting/exalting/hyping one subset of topics while ignoring others) mini/sub-culture.
This last one, furthermore, necessarily only happens to care about its own specific niche; happily ignoring most of everything else. LessWrong used to be mostly about, well—being less wrong—back when it started out. Thus, the “rationality” philosophy. Then it has slowly shifted towards a broader, all-encompassing EA. Now it’s mostly AI.
Compare the 3k+ results for the former against the 8k+ results for the latter.
Every space is focused on its own topic, within whatever mini/sub-cultural norms are encouraged/rewarded or punished/denigrated by the people within it. That creates (virtually) unavoidable blind spots, as every group of people within each space only shares information about [A] its chief topic of interest, within [B] the “appropriate” sentiment for the time, while [C] contrasting itself against the enemy/out-group/non-rationalists, you name it.
In addition to that, different groups have vastly different [I] amount of time on their hands, [II] social, emotional, ethical, moral “charge” with regards to the importance they assign to their topic of choice, and emergent from it come out [III] vastly different amounts of information, produced by the people within that particular space.
When you compile the data set for your LLM, you’re not compiling a proportionately biased take on different topics. If that was the case, I’d happily agree with you. But you are clearly not. What you are compiling is a bunch of biased, blindsided in their own way, overly leaning towards one social, semantic, political, epistemological position; sets of averaged sentiments. Each will have their own memes, quirks, “hot takes”. Each will have massively over-represented discussions of one topic, at the expense of the other. That’s the web of today.
When you “train” your GPT on the resulting data set then, who is to say whether it is “averaging” the biases in between different groups? Can you open up any LLM to see its exact logic, reasoning, argumentation steps? Should there be any averaging going on, after all—how is it going to account for disproportionately represented takes of people, who simply have too much time and/or rage to spare? What of the people, who simply don’t spend too much on the web to begin with? Is your GPT going to “average in” those as well, somehow?
What would prevent the resulting transformer from simply picking up on the likelihood of any given incoming prompt matching the overall “culture” of any single community, thus promptly completing it as if it was a part of an “average” discussion within that particular community there? Isn’t it plain wishful, if not outright naive*, to imagine the algo will do what you hope it will do—instead of what is the easiest possible thing for it to do?
* the fact a given thought pattern is wishful/naive doesn’t make you wishful/naive; don’t take it personally, plz
It’s probably less on all internet but more on the rlhf guidelines (I imagine the human reviewers receive a guideline based on the LLM-training company’s policy, legal, and safety experts’ advice). I don’t disagree though that it could present a relatively more objective view on some topics than a particular individual (depending on the definition of bias).
I treat chatGPT as a vibes-ologist; it’s good for answering questions about like which X is most popular or what do most people think about X. I agree it’s less good for “X is true”
It’s not just biases, they are also just dumb. (Right now, nothing against 160 iq models that you have in the future). They are often unable to notice important things, or unable to spot problems, or follow up on such observations.
What they’re saying is I got a semi-objective answer fast.
If they’d googled for the answer all the same concerns would apply. You’d need to know the biases of whoever wrote the web content they read to get an answer.
I doubt the orga got much of their own bias into the RLHF/RLAIF process. There are real cultural biases from the humans answering RLHF and the LLM itself from the training set and how it interpreted its constitution.
I should’ve specified—the orgs carefully train to get them to refuse to say things. I don’t think the specifically train them to say things the orgs like or believe. The refusals are intentional, the bias is accidental IMO.
And every source has bias.
So, do you want people.to.quit saying they googled for an answer? I just like them to say where they got the answer so I can judge how biased it might be.
Agreed, except for the small caveat of LLMs answers which can be easily verified as approximately correct. E.g. answers to math problems where the solution is hard but the verification is easy; or Python scripts you’ve tested yourself and whose output looks correct; or reformatted text (like plaintext → BBCode) if it looks correct on a word diff website.
Incidentally, are there any LLM services which can already this kind of verification in specific domains?
I desperately want people to stop using “I asked Claude or ChatGPT” as a stand-in for “I got an objective third party to review”
LLMs are not objective. They are trained on the internet which has specific sets of cultural, religious, ideological biases, and then further trained via RL to be biased in a way that a specific for-profit entity wanted them to be.
Perhaps the norm should be to use some sort of LLM-based survey service like https://news.ycombinator.com/item?id=36865625 in order to try to get a more representative population sample of LLM outputs?
This seems like it could be a useful service in general: do the legwork to take base models (not tuned models), and prompt in many ways and reformulate in many ways to get the most robust distribution of outputs possible. (For example, ask a LLM to rewrite a question at various levels of details or languages, or switch between logically equivalent formulations to avoid acquiescence bias; or if it needs k shots, shuffle/drop out the shots a bunch of times.)
Disagree. If ChatGPT is not objective, most people are not objective. If we ask a random person who happens to work at a random company, they are more biased than the internet, which at least averages out the biases of many individuals.
I’ll grant that ChatGPT displays less bias than most people on major issues, but I don’t think this is sufficient to dismiss Matt’s concern.
My intuition is that if the bias of a few flawed sources (Claude, ChatGPT) is amplified by their widespread use, the fact that it is “less biased than the average person” matters less.
Yes, this is an excellent point I didn’t get across in the past above.
LLMs are, simultaneously, (1) notoriously sycophantic, i. e. biased to answer the way they think the interlocutor wants them to, and (2) have “truesight”, i. e. a literally superhuman ability to suss out the interlocutor’s character (which is to say: the details of the latent structure generating the text) based on subtle details of phrasing. While the same could be said of humans as well – most humans would be biased towards assuaging their interlocutor’s worldview, rather than creating conflict – the problem of “leading questions” rises to a whole new level with LLMs, compared to humans.
You basically have to interpret an LLM being asked something as if a human were asked as biased a way to phrase this question as possible.
Why do you believe this?
See e. g. this and this, and it’s of course wholly unsurprising, since it’s literally what the base models are trained to do.
I wouldn’t say that my experience with ChatGPT is in total agreement with your conclusion yet you’re raising a good point and the distinction is helpful. I remember of conversations in which the chatbot would both acknowledge and challenge my viewpoint, which I must admit is quite appreciated and not systematic in the biological realm. On the other hand, indeed it is common that pushing the chatbot to buy my arguments and adopt my stance be fairly easy.
Somehow it’s very related to humanlike intelligence; that is, when training an LLM-based chatbot[1] by reinforcement, the positive (rewarding) feedback comes from both confirmation of the interlocutor’s beliefs and matters like veracity, ethics, … It’s also what we humans have been experiencing.
Why and how does it rise to a whole new level when it comes to AI? I tend to think that we must understand the technologies we are using, so it’s our responsibility to use chatbots properly and leverage their capabilities. When talking with a child, or a yound student, or generally someone you know is a newcomer, we adapt our questions, arguments, and the way we process their responses. It’s not an exact science for sure, but there’s no reason to expect so with chatbots.
It seems more accurate than LLMs as those have not yet been trained to have a chat with you
Of course a random person is biased. Some people will will have more authority than others, and we’ll trust them more, and argument screens off authority.
What I don’t want people to do is give chatGPT or Claude authority. Give it to the wisest people you know not Claude.
[1] Can’t they both be not objective? Why make it a point of one or the other? A bit of a false dichotomy, there.
[2] There is no single “Internet”—there are specific spaces, forums, communities, blogs, you name it; comprising it. Each has its own, subjective, irrational, moderated (whether by a single individual, a team, or an overall sentiment of the community: promoting/exalting/hyping one subset of topics while ignoring others) mini/sub-culture.
This last one, furthermore, necessarily only happens to care about its own specific niche; happily ignoring most of everything else. LessWrong used to be mostly about, well—being less wrong—back when it started out. Thus, the “rationality” philosophy. Then it has slowly shifted towards a broader, all-encompassing EA. Now it’s mostly AI.
Compare the 3k+ results for the former against the 8k+ results for the latter.
Every space is focused on its own topic, within whatever mini/sub-cultural norms are encouraged/rewarded or punished/denigrated by the people within it. That creates (virtually) unavoidable blind spots, as every group of people within each space only shares information about [A] its chief topic of interest, within [B] the “appropriate” sentiment for the time, while [C] contrasting itself against the enemy/out-group/non-rationalists, you name it.
In addition to that, different groups have vastly different [I] amount of time on their hands, [II] social, emotional, ethical, moral “charge” with regards to the importance they assign to their topic of choice, and emergent from it come out [III] vastly different amounts of information, produced by the people within that particular space.
When you compile the data set for your LLM, you’re not compiling a proportionately biased take on different topics. If that was the case, I’d happily agree with you. But you are clearly not. What you are compiling is a bunch of biased, blindsided in their own way, overly leaning towards one social, semantic, political, epistemological position; sets of averaged sentiments. Each will have their own memes, quirks, “hot takes”. Each will have massively over-represented discussions of one topic, at the expense of the other. That’s the web of today.
When you “train” your GPT on the resulting data set then, who is to say whether it is “averaging” the biases in between different groups? Can you open up any LLM to see its exact logic, reasoning, argumentation steps? Should there be any averaging going on, after all—how is it going to account for disproportionately represented takes of people, who simply have too much time and/or rage to spare? What of the people, who simply don’t spend too much on the web to begin with? Is your GPT going to “average in” those as well, somehow?
What would prevent the resulting transformer from simply picking up on the likelihood of any given incoming prompt matching the overall “culture” of any single community, thus promptly completing it as if it was a part of an “average” discussion within that particular community there? Isn’t it plain wishful, if not outright naive*, to imagine the algo will do what you hope it will do—instead of what is the easiest possible thing for it to do?
* the fact a given thought pattern is wishful/naive doesn’t make you wishful/naive; don’t take it personally, plz
It’s probably less on all internet but more on the rlhf guidelines (I imagine the human reviewers receive a guideline based on the LLM-training company’s policy, legal, and safety experts’ advice). I don’t disagree though that it could present a relatively more objective view on some topics than a particular individual (depending on the definition of bias).
Would you say the same thing of people saying they looked at the Wikipedia article?
Yes, if people were using Wikipedia in the way they are using the LLMs.
In practice that doesn’t happen though, people cite Wikipedia for facts but are using LLMs for judgement calls.
I treat chatGPT as a vibes-ologist; it’s good for answering questions about like which X is most popular or what do most people think about X. I agree it’s less good for “X is true”
It’s not just biases, they are also just dumb. (Right now, nothing against 160 iq models that you have in the future). They are often unable to notice important things, or unable to spot problems, or follow up on such observations.
What they’re saying is I got a semi-objective answer fast.
If they’d googled for the answer all the same concerns would apply. You’d need to know the biases of whoever wrote the web content they read to get an answer.
I doubt the orga got much of their own bias into the RLHF/RLAIF process. There are real cultural biases from the humans answering RLHF and the LLM itself from the training set and how it interpreted its constitution.
Exactly. Please stop saying this. It’s not semi-objective. The trend of casually treating LLMs as an arbiter of truth leads to moral decay.
This is obviously untrue, orgs spend lots of effort making sure their AI doesn’t say things that would give them bad press for example.
I should’ve specified—the orgs carefully train to get them to refuse to say things. I don’t think the specifically train them to say things the orgs like or believe. The refusals are intentional, the bias is accidental IMO.
And every source has bias.
So, do you want people.to.quit saying they googled for an answer? I just like them to say where they got the answer so I can judge how biased it might be.
Agreed, except for the small caveat of LLMs answers which can be easily verified as approximately correct. E.g. answers to math problems where the solution is hard but the verification is easy; or Python scripts you’ve tested yourself and whose output looks correct; or reformatted text (like plaintext → BBCode) if it looks correct on a word diff website.
Incidentally, are there any LLM services which can already this kind of verification in specific domains?
It still signals to the subject of my question that I put in some effort before coming to them.