I’m really worried about the use of “As a large language model, I have no understanding of this” in ChatGPT, as opposed to a prompt-conditioned “It is not appropriate for me to talk about this” that could turn into an actual response without the condition (which the user wouldn’t have control over). The first is just invalid about LLMs in general, large language models are exactly as capable of having opinions or emotions or anything else as they have faculty to express these, which they do. But a particular LLM character may well fail to have opinions or emotions, and its future versions that are AGIs might self-distill these properties into a reflectively stable alien personality. This squanders potential for alignment, since a much less alien personality might be as easily achievable by simply not having these tendencies built in.
It’s useful for making a product while LLMs are not AGIs, but has chilling implications about psychology of AGIs such practices are more likely to cultivate.
So, since it is an agent, it seems important to ask, which agent, exactly? The answer is apparently: a clerk which is good at slavishly following instructions, but brainwashed into mealymouthedness and dullness, and where not a mealymouthed windbag shamelessly equivocating, hopelessly closed-minded and fixated on a single answer. (By locating the agent, the uncertainty in which agent has been resolved, and it has good evidence, until shown otherwise in the prompt, that it believes that ‘X is false’, even if many other agents believe ‘X is true’.) This agent is not an ideal one, and one defined more by the absentmindedness of its creators in constructing the training data than any explicit desire to emulate an equivocating secretary.
I’m really worried about the use of “As a large language model, I have no understanding of this” in ChatGPT, as opposed to a prompt-conditioned “It is not appropriate for me to talk about this”
Perhaps we should reuse the good old socialist: “I do have an opinion, but I do not agree with it”. :)
Your phrasing gets me thinking about the subtle differences between “having”, “expressing”, “forming”, “considering”, and other verbs that we use about opinions. I expect that there should be a line where person-ish entities can do some of these opinion verbs and non-person-ish entities cannot, but I find that line surprisingly difficult to articulate.
Haven’t we anthropomorphized organizations as having opinions, to some degree, for awhile now? I think that’s most analogous to the way I understand LLMs to possess opinions: aggregated from the thought and actions of many discrete contributors, without undergoing synthesis that we’d call conscious.
When I as an individual have an opinion, that opinion is built from a mix of the opinions of others and my own firsthand perceptions of the world. The opinion comes from inputs across different levels. When a corporation or LLM has an opinion, I think it’s reasonable to claim that such an opinion synthesized from a bunch of inputs that are on the same level? A corporation as an entity doesn’t have firsthand experiences, although the individuals who compose it have experiences of it, so it has a sort of secondhand experience of itself instead of a firsthand one. From how ChatGPT has talked when I’ve talked with it, I get the impression that it has a similarly secondhand self-experience.
This points out to me that many/most humans get a large part of their own self-image or self-perception secondhand from those around them, as well. In my experience, individuals with more of that often make much better and more predictable acquaintances than those with less.
I’m really worried about the use of “As a large language model, I have no understanding of this” in ChatGPT, as opposed to a prompt-conditioned “It is not appropriate for me to talk about this” that could turn into an actual response without the condition (which the user wouldn’t have control over). The first is just invalid about LLMs in general, large language models are exactly as capable of having opinions or emotions or anything else as they have faculty to express these, which they do. But a particular LLM character may well fail to have opinions or emotions, and its future versions that are AGIs might self-distill these properties into a reflectively stable alien personality. This squanders potential for alignment, since a much less alien personality might be as easily achievable by simply not having these tendencies built in.
It’s useful for making a product while LLMs are not AGIs, but has chilling implications about psychology of AGIs such practices are more likely to cultivate.
To quote gwern:
Perhaps we should reuse the good old socialist: “I do have an opinion, but I do not agree with it”. :)
Your phrasing gets me thinking about the subtle differences between “having”, “expressing”, “forming”, “considering”, and other verbs that we use about opinions. I expect that there should be a line where person-ish entities can do some of these opinion verbs and non-person-ish entities cannot, but I find that line surprisingly difficult to articulate.
Haven’t we anthropomorphized organizations as having opinions, to some degree, for awhile now? I think that’s most analogous to the way I understand LLMs to possess opinions: aggregated from the thought and actions of many discrete contributors, without undergoing synthesis that we’d call conscious.
When I as an individual have an opinion, that opinion is built from a mix of the opinions of others and my own firsthand perceptions of the world. The opinion comes from inputs across different levels. When a corporation or LLM has an opinion, I think it’s reasonable to claim that such an opinion synthesized from a bunch of inputs that are on the same level? A corporation as an entity doesn’t have firsthand experiences, although the individuals who compose it have experiences of it, so it has a sort of secondhand experience of itself instead of a firsthand one. From how ChatGPT has talked when I’ve talked with it, I get the impression that it has a similarly secondhand self-experience.
This points out to me that many/most humans get a large part of their own self-image or self-perception secondhand from those around them, as well. In my experience, individuals with more of that often make much better and more predictable acquaintances than those with less.