A mathematical construct that models human natural language could be said to express “agency” in a functional sense insofar as it can perform reasoning about goals, and “honesty” insofar as the language it emits accurately reflects the information encoded in its weights?
I agree that from a functional perspective, we can interact with an LLM in the same way as we would another human. At the same time I’m pretty sure we used to have good reasons for maintaining a conceptual distinction.
One potential issue is that when the language shifts to implicitly frame the LLM as a person, that subtly shifts the default perception on a ton of other linked issues. Eg the “LLM is a human” frame raises the questions of “do models deserve rights”.
But idunno, it’s possible that there’s some philosophical argument by which it makes sense to think of LLMs as human once they pass the turing test.
Also, there’s undoubtedly something lost when we try to be very precise. Having to dress discourse in qualifications makes the point more obscure, which doesn’t help when you want to leave a clear take home message. Framing the LLM as a human is a neat shorthand that preserves most of the xrisk-relevant meaning.
I guess I’m just wondering if alignment research has resorted to anthropomorphization because of some well considered reason I was unaware of, or simply because it’s more direct and therefore makes points more bluntly (“this LLM could kill you” vs “this LLM could simulate a very evil person who would kill you”).
A mathematical construct that models human natural language could be said to express “agency” in a functional sense insofar as it can perform reasoning about goals, and “honesty” insofar as the language it emits accurately reflects the information encoded in its weights?
I agree that from a functional perspective, we can interact with an LLM in the same way as we would another human. At the same time I’m pretty sure we used to have good reasons for maintaining a conceptual distinction.
One potential issue is that when the language shifts to implicitly frame the LLM as a person, that subtly shifts the default perception on a ton of other linked issues. Eg the “LLM is a human” frame raises the questions of “do models deserve rights”.
But idunno, it’s possible that there’s some philosophical argument by which it makes sense to think of LLMs as human once they pass the turing test.
Also, there’s undoubtedly something lost when we try to be very precise. Having to dress discourse in qualifications makes the point more obscure, which doesn’t help when you want to leave a clear take home message. Framing the LLM as a human is a neat shorthand that preserves most of the xrisk-relevant meaning.
I guess I’m just wondering if alignment research has resorted to anthropomorphization because of some well considered reason I was unaware of, or simply because it’s more direct and therefore makes points more bluntly (“this LLM could kill you” vs “this LLM could simulate a very evil person who would kill you”).