Daniel Tan comments on Daniel Tan’s Shortform

Daniel Tan 28 Dec 2024 21:34 UTC
6 points
−2
I’m pretty confused as to why it’s become much more common to anthropomorphise LLMs.

At some point in the past the prevailing view was “a neural net is a mathematical construct and should be understood as such”. Assigning fundamentally human qualities like honesty or self-awareness was considered an epistemological faux pas.

Recently it seems like this trend has started to reverse. In particular, prosaic alignment work seems to be a major driver in the vocabulary shift. Nowadays we speak of LLMs that have internal goals, agency, self-identity, and even discuss their welfare.

I know it’s been a somewhat gradual shift, and that’s why I haven’t caught it until now, but I’m still really confused. Is the change in language driven by the qualitative shift in capabilities? do the old arguments no longer apply?
- Zack_M_Davis 28 Dec 2024 21:51 UTC
  6 points
  0
  Parent
  A mathematical construct that models human natural language could be said to express “agency” in a functional sense insofar as it can perform reasoning about goals, and “honesty” insofar as the language it emits accurately reflects the information encoded in its weights?
  - Daniel Tan 28 Dec 2024 22:02 UTC
    3 points
    0
    Parent
    I agree that from a functional perspective, we can interact with an LLM in the same way as we would another human. At the same time I’m pretty sure we used to have good reasons for maintaining a conceptual distinction.
    
    One potential issue is that when the language shifts to implicitly frame the LLM as a person, that subtly shifts the default perception on a ton of other linked issues. Eg the “LLM is a human” frame raises the questions of “do models deserve rights”.
    
    But idunno, it’s possible that there’s some philosophical argument by which it makes sense to think of LLMs as human once they pass the turing test.
    
    Also, there’s undoubtedly something lost when we try to be very precise. Having to dress discourse in qualifications makes the point more obscure, which doesn’t help when you want to leave a clear take home message. Framing the LLM as a human is a neat shorthand that preserves most of the xrisk-relevant meaning.
    
    I guess I’m just wondering if alignment research has resorted to anthropomorphization because of some well considered reason I was unaware of, or simply because it’s more direct and therefore makes points more bluntly (“this LLM could kill you” vs “this LLM could simulate a very evil person who would kill you”).