Daniel Kokotajlo comments on What is the best argument that LLMs are shoggoths?

Daniel Kokotajlo 18 Mar 2024 1:12 UTC
7 points
2
Can you say more about what you mean by “Where can I find a post or article arguing that the internal cognitive model of contemporary LLMs is quite alien, strange, non-human, even though they are trained on human text and produce human-like answers, which are rendered “friendly” by RLHF?”

Like, obviously it’s gonna be alien in some ways and human-like in other ways. Right? How similar does it have to be to humans, in order to count as not an alien? Surely you would agree that if we were to do a cluster analysis of the cognition of all humans alive today + all LLMs, we’d end up with two distinct clusters (the LLMs and then humanity) right?
- JoshuaFox 18 Mar 2024 12:24 UTC
  2 points
  0
  Parent
  
  > Like, obviously it’s gonna be alien in some ways and human-like in other ways. Right
  It has been said that since LLMs predict human output, they will, if sufficiently improved, be quite human—that they will behave in a quite human way.
  
  > Can you say more about what you mean by “Where can I find a post
  As part of a counterargument to that, we could find evidence that their logical structure is quite different from humans. I’d like to see such a write-up.
  
  > Surely you would agree that if we were to do a cluster analysis of the cognition of all humans alive today + all LLMs, we’d end up with two distinct clusters (the LLMs and then humanity) right?
  
  I agree, but I’d like to see some article or post arguing that.
  - Daniel Kokotajlo 18 Mar 2024 14:29 UTC
    2 points
    0
    Parent
    OK, thanks.
    Your answer to my first question isn’t really an answer—“they will, if sufficiently improved, be quite human—they will behave in a quite human way.” What counts as “quite human?” Also are we just talking about their external behavior now? I thought we were talking about their internal cognition.
    
    You agree about the cluster analysis thing though—so maybe that’s a way to be more precise about this. The claim you are hoping to see argued for is “If we magically had access to the cognition of all current humans and LLMs, with mechinterp tools etc. to automatically understand and categorize it, and we did a cluster analysis of the whole human+llm population, we’d find that there are two distinct clusters: the human cluster and the llm cluster.”I
    
    s that right?I
    
    f so then here’s how I’d make the argument. I’d enumerate a bunch of differences between LLMs and humans, differences like “LLMs don’t have bodily senses” and “LLMs experience way more text over the course of their training than humans experience in their lifetimes” and “LLMs have way fewer parameters” and “LLMs internal learning rule is SGD whereas humans use hebbian learning or whatever” and so forth, and then for each difference say “this seems like the sort of thing that might systematically affect what kind of cognition happens, to an extent greater than typical intra-human differences like skin color, culture-of-childhood, language-raised-with, etc.” Then add it all up and be like “even if we are wrong about a bunch of these claims it still seems like overall the cluster analysis is gonna keep humans and LLMs apart instead of mingling them together. Like what the hell else could it do? Divide everyone up by language maybe, and have primarily-English LLMs in the same cluster as humans raised speaking English, and then nonenglish speakers and nonenglish LLMs in the other cluster? That’s probably my best guess as to how else the cluster analysis could shake out, and it doesn’t seem very plausible to me—and even if it were true, it would be true on the level of ‘what concepts are used internally’ rather than more broadly about stuff that really matters like what the goals/values/architecture of the system is (i.e. how they are used)