Daniel Kokotajlo comments on What is the best argument that LLMs are shoggoths?

Daniel Kokotajlo 18 Mar 2024 14:29 UTC
2 points
0
OK, thanks.
Your answer to my first question isn’t really an answer—“they will, if sufficiently improved, be quite human—they will behave in a quite human way.” What counts as “quite human?” Also are we just talking about their external behavior now? I thought we were talking about their internal cognition.

You agree about the cluster analysis thing though—so maybe that’s a way to be more precise about this. The claim you are hoping to see argued for is “If we magically had access to the cognition of all current humans and LLMs, with mechinterp tools etc. to automatically understand and categorize it, and we did a cluster analysis of the whole human+llm population, we’d find that there are two distinct clusters: the human cluster and the llm cluster.”I

s that right?I

f so then here’s how I’d make the argument. I’d enumerate a bunch of differences between LLMs and humans, differences like “LLMs don’t have bodily senses” and “LLMs experience way more text over the course of their training than humans experience in their lifetimes” and “LLMs have way fewer parameters” and “LLMs internal learning rule is SGD whereas humans use hebbian learning or whatever” and so forth, and then for each difference say “this seems like the sort of thing that might systematically affect what kind of cognition happens, to an extent greater than typical intra-human differences like skin color, culture-of-childhood, language-raised-with, etc.” Then add it all up and be like “even if we are wrong about a bunch of these claims it still seems like overall the cluster analysis is gonna keep humans and LLMs apart instead of mingling them together. Like what the hell else could it do? Divide everyone up by language maybe, and have primarily-English LLMs in the same cluster as humans raised speaking English, and then nonenglish speakers and nonenglish LLMs in the other cluster? That’s probably my best guess as to how else the cluster analysis could shake out, and it doesn’t seem very plausible to me—and even if it were true, it would be true on the level of ‘what concepts are used internally’ rather than more broadly about stuff that really matters like what the goals/values/architecture of the system is (i.e. how they are used)