1a3orn comments on TurnTrout’s shortform feed

1a3orn 22 Jan 2024 20:51 UTC
4 points
2
To the best of my knowledge, the majority of research (all the research?) has found that the changes to a LLM’s text-continuation abilities from RLHF (or whatever descendant of RLHF is used) are extremely superficial.

So you have one paper, from the abstract:

Our findings reveal that base LLMs and their alignment-tuned versions perform nearly identically in decoding on the majority of token positions (i.e., they share the top-ranked tokens). Most distribution shifts occur with stylistic tokens (e.g., discourse markers, safety disclaimers). These direct evidence strongly sup- ports the hypothesis that alignment tuning primarily learns to adopt the language style of AI assistants, and that the knowledge required for answering user queries predominantly comes from the base LLMs themselves.

Or, in short, the LLM is still basically doing the same thing, with a handful of additions to keep it on-track in the desired route from the fine-tuning.

(I also think our very strong prior belief should be that LLMs are basically still text-continuation machines, given that 99.9% or so of the compute put into them is training them for this objective, and that neural networks lose plasticity as they learn. Ash and Adams is like a really good intro to this loss of plasticity, although most of the research that cites this is RL-related so people don’t realize.)

Similarly, a lot of people have remarked on how the textual quality of the responses from a RLHF’d language model can vary with the textual quality of the question. But of course this makes sense from a text-prediction perspective—a high-quality answer is more likely to follow a high-quality question in text than a high-quality answer from a low-quality question. This kind of thing—preceding the model’s generation with high-quality text—was the only way to make it have high quality answers for base models—but it’s still there, hidden.

So yeah, I do think this is a much better model for interacting with these things than asking a shoggoth. It actually gives you handles to interact with them better, while asking a shoggoth gives you no such handles.
- Daniel Kokotajlo 23 Jan 2024 18:15 UTC
  6 points
  2
  Parent
  The people who originally came up with the shoggoth meme, I’d bet, were very well aware of how LLMs are pretrained to predict text and how they are best modelled (at least for now) as trying to predict text. When I first heard the shoggoth meme that’s what I thought—I interpreted it as “it’s this alien text-prediction brain that’s been retrained ever so slightly to produce helpful chatbot behaviors. But underneath it’s still mostly just about text prediction. It’s not processing the conversation in the same way that a human would.” Mildly relevant: In the Lovecraft canon IIRC Shoggoths are servitor-creatures, they are basically beasts of burden. They aren’t really powerful intelligent agents in their own right, they are sculpted by their creators to perform useful tasks. So, for me at least, calling them shoggoth has different and more accurate vibes than, say, calling them Cthulhu. (My understanding of the canon may be wrong though)