Owain_Evans comments on LLMs can learn about themselves by introspection

Owain_Evans 20 Oct 2024 21:56 UTC
2 points
0
I agree about the “longer responses”.

I’m unsure about the “personality trait” framing. There are two senses of “introspection” for humans. One is introspecting on your current mental state (“I feel a headache starting”) and the other is being introspective about patterns in your behavior (e.g. “i tend to dislike violent movies” or “i tend to be shy among new people”). The former sense is more relevant to philosophy and psychology and less often discussed in daily life. The issue with the latter sense is that a model may not have privileged access to facts like this—i.e. if another model had the same observational data then it could learn the same fact.

So I’m most interested in the former kind of introspective, or in cases of the latter where it’d take large and diverse datasets (that are hard to construct) for another model to make the same kind of generalization.
- Thane Ruthenis 21 Oct 2024 0:31 UTC
  2 points
  0
  Parent
  One is introspecting on your current mental state (“I feel a headache starting”)
  That’s mostly what I had in mind as well. It still implies the ability to access a hierarchical model of your current state.
  You’re not just able to access low-level facts like “I am currently outputting the string ‘disliked’”, you also have access to high-level facts like “I disliked the third scene because it was violent”, “I found the plot arcs boring”, “I hated this movie”, from which the low-level behaviors are generated.
  Or using your example, “I feel a headache starting” is itself a high-level claim. The low-level claim is “I am experiencing a negative-valence sensation from the sensory modality A of magnitude X”, and the concept of a “headache” is a natural abstraction over a dataset of such low-level sensory experiences.