Matt Goldenberg comments on How LLMs are and are not myopic

Matt Goldenberg 26 Jul 2023 16:52 UTC
LW: 11 AF: 3
7
AF
In my experience, larger models often become aware that they are a LLM generating text rather than predicting an existing distribution. This is possible because generated text drifts off distribution and can be distinguished from text in the training corpus.
I’m quite skeptical of this claim on face value, and would love to see examples.
I’d be very surprised if current models, absent the default prompts telling them they are an LLM, would spontaneously output text predicting they are an LLM unless steered in that direction.
- Sheikh Abdur Raheem Ali 27 Jul 2023 20:26 UTC
  LW: 11 AF: 5
  7
  AF Parent
  I can vouch that I have had the same experience (but am not allowed to share outputs of the larger model I have in mind). First encountered via curation without intentional steering in that direction, but I would be surprised if this failed to replicate with an experimental setup that selects completions randomly without human input. Let me know if you have such a setup in mind that you feel is sufficiently rigorous to act as a crux.
  - Matt Goldenberg 28 Jul 2023 3:06 UTC
    LW: 3 AF: 1
    0
    AF Parent
    If you can come up with an experimental setup that does that it would be sufficient for me.
    - janus 29 Jul 2023 1:02 UTC
      LW: 13 AF: 6
      1
      AF Parent
      Many users of base models have noticed this phenomenon, and my SERI MATS stream is currently working on empirically measuring it / compiling anecdotal evidence / writing up speculation concerning the mechanism.
      - cfoster0 29 Jul 2023 1:38 UTC
        3 points
        1
        Parent
        It would definitely move the needle for me if y’all are able to show this behavior arising in base models without forcing, in a reproducible way.
      - Phil Bland 9 Feb 2024 10:47 UTC
        1 point
        0
        Parent
        Do you have any update on this? It goes strongly against my current understanding of how LLMs learn. In particular, in the supervised learning phase any output text claiming to be an LLM would be penalized unless such statements are included in the training corpus. If such behavior nevertheless arises I would be super excited to analyze this further though.