abramdemski comments on The Waluigi Effect (mega-post)

abramdemski 13 Mar 2023 19:32 UTC
LW: 8 AF: 3
2
AF
LLMs are high order Markov models, meaning they can’t really balance two different hypotheses in the way you describe; because evidence drops out of memory eventually, the probability of Waluigi drops very small instead of dropping to zero. This makes an eventual waluigi transition inevitable as claimed in the post.
- Cleo Nardo 13 Mar 2023 23:08 UTC
  LW: 21 AF: 12
  17
  AF Parent
  You’re correct. The finite context window biases the dynamics towards simulacra which can be evidenced by short prompts, i.e. biases away from luigis and towards waluigis.
  
  But let me be more pedantic and less dramatic than I was in the article — the waluigi transitions aren’t inevitable. The waluigi are approximately-absorbing classes in the Markov chain, but there are other approximately-absorbing classes which the luigi can fall into. For example, endlessly cycling through the same word (mode-collapse) is also an approximately-absorbing class.
  - abramdemski 13 Mar 2023 23:15 UTC
    LW: 4 AF: 3
    0
    AF Parent
    What report is the image pulled from?
    - Cleo Nardo 13 Mar 2023 23:21 UTC
      17 points
      7
      Parent
      “Open Problems in GPT Simulator Theory” (forthcoming)
      Specifically, this is a chapter on the preferred basis problem for GPT Simulator Theory.
      
      TLDR: GPT Simulator Theory says that the language model $μ : T^{k} \to Δ (T)$ decomposes into a linear interpolation $μ = \sum s \in S α_{s} μ_{s}$ where each $μ_{s} : T^{k} \to Δ (T)$ is a “simulacra” and the amplitudes $a_{s}$ update in an approximately Bayesian way. However, this decomposition is non-unique, making GPT Simulator Theory either ill-defined, arbitrary, or trivial. By comparing this problem to the preferred basis problem in quantum mechanics, I construct various potential solutions and compare them.