Gunnar_Zarncke comments on The Waluigi Effect (mega-post)

Gunnar_Zarncke 3 Mar 2023 10:35 UTC
3 points
0
I think this proves a bit too much. It seems plausible to me that this super-position exists in narratives and fiction, but real-life conversations are not like that (unless people are acting, and even then they sometimes break). For such conversations and statements, the superposition would at least be different.
This does suggest a different line of attack: Prompt ChatGPT into reproducing forum conversations by starting with a forum thread and let it continue it.
- Cleo Nardo 3 Mar 2023 12:22 UTC
  5 points
  0
  Parent
  real-life conversations are not like that
  That’s exactly the point I’m making! The chatbot isn’t a unique character which might behave differently on different inputs. Rather, the chatbot is the superposition of many different characters, and their amplitude can fluctuate depending on how you interact with the superposition.
  - Gunnar_Zarncke 3 Mar 2023 12:33 UTC
    7 points
    2
    Parent
    I think you are misunderstanding me. ChatGPT is not just the superposition of characters. Sure, for the fiction and novels it has read yes, but for the real-life conversations no. ChatGPT is a superposition of fiction and real dialogue which doesn’t follow narratives. If you prompt it into a forum thread scenario it will respond with real-life conversations with fewer waluigis. I tried and it works basically (though I need more practice).
    - Cleo Nardo 3 Mar 2023 12:49 UTC
      8 points
      1
      Parent
      Oh, I misunderstood. Yep, you’re correct, ChatGPT is a superposition of both fictional dialogue and forum dialogue, and you can increase the amplitude of forum dialogue by writing the dialogue in the syntax of forum logs. However, you can also increase the amplitude of fiction by writing in the dialogue of fiction, so your observation doesn’t protect against adversarial attacks against chatbots.
      
      Moreover, real-life forums contain waluigis, although they won’t be so cartoonishly villainous.
      - Gunnar_Zarncke 3 Mar 2023 13:39 UTC
        2 points
        0
        Parent
        Indeed.
        
        I think trying to strongly align an LLM is futile.
        Bill Benzon 5 Mar 2023 13:43 UTC
        1 point
        0
        Parent
        LLM as Borg?
        
        I think of LLMs as digital wilderness. You explore it, map out some territory that interests you, and then figure out how to “domesticate” it, if you can. Ultimately, I think, you’re going to have to couple with a World Model.