keltan comments on keltan’s Shortform

keltan 4 Dec 2024 22:46 UTC
5 points
0
I’m not entirely sure why, but I find it trivial to get GPT-4o to output harmful content in advanced voice mode (AVM), given that it doesn’t have a direct content filter on it e.g. NSFW.

In the span of 30 minutes it gave me 1) instructions to make a pipe bomb to attach to a car. And 2) Instructions on how to leak a virus into the public without detection (avoiding saying more on this).

I have a theory as to why it might be easy for me specifically. But I would like to know if this is the experience that others have with AVM?
- dirk 4 Dec 2024 23:09 UTC
  2 points
  0
  Parent
  I haven’t tried harmful outputs, but FWIW I’ve tried getting it to sing a few times and found that pretty difficult.
  - keltan 5 Dec 2024 2:10 UTC
    1 point
    0
    Parent
    Hu. That is extremely useful. Thank you.
    I’ve got a lot of singing out of AVM. While my current method works well for this, I find it more challenging than eliciting harmful outputs.
    - Daya Chowdry 5 Dec 2024 6:15 UTC
      1 point
      0
      Parent
      Did you use any specific prompt in memory or custom instructions?
      - keltan 5 Dec 2024 22:15 UTC
        1 point
        0
        Parent
        Omg. Oops! I completely forgot about custom instructions and memory! I’ll run some more trials with those off. Thank you very much for pointing this out.