eggsyntax comments on AI #34: Chipping Away at Chip Exports

eggsyntax 20 Oct 2023 16:12 UTC
1 point
0
The GPT-4V dog that did not bark remains adversarial attacks. At some point, one would think, we would get word of either:
- Successful adversarial image-based attacks on GPT-4V.
- Unsuccessful attempts at adversarial image-based attacks on GPT-4V.
As of late August, Jan Leike suggested that OpenAI doesn’t have any good defenses:

https://twitter.com/janleike/status/1695153523527459168

Jailbreaking LLMs through input images might end up being a nasty problem. It’s likely much harder to defend against than text jailbreaks because it’s a continuous space. Despite a decade of research we don’t know how to make vision models adversarially robust.
- Zvi 20 Oct 2023 20:32 UTC
  2 points
  0
  Parent
  I think that’s a little more recent than the last confirmation I remembered, but yes, exactly.