Patrick Leask comments on Stop posting prompt injections on Twitter and calling it “misalignment”

Patrick Leask 20 Feb 2023 12:58 UTC
2 points
0
I’m not convinced by the comparison to kitchenware and your grandmother—chatbots (especially ones that can have external sideeffects) should be assessed by software safety standards, where injection attacks can be comprehensive and anonymous. It’s quite unlikely that your grandma could be tricked into thinking she’s in a video game where she needs to hit her neighbour with a collander, but it seems likely that a chatbot with access to an API that hits people with collanders could be tricked into believing using the API is part of the game.
I think the concept of the end-user is a little fuzzy—ideally if somebody steals my phone they shouldn’t be able to unlock it with an adversarial image, but you seem to be saying this is too high a bar to set, as the new end-user (the thief) wants it to be unlocked.