Thanks for the comment! I’m curious about the Anthropic Codex code-vulnerability prompting, is this written up somewhere? The closest I could find is this, but. I don’t think that’s what you’re referencing?
Interesting, thank you! I guess I was thinking of deception as characterized by Evan Hubinger, with mesa-optimizers, bells, whistles, and all. But I can see how a sufficiently large competence-vs-performance gap could also count as deception.
Wait! There’s doubts about the Tay story? I didn’t know that, and have failed to turn up anything in a few different searches just now. Can you say more, or drop a link if you have one?
I was not aware of this, thanks for pointing this out! I made a note in the text. I guess this is not an example of “advanced AI with an unfortunately misspecified goal” but rather just an example of the much larger class of “system with an unfortunately misspecified goal”.
Quibble: reminder that the Tay example is probably not real and shouldn’t be used.
Surely the Anthropic Codex code-vulnerability prompting is a great example?
Thanks for the comment! I’m curious about the Anthropic Codex code-vulnerability prompting, is this written up somewhere? The closest I could find is this, but. I don’t think that’s what you’re referencing?
https://arxiv.org/pdf/2107.03374.pdf#page=27
Interesting, thank you! I guess I was thinking of deception as characterized by Evan Hubinger, with mesa-optimizers, bells, whistles, and all. But I can see how a sufficiently large competence-vs-performance gap could also count as deception.
Wait! There’s doubts about the Tay story? I didn’t know that, and have failed to turn up anything in a few different searches just now. Can you say more, or drop a link if you have one?
I don’t want to write an essay about this, it’s too stupid an incident for anyone to waste time thinking about, but somehow everyone thinks it’s a great example and must be mentioned in every piece on AI risk… Some material: https://news.ycombinator.com/item?id=30739093 https://www.gwern.net/Leprechauns
I was not aware of this, thanks for pointing this out! I made a note in the text. I guess this is not an example of “advanced AI with an unfortunately misspecified goal” but rather just an example of the much larger class of “system with an unfortunately misspecified goal”.
Thanks!