I think some of the time they’d agree with the clarified meaning… but also “often” they would treat it as an adversarial clarification and perhaps threateningly insinuate that you should stop adding clarity near their game.
(For reference: I’m not a nerd, I’m a language geek, and I think the main barrier to making really plausible and “human feeling” chatbots is (in some sense) figuring out to make them capable enough of manipulative insinuation (and defense from such attacks) that their powers start to feel like maybe they NEED to be Friendly for the machinery to feel safe to release into the wild?)
I think some of the time they’d agree with the clarified meaning… but also “often” they would treat it as an adversarial clarification and perhaps threateningly insinuate that you should stop adding clarity near their game.
(For reference: I’m not a nerd, I’m a language geek, and I think the main barrier to making really plausible and “human feeling” chatbots is (in some sense) figuring out to make them capable enough of manipulative insinuation (and defense from such attacks) that their powers start to feel like maybe they NEED to be Friendly for the machinery to feel safe to release into the wild?)