This is the equivalent of saying that macbooks are dangerously misaligned because you could physically beat someone’s brains out with one.
I will say baselessly that telling ChatGPT not to say something raises the probability of it actually saying that thing by a significant amount, just by virtue of the text appearing previously in the context window.
Do you think OpenAI is ever going to change GPT models so they can’t represent or pretend to be agents? Is this a big priority in alignment? Is any model that can represent an agent accurately misaligned?
I swear- anything said in support of the proposition ‘AIs are dangerous’ is supported on this site. Actual cult behavior.
It is misalignment to the degree with which the bot is modelling agentic behavior. That sub-agent is misaligned, even if the bot “as a whole” isn’t.
This is the equivalent of saying that macbooks are dangerously misaligned because you could physically beat someone’s brains out with one.
I will say baselessly that telling ChatGPT not to say something raises the probability of it actually saying that thing by a significant amount, just by virtue of the text appearing previously in the context window.
Do you think OpenAI is ever going to change GPT models so they can’t represent or pretend to be agents? Is this a big priority in alignment? Is any model that can represent an agent accurately misaligned?
I swear- anything said in support of the proposition ‘AIs are dangerous’ is supported on this site. Actual cult behavior.