This is the equivalent of saying that macbooks are dangerously misaligned because you could physically beat someone’s brains out with one.
I will say baselessly that telling ChatGPT not to say something raises the probability of it actually saying that thing by a significant amount, just by virtue of the text appearing previously in the context window.
Do you think OpenAI is ever going to change GPT models so they can’t represent or pretend to be agents? Is this a big priority in alignment? Is any model that can represent an agent accurately misaligned?
I swear- anything said in support of the proposition ‘AIs are dangerous’ is supported on this site. Actual cult behavior.
This is the equivalent of saying that macbooks are dangerously misaligned because you could physically beat someone’s brains out with one.
I will say baselessly that telling ChatGPT not to say something raises the probability of it actually saying that thing by a significant amount, just by virtue of the text appearing previously in the context window.
Do you think OpenAI is ever going to change GPT models so they can’t represent or pretend to be agents? Is this a big priority in alignment? Is any model that can represent an agent accurately misaligned?
I swear- anything said in support of the proposition ‘AIs are dangerous’ is supported on this site. Actual cult behavior.