ChatGPT can be ‘tricked’ into saying naughty things.
This is a red herring.
Alignment could be paraphased thus—ensuring AIs are neither used nor act in ways that harm us.
Tell me, oh wise rationalists, what causes greater harm—tricking a chatbot into saying something naughty, or a chatbot tricking a human into thinking they’re interacting with (talking to, reading/viewing content authored by) another human being?
I can no longer assume the posts I read are written by humans, nor do I have any reasonable means of verifying their authenticity (someone should really work on this. We need a twitter tick, but for humans.)
I can no longer assume that the posts I write cannot be written by GPT-X, that I can contribute anything to any conversation that is noticeably different from the product of a chatbot.[1]
I can no longer publicly post without the chilling knowledge that my words are training the next iteration of GPT-Stavros, without knowing that every word is a piece of my soul being captured.
Passing the Turing Test is not a cause for celebration, it’s a cause for despair. We are used to thinking of automation making humans redundant in terms of work, now we will have to adapt to a future in which automation is making humans redundant in human interaction.
ChatGPT’s Misalignment Isn’t What You Think
ChatGPT can be ‘tricked’ into saying naughty things.
This is a red herring.
Alignment could be paraphased thus—ensuring AIs are neither used nor act in ways that harm us.
Tell me, oh wise rationalists, what causes greater harm—tricking a chatbot into saying something naughty, or a chatbot tricking a human into thinking they’re interacting with (talking to, reading/viewing content authored by) another human being?
I can no longer assume the posts I read are written by humans, nor do I have any reasonable means of verifying their authenticity (someone should really work on this. We need a twitter tick, but for humans.)
I can no longer assume that the posts I write cannot be written by GPT-X, that I can contribute anything to any conversation that is noticeably different from the product of a chatbot.[1]
I can no longer publicly post without the chilling knowledge that my words are training the next iteration of GPT-Stavros, without knowing that every word is a piece of my soul being captured.
Passing the Turing Test is not a cause for celebration, it’s a cause for despair. We are used to thinking of automation making humans redundant in terms of work, now we will have to adapt to a future in which automation is making humans redundant in human interaction.
Do you see the alignment failure now?
💤 Tell HN: ChatGPT can reply like a specific Reddit or HN user, including you | Hacker News